Linear Regression.. Simple understanding..

A uni-variate regression captures linear relationship between an independent variable and a dependent variable. For each value of X  and corresponding Y values are plotted as shown below. Intentionally I have put Y as 1.5 * X. There are lots of notes on regression analysis and there could be much more theoretical explanations.

image 

But some of points that need to be noted are

  • Each point on scatter plotter graph is an instance of a data and Regression line aims to generate pattern of relationship between X and Y variables.

When we try to draw a regression line for above data there are 2 degrees of freedom for us to use.

1. Angle of regression line wrt. to X and Y axis (Called slope).

2. Intercept of regression line i.e where would regression line cross Y axis when X = 0.

For above example, regression line is as below, where  slope is 1.5 and intercept is 0.

image

Excel provides a way to control intercept and thus angle or slope of regression line. Above case is regression line with intercept = 0. Changing intercept values, forces a different slope as regression line should be such that there should be minimum deviation of estimated from actual value. That is why if we notice last case where intercept is 1000, regression line R2 value is negative indicating, regression line is NOT a best fit line for provided data and infact it has opposite slope. In other words, last case, though our data points have a positive correlation using intercept, a negative correlation has been forced.

Regression Line with intercept = 25

image

Regression Line with intercept = 50

image

Regression Line with intercept = 100

image

 

Regression Line with intercept = 1000

image

This was a simple thought process (for those of you who play with stats gurus, this is a trivial thing), but when have to understand “How is excel computing values of slope and intercept” in such a manner that estimated value  is as near to actual value (Gradient descent??)