Supercomputing Networking Research Education Ohio Supercomputer Center Site Map Staff Directory Support

WebED Education, Outreach and Training

A Short Course: Using Modeling and Statistics

Overview of Linear Regression

Linear regression is a statistical technique for measuring the strength of a linear relationship between a dependent variable (Y) and one or more independent variables (X1, X2, X3 … Xn).  The dependent variable is the one being impacted (the effect) and the independent variables are the causes of that effect.

The general form of the linear equation is:

Y = aX + b

Where:

Y = the dependent variable
a = a coefficient equivalent to the slope of the line
b = the Y intercept of the line (the place where it crosses the Y axis)

In the case of a multiple regression with three variables:

Y = a1X1 + a2X2 + a3X3 + b

The graph below illustrates a simple regression line and how the line approximates a single distribution. Note that not all of the real data points for Y fall on the regression line.  The regression statistics measure the degree of fit between the line and data points.  The squared deviations are added to create a measure called the coefficient of determination or R2 statistic.  This statistic measures the goodness of fit or the amount of variance explained by the model where 0 represents no variance explained and 1.0 represents 100% of the variance explained.

The model also returns values for the significance of the coefficients from a statistical point of view.  We want the relevant statistic, the F statistic to be less than 0.05 percent meaning that the results could not have happened by chance more than 5% of the time or that we are 95% confident of the result.

The table below illustrates a typical regression output from an Excel regression analysis. 

 


Please contact Al-Azad Iqbal or Steve Gordon for Questions and Comments - Updated 10/2/07