Introduction
Ordinary least squares (OLS) regression, also known as linear regression, is a common statistical procedure. Many academic papers, research essays, and theses include OLS regression. In this blog entry, we’ll show you how to use Stata to generate basic OLS findings. In another blog entry, we’ve included more advanced concepts related to OLS regression in Stata, including residuals testing, leverage, multicollinearity, and other diagnostics. Our purpose here is to show you how to generate, interpret, and illustrate basic OLS findings in Stata.
What You’ll Need
Any flavor of Stata.
Entering Data
You can enter data into Stata using the prompt command window, or you can go into the data editor view to enter data as you would in Microsoft Excel. The easiest approaches are to enter data into the editor view or import data from Excel.
Here, let’s show you how to enter data using the command prompt.
Let’s say you have data on the heights (in inches) and weights (in pounds) of 15 people. The heights, in sequential order of your 15 subjects, are as follows: 67, 72, 75, 80, 60, 65, 68, 69, 69, 70, 70, 80, 76, 60, 60. The weights, in the same sequential order of the 15 subjects, are as follows: 150, 240, 270, 300, 160, 180, 170, 175, 175, 190, 190, 260, 240, 140, 130. Try typing these values into the data editor. Alternatively, paste this code into the command prompt to load the data:
set obs 15
gen height = .
gen weight = .
replace height = 67 in 1
replace height = 72 in 2
replace height = 75 in 3
replace height = 80 in 4
replace height = 60 in 5
replace height = 65 in 6
replace height = 68 in 7
replace height = 69 in 8
replace height = 69 in 9
replace height = 70 in 10
replace height = 70 in 11
replace height = 80 in 12
replace height = 76 in 13
replace height = 60 in 14
replace height = 60 in 15
replace weight = 150 in 1
replace weight = 240 in 2
replace weight = 270 in 3
replace weight = 300 in 4
replace weight = 160 in 5
replace weight = 180 in 6
replace weight = 170 in 7
replace weight = 175 in 8
replace weight = 175 in 9
replace weight = 190 in 10
replace weight = 190 in 11
replace weight = 260 in 12
replace weight = 240 in 13
replace weight = 140 in 14
replace weight = 130 in 15
Running the Regression
Having entered these variables into Stata, you can use the following code to generate your OLS regression model:
regress weight height
Your OLS model is significant. In APA format, you could write that there is a significant linear relationship between weight and height, F(1, 13) = 67.41, p < .0001. Looking at the coefficient table, you would write your regression equation as follows:
Weight = 7.19(Height) – 301.08
Thus, every inch of added height corresponds with 7.19 added pounds of bodyweight. You could use the equation above to predict weight given height. For example, a person 71 inches tall would be predicted to have the following weight:
Weight = 7.19(71) – 301.08, or 209.41 pounds
Don’t forget, you can square your r value to get the coefficient of determination, which happens to be 0.8383. In other words, in your dataset, (0.9156)^2 or approximately 83.83% of the variation in weight is explained by variation in height.
Scatterplots for Regression
You should take advantage of Stata’s customized graphing features to generate regression scatterplots that contain (a) the OLS line of best fit and (b) the 95% confidence interval (CI). The line of best fit is the prediction line that demonstrates the linear trend relating your data, and the 95% CI illustrates the precision of the model’s fit. The code for creating this graph in Stata, using the data and variable names above, is as follows:
graph twoway (lfitci weight height) (scatter weight height)
You can adjust many of the properties of the scatterplot by opening up the graph editor function within Stata.
Conclusion
OLS regression is a common statistical procedure in many academic papers, research essays, and theses. In this blog, we demonstrated how to run a simple OLS regression in Stata. In another blog entry, we’ve demonstrated some of Stata’s more advanced regression features.
BridgeText can help you with all of your statistical analysis needs.