Introduction
After conducting linear regression in a program such as Stata, you will want to conduct some diagnostics and check some assumptions. This is part of more advanced ordinary least squares (OLS) regression procedures commonly conducted and reported on in quantitatively oriented academic papers, research essays, and theses. In this blog entry, we’ll offer you a real-world example of heteroskedasticity and explain how to address this problem.
What is Heteroskedasticity?
WallStreetMojo has an excellent definition of heteroskedasticity that you should check out.
Data Example
For non-specialists, the best way to understand statistical concepts is often by diving into an example, so let’s examine one. How about a dataset that tracks children’s spending on candy as a function of their monthly pocket money? We can graph the relationship between these two variables (treating candy spending as a dependent variable and pocket money as an independent variable), and, as expected, as pocket money goes up, so does spending on candy.
Running the Regression
When we run the ordinary least squares (OLS) regression, it happens to be significant, F(1, 38) = 19.26, p < .0001. For every added $100 in pocket money, a child spends another $3.73 on candy. Given the statistical significance, you might be included to stop here, but you still need to test for heteroskedasticity.
regress candy pocket
Heteroskedasticity Test and RVF Plot
In Stata, we can test for heteroskedasticity by typing hettest after a regression. Here are the results for the model above:
If the p value is < .05, then heteroskedasticity is a problem. Here, p = .0079, so heteroskedasticity is definitely a problem. Reporting the results of a test of heteroskedasticity is likely to be sufficient for diagnostic purposes, but, depending on the nature of your academic papers, research essays, and theses, you can also generate what is known as a residuals-vs-fitted (RVF) plot to illustrate heteroskedasticity. When heteroskedasticity exists, it can be seen in RVF plots and similar graphics, as WallStreetMojo explains.
In Stata, you can use the following code to generate an rvfplot, adding a y line at 0 to better illustrate heteroskedasticity:
rvfplot, yline(0)
Here’s what that RVF plot looks like:
Fixing Heteroskedasticity in Stata
In Stata, you can address heteroskedasticity in regression by using the hetregress command:
hetregress candy pocket
There’s barely any change in the coefficient for pocket money, which you can now report. The important thing is that you’ve tested for heteroskedasticity in your linear regression and conducted the appropriate form of regression (heteroskedastic linear regression) in its place.
BridgeText can help you with all of your statistical analysis needs.