Introduction
Testing continuous variables for their normality of distribution is a common feature of many statistical procedures. In this blog entry, you’ll learn how to use the Shapiro-Wilk test of normality in Stata.
Generate Data
Let’s generate two sets of data, one normal and one not normal, to demonstrate the Shapiro-Wilk test in Stata.
set obs 100
drawnorm x, mean(100) sd(15)
gen y = runiform(55,145)
Run the Shapiro-Wilk Test
You can run the Shapiro-Wilk test on more than one variable at a time. Try:
swilk x y
Here’s what you get:
When the p value for a Shapiro-Wilk test is > .05, you can conclude that a continuous variable is normally distributed. In this case, x is normal (p = .67109), and y is not normal (p = .00077).
Confirm Visually with Histograms
We can also run some histograms to confirm the normality of the distribution of these two variables. The histograms need to be run one at a time. Try:
hist x, bin(20) freq scheme(s1color)
And you get:
Next, try:
hist y, bin(20) freq scheme(s1color)
And you get:
The histograms confirm what the Shapiro-Wilk test told us, which is that x is distributed normally. Look at the classic bell curve shape of the histogram of x.
Variations
You can run the Shapiro-Wilk test on a subset of data. Let’s say you only wanted to run it for the first 50 values of x. Try:
swilk x in 1/50
And you get:
Let’s say you had a categorical variable, gender, for which you had x values, and you wanted to test the normality of x by category. Let’s create that category first, then run the Shapiro-Wilk test separately for the two values:
gen gender = 1
replace gender = 2 in 51/100
label define gender 1 "Male" 2 "Female"
label value gender gender
by gender, sort: swilk x
And you get:
BridgeText can help you with all of your statistical analysis needs.