Chi Squared Analysis in Stata

Jan 8

The purpose of a Chi squared test is to determine whether two categorical variables are independent of each other. In this blog, we’ll show you how to use Stata to conduct and interpret a Chi squared test. You’ll also learn how to create graphics to support a Chi squared analysis.

Load Dataset

Let’s call up this built-in dataset on demographics:

sysuse nlsw88
describe

State Hypotheses

A good beginning for Chi squared analysis is to generate a table of the values you’re comparing. Let’s say that we want to determine whether race and marital status are independent of each other. State your hypotheses:

H0: Race and marital status are independent of each other.
HA: Race and marital status are not independent of each other.

Run and Interpret a Chi Squared Analysis

Now try the following code:

tab married race, chi2 expected

This table shows you counts. The top row consists of the observed counts, and, just below the top row, you have the expected counts. Let’s explain what that means. For example, the table shows you that 487 single people are white, and 309 single people are black. If race and marital status were independent of each other, then, we would have expected 586 single people to be white and approximately 209 people single people to be black.

To understand this disparity between observed and expected counts better, notice that the Chi squared result is statistically significant, p < .001. Therefore, race and marital status are not independent of each other.

This finding can now be interpreted by comparing observed and expected values in your table. Notice that (a) more black people were actually single (309) than expected to be single (209); and (b) fewer black people were actually married (274) than expected to be married (374). Also notice that (a) fewer white people were actually single (487) than expected to be single (586); and (b) more white people were actually married (1,150) than expected to be married (1,1051). Let’s ignore the other race category, as most people in the dataset were either black or white.

The conclusion is that white people were likelier than black people to be married—which also implies, logically, that black people were likelier than white people to be single.

Visualize the Results

Notice that, in this dataset, being married is coded as 1, whereas being single is coded as 0. Confirm with the following code:

codebook married

Using a 95% confidence interval plot, therefore, you can use the following code to illustrate the difference between black and white survey respondents in particular:

ciplot married, by(race)

Here’s what you get:

Notice, graphically, that the mean and 95% CI of marriage for white people is substantially higher than the corresponding mean and 95% CI for black people. You can generate these means and 95% CIs using the following code as well:

by race, sort: ci means married

Here’s what you get:

Filter Data

Having noticed that very few people in this analysis are of a race other than black or white, you might be interested in running the Chi squared analysis and generating the 95% CI plots on a subset of the dataset that excludes people of other race.

First, try the following code:

codebook race

Therefore, race = 3 is the marker for people of any race other than black or white. You could try the following code to run your Chi squared analysis in a way that excludes others from consideration:

tab married race if race < 3, chi2 expected

Because you have specified if race < 3, the results only include people of race = 1 (white) or race = 2 (black). Here are the adjusted results:

As you can confirm, the model changed very slightly, and your interpretation and conclusions will not change. However, using Stata’s conditional if command, you were able to focus on the two main races in the dataset.

BridgeText can help you with all of your statistical analysis needs.

Have any questions?

Our support team is ready to answer your questions.

Help Center FAQ