Introduction
The purpose of a Chi squared test is to determine whether two categorical variables are independent of each other. In this blog, we’ll show you how to use R to conduct a Chi squared test.
Install Libraries
You can begin by installing this R library if you haven’t already:
install.packages("ggplot")
Access Dataset
Let’s call up this dataset on diamonds:
library(ggplot2)
View(diamonds)
In RStudio, the dataset will now appear at the top left of your screen:
Generate a Table and State Hypotheses
A good beginning for Chi squared analysis is to generate a table of the values you’re comparing. Let’s say that we want to determine whether diamond color and diamond cut are independent of each other. State your hypotheses:
H0: Diamond color and diamond cut are independent of each other.
HA: Diamond color and diamond cut are not independent of each other.
Next, let’s generate a table using the following code:
table(diamonds$cut, diamonds$color)
Here’s what you get:
Thus, for example, 163 diamonds are color D and have a fair cut, 224 diamonds are color E and have a fair cut, etc. From here, you can get the Chi squared results as follows (note that we will capture the results using a variable name, res):
res <- chisq.test(table(diamonds$cut, diamonds$color))
print(res)
Here are the results of the chi squared test:
Therefore, we reject the null hypothesis, as p < .05. The variables of diamond color and diamond cut are not independent of each other. What does this mean? In order to get closer to a real-world interpretation of Chi squared findings, you can generate a table of what the results would have looked like had the variables truly been independent of each other. Try the following code:
res$expected
Here is the table of expected values:
You can now compare the observed and expected values to make specific inferences based on colors and cuts.
BridgeText can help you with all of your statistical analysis needs.