Introduction
Poisson regression is an underutilized but powerful regression model. In this blog entry, we’ll show you to run and interpret a Poisson regression model in R.
Access Data and Libraries
Install ggplot if you have not already done so, and load ggplot2:
install.packages("ggplot")
library(ggplot2)
Now access the dataset warpbreaks:
warpbreaks
head(warpbreaks)
Why Poisson Regression?
Poisson regression is typically used when your outcome / dependent variable is a count variable. In warpbreaks, breaks is a count variable: It represents the number of times a break occurs given a specific wool and tension level. That makes Poisson regression well-suited to the warpbreaks data.
Run the Poisson Regression
Let’s try to quantify the number of breaks as a function of wool, tension, and wool-tension. Try the following code:
poisson.output <-glm(formula = breaks ~ wool+tension, data = warpbreaks, family = poisson)
print(summary(poisson.output))
Here’s what you get:
Note that the model returned three statistically significant results. Wool B is associated with e^-0.20 (about 0.81) fewer breaks than Wool A, tension M is associated with e^-0.32 (about 0.73) fewer breaks than tension L, and tension H is associated with e^-0.52 (about 0.60) fewer breaks than tension M.
You can also get the exponentiated results in a table instead of calculating them as we did. First, note that you will need to install broom through the command:
install.packages("broom")
Next:
library(broom)
tidy(poisson.output, conf.int = TRUE, exponentiate = TRUE)
Visualize Data
Try the following R code:
library(ggplot2)
boxplot.wool<-ggplot(warpbreaks, aes(x=wool, y=breaks)) + geom_boxplot(col="firebrick")
boxplot.wool
Here, you get a sense of there being fewer breaks associated with B. Next, try the following R code:
boxplot.tension<-ggplot(warpbreaks, aes(x=tension, y=breaks)) + geom_boxplot(col="firebrick")
boxplot.tension
And here, you get a sense of there being fewer breaks associated with tensions M and H.
BridgeText can help you with all of your statistical analysis needs.