Introduction
Logistic regression is conducted with an outcome or dependent variable that can have only two values, typically 0 or 1. This type of statistical procedure is often utilized in academic essays, research papers, and theses that are quantitative and focused on the analyses of binary outcomes (for example, wins or losses, heads or tails, etc.). In this blog, you’ll learn how to carry out a logistic regression with odds ratio reporting in R.
Example Scenario and R Code
Please install broom if you have not already done so:
install.packages("broom")
Now let’s imagine a scenario in which you want to model the relationship between an urban population rate and murder rate, working with U.S. states. Begin by looking at the built-in USArrests dataset:
USArrests
As you can see, there is not yet an outcome variable that is coded 0 / 1, so let’s create one. Let’s say that we consider any murder rate of more than 9 to be high, and any murder rate of 9 or below to low or moderate (or, in precise terminology, non-high).
Try the following code:
murder <- USArrests
murder$murder.dummy <- ifelse(murder$Murder > 9, 1, 0)
print(murder)
Now, every state with a murder rate of more than 9 per 100,000 is also coded as 1 (a high-murder state), and every state with a murder rate of 9 or below is coded as a non-high-murder state:
Logistic Regression Model
Now try the following code:
logfit_murder = glm(murder.dummy ~ UrbanPop, data = murder, family = binomial)
library(broom)
tidy(logfit_murder, conf.int = TRUE, exponentiate = TRUE)
The model is not statistically significant, as you can determine by looking at the p value column for both the intercept and UrbanPop. In order to interpret the odds ratio (OR) estimate associated with UrbanPop, begin by recalling that a high-murder state is 1, whereas a non-high-murder state is 0. Given this coding approach, if the OR were exactly 1, the urban population percentage of a state would not alter the odds of that state being a high-murder state. If the OR had been below 1, and statistically significant, urban population would reduce the odds of a state being high-murder. If the OR had been above 1, and statistically significant, urban population would increase the odds of a state being high-murder.
In the model above, the OR for UrbanPop is not statistically significant, p = .784. Note that 1 falls within the 95% confidence interval of the OR (0.954, 1.040). Therefore, there is no effect of a state’s urban population on a state being high-murder.
What if the OR had been significant and you wanted to interpret it? Assume that the OR had had a p value of .047 and had itself been 1.011.
To interpret this OR,
1.011 – 1.00 = 0.011
0.011 * 100 = 1.1%
Therefore, every added 1% of urban population in a state would also have increased the odds of that state being a high-murder state by 1.1%.
BridgeText can help you with all of your statistical analysis needs.