Conducting an ANOVA and Tukey's Test in Python

Jul 26

To run an analysis of variance (ANOVA) and Tukey's test in Python, we'll need to make use of the scipy and statsmodels libraries. Please note, you'd have to install these libraries if they aren't already installed. Here is a coded example:

import numpy as np
import pandas as pd
import scipy.stats as stats
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.multicomp import pairwise_tukeyhsd
# Let's say we have a dataset 'df' with a categorical column 'Group' and a numerical column 'Values'
df = pd.DataFrame({
'Group': np.repeat(['A', 'B', 'C'], 20),
'Values': np.concatenate([np.random.normal(5, 1, 20), np.random.normal(6, 1, 20), np.random.normal(7, 1, 20)])
})
# Conduct one-way ANOVA
model = ols('Values ~ C(Group)', data=df).fit()

anova_table = sm.stats.anova_lm(model, typ=2) # Type 2 ANOVA DataFrame
print(anova_table)
# Conduct Tukey's test
tukey = pairwise_tukeyhsd(endog=df['Values'], groups=df['Group'], alpha=0.05)
print(tukey)

Here’s what you get:

Highlighted above are the two p values you need to consult. PR(>F) is where Python records the statistical significance of the ANOVA, which, if below .05, is significant. Here, p is much smaller than .05, so the ANOVA is significant. However, this does not tell you which group is significantly different from which other group. Here, you need to consult p-adj. Wherever this number is < .05, the pairwise comparison is significant. Therefore,

The mean of A is larger than the mean of B, because p = .0001, and the mean difference is positive.
The mean of A is larger than the mean of C, because p < .0001, and the mean difference is positive.
The mean of B is larger than the mean of C, because p = .0419, and the mean difference is positive.

In order to run an ANOVA, you need to ensure that data are normally distributed. Let’s look at the normality for A, B, and C using the Shapiro-Wilk test:

# Split the groups
group_a = df['Values'][df['Group'] == 'A']
group_b = df['Values'][df['Group'] == 'B']
group_c = df['Values'][df['Group'] == 'C']
# Perform Shapiro-Wilk test
w_a, p_value_a = stats.shapiro(group_a)
w_b, p_value_b = stats.shapiro(group_b)
w_c, p_value_c = stats.shapiro(group_c)
# Print results
print(f"Group A: W={w_a}, p-value={p_value_a}")
print(f"Group B: W={w_b}, p-value={p_value_b}")
print(f"Group C: W={w_c}, p-value={p_value_c}")

Here’s what you get:

In the output, a W close to 1 and a large p-value (e.g., > 0.05) for each group would suggest that the null hypothesis (the data came from a normal distribution) should not be rejected. If, for example, all of your groups had non-normal distributions, then, in Python, you could perform non-parametric one-way ANOVA using the Kruskal-Wallis test from the scipy library. However, there is no direct equivalent to Tukey's test for non-parametric data. Instead, we often use post-hoc pairwise comparisons with a correction for multiple tests, such as the Dunn's test. Here's an example of code:

import numpy as np
import pandas as pd
import scipy.stats as stats
from statsmodels.stats.multicomp import MultiComparison
# Let's say we have a dataset 'df' with a categorical column 'Group' and a numerical column 'Values'
df = pd.DataFrame({
  'Group': np.repeat(['A', 'B', 'C'], 20),
  'Values': np.concatenate([np.random.normal(5, 1, 20), np.random.normal(6, 1, 20), np.random.normal(7, 1, 20)])
})
# Conduct Kruskal-Wallis test
h, p_value = stats.kruskal(df['Values'][df['Group'] == 'A'],
df['Values'][df['Group'] == 'B'],
df['Values'][df['Group'] == 'C'])
print(f"Kruskal-Wallis test: H={h}, p-value={p_value}")
# If p-value is significant, perform post-hoc pairwise comparisons with Dunn's test
if p_value < 0.05:
  # MultiComparison class requires statsmodels 0.10.0 or higher
  mc = MultiComparison(df['Values'], df['Group'])
  result = mc.allpairtest(stats.ranksums, method='Holm')
  print(result[0]) # result[0] gives the summary of tests