Introduction
A paired t test in Python compares vectors or values that represent matched subjects who are measured on a continuous variable. In this blog entry, we’ll show you how to run a paired t test in Python and combine with it appropriate graphics.
Enter Data
Let’s use the concept of 31 people who went out on dates in 2021, then again in 2022, after getting the help of a dating coach. Try entering these data into Python using:
import scipy.stats as stats
pre_coach = [12, 15, 0, 23, 18, 10, 10, 9, 8, 10,
25, 21, 13, 18, 3, 21, 22, 12, 12, 11,
3, 15, 28, 41, 0, 0, 7, 10, 12, 12, 12]
post_coach = [15, 18, 2, 23, 19, 11, 11, 7, 10, 12,
35, 31, 11, 16, 7, 23, 22, 12, 12, 11,
7, 12, 30, 50, 4, 4, 8, 9, 11, 18,
22]
Run the T Test
Now try the following code:
stats.ttest_rel(pre, post)
Here’s what you get:
By default, Python subtracts the second vector (post_coach, meaning the number of dates before working with the dating coach) from the first vector (pre_coach, meaning the number of dates before working with the dating coach). Therefore, the fact that the t statistic is negative (t =-3.496) means that people had more dates after working with the dating coach. How many more dates? Try the following code to learn:
import numpy as np
print(np.mean(post-coach))
print(np.mean(pre-coach))
The mean difference is therefore roughly 2.26 dates.
Note that the statistical significance of this difference is discussed below.
Interpret the Results
In a paired samples t test, there is one p value for a two-tailed hypothesis and another for a one-tailed hypothesis. The results above (including p = .001492) are for a two-tailed test in which your hypotheses are as follows:
H0: The mean dates before working with the dating coach are equal to the mean dates after working with the dating coach.
HA: The mean dates before working with the dating coach are not equal to the mean dates after working with the dating coach.
If you were hypothesizing that dates are greater after the intervention of the dating coach—which the descriptive statistics suggest—you would divide the p value that Python provided by 2, resulting in p = .0007459. If your alternative hypothesis appears to be supported by the means, then, in a one-tailed approach, your p must always be half of what it is for a two-tailed approach.
If you were hypothesizing that dates lessen after the intervention of the dating coach, which is not borne out by the means, your p is 1 - .0007459, or .9993.
Graphics
Try the following code to visualize the results of the paired samples t test:
import matplotlib.pyplot as plt
data = [pre_coach, post_coach]
fig = plt.figure(figsize =(10, 7))
ax = fig.add_axes([0, 0, 1, 1])
bp = ax.boxplot(data)
plt.show()
BridgeText can help you with all of your statistical analysis needs.