Why Effect Size Matters in T-Tests

Feb 24

Even otherwise conscientious and skilled graduate students can forget the need to provide effect sizes as a complement to statistical significance. Of course, everyone knows that you need a p value to accompany your statistical results, outside of descriptive statistics, but what about effect size? In this blog, we’ll walk you through what effect sizes are through some practical examples using R and scenarios that arise in many graduate theses and PhDs (and even in more advanced undergraduate work).

Let’s say you’re interested in whether ADHD-diagnosed students who listened to synthwave music before a test (synth) performed better than those who did not listen to music (silence). You run an independent samples t-test in which 60 people have randomly been assigned to synth and silence. You measure performance as the score 0 to 100, on a test. If you like, try running the code yourself in R Studio:

library(ggplot2)

# Set seed for reproducibility

set.seed(123)

# Generate mock data

n_group <- 30

performance_synth <- rnorm(n_group, mean = 80, sd = 10)

performance_silence <- rnorm(n_group, mean = 70, sd = 10)

data <- data.frame(

group = rep(c("synth", "silence"), each = n_group),

performance = c(performance_synth, performance_silence)

)

# Perform an independent samples t-test

t_test_result <- t.test(performance ~ group, data = data, var.equal = TRUE)

# Print the t-test results

print(t_test_result)

# Calculate Cohen's d for effect size

mean_synth <- mean(data$performance[data$group == "synth"])

mean_silence <- mean(data$performance[data$group == "silence"])

sd_pooled <- sqrt(((n_group - 1) * var(data$performance[data$group == "synth"]) +

(n_group - 1) * var(data$performance[data$group == "silence"])) /

(n_group + n_group - 2))

cohens_d <- (mean_synth - mean_silence) / sd_pooled

# Print Cohen's d

print(cohens_d)

# Optional: Create a scatter plot with a line of best fit

ggplot(data, aes(x = group, y = performance, color = group)) +

geom_jitter(width = 0.2, alpha = 0.6) + # Add jitter to points for better visualization

geom_boxplot(alpha = 0.5) + # Overlay boxplot for distribution overview

scale_color_manual(values = c("blue", "red")) + # Manual color assignment

labs(x = "Group", y = "Performance", title = "Test Performance by Group") +

theme_minimal() # Use a minimal theme for a clean look

You learn that there is a statistically significant difference in that the synthwave group outperformed the silent group:

Great, your model is significant, but what does it mean? What if someone asks you about the practical effect of your findings? After all, it’s possible for findings to be statistically significant but too weak to matter much in practice.

That’s where Cohen’s d comes in. Cohen’s d is a statistical metric used to quantify the size of an effect observed in your research. It measures the difference between two means in standard deviation units, providing a scale-independent assessment of effect size. This measure is particularly useful when comparing the effectiveness of interventions, the impact of variables, or the strength of relationships in various fields of study, including psychology, education, and social sciences. As graduate students embarking on thesis writing, understanding and applying Cohen’s d can significantly enhance the depth of your analysis and the clarity of your findings.

The value of Cohen’s d tells us how many standard deviations the means of two groups are apart. Unlike p values, which indicate whether an effect exists, Cohen’s d provides insight into the magnitude of the effect, offering a more nuanced understanding of your results.

Cohen suggested the following benchmarks for interpreting the size of d:

Small effect: d=0.2. A small effect is one that is observable, but may not be significant in practical terms. It's the kind of effect that requires a large sample size to detect statistically.
Medium effect: d=0.5. This is considered a noticeable effect size, likely to be visible to the naked eye and of practical significance in many contexts.
Large effect: d=0.8. A large effect is one that is very noticeable and likely to have practical implications in most situations. It is the kind of effect that can often be detected with a moderate sample size.
Very large effect: While Cohen did not explicitly define a cutoff for very large effects, values of d that are 1.2 or greater are often considered very large, indicating a difference of more than one standard deviation between groups. These effects are highly significant and usually have important practical implications.

In the synthwave example, d was over 0.85 and can therefore be considered a large effect:

In your thesis, when discussing experimental results or comparing groups, citing Cohen’s d alongside p values can provide a comprehensive view of your findings. It not only demonstrates the statistical significance of your results but also their practical significance, offering a robust argument for the relevance and impact of your research. Additionally, reporting effect sizes is increasingly seen as a best practice in research, enhancing the transparency, replicability, and interpretability of your findings.

BridgeText can help you with all of your statistics needs.

Have any questions?

Our support team is ready to answer your questions.

Help Center FAQ