Generating z Scores in Python

Jan 6

Introduction

For a normally distributed variable, a z score assigns a number to each data point based on its distance, in standard deviations, from the mean. For example, if the mean of normally distributed variable iq is 100, with sd 15, then an IQ of 100 has a z score of 0, an IQ of 85 has a z score of -1, and an IQ of 115 has a z score of 1. In this blog, we’ll show you how to use Python code to create z scores for each value in a normal distribution.

Generate Normally Distributed Data

First, let’s generate a normally distributed variable, iq, with 5,000 observations, a mean of 100, and a standard deviation of 15.

import numpy as np
import scipy.stats as stats
from scipy.stats import shapiro
from numpy.random import normal
iq2 = normal(loc=100, scale=15, size=5000)
iq = np.array([iq2]).round()

You can confirm that Python has generated an IQ variable (rounded to an integer value) with a mean very close to 100 and a standard deviation very close to 15:

print(np.mean(iq))
print(np.std(iq))

You should also run the Shapiro-Wilk test on IQ and observe that, because p > .05, IQ is normally distributed in this sample.

shapiro_test = stats.shapiro(iq)
print(shapiro_test)

Create z Scores

Now try the following code, which will take every value of IQ, subtract the mean from it, and divide it by the standard deviation, leading to the generation of z scores.

z = (iq-np.mean(iq))/np.std(iq)

You can now visualize IQ and z scores side-by-side in the integrated development environment (IDE) of your choice for Python. You can also use the following code to generate a predicted z score for any IQ score, inside or outside your sample. For example, you can return the z score for an IQ of 125 as follows:

pred_z125 = (125-np.mean(iq))/np.std(iq)
print(pred_z125)

BridgeText provides statistical testing, analysis, coding, and interpretation services.