How
Contents
How#
Calculate measures of spread and tendency#
Calculate a mean#
We can calculate the mean of a set of data using statistics.mean
which takes an
iterable.
Tip
statistics.mean(data)
For example to calculate the mean of \((1, 5, 10, 12, 13, 20)\):
import statistics as st
data = (1, 5, 10, 12, 13, 20)
st.mean(data)
10.166666666666666
Calculate a median#
We can calculate the median of a set of data using statistics.median
which takes an
iterable.
Tip
statistics.median(data)
For example to calculate the median of \((1, 5, 10, 12, 13, 20)\):
import statistics as st
data = (1, 5, 10, 12, 13, 20)
st.median(data)
11.0
Calculate the population standard deviation#
We can calculate the population standard deviation of a set of data using statistics.pstdev
which takes an
iterable.
Tip
statistics.pstdev(data)
For example to calculate the population standard deviation of \((1, 5, 10, 12, 13, 20)\):
import statistics as st
data = (1, 5, 10, 12, 13, 20)
st.pstdev(data)
6.039223643997813
Calculate the sample standard deviation#
We can calculate the sample standard deviation of a set of data using statistics.stdev
which takes an
iterable.
Tip
statistics.stdev(data)
For example to calculate the sample standard deviation of \((1, 5, 10, 12, 13, 20)\):
import statistics as st
data = (1, 5, 10, 12, 13, 20)
st.stdev(data)
6.6156380392723015
Calculate the population variance#
We can calculate the population variance of a set of data using statistics.pvariance
which takes an
iterable.
Tip
statistics.pvariance(data)
For example to calculate the population variance of \((1, 5, 10, 12, 13, 20)\):
import statistics as st
data = (1, 5, 10, 12, 13, 20)
st.pvariance(data)
36.47222222222222
Calculate the sample variance#
We can calculate the sample variance of a set of data using statistics.variance
which takes an
iterable.
Tip
statistics.variance(data)
For example to calculate the sample variance of \((1, 5, 10, 12, 13, 20)\):
import statistics as st
data = (1, 5, 10, 12, 13, 20)
st.variance(data)
43.766666666666666
Calculate the maximum#
We can calculate the maximum of a set of data use max
which takes an iterable:
Tip
max(data)
For example to calculate the maximum of \((1, 5, 10, 12, 13, 20)\):
data = (1, 5, 10, 12, 13, 20)
max(data)
20
Calculate the minimum#
We can calculate the minimum of a set of data use max
which takes an iterable:
Tip
min(data)
For example to calculate the minimum of \((1, 5, 10, 12, 13, 20)\):
data = (1, 5, 10, 12, 13, 20)
min(data)
1
Calculate quantiles#
To calculate cut points dividing data in to \(n\) intervals of equal probability
we can use statistics.quantiles
which takes an iterable and a number of
intervals.
Tip
statistics.quantiles(data, n)
For example to calculate the cut points that divide \((1, 5, 10, 12, 13, 20)\) in to 4 intervals of equal probability (in this case the quantiles are called quartiles):
import statistics as st
data = (1, 5, 10, 12, 13, 20)
st.quantiles(data, n=4)
[4.0, 11.0, 14.75]
Calculate the sample covariance#
To calculate the sample covariance of two data sets we can use
statistics.covariance
which takes two iterables.
Tip
statistics.covariance(first_data_set, second_data_set)
For example to calculate the sample covariance of \(x=(1, 5, 10, 12, 13, 20)\) and \(y=(3, -3, 6, -2, 1, 2)\):
import statistics as st
x = (1, 5, 10, 12, 13, 20)
y = (3, -3, 6, -2, 1, 2)
st.covariance(x, y)
1.1666666666666674
Calculate the Pearson correlation coefficient#
To calculate the correlation coefficient of two data sets we can use
statistics.correlation
which takes two iterables.
Tip
statistics.correlation(first_data_set, second_data_set)
For example to calculate the correlation coefficient of \(x=(1, 5, 10, 12, 13, 20)\) and \(y=(3, -3, 6, -2, 1, 2)\):
import statistics as st
x = (1, 5, 10, 12, 13, 20)
y = (3, -3, 6, -2, 1, 2)
st.correlation(x, y)
0.05325222181462787
Fit a line of best fit#
To carry out linear regression to fit a line of best fit between two data sets
we can use statistics.linear_regression
which takes two iterables and returns
a tuple with the slope and the intercept of the line.
Tip
statistics.linear_regression(first_data_set, second_data_set)
For example to calculate the correlation coefficient of \(x=(1, 5, 10, 12, 13, 20)\) and \(y=(-3, -14, -31, -6, -40, -70)\):
import statistics as st
x = (1, 5, 10, 12, 13, 20)
y = (-3, -14, -31, -6, -40, -70)
st.linear_regression(x, y)
LinearRegression(slope=-3.2338156892612333, intercept=5.543792840822537)
How to create an instance of the normal distribution#
A normal distribution with mean \(\mu\) and standard deviation \(\sigma\) can be
created using statistics.NormalDist
:
Tip
statistics.NormalDist(mu, sigma)
For example to create the normal distribution with \(\mu=3\) and \(\sigma=.5\):
import statistics as st
distribution = st.NormalDist(mu=3, sigma=.5)
distribution
NormalDist(mu=3.0, sigma=0.5)
How to use the cumulative distribution function of a normal distribution#
For an instance of a normal distribution with mean \(\mu\) and \(\sigma\), the
cumulative distribution function which gives \(F(x)=P(X<x)\) (the probability that
the normally distributed random variable is less than \(X\)) can be accessed using
statistics.NormaDist.cdf
.
Tip
distribution = statistics.NormalDist(mu, sigma)
distribution.cdf(x)
For example to find the probability that \(X<2\) for a normally distributed random variable with \(\mu=3\) and \(\sigma=.5\):
import statistics as st
distribution = st.NormalDist(mu=3, sigma=.5)
distribution.cdf(2)
0.02275013194817921
How to use the inverse cumulative distribution function of a normal distribution#
For an instance of a normal distribution with mean \(\mu\) and \(\sigma\), the
inverse cumulative distribution function which for a given \(p\) gives \(x\) such that \(p=P(X<x)\)
can be accessed using statistics.NormaDist.inv_cdf
.
Tip
distribution = statistics.NormalDist(mu, sigma)
distribution.inv_cdf(p)
For example to find the value of \(X\) for which a normally distributed random variable with \(\mu=3\) and \(\sigma=.5\) will be less than with probability \(.7\).
import statistics as st
distribution = st.NormalDist(mu=3, sigma=.5)
distribution.inv_cdf(.7)
3.2622002563540202