Solutions
Contents
Solutions#
Question 1#
1
. For each of the following sets of data:
Calculate:
- The mean, - The median, - The max, - The min, - The population standard deviation, - The sample standard deviation, - The population variance, - The sample variance, - The quartiles (the set of $n=4$ quantiles), - The deciles (the set of $n=10$ quantiles),
1
. `data_set_1 = (…)
import statistics as st
data_set_1 = (
74,
-7,
58,
82,
60,
3,
49,
85,
24,
99,
73,
76,
11,
-4,
61,
87,
93,
13,
1,
28,
)
The mean,
st.mean(data_set_1)
48.3
The median,
st.median(data_set_1)
59.0
The max,
max(data_set_1)
99
The min,
min(data_set_1)
-7
The population standard deviation,
st.pstdev(data_set_1)
35.1441318003447
The sample standard deviation,
st.stdev(data_set_1)
36.05711842998112
The population variance,
st.pvariance(data_set_1)
1235.11
The sample variance,
st.variance(data_set_1)
1300.1157894736841
The quartiles (the set of \(n=4\) quantiles),
st.quantiles(data_set_1, n=4)
[11.5, 59.0, 80.5]
The deciles (the set of \(n=10\) quantiles),
st.quantiles(data_set_1, n=10)
[-3.5, 4.6, 16.3, 36.4, 59.0, 68.2, 75.4, 84.4, 92.4]
2
. `data_set_2 = (…)
import statistics as st
data_set_2 = (
65,
59,
81,
81,
76,
93,
91,
88,
55,
97,
86,
94,
79,
54,
63,
56,
58,
77,
85,
88,
)
The mean,
st.mean(data_set_2)
76.3
The median,
st.median(data_set_2)
80.0
The max,
max(data_set_2)
97
The min,
min(data_set_2)
54
The population standard deviation,
st.pstdev(data_set_2)
14.202464574854606
The sample standard deviation,
st.stdev(data_set_2)
14.571421200057106
The population variance,
st.pvariance(data_set_2)
201.71
The sample variance,
st.variance(data_set_2)
212.32631578947368
The quartiles (the set of \(n=4\) quantiles),
st.quantiles(data_set_2, n=4)
[60.0, 80.0, 88.0]
The deciles (the set of \(n=10\) quantiles),
st.quantiles(data_set_2, n=10)
[55.1, 58.2, 63.6, 76.4, 80.0, 83.4, 87.4, 90.4, 93.9]
3
. `data_set_3 = (…)
import statistics as st
data_set_3 = (
0.31,
-0.13,
0.19,
0.46,
-0.27,
-0.06,
0.20,
0.42,
-0.07,
0.11,
-0.11,
-0.43,
-0.36,
0.45,
-0.42,
0.11,
0.08,
0.31,
0.48,
0.17,
)
The mean,
st.mean(data_set_3)
0.07200000000000001
The median,
st.median(data_set_3)
0.11
The max,
max(data_set_3)
0.48
The min,
min(data_set_3)
-0.43
The population standard deviation,
st.pstdev(data_set_3)
0.28690765064738166
The sample standard deviation,
st.stdev(data_set_3)
0.2943610386118237
The population variance,
st.pvariance(data_set_3)
0.082316
The sample variance,
st.variance(data_set_3)
0.08664842105263158
The quartiles (the set of \(n=4\) quantiles),
st.quantiles(data_set_3, n=4)
[-0.125, 0.11, 0.31]
The deciles (the set of \(n=10\) quantiles),
st.quantiles(data_set_3, n=10)
[-0.414,
-0.242,
-0.098,
-0.003999999999999998,
0.11000000000000001,
0.18200000000000002,
0.277,
0.398,
0.4590000000000001]
4
. `data_set_4 = (…)
import statistics as st
data_set_4 = (
2,
4,
2,
2,
2,
2,
2,
3,
2,
2,
2,
4,
2,
4,
2,
2,
3,
4,
3,
4,
)
The mean,
st.mean(data_set_4)
2.65
The median,
st.median(data_set_4)
2.0
The max,
max(data_set_4)
4
The min,
min(data_set_4)
2
The population standard deviation,
st.pstdev(data_set_4)
0.852936105461599
The sample standard deviation,
st.stdev(data_set_4)
0.8750939799154206
The population variance,
st.pvariance(data_set_4)
0.7275
The sample variance,
st.variance(data_set_4)
0.7657894736842106
The quartiles (the set of \(n=4\) quantiles),
st.quantiles(data_set_4, n=4)
[2.0, 2.0, 3.75]
The deciles (the set of \(n=10\) quantiles),
st.quantiles(data_set_4, n=10)
[2.0, 2.0, 2.0, 2.0, 2.0, 2.6, 3.0, 4.0, 4.0]
Question 2#
2
. Calculate the sample covariance and the correlation coefficient for the following pairs of data sets from question 1:
1
.data_set_1
anddata_set_4
st.covariance(data_set_1, data_set_4)
-12.468421052631578
st.correlation(data_set_1, data_set_4)
-0.39515342199380205
2
.data_set_3
anddata_set_4
st.covariance(data_set_3, data_set_4)
0.04126315789473684
st.correlation(data_set_3, data_set_4)
0.1601870630717755
3
.data_set_2
anddata_set_3
st.covariance(data_set_2, data_set_3)
0.057263157894736905
st.correlation(data_set_2, data_set_3)
0.013350362425512118
4
.data_set_1
anddata_set_2
st.covariance(data_set_1, data_set_2)
77.16842105263159
st.correlation(data_set_1, data_set_2)
0.1468745962708178
Question 3#
3
. For each of the data sets from question 1 obtain the covariance and correlation coefficient for the data set with itself.
1
. `data_set_1 = (…)
st.covariance(data_set_1, data_set_1)
1300.1157894736843
st.correlation(data_set_1, data_set_1)
1.0
2
. `data_set_2 = (…)
st.covariance(data_set_2, data_set_2)
212.32631578947368
st.correlation(data_set_2, data_set_2)
1.0
3
. `data_set_3 = (…)
st.covariance(data_set_3, data_set_3)
0.08664842105263158
st.correlation(data_set_3, data_set_3)
1.0
4
. `data_set_4 = (…)
st.covariance(data_set_4, data_set_4)
0.7657894736842106
st.correlation(data_set_4, data_set_4)
1.0
Question 4#
4
. Obtain a line of best fit for the pairs of data sets from question 2.
1
.data_set_1
anddata_set_4
st.linear_regression(data_set_1, data_set_4)
LinearRegression(slope=-0.009590238926087555, intercept=3.113208540130029)
2
.data_set_3
anddata_set_4
st.linear_regression(data_set_3, data_set_4)
LinearRegression(slope=0.47621361582195443, intercept=2.6157126196608194)
3
.data_set_2
anddata_set_3
st.linear_regression(data_set_2, data_set_3)
LinearRegression(slope=0.00026969411531406506, intercept=0.05142233900153683)
4
.data_set_1
anddata_set_2
st.linear_regression(data_set_1, data_set_2)
LinearRegression(slope=0.05935503720316409, intercept=73.43315170308718)
Question 5#
5
. Given a collection of 250 individuals whose height is normally distributed with mean 165 and standard deviation 5. What is the expected number of individuals with height between 150 and 160?
We start by creating the distribution:
distribution = st.NormalDist(165, 5)
distribution
NormalDist(mu=165.0, sigma=5.0)
Now let us find the probability of the random variable being between 150 and 160:
probability = distribution.cdf(160) - distribution.cdf(150)
probability
0.15730535589982697
The expected number of individuals is thus given by:
probability * 250
39.32633897495674
Question 6#
6
. Consider a class test where the score are normally distributed with mean 65 and standard deviation 5.
1
. What is the probability of failing the class test (a score less than 40)?
We start by creating the distribution:
distribution = st.NormalDist(65, 5)
distribution
NormalDist(mu=65.0, sigma=5.0)
The probability is given by:
distribution.cdf(40)
2.8665157186802404e-07
2
. What proportion of the class gets a first class mark (a score above 70)?
The probability is given by:
1 - distribution.cdf(70)
0.15865525393145707
3
. What is the mark that only 5% of the class would expect to get more than?
For this, we use the inverse cdf but we need to find the inverse cdf of \(.5\): a mark for which 5% of the class gets more than is equivalent to a mark for which 95% of the class get less than.
distribution.inv_cdf(.95)
73.22426813475735