(a) Generating the summary statistics: Using excel,
We get the output:
Substituting the mean and standard deviations to compute the Coefficient of Variation (in %):
s | CV(%) | ||
x1 | 79.04 | 12.08 | 12.08 / 79.04 = 15.54 |
x2 | 79.48 | 12.50 | 15.73 |
x3 | 81.48 | 11.77 | 14.44 |
x4 | 162.04 | 24.04 | 14.84 |
Comparing the spread relative to the mean would be the same as comparing the CVs. We find the CV for all the four variables are approximately equal i.e. they do not appear to be significantly different. We may say that:
Yes, the spread is about the same.Yes, the tests are about the same level of difficulty.
(b) Computing the correlation coefficient:
r | r^{2} | |
x1,x2 | 0.90 | 0.81 |
x1,x3 | 0.89 | 0.80 |
x1,x4 | 0.95 | 0.90 |
x2,x3 | 0.85 | 0.72 |
x2,x4 | 0.93 | 0.86 |
x3,x4 | 0.97 | 0.95 |
We find the highest correlation coefficient obtained is 0.97 observed between 3 and 4. Hence, we may say that:
3. Beacause it has the highest correlation with 4. Yes, the other two still have a lot of influence because of their high correlations with 4.
(c) Running a multiple regression by regressing x_{4} on the predictors x_{1},x_{2} and x_{3}, we get the output:
(d) The fitted regression equation can be expressed as:
where the intercept is estimated to be -4.34 and the slope of x_{1} is estimated to be 0.36, other predictors in the model being constant; similarly for x_{2} and x_{3}. Hence, we may say that:
If we hold all other explanatory variables as fixed constant, then we can look at one coefficient as slope.
Here, the slope of x_{3} can be interpreted as: the mean score of x_{4} increases by 1.17 units for a unit increase in x_{3}. Hence, if marks in x_{3} increases by 14, x_{4} marks is expected to increase by (14)(1.17) = 16.38 points
(e) We test the significance of the slope coefficients by testing the hypothesis:
Vs
t | P-value | |
2.93 | 0.008 | |
5.38 | 0.000 | |
11.33 | 0.000 |
Since, the p-value of the t test of all three predictors are significant at 5% level (since, 0.008,0.000 < 0.05), we may say that we reject all null hypothesis; there is sufficient evidence that differ from zero.
If a coefficient is different from zero, then it contributes to the regression equation.
(f) The 90% CI for slope can be constructed using the formula:
where the critical value of t is obtained as:
= 1.714
Lower Limit | Upper Limit | |
0.36 - 1.714 x 0.12 = 0.15 | 0.36 + 1.714 x 0.12 = 0.56 | |
0.37 | 0.72 | |
0.99 | 1.34 |
For x_{1} = 68, x_{2} = 72,x_{3} = 75
y = -4.34 + 0.36(68) + 0.54(72) + 1.17(75)
= 146.49 = 146 (approx.)
