The regression model being estimated is
where Salary is the Monthly salary (in Dollars)
Master_Degree is the number of years since being awarded a master degree.
a) The error term is
The assumptions are
b) The estimated values of the coefficients are

The estimated value of intercept is
The estimated value of the slope is
Ans: The estimated equation is
c) The estimated value of the slope coefficient is 150.95. The positive value indicates that number of years since being awarded a master degree and the monthly salary move in the same direction. That is the monthly salary would increase with the increase in number of years since being awarded a master degree. For each year increase in the number of years since being awarded a master degree, the monthly salary increases by $150.95.
It is reasonable that monthly salary would increase with the increase in number of years since being awarded a master degree, provided the person is gainfully employed (as a data scientist?) during these years in the field of interest. That is there is a corresponding increase in the number of years of experience working as a data scientist.
d) 90% confidence interval indicates that a significance level
of
.
The critical t value is obtained using
.
The number of observations is n=80. The degrees of freedom for t is n-2=80-2=78.
Using the t tables we can get the critical value of t for
degrees of freedom df=60 as
and for df=120 as 1.658.
the value for df=78 will be something between these 2 values. We can either interpolate of use excel to get the exact value.
Using =T.INV.2T(0.1,78) we get a t value of 1.665
We know the standard error of the slope estimate from the output
The 90% confidence interval is
ans: 90% confidence interval for the slope is [128.44,173.46]
e) There would be a positive relationship between monthly salary and number of years, if the slope coefficient is positive (that is >0).
That is we want to test the following hypotheses
The hypothesized value of the slope is
The test statistics is
this is a 1 tailed (right tailed) test (The alternative hypothesis has ">")
the p-value is P(T>11.165).
Using the excel function, =T.DIST.RT(11.165,78) we get p-value=3.83E-18
We will reject the null hypothesis if the p-value is less than the significance level.
Here the p-value of 0.000 is less than the significance level 0.05.
Hence we reject the null hypothesis.
We conclude that there is a significant positive relationship between monthly salary and number of years.
f) Using the ANOVA table

the degrees of freedom for Residuals is df=n-2=80-2=78
The Mean square residuals is MSE=126408.2.
The sum of square residuals is
Sum of square Total is SST=20514650.04
The coefficient of determination is
The value of coefficient of determination is 0.5194. It indicates that 51.94% of variation is monthly salary is explained by the variation is the number of years since being awarded a master degree.
g) The expected value of salary for master_degree=20 is
The expected monthly salary of a data scientist who has been awarded with a master degree for 20 years is $9,608.12
Simple Linear regression 1. A researcher uses a simple linear regression to measure the relationship between...
Question 6 (10 marks) Finally, the researcher considers using regression analysis to establish a linear relationship between the two variables – hours worked per week and yearly income. a) What is the dependent variable and independent variable for this analysis? Why? (2 marks) b) Use an appropriate plot to investigate the relationship between the two variables. Display the plot. On the same plot, fit a linear trend line including the equation and the coefficient of determination R2 . (2 marks)...
Which of the following statements is true with respect to a simple linear regression model? a. The regression slope coefficient is the square of the correlation coefficient b. It is possible that the correlation between a y and x variable might be statistically significant, but the regression slope coefficient could be determined to be zero since they measure different things c. The percentage of variation in the dependent variable that is explained by the independent variable can be determined by...
1) True or False? A researcher applies a simple regression to get the results shown below using n=8 observations. Then, to construct the 95 percent confidence interval for the slope, we must use a t statistic of 2.447, by Appendix D. Variable Coefficients Standard Error Intercept -0.1667 2.8912 X Variable (slope) 1.8333 0.2307 2) Based off the table presented above, A researcher applies a simple regression to get the results shown below using n=8 observations. Which of the followings is the...
#1 In simple linear regression, r is the: a) coefficient of determination. b) mean square error. c) correlation coefficient. d) squared residual. #2 In regression analysis, with the model in the form y = β0 + β1x + ε, x is the a) estimated regression equation. b) y-intercept. c) slope. d) independent variable. #3 A regression analysis between sales (y in $1,000s) and advertising (x in dollars) resulted in the following equation. ŷ = 40,000 + 3x The above equation...
Consider the simple linear regression model: HARD1 = β0 + β1*SCORE + є, where є ~ N(0, σ). Note: HARD1 is the Rockwell hardness of 1% copper alloys and SCORE is the abrasion loss score. Assume all regression model assumptions hold. The following incomplete output was obtained from Excel. Consider also that the mean of x is 81.467 and SXX is 81.733. SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square 0.450969 Standard Error Observations 15 ANOVA df...
Problem 5- Simple Linear Regression The following data represent the number of flash drives sold per day at a local computer shop and their prices Price $34 36 32 35 30 Units Sold 6 40 A computer output is produced to examine this relationship further SUMMA RY OUTPUT Regression Statistics Multiple R RSquare Adjusted R Square Standard Error Observations 0.924982 0.855592 0.826711 1.119949 7 ANOVA MS gnificance F Regression Residual Total 137.15714 37.15714 29.62415 0.002842 5 б,271429 1.254286 6 43.42857...
please help!
Following is a simple linear regression model: y = a + A + & The following results were obtained from some statistical software. R2 = 0.523 Syx (regression standard error) = 3.028 n (total observations) = 41 Significance level = 0.05 = 5% Variable Interecpt Slope of X Parameter Estimate 0.519 -0.707 Std. Err. of Parameter Est 0.132 0.239 Note: For all the calculated numbers, keep three decimals. Write the fitted model (5 points) 2. Make a prediction...
2. In a typical simple linear regression model, explore the relationship between the expected value of change in the response variable y and the value of the regressor x changed by 20 or 40 units. Describe the condition or assumption, if any, to meet for such exploration. 3. In a multiple linear regression model where x1 and x2 are two regressors. Explore the relationship between the expected value of change in the response variable y and the value of the...
a) The simple linear regression equation that shows the best
relationship between the number of patients and year is (round your
responses to three decimal places).
y= _ + _x
b) Using linear regression the number of patients Dr. Fok will
see in year 11 = _____ patients (round your response to two decimal
places).
c) Using linear regression, the number of patients Dr. Fok will
see in year 12 = _____ patients. (round your response to two
decimal places)....