Pick a minimum of 20 observations on any subject. This will include a dependent variable plus two independent variables that you may think are either negatively or positively correlated with the dependent variable. List the observed data (include the source). Then do the following:
a. State before doing any calculations whether you think they are positively or negatively correlated. What is your rationale?
Example: I test for a correlation between the quantity of coffee that people buy (Y) with the price of coffee (X1) and the household income (X2).
I hypothesize that there is a negative correlation between quantity and price because people like to buy goods at lower rather than higher prices. I also hypothesize that there is also a positive correlation between the quantity of coffee and household income because people can buy more coffee when their income increases.
b. Draw a graph of each of the two independent variables with the dependent variable either by hand or by using Excel. (Do this by inserting an XY/Scatter chart.)
c. Use Excel to do the necessary regression. Give the values for the y-intercept, b1 and b2. Write out the equation. Also show R-square, the F-statistic and its p-value and the t-statistics with their respective p-values.
d. Test for multicollinearity using the rule that the two independent variables are multicollinear if their correlation coefficient is .70 or greater (implying r-square is .49 or greater). If they are multicolliear, give a brief statement on why do you think that is the case.
e.Pretend that this was an assignment from your manager and communicate your findings to the manager in 100 words or less. You should assume in preparing this memo:
I ONLY NEED HELP WITH (E). THE REST IS JUST FOR REFERENCE. Thank you!!!
Answer: In a regression model, Y is known as the dependent variable, whose value depends upon some Xs, which are independent variables. In this case, the quantity of coffee bought depends on the price and the household income. Here, Y is the amount of coffee bought and X1 = price of coffee and X2 = household income.
In order to see whether there is a relationship between the given Y and Xs, we fit a regression model. This is a statistical model that helps us know whether a set of given variables affect the changes in a particular dependent variable and if so, what is the change that occurs in the dependent variable with 1 unit change in the independent variable/s. In order to fit this model, we use the following steps:
a. Draw the scatter plot of X vs Y. This helps us to see whether there is a linear relationship between the variables. Because we can perform linear regression only when there is a linear relationship between the dependent variable and the independent variable/s.
b. Fit the regression model. The regression model in this case will be given as
y =
o +
1X1 +
2X2. Here,
o is the y-intercept. It means that when the value of X1 and X2 is
0, this is the value of Y. It is the initial amount of coffee
purchased, irrespective of the price or the household income.
Now, X1 = price of coffee.
1 is the amount by which the value of Y changes when there is a
change of 1 unit in the value of X1. Thus, in this case, if the
price of coffee increases by 1 unit, the amount of coffee purchases
is affected by
1 units.
X2 = household income.
2 is the amount by which the value of Y changes when there is a
change of 1 unit in the value of X2. Thus, in this case, if the
household income increases by 1 unit, the amount of coffee
purchased is affected by
2 units.
Now, in order for these independent variables to affect the
dependent variable,
1 and
2 must not be equal to 0. In order to test this, we use the t-test
for the coefficients. We hypothesize that the coefficients are 0
and use t-test to see if our hypothesis is true or not. If any one
of the
is 0, then we conclude that there is no relationship between Y and
the given X. We conclude this on the basis of a p-value. This is
the probability that for a given hypothesis, the t-stat obtained
lies outside the acceptable range. For most tests, we use a 5%
significance level for p-value. That is, if the p-value obtained is
less than 0.05, then we conclude that
is not 0, otherwise it's 0. The p-value comes as an output of the
t-test for coefficients.
Also, since this is a model, it can be used to predict the values for the dependent variable for any simulated value of the independent variable. Thus, in order to do so, the model must be robust and accurate. This estimate is given through the value of R2.
R-squared (R2) is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model. Whereas correlation explains the strength of the relationship between an independent and dependent variable, R-squared explains to what extent the variance of one variable explains the variance of the second variable. So, if the R2 of a model is 0.50, then approximately half of the observed variation can be explained by the model's inputs.
For a model to be a good fit, R2 must be high.
There's also a concept of multicollinearity. This usually happens when the independent variables are related to each other/correlated. In this case, there's always a bias if we take just the variables. We also need to consider their interaction effect. Thus, in this case, for example, if price of coffee and household income were related, we would introduce a third variable Coffee*Household Income, which would find the interaction effect of these variable on the amount of coffee purchased.
Pick a minimum of 20 observations on any subject. This will include a dependent variable plus...
In estimating a regression based on monthly observations from January 1987 to December 2002 inclusive, you find that the coefficient on the independent variable is positive and significant at the 0.05 level. You are concerned, however, that the t−statistic on the independent variable may be inflated because of serial correlation between the error terms. Therefore, you examine the Durbin-Watson statistic, which is 1.8953 for this regression. (3.1) Based on the value of the Durbin-Watson statistic, what can you say about...
Dummy Variable Regression: Choose any metric variable as the
dependent variable (you can use the same one that you used in Part
A) and choose gender as an independent variable. Also choose one
more metric variable as an additional independent variable. Once
again, however, you must sort through the metric independent
variables until you find one that, along with gender, produces a
significant F-calc. Use alpha = .05 here as well. You
only need to report the model that produced...
The equation of the regression line between two variables x (independent variable) and y (dependent variable) is given by y-hat = -3x + 2; and the correlation coefficient is r = -.95. The possible x-values range from 1 to 10. Which of the following statements are correct? I. The variable y is strongly positive correlated to the variable x. II. The variable y is strongly negative correlated to the variable x. III. If x = 5, one would predict that...
2. According to Cohen's (1988) guidelines, an r of -0.56 would be considered a correlation 3. If two variables are correlated people who have low scores on one variable will tend to have low scores on the other variable. 4. Calculating a correlation coefficient is only appropriate when there is a relation between two variables. 5. A correlation value of would indicate that there was no association between the two variables. 6. regression enables one to predict an individual's score...
4. Part of an Excel output relating X (independent variable) and Y (dependent variable) is shown below. Fill in all the blanks marked with "?". Summary Output Regression Statistics Multiple R ? R Square ? Adjusted R Square 0.8125 Standard Error 1.3693064 Observations 7 ANOVA df SS MS F Significance F Regression ? 50.625 ? ? ? Residual ? 9.375 ? Total 6 60 Coefficients Standard Error. t Stat P-value Lower 95% Intercept 13.75 1.398341. 9.833082 0.0001853 10.15555 x -1.125...
Suppose you estimate a Linear Regression with quantity of sales as the dependent variable and price and income as independent variables. From this Linear Regression, you get an Adjusted R-squared of 0.2045. When you add the month of the year as an independent variable to the Linear Regression, the Adjusted R- squared is 0.1846. What does this indicate? a) The Goodness-of-Fit as measured by Adjusted R-squared has gotten better b) Adding the month of the year as an independent variable...
Consider a multiple regression model of the dependent variable y on independent variables x1, X2, X3, and x4: Using data with n 60 observations for each of the variables, a student obtains the following estimated regression equation for the model given: y0.35 0.58x1 + 0.45x2-0.25x3 - 0.10x4 He would like to conduct significance tests for a multiple regression relationship. He uses the F test to determine whether a significant relationship exists between the dependent variable and He uses the t...
4. Testing for significance Aa Aa Consider a multiple regression model of the dependent variable y on independent variables x1, x2, X3, and x4: Using data with n = 60 observations for each of the variables, a student obtains the following estimated regression equation for the model given: 0.04 + 0.28X1 + 0.84X2-0.06x3 + 0.14x4 y She would like to conduct significance tests for a multiple regression relationship. She uses the F test to determine whether a significant relationship exists...
13. Regressions for Decision Making (20 points) The station manager of a local television station is interested in predicting the will watch in the viewing area. The explanatory variables are: age (n years years), and family size (number of family members in household). The multiple regression n predicting the amount of television (in hours) that people education (highest level obtained, in output from Excel is shown 0 6644 05598 R-Square of Estimate ANOVA Table 13.9682 5.6413 4.6561 0.3134 14.8564 0,0000...
(16 pts) Suppose you have the output from an Excel linear regression. The dependent variable is ntrip, see definitions below Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.534 0.386 0.370 1.414 785 ANOVA df sS MS Regression Residual Total 2 113.5355 56.76777 156.2694 782 284.0761 0.363269 784 397.6116 Standard Coefficients Error tStat P-value Intercept hhsize wrkrcnt 1.500 0.250 0.150 0.049 20.7860.000 0.016 12.857 .000 0.027 5.551 0.000 NAME |Type- ntrip Numeric # of trips made...