Question

Use the data Orange in R. You should include the r code as well as the output in your file, with appropriate answer to questions. Answer the following questions in your document 1. Fit a simple linear regression using circumterence as response and age as predictor. Is there a significant linear relationship between the two variables? State the null and alternative hypothesis, test statistic and p-value 2. Find which observation has the largest residual in absolute value) Give the value of this residual, the response value of the observation associated with this residual, and the fitted value for this observation 3, Construct an 90% CI for the true mean circumference at age equal to 600 4. Construct a 90% PI for a new observation at age equal to 800. Do 2.3.4
0 0
Add a comment Improve this question Transcribed image text
Answer #1

2) First we fit a linear regression as circumference as response and age as predictor. Then we calculate the residuals from the fitted model and find the largest residual and corresponding response value and the fitted value.The R code is given as below:

d=Orange
y=d$circumference #response   
x=d$age #predictor

l=lm(y~x) #fitted model
summary(l)
y_hat=predict(l) #predicted values of y
res=abs(y-y_hat) # absolute residuals
maxres=max(res) #maximum residual
y_maxres=y[which(res==maxres)] #corresponding response
y_hat_maxres=y_hat[which(res==maxres)] #corresponding fitted response

3) Here we have to find 90% CI. So here the level alpha=0.1. Let the linear regression model is denoted by y=\beta_0+\beta_1x+e where e\sim N(0,\sigma^2) . Let the true mean circumference is denoted by \mu_{y|x_0}=E(y|x_0)=\beta_0+\beta_1x_0 . So fitted mean circumference is given by \hat\mu_{y|x_0}=\hat\beta_0+\hat\beta_1x_0

So   E(\hat\mu_{y|x_0})=\mu_{y|x_0} . Now , V(\hat\mu_{y|x_0})=V(\hat\beta_0+\hat\beta_1x_0)=V(\hat\beta_0)+x_0^2V(\hat\beta_1)+2x_0Cov(\hat\beta_0,\hat\beta_1)

V(\hat\beta_0)=\sigma^2[(\frac{1}{n}+\frac{\bar x^2}{s_{xx}})] ,   V(\hat\beta_1)=\sigma^2(\frac{1}{s_{xx}}) , Cov(\hat\beta_0,\hat\beta_1)=-\sigma^2(\frac{\bar x}{s_{xx}}) \

putting these values we get -   V(\hat\mu_{y|x_0})=\sigma^2[\frac{1}{n}+\frac{(x_0-\bar x)^2}{s_{xx}}]

Therefore \frac{\hat\mu_{y|x_0}-\mu_{y|x_0}}{\sqrt{\sigma^2[\frac{1}{n}+\frac{(x_0-\bar x)^2}{s_{xx}}]}}\sim N(0,1) and \frac{(n-2)\hat\sigma^2}{\sigma^2}\sim \chi^2_{n-2}

So \frac{\hat\mu_{y|x_0}-\mu_{y|x_0}}{\sqrt{\hat\sigma^2[\frac{1}{n}+\frac{(x_0-\bar x)^2}{s_{xx}}]}}\sim t_{n-2}

The 90% CI can be obtained from the following equation

P(-t_{0.05;n-2}<=\frac{\hat\mu_{y|x_0}-\mu_{y|x_0}}{\sqrt{\hat\sigma^2[\frac{1}{n}+\frac{(x_0-\bar x)^2}{s_{xx}}]}}<=t_{0.05;n-2})=0.90

which will give the 90%CI as

(\hat\mu_{y|x_0}-t_{0.05;n-2}{\sqrt{\hat\sigma^2[\frac{1}{n}+\frac{(x_0-\bar x)^2}{s_{xx}}]}},\hat\mu_{y|x_0}+t_{0.05;n-2}{\sqrt{\hat\sigma^2[\frac{1}{n}+\frac{(x_0-\bar x)^2}{s_{xx}}]}} \,\,\,)

Here   s_{xx}=\sum_{i=1}^{n}(x_i-\bar x)^2 =   8225644 , n = 35, x0 = 600 , t0.05;33 = 1.69236, xbar=922.1429, \hat \sigma^2=SSRes/33 = 18594.74 / 33 = 563.477 , \hat\mu_{y|x_0}=\hat\beta_0+\hat\beta_1x_0= 17.3997 +  0.1068*600 = 81.4797

Putting these values we will get the 90% CI for true mean circumference which is equal to (73.3267,89.6327)

4) Here we need to find 90% PI for a new obsevation corresponding to age=800.

Here y_0=\beta_0+\beta_1x_0+e_0 and \hat y_0=\hat \beta_0+\hat \beta_1x_0 . Take  z=\hat y_0-y_0 So E(z)=0 and V(z)=\sigma^2[1+\frac{1}{n}+\frac{(x_0-\bar x)^2}{s_{xx}}] .The variance of z is derived similarly as before only extra term here is V(y0) which is \sigma^2. Similarly we will get that

\frac{\hat y_0-\-y_0}{\sqrt{\hat\sigma^2[1+\frac{1}{n}+\frac{(x_0-\bar x)^2}{s_{xx}}]}}\sim t_{n-2}

The 90% CI can be obtained from the following equation

P(-t_{0.05;n-2}<=\frac{\hat y_0-y_0}{\sqrt{\hat\sigma^2[1+\frac{1}{n}+\frac{(x_0-\bar x)^2}{s_{xx}}]}}<=t_{0.05;n-2})=0.90

which will give the 90% PI as

(\hat y_0-t_{0.05;n-2}{\sqrt{\hat\sigma^2[1+\frac{1}{n}+\frac{(x_0-\bar x)^2}{s_{xx}}]}},\hat y_0+t_{0.05;n-2}{\sqrt{\hat\sigma^2[1+\frac{1}{n}+\frac{(x_0-\bar x)^2}{s_{xx}}]}} \,\,\,)

Where the values of all quantities will remain same except x0 which is equal to 800 here. \hat y_0=\hat \beta_0+\hat \beta_1x_0 =  17.3997 +  0.1068*800=102.8397

So putting all other values we will get the desired 90% PI for a new observation which is equal to ( 62.0597,143.6197)

All the calculations which are done in R is given below:

x_bar=mean(x)
sxx=sum((x-x_bar)**2)
n=length(y)
tab_t=qt(0.95,33)
res1=(y-y_hat)
ssres=sum(res1**2)

Add a comment
Know the answer?
Add Answer to:
Do 2.3.4 Use the data "Orange" in R. You should include the r code as well...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • (Do this problem without using R) Consider the simple linear regression model y =β0 + β1x...

    (Do this problem without using R) Consider the simple linear regression model y =β0 + β1x + ε, where the errors are independent and normally distributed, with mean zero and constant variance σ2. Suppose we observe 4 observations x = (1, 1, −1, −1) and y = (5, 3, 4, 0). (a) Fit the simple linear regression model to this data and report the fitted regression line. (b) Carry out a test of hypotheses using α = 0.05 to determine...

  • Question 5 (1 point) Orange trees ~ In the 1960s, a botanist was conducting a research...

    Question 5 (1 point) Orange trees ~ In the 1960s, a botanist was conducting a research on orange trees. He wanted to figure out the relationship between the growth of trees and their ages. The botanist gathered a random sample of 35 orange trees and recorded the circumference of the tree in mm and the age of the tree in days. Circumference of tree is the X variable and Age of tree is the Y variable in this scenario. R...

  • Solve using R and show R code Instruction: Please submit your R code along with a...

    Solve using R and show R code Instruction: Please submit your R code along with a brief write-up of the solutions. Some of the questions below can be answered with very little or no programming. However, write code that outputs the final answer and dos not ryuira uper calceulatioms. Q.N. 1) The mammals data set in the MASS package records brain size and body size of 62 different mammals a) Fit a regresion model to describe the relation between brain...

  • Please help me with these questions with R codes.. thank you!! Here’s the data I have...

    Please help me with these questions with R codes.. thank you!! Here’s the data I have obtained for the questions: Data: 9 students in total Height(cm) Head Circumference(cm) 179 60 161 55 162 57 155 60 158 56 172 57 191 60 179 57 163 58 2. Draw at most 3 plots to visually describe your data. Is your response variable approximately Normal? 3. Numerically describe the centre, spread and any unusual points of your variables/data. 4. Fit and describe...

  • 2. Suppose Y ~ Exp(a), which has pdf f(y)-1 exp(-y/a). (a) Use the following R code to generate data from the model Yi...

    2. Suppose Y ~ Exp(a), which has pdf f(y)-1 exp(-y/a). (a) Use the following R code to generate data from the model Yi ~ Exp(0.05/Xi), and provide the scatterplot of Y against X set.seed(123) n <- 500 <-rnorm (n, x 3, 1) Y <- rexp(n, X) (b) Fit the model Yi-Ao + Ax, + ε¡ using the lm function in R and provide a plot of the best fit line on the scatterplot of Y vs X, and the residual...

  • R is a little difficult for me, please answer if you can interpret the R code, I want to learn better how to interpret the R code 4. each 2 pts] Below is the R output for a simple linear regression m...

    R is a little difficult for me, please answer if you can interpret the R code, I want to learn better how to interpret the R code 4. each 2 pts] Below is the R output for a simple linear regression model Coefficients: Estimate Std. Error t value Pr(>t) (Intercept) 77.863 4.199 18.544 3.54e-13 3.485 3.386 0.00329* 11.801 Signif. codes: 0 0.0010.010.05 0.11 Residual standard error: 3.597 on 18 degrees of freedom Multiple R-squared: 0.3891, Adjusted R-squared: 0.3552 F-statistic: 11.47...

  • 2. The following data were collected last semester on ten students. Complete a multiple regression analysis in which you use AGE (A), MATH PROFICIENCY (MP) (on a 1 –10 scale), and GENDER (G) (0 = male...

    2. The following data were collected last semester on ten students. Complete a multiple regression analysis in which you use AGE (A), MATH PROFICIENCY (MP) (on a 1 –10 scale), and GENDER (G) (0 = male, 1 = female) as predictors of FINAL EXAM (FE) performance. Do this analysis in SPSS and then answer the following questions. Subject # A MP G FE 1 35 8 1 90 2 31 6 0 88 3 26 5 1 84 4 33...

  • The Book of R (Question 20.2) Please answer using R code. Continue using the survey data...

    The Book of R (Question 20.2) Please answer using R code. Continue using the survey data frame from the package MASS for the next few exercises. The survey data set has a variable named Exer , a factor with k = 3 levels describing the amount of physical exercise time each student gets: none, some, or frequent. Obtain a count of the number of students in each category and produce side-by-side boxplots of student height split by exercise. Assuming independence...

  • USE R STUDIO The stackloss data frame available in R contains 21 observations on four variables...

    USE R STUDIO The stackloss data frame available in R contains 21 observations on four variables taken at a factory where ammonia is converted to nitric acid. The first three variables are Air.Flow, Water.Temp, and Acid.Conc. The fourth variable is stack.loss, which measures the amount of ammonia that escapes before being absorbed. Read the help file for more information about this data frame. - Give a numerical summarization of each column of the dataset, then use boxplots to help illustrating...

  • Yes, as it is in the mint abs. NO: 24,46 so 50 is an outlier (g)...

    Yes, as it is in the mint abs. NO: 24,46 so 50 is an outlier (g) Find a 95% confidence interval for the slope. Interpret your confidence interval. (h) Test the null hypothesis that the slope is zero and describe your conclusion. (i) Suppose we wish to predict the mean per capita retail sale for the years with per capita personal income 16000. What is the 95% confidence interval for this prediction? 6) If the per capita personal income in...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT