Question

This problem uses the Wage dataset in ISLR package in R n this part of the...

This problem uses the Wage dataset in ISLR package in R

  1. n this part of the problem, we will find a polynomial function of age that best fits the wage data. For each polynomial function between p = 0, 1, 2, ...10:

    i. Fit a linear regression to predict wages as a function of age, age2, ... agep (you should include an intercept as well). Note that p = 0 model is an “intercept-only” model

    ii. Use 5-fold cross validation to estimate the test error for this model. Save both the test error and the training error.

    (c) Plot both the test error and training error (on the same plot) for each of the models estimated above as a function of p. What do you observe about the training error as p increases? What about the test error? Based on your results, which model should you select and why?

0 0
Add a comment Improve this question Transcribed image text
Know the answer?
Add Answer to:
This problem uses the Wage dataset in ISLR package in R n this part of the...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • USING R: x variable = income, y variable = sales; data set = Carseats how would...

    USING R: x variable = income, y variable = sales; data set = Carseats how would you code this? In this part of the problem, we will find a polynomial function of Income that best fits the Carseats data. For each polynomial function between p 0,1,2,..10: i. Fit a linear regression to predict Sales as a function of Income, Income2. IncomeP (you should include an intercept as wel. Note that p 0 model is an "intercept-only" model.

  • in R For the iris dataset, store the 50 sepal lengths for the 50 versicolor rises...

    in R For the iris dataset, store the 50 sepal lengths for the 50 versicolor rises in a vector x For the iris dataset, store the 50 sepal lengths for the 50 virginica irises in a vectory What are the means and the variances of x and y? The variances "seem" different. Perform Welch's t-test that is appropriate in such cases to check if the mean sepal lengths of Versicolor and Virginica irises are significantly different. What is the p-value...

  • This question requires using Rstudio. This is following commands to install and import data into R:...

    This question requires using Rstudio. This is following commands to install and import data into R: > install.packages("ISLR") > library(ISLR) > data(Wage) The required data installed and imported, now this is description of the data: This dataset contains economic and demographic data for 3000 individuals living in the mid-Atlantic region. For each of the 3000 individuals, the following 11 variables are recorded: year: Year that wage information was recorded age: Age of worker maritl: A factor with levels 1. Never...

  • python Machine Learning problem Introduction In this project, you need to build a Multi-layer Perceptron (MLP)...

    python Machine Learning problem Introduction In this project, you need to build a Multi-layer Perceptron (MLP) model for a specific dataset to do predictions. Wine Data Set. These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines. Specifically, the attributes are 1)Alcohol, 2) Malic acid, 3) Ash, 4) Alcalinity...

  • Problem 1 (Logistic Regression and KNN). In this problem, we predict Direction using the data Weekly.csv....

    Problem 1 (Logistic Regression and KNN). In this problem, we predict Direction using the data Weekly.csv. a. i. Split the data into one training set and one testing set. The training set contains observations from 1990 to 2008 (Hint: we can use a Boolean vector train=(Year < 2009)). The testing set contains observations in 2009 and 2010 (Hint: since train is a Boolean vector here, should use ! symbol to reverse the elements of a Boolean vector to obtain the...

  • 4 MARKS QUEStION 3 Background You are part of a team working for the United Nations...

    4 MARKS QUEStION 3 Background You are part of a team working for the United Nations Environment Programme (UNEP) to investigate the deforestation process in Borneo. You are provided six images of the forest area in Borneo from 1950-2020 which comprise of historical and projection data. Forests are represented as green pixels and deforested areas as yellow pixels. Q3a In the Q3a.m file, use the imread() function to read the images. For each year (1950, 1985, 2000, 2005, 2010, 2020),...

  • For Questions 4-11, use the swiss dataset, which is built into R. Fit a multiple linear regression model with Fertility...

    For Questions 4-11, use the swiss dataset, which is built into R. Fit a multiple linear regression model with Fertility as the response and the remaining variables as predictors. You should use ?swiss to learn about the background of this dataset. 9. 1 Run Reset Report the value of the F statistic for the significance of regression test. Enter answer here point 10. 1 Run Reset 0.01. What decision do Carry out the significance of regression test using a you...

  • Question We will be analyzing the R dataset 'ho2' which is in the fpp2 package in...

    Question We will be analyzing the R dataset 'ho2' which is in the fpp2 package in Rstudio (Note that the 'O' in the name is a ZERO i.e. 'h-ZERO-2' We examine the h02 dataset by printing it to the console with h02 and using ?h02. Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1991 0.4297950 0.4009060 0.4321590 0.4925430 0.5023690 0.6026520 1992 0.6601190 0.3362200 0.3513480 0.3798080 0.3618010 ©.4105340 0.4833887 0.4754634 0.5347610 0.5686061 0.5952233 0.7712578 1993 0.7515028 0.3875543...

  • 2. R programming 2·The data set prostate in the faraway package is froma study on 97...

    2. R programming 2·The data set prostate in the faraway package is froma study on 97 men with prostate cancer who were due to receive a radical prostatectomy We are interest is in predicting lpsa (log prostate specific antigen) with Icavol (log cancer volume). (a) Draw a scatterplot -does a simple linear regression model seem reasonable? (b) Without using the R function Im), compute the values T,Y, Sxx, Syy and Sxy. Com- pute the ordinary least squares estimates of the...

  • Please Use R programming language to answers these question and please show me the code as...

    Please Use R programming language to answers these question and please show me the code as well. Thank You 1. Problem: dataset: savings; package : faraway Use R, perform the calculations and answer the following questions (a) Calculate the design matrix X, and all regression coefficients estimates, as shown in (3). (b) Calculate the Residuals standard error , as in (5). (c) ANOVA table: Calculate SST, SSE, SSR, ?2, as in (6).      Calculate the ANOVA F-statistic and p-value. (d)...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT