Question

(Referencing problem 6.1 from 'Data Mining for Business Analytics Concepts, Techniques, and Applications in R' Shmueli,...

(Referencing problem 6.1 from 'Data Mining for Business Analytics Concepts, Techniques, and Applications in R' Shmueli, et.al.) 6.1.d.iii. reads as follows: " Use stepwise regression with the three options (backward, forward, both) to reduce the remaining predictors as follows: Run stepwise on the training set. Choose the top from each stepwise run. Then use each of these models separately to predict the validation set. Compare RMSE, MAPE, and mean error, as well as lift charts. Finally, describe the best model."

For the bolded parts of the book question, I'm not clear what I'm supposed to be doing with these. I did run stepwise regression for the three backward, forward, and both options, but it's not clear to me what in the outputs I use to 'choose the top from each stepwise run'. Insight how to proceed is greatly appreciated here

0 0
Add a comment Improve this question Transcribed image text
Answer #1

When stepwise regression is run, backward forward or both the way it works is the entire list of variables that is fed into the program is taken into account and it returns based on specified stay and/or exit parameters keeps a reduced finalized list of predictors. For example, if there are 100 predictors initially, then the stepwise output may give out top 40 variables which are basically the optimum subset of variables that are explaining the response relatively well enough as opposed to all the predictor variables being used, the objective is reduction of predictor variables for parsimonious model.

Explanation "choose the top form each stepwise run and run each model separately to predict validation set" is as follows:

Let's say if there are 1000 observations/records/rows and 100 variables/predcitors and one response variable. The first step is dividing the data into training and test datasets randomly usually in the ratio 7:3. Training dataset would contain 700 records and test would contain 300 records. Test dataset is to be kept aside for the time being.

Next stepwise for forward backward and both are to be run for the training dataset 700 records. Then one has to choose the top m variables from each run of stepwise. Say if it isn't prespecified choose say m=30 subjectively, basically the reduced number of predictor variables that you actually want to build the model on. Thus one has 3 lists of variables each for forward backward and both methods of stepwise each containing 30 variables. Now run 3 separate regressions on the 700 records on each of the above variable lists of size 30. Then 3 models are developed.

Next, on the validation /test data of size 300, these models are scored to predict the same for the test dataset, scoring can be done using the same package in the software. Then corresponding metrics like RMSE MAPE mean error and lift charts can be compared across the test dataset for the 3 models. Based on these, one can do a comparison and choose the best model, lower the RMSE, MAPE and mean error, better the model.

Add a comment
Know the answer?
Add Answer to:
(Referencing problem 6.1 from 'Data Mining for Business Analytics Concepts, Techniques, and Applications in R' Shmueli,...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT