Question

You are given SmallSample.csv data. Please complete the following using this data and write a rscript....

You are given SmallSample.csv data. Please complete the following using this data and write a rscript.

  1. Please read SmallSample.csv data and create a data frame variable call smallsample.
  2. Please show the first six records to check data frame format
  3. Please show the structure of the data.
  4. Please check every column’s class: age& income should be numeric; gender, marital, and risk should be factor. numkids (number of kids) should be integer. If they are not the correct type, please change them (hint: as.numeric() &factor())
  5. For the factor column, please display and check their levels and frequency for each level.
  6. Please calculate and display the mean of age and income for all the records
  7. Please calculate and display the mean of income for each category of risk
  8. Please create 3 bins for income, and label them “low”, “medium”, and “high” using equal width strategy. And please add a new column to the data frame of samll sample and call it INCOME LEVEL. Please assign the binning results to the new column.
  9. Please display how many records fall into “low” income level category.
  10. The CSV file looks like following which consists of more than 300 datas
  11. AGE INCOME GENDER MARITAL NUMKIDS RISK
    31 59193 Female married 1 good risk
    45 58381 Male married 1 good risk
    43 57388 Female married 0 bad loss
    41 56470 Male married 0 bad loss
    46 55554 Female married 0 good risk
    32 54792 Male married 1 good risk
    44 53983 Female married 1 bad profit
    44 53550 Male married 1 bad loss
    32 52973 Male married 1 bad profit
    39 52495 Female married 1 good risk
    44 51498 Male married 0 bad loss
    33 50631 Female married 0 good risk
    38 50076 Male married 1 bad profit
    35 49600 Male married 1 good risk
    34 49007 Male married 1 bad profit
    37 48061 Female married 1 good risk
    39 47161 Male married 1 good risk
    38 46823 Male married 0 good risk
    36 45949 Female married 1 bad profit
    30 45715 Male married 0 good risk
    42 45584 Female married 1 good risk
    43 45390 Female married 0 good risk
    35 45238 Female married 0 good risk
    38 45103 Female married 0 good risk
    37 44936 Female married 1 bad profit
    30 44756 Female married 1 good risk
    42 44597 Female married 1 bad loss

0 0
Add a comment Improve this question Transcribed image text
Answer #1

PLEASE DO RATE IT IF HELPS ELSE LET ME KNOW YOUR DOUBT.

setwd('E:\\projects\\work\\HomeworkLib\\R\\SmallSample\\')
# Please read SmallSample.csv data and create a data frame variable call smallsample.
smallsample<-read.csv('SmallSample.csv',header = T, sep = ",")

# Please show the first six records to check data frame format
head(smallsample,6)

# Please show the structure of the data.
str(smallsample)

# Please check every column’s class: age& income should be numeric; gender, marital, and risk should be factor. numkids (number of kids) should be integer. If they are not the correct type, please change them (hint: as.numeric() &factor())

class(smallsample$AGE)
class(smallsample$INCOME)
class(smallsample$GENDER)
class(smallsample$MARITAL)
class(smallsample$NUMKIDS)
class(smallsample$RISK)

# For the factor column, please display and check their levels and frequency for each level.
levels(smallsample$GENDER)
table(smallsample$GENDER)
levels(smallsample$MARITAL)
table(smallsample$MARITAL)

# Please calculate and display the mean of age and income for all the records
mean(smallsample$AGE)
mean(smallsample$INCOME)

# Please calculate and display the mean of income for each category of risk
aggregate(smallsample$AGE, list(smallsample$RISK), mean)

# Please create 3 bins for income, and label them “low”, “medium”, and “high” using
#equal width strategy. And please add a new column to the data frame of samll sample
#and call it INCOME LEVEL. Please assign the binning results to the new column.

smallsample$INCOME_LEVEL<- cut(smallsample$INCOME, 3, include.lowest=TRUE, labels=c("low", "medium", "high"))

# Please display how many records fall into “low” income level category.
nrow(smallsample[smallsample$INCOME_LEVEL=='low',])

Add a comment
Know the answer?
Add Answer to:
You are given SmallSample.csv data. Please complete the following using this data and write a rscript....
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • II. Using the spreadsheet provided, please answer each question. You must show work for the problems...

    II. Using the spreadsheet provided, please answer each question. You must show work for the problems where this is indicated. Round all answers to the nearest hundredth where necessary. 5. Name two categories that have qualitative data. 6. Name two categories that have quantitative data that is discrete, 7. Name two categories that have quantitative data that is continuous. 8. Build a frequency chart for the category “Major". 9. Build a relative frequency chart for the category “Number of Siblings"....

  • Using the data in the attached table info.csv Find the k-nearest neighbors for record #10 using...

    Using the data in the attached table info.csv Find the k-nearest neighbors for record #10 using k = 3 Write a Python 3 program that solves the above problem Python program must read the info.csv file from within the program The info.csv file contains the following data table 1, 22, Single, 46156.98, Bad loss 2, 33, Married, 24188.10, Bad loss 3, 28, Other, 28787.34, Bad loss 4, 51, Other, 23886.72, Bad loss 5, 25, Single, 47281.44, Bad loss 6, 39,...

  • Write a program called RentalRate.java which reads in a date of birth, today’s date and a...

    Write a program called RentalRate.java which reads in a date of birth, today’s date and a male/female designation from the user, and will then determine the basic daily and weekly rates for rental of an economy class car. Rental rate classes are: Best rate (male drivers, age 33–65 and female drivers, age 30-62)--$40.00per day, $200.00 per week Risk rate 1 (female drivers, age 25–29)–Best rate plus $10.00 per day or best rate plus $55.00 per week. Risk rate 2 (male...

  • A. Perform a one-way ANOVA to look at whether income (INC1) differs by type of relationship...

    A. Perform a one-way ANOVA to look at whether income (INC1) differs by type of relationship (RELAT). Which of the following describes your result: A. F(3,396) = 4.91, p > .05 B. F(3,396) = 4.91, p < .001 C. F(3,396) = 6.85, p > .05 D. F(3,396) = 6.85, p < .001 B. The main effect due to gender indicates that: A. Women earn more than men. B. Men earn more than women. C. Men and women have incomes that...

  • Please display the complete process of how you calculated your solution, for better understanding. Thank you!...

    Please display the complete process of how you calculated your solution, for better understanding. Thank you! 3. The Department of Justice has released the following information as it relates to sex offences: Male Victim; Male Offender - 3,760 Female Victim; Female Offender - 140 Male Victim; Female Offender-450 Female Victim; Male Offender - 1,590 a) Calculate the chi-sq for this data and using the 5-step model, determine whether there is a relationship between the gender of the offender and the...

  • We sometimes hear that getting married is good for your career. The table below presents data...

    We sometimes hear that getting married is good for your career. The table below presents data from one of the studies behind this generalization. To avoid gender effects, the investigators looked only at men. The data describe the marital status and the job level of all the male managers and professionals employed by a large manufacturing firm. The firm assigns each position a grade that reflects the value of that particular job to the company. The authors of the study...

  • Using python code answer the following : #Q5: What is the Employee ID of highest MonthlyIncome...

    Using python code answer the following : #Q5: What is the Employee ID of highest MonthlyIncome paid employee? #Q6: What is the average(mean) DailyRate group by Age for all Employees whose age is greater than 58. (hint: use groupby function) #Q7: How many unique EducationField are there? Attrition BusinessT DailyRate Departme DistanceF Education Education Employee Employee Environm Gender HourlyRat Joblnvolv JobLevel JobRole Age JobSatisfa MaritalSta Monthlylr MonthlyR 2 Life Scien Travel Ra 1102 Sales 2 Female 2 Sales Exec 4...

  • What is the demographic composition of the sample ? Summary of the below sample for the...

    What is the demographic composition of the sample ? Summary of the below sample for the above question Q-1 Statistics Statistics, table, 1 levels of column headers and 2 levels of row headers, table with 8 columns and 5 rows Size of home town or city Gender Marital status Age category Level of education Income category N Valid 1000 1000 1000 1000 1000 1000 Missing 0 0 0 0 0 0 Mode 4 0 1 3 4 3 Size of...

  • Homework Case 1 Statistics - Please explain and show how to create graphs and charts. Pelican Stores a division of National Clothing, is a chain of women's apparel store operating throughout the c...

    Homework Case 1 Statistics - Please explain and show how to create graphs and charts. Pelican Stores a division of National Clothing, is a chain of women's apparel store operating throughout the country.The chain recently ran a promotion in which discount coupons were sent to customers of othe National Clothing stores. Data collected for a sample of 100 in-store credit card transactions at Pelican Stores during one day while the promotion was running are contained in the file name Pelican...

  • Solve the following questions

    Pelican stores, a division of national clothing, is a chain of women's apparel stores, operating throughout the country. The chain recently ran a promotion in which discount coupons were sent to customers of other national clothing stores. Data collected for a sample of in-store credit card transactions at pelican stores during one day while the promotion was running. the proprietary card method of payment refers to charges made on a national clothing charge card. customers who have made a purchase...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT