Problem

The UC Irvine Machine L earning repository contains many datasets for conducting computer...

The UC Irvine Machine L earning repository contains many datasets for conducting computer science research. One dataset is the Haberman's Survival dataset, available at http://archive.ics.uci.edu/ml/datasets/ Haberman’s+Survival and also included online with the source code for the book. The file “haberman.data” contains survival data for breast cancer patients in comma-separated value (CSV) format. The first field is the patient’s age at the time of surgery, the second field is the year of the surgery, the third field is the number of positive axillary nodes detected, and the fourth field is the survival status. The survival status is 1 if the patient survived 5 years or longer and 2 if the patient died within 5 years.

Write a program that reads the CSV file and calculates the average number of positive axillary nodes detected for patients who survived 5 years or longer, and the average number of positive axillary nodes detected for patients who died within 5 years. A significant difference between the two averages suggests whether or not the number of positive axillary nodes detected can be used to predict survival time. Your program should ignore the age and year fields for each record.

Step-by-Step Solution

Request Professional Solution

Request Solution!

We need at least 10 more requests to produce the solution.

0 / 10 have requested this problem solution

The more requests, the faster the answer.

Request! (Login Required)


All students who have requested the solution will be notified once they are available.
Add your Solution
Textbook Solutions and Answers Search
Solutions For Problems in Chapter 10
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT