The UC Irvine Machine L earning repository contains many datasets for conducting computer science research. One dataset is the Haberman's Survival dataset, available at http://archive.ics.uci.edu/ml/datasets/ Haberman’s+Survival and also included online with the source code for the book. The file “haberman.data” contains survival data for breast cancer patients in comma-separated value (CSV) format. The first field is the patient’s age at the time of surgery, the second field is the year of the surgery, the third field is the number of positive axillary nodes detected, and the fourth field is the survival status. The survival status is 1 if the patient survived 5 years or longer and 2 if the patient died within 5 years.
Write a program that reads the CSV file and calculates the average number of positive axillary nodes detected for patients who survived 5 years or longer, and the average number of positive axillary nodes detected for patients who died within 5 years. A significant difference between the two averages suggests whether or not the number of positive axillary nodes detected can be used to predict survival time. Your program should ignore the age and year fields for each record.
We need at least 10 more requests to produce the solution.
0 / 10 have requested this problem solution
The more requests, the faster the answer.