In the early 1900's, several investigators were interested in predicting behavioural and so- cial outcomes amon...
In the early 1900's, several investigators were interested in predicting behavioural and so- cial outcomes among people based on physical characteristics. Macdonnell (1902) reports a correlation matrix for the following seven physical variables measured on 3000 British criminals: (1) head length (HEADLEN), (2) head breadth (HEADBDTH), (3) face breadth (FACEBDTH), (4) left finger length (FINGLEN), (5) left forearm length (FOREARM), (6) left foot length (FOOT), and (7) height (HEIGHT). Assume that all original variables were measured in centimetres. The R-output is presented below of a principal components anal- ysis based on Macdonnell's correlation matrix PC5 PC6 PC7 PC4 PC3 PC2 PCI 0,005455 0,067843 -.016505 -.087441 0.882232 0.364447 0.639506 0.512374 -.234869 0.276214 HEADLEN 0.034908 0.017690 -.083151 0.687188 255886 0.211852 0.295062 0.437587 0.455737 0.450262 HEADBDTH 0.033762 0.318252 .074653 0.102549 -.382679 -.069875 -.697667 FACEBDTH FINGLEN 0.503388 0.618945 0.103419 -.784745 0.290290 0.113410 0,053181 0.038723 .276601 -.178344 -.179459 -.036635 FOREARM 0.034273 .870496 0.014469 -.059009 FOOT 0.352716 0,233015 -.769546 -.006241 -.083677 0.435716 HEIGHT The eigenvalues of the correlation matrix is as follows: Eigenvalue Proportion Cumulative 0.54276 0.75732 3.79931 1.50195 0.542759 PCI 0.214565 PC2 0.85025 0,092926 PC3 0.65048 0.90167 0.051419 0.35994 PC4 0.95012 0.33915 0.23525 0.11391 0.048450 PC5 0.98373 1.00000 0.033608 PC6 0.016274 PC7 (a) On of thegoals of principal components analysis is to reduce the dimension of the original data. How would you choose the number of principal components to retain for subsequent analyses? In this example how many components would you retain? (5 marks) Another goal of PCA is to visualise the dataset. Given an observation (the criminal's (b) name is John Doe) of the dataset below, based on the number of PCs you have chosen, write down the coordinates of John Doe in the new coordinate system. Show your intermediate and final results in three significant figures. (6 marks) HEADLEN HEADBDTH FACEBDTH FINGLEN 10.0 FOREARM FOOT HEIGHT 30.0 26.0 180.0 16.0 14.0 20.0 appropriate for this example to perform (scaled) or unstandardised (unscaled) variables? Justify your answer. PCA based on standardised (c) Is it more (4 marks)
In the early 1900's, several investigators were interested in predicting behavioural and so- cial outcomes among people based on physical characteristics. Macdonnell (1902) reports a correlation matrix for the following seven physical variables measured on 3000 British criminals: (1) head length (HEADLEN), (2) head breadth (HEADBDTH), (3) face breadth (FACEBDTH), (4) left finger length (FINGLEN), (5) left forearm length (FOREARM), (6) left foot length (FOOT), and (7) height (HEIGHT). Assume that all original variables were measured in centimetres. The R-output is presented below of a principal components anal- ysis based on Macdonnell's correlation matrix PC5 PC6 PC7 PC4 PC3 PC2 PCI 0,005455 0,067843 -.016505 -.087441 0.882232 0.364447 0.639506 0.512374 -.234869 0.276214 HEADLEN 0.034908 0.017690 -.083151 0.687188 255886 0.211852 0.295062 0.437587 0.455737 0.450262 HEADBDTH 0.033762 0.318252 .074653 0.102549 -.382679 -.069875 -.697667 FACEBDTH FINGLEN 0.503388 0.618945 0.103419 -.784745 0.290290 0.113410 0,053181 0.038723 .276601 -.178344 -.179459 -.036635 FOREARM 0.034273 .870496 0.014469 -.059009 FOOT 0.352716 0,233015 -.769546 -.006241 -.083677 0.435716 HEIGHT The eigenvalues of the correlation matrix is as follows: Eigenvalue Proportion Cumulative 0.54276 0.75732 3.79931 1.50195 0.542759 PCI 0.214565 PC2 0.85025 0,092926 PC3 0.65048 0.90167 0.051419 0.35994 PC4 0.95012 0.33915 0.23525 0.11391 0.048450 PC5 0.98373 1.00000 0.033608 PC6 0.016274 PC7 (a) On of thegoals of principal components analysis is to reduce the dimension of the original data. How would you choose the number of principal components to retain for subsequent analyses? In this example how many components would you retain? (5 marks) Another goal of PCA is to visualise the dataset. Given an observation (the criminal's (b) name is John Doe) of the dataset below, based on the number of PCs you have chosen, write down the coordinates of John Doe in the new coordinate system. Show your intermediate and final results in three significant figures. (6 marks) HEADLEN HEADBDTH FACEBDTH FINGLEN 10.0 FOREARM FOOT HEIGHT 30.0 26.0 180.0 16.0 14.0 20.0 appropriate for this example to perform (scaled) or unstandardised (unscaled) variables? Justify your answer. PCA based on standardised (c) Is it more (4 marks)