Sol:
Please see the R snippet
# read the data into R dataframe
data.df<-
read.csv("singlebirths.csv",header=TRUE)
str(data.df)
# reshape the data to long format
library(reshape2)
melt(data.df)
#fit a lm model
melt.df<- na.omit(melt(data.df))
mod1 = lm(value ~ variable, data = melt.df)
# check anova for part 1 of the question
anova(mod1)
# the test is significant and they do differ
a1<-aov(value~variable,data=melt.df)
# part 2 , perform a post hoc test , such as the Tukey hsd
posthoc<-TukeyHSD(x=a1,conf.level=0.95)
#plot the results for visual reppresentation of the
results
plot(posthoc)
# for part c we need to create a new dummy variable for rural vs
urban
# midwest vs south
melt.df$var1 <- ifelse(melt.df$variable %in%
c("MR","MU"),"Midwest", "South")
melt.df$var2 <- ifelse(melt.df$variable %in%
c("SR","MR"),"Rural", "Urban")
# perform the anova again , you may also choose to do a t
test
a2<-aov(value~var1,data=melt.df)
a3<-aov(value~var2,data=melt.df)
# summary of the results
summary(a2)
summary(a3)
Kindly change the path with the path of the file on your local machine. Enter the data exactly as shown in the table above
The results are
> summary(a1)
Df Sum Sq Mean Sq F value Pr(>F)
variable 3 943136 314379 4.652 0.0172 *
Residuals 15 1013628 67575
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(a2)
Df Sum Sq Mean Sq F value Pr(>F)
var1 1 328280 328280 3.427 0.0816 .
Residuals 17 1628484 95793
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(a3)
Df Sum Sq Mean Sq F value Pr(>F)
var2 1 432305 432305 4.821 0.0423 *
Residuals 17 1524459 89674
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The result of the post hoc test is
> posthoc
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = value ~ variable, data = melt.df)
$variable
diff lwr upr p adj
MU-MR 479.00 5.150837 952.84916 0.0471395
SR-MR -99.45 -602.042935 403.14293 0.9394047
SU-MR 36.80 -437.049163 510.64916 0.9958684
SR-MU -578.45 -1081.042935 -75.85707
0.0217381
SU-MU -442.20 -916.049163 31.64916 0.0714508
SU-SR 136.25 -366.342935 638.84293 0.8617146
The visual graph is

as we are trying to compare a quatitative variable against a categorical variable for with different levels of categories , a 1 way anova is a suitable choice to check whther the values differ across differetn categories. You may also chose to do a t test for the same
homework problem A random sample of the records of single births was selected from each of...