Exercises

For these exercises, we will be using the following dataset:

library(downloader)
url <- "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/mice_pheno.csv"
filename <- basename(url)


We will remove the lines that contain missing values:

dat <- na.omit( dat )

1. Use dplyr to create a vector x with the body weight of all males on the control (chow) diet. What is this populationâ€™s average?

2. Now use the rafalib package and use the popsd function to compute the population standard deviation.

3. Set the seed at 1. Take a random sample $X$ of size 25 from x. What is the sample average?

4. Use dplyr to create a vector y with the body weight of all males on the high fat (hf) diet. What is this populationâ€™s average?

5. Now use the rafalib package and use the popsd function to compute the population standard deviation.

6. Set the seed at 1. Take a random sample $Y$ of size 25 from y. What is the sample average?

7. What is the difference in absolute value between $\bar{y} - \bar{x}$ and $$\bar{X}-\bar{Y}? 8. Repeat the above for females. Make sure to set the seed to 1 before each sample call. What is the difference in absolute value between $\bar{y} - \bar{x}$ and$$\bar{X}-\bar{Y}\$?

9. For the females, our sample estimates were closer to the population difference than with males. What is a possible explanation for this?

• A) The population variance of the females is smaller than that of the males; thus, the sample variable has less variability.
• B) Statistical estimates are more precise for females.
• C) The sample size was larger for females.
• D) The sample size was smaller for females.