Population, Samples, and Estimates Exercises
Exercises
For these exercises, we will be using the following dataset:
library(downloader)
url <- "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/mice_pheno.csv"
filename <- basename(url)
download(url, destfile=filename)
dat <- read.csv(filename)
We will remove the lines that contain missing values:
dat <- na.omit( dat )
-
Use
dplyr
to create a vectorx
with the body weight of all males on the control (chow
) diet. What is this population’s average? -
Now use the
rafalib
package and use thepopsd
function to compute the population standard deviation. -
Set the seed at 1. Take a random sample of size 25 from
x
. What is the sample average? -
Use
dplyr
to create a vectory
with the body weight of all males on the high fat (hf
) diet. What is this population’s average? -
Now use the
rafalib
package and use thepopsd
function to compute the population standard deviation. -
Set the seed at 1. Take a random sample of size 25 from
y
. What is the sample average? -
What is the difference in absolute value between and $$\bar{X}-\bar{Y}$?
-
Repeat the above for females. Make sure to set the seed to 1 before each
sample
call. What is the difference in absolute value between and $$\bar{X}-\bar{Y}$? -
For the females, our sample estimates were closer to the population difference than with males. What is a possible explanation for this?
- A) The population variance of the females is smaller than that of the males; thus, the sample variable has less variability.
- B) Statistical estimates are more precise for females.
- C) The sample size was larger for females.
- D) The sample size was smaller for females.