Monte Carlo Exercises
Exercises
We have used Monte Carlo simulation throughout this chapter to demonstrate statistical concepts; namely, sampling from the population. We mostly applied this to demonstrate the statistical properties related to inference on differences in averages. Here, we will consider examples of how Monte Carlo simulations are used in practice.

Imagine you are William_Sealy_Gosset and have just mathematically derived the distribution of the tstatistic when the sample comes from a normal distribution. Unlike Gosset you have access to computers and can use them to check the results.
Let’s start by creating an outcome. Set the seed at 1, use
rnorm
to generate a random sample of size 5, from a standard normal distribution, then compute the tstatistic with the sample standard deviation. What value do you observe? 
You have just performed a Monte Carlo simulation using
rnorm
, a random number generator for normally distributed data. Gosset’s mathematical calculation tells us that this random variable follows a tdistribution with degrees of freedom. Monte Carlo simulations can be used to check the theory: we generate many outcomes and compare them to the theoretical result. Set the seed to 1, generate tstatistics as done in exercise 1. What percent are larger than 2? 
The answer to exercise 2 is very similar to the theoretical prediction:
1pt(2,df=4)
. We can check several such quantiles using theqqplot
function.To obtain quantiles for the tdistribution we can generate percentiles from just above 0 to just below 1:
B=100; ps = seq(1/(B+1), 11/(B+1),len=B)
and compute the quantiles withqt(ps,df=4)
. Now we can useqqplot
to compare these theoretical quantiles to those obtained in the Monte Carlo simulation. Use Monte Carlo simulation developed for exercise 2 to corroborate that the tstatistic follows a tdistribution for several values of .For which sample sizes does the approximation best work?
 A) Larger sample sizes.
 B) Smaller sample sizes.
 C) The approximations are spot on for all sample sizes.
 D) None. We should use CLT instead.
 Use Monte Carlo simulation to corroborate that the tstatistic comparing two means and obtained with normally distributed (mean 0 and sd) data follows a tdistribution. In this case we will use the
t.test
function withvar.equal=TRUE
. With this argument the degrees of freedom will bedf=2*N2
withN
the sample size. For which sample sizes does the approximation best work? A) Larger sample sizes.
 B) Smaller sample sizes.
 C) The approximations are spot on for all sample sizes.
 D) None. We should use CLT instead.

Is the following statement true or false? If instead of generating the sample with
X=rnorm(15)
, we generate it with binary dataX=rbinom(n=15,size=1,prob=0.5)
then the tstatistictstat < sqrt(15)*mean(X) / sd(X)
is approximated by a tdistribution with 14 degrees of freedom.

Is the following statement true or false? If instead of generating the sample with
X=rnorm(N)
withN=500
, we generate the data with binary dataX=rbinom(n=500,size=1,prob=0.5)
, then the tstatisticsqrt(N)*mean(X)/sd(X)
is approximated by a tdistribution with 499 degrees of freedom. 
We can derive approximation of the distribution of the sample average or the tstatistic theoretically. However, suppose we are interested in the distribution of a statistic for which a theoretical approximation is not immediately obvious.
Consider the sample median as an example. Use a Monte Carlo to determine which of the following best approximates the median of a sample taken from normally distributed population with mean 0 and standard deviation 1.
 A) Just like for the average, the sample median is approximately normal with mean 0 and SD .
 B) The sample median is not approximately normal.
 C) The sample median is tdistributed for small samples and normally distributed for large ones.
 D) The sample median is approximately normal with mean 0 and SD larger than .