{pagebreak}

Exercises

  1. A test for cystic fibrosis has an accuracy of 99%. Specifically, we mean that:

    The cystic fibrosis rate in the general population is 1 in 3,900,

    If we select a random person and they test positive, what is probability that they have cystic fibrosis $$\mbox{Prob}(D +)$$ ? Hint: use Bayes Rule.
  2. (Advanced) First download some baseball statistics.

     tmpfile <- tempfile()
     tmpdir <- tempdir()
     download.file("http://seanlahman.com/files/database/lahman-csv_2014-02-14.zip",tmpfile)
     ##this shows us files
     filenames <- unzip(tmpfile,list=TRUE)
     players <- read.csv(unzip(tmpfile,files="Batting.csv",exdir=tmpdir),as.is=TRUE)
     unlink(tmpdir)
     file.remove(tmpfile)
    

    We will use the dplyr, which you can read about here to obtain data from 2010, 2011, and 2012, with more than 500 at bats (AB >= 500).

     dat <- filter(players,yearID>=2010, yearID <=2012) %>% mutate(AVG=H/AB) %>% filter(AB>500)
    

    What is the average of these batting averages?

  3. What is the standard deviation of these batting averages?

  4. Use exploratory data analysis to decide which of the following distributions approximates our AVG:
    • A) Normal.
    • B) Poisson.
    • C) F-distribution.
    • D) Uniform.
  5. It is April and after 20 at bats, José Iglesias is batting .450 (which is very good). We can think of this as a binomial distribution with 20 trials, with probability of success . Our sample estimate of is .450. What is our estimate of standard deviation? Hint: This is the sum that is binomial divided by 20.

  6. The Binomial is approximated by normal, so our sampling distribution is approximately normal with mean and SD . Earlier we used a baseball database to determine that our prior distribution is Normal with mean and SD . We also saw that this is the posterior mean prediction of the batting average.

    What is your Bayes prediction for the batting average going forward?