Bayes Exercises
{pagebreak}
Exercises
-
A test for cystic fibrosis has an accuracy of 99%. Specifically, we mean that:
The cystic fibrosis rate in the general population is 1 in 3,900,
If we select a random person and they test positive, what is probability that they have cystic fibrosis $$\mbox{Prob}(D +)$$ ? Hint: use Bayes Rule. -
(Advanced) First download some baseball statistics.
tmpfile <- tempfile() tmpdir <- tempdir() download.file("http://seanlahman.com/files/database/lahman-csv_2014-02-14.zip",tmpfile) ##this shows us files filenames <- unzip(tmpfile,list=TRUE) players <- read.csv(unzip(tmpfile,files="Batting.csv",exdir=tmpdir),as.is=TRUE) unlink(tmpdir) file.remove(tmpfile)
We will use the
dplyr
, which you can read about here to obtain data from 2010, 2011, and 2012, with more than 500 at bats (AB >= 500).dat <- filter(players,yearID>=2010, yearID <=2012) %>% mutate(AVG=H/AB) %>% filter(AB>500)
What is the average of these batting averages?
-
What is the standard deviation of these batting averages?
- Use exploratory data analysis to decide which of the following distributions approximates our AVG:
- A) Normal.
- B) Poisson.
- C) F-distribution.
- D) Uniform.
-
It is April and after 20 at bats, José Iglesias is batting .450 (which is very good). We can think of this as a binomial distribution with 20 trials, with probability of success . Our sample estimate of is .450. What is our estimate of standard deviation? Hint: This is the sum that is binomial divided by 20.
-
The Binomial is approximated by normal, so our sampling distribution is approximately normal with mean and SD . Earlier we used a baseball database to determine that our prior distribution is Normal with mean and SD . We also saw that this is the posterior mean prediction of the batting average.
What is your Bayes prediction for the batting average going forward?