Inference with bioc
Introduction
In this section we will cover inference in the context of genomics experiments. We apply some of the concepts we have covered in previous sections including t-tests, multiple comparisons and standard deviation estimates from hierarchical models.
We start by loading the pooling experiment data
library(Biobase)
library(maPooling)
data(maPooling)
pd=pData(maPooling)
individuals=which(rowSums(pd)==1)
And extracting the individual mice as well as their strain
individuals=which(rowSums(pd)==1)
individuals=individuals[-grep("tr",names(individuals))]
y=exprs(maPooling)[,individuals]
g=factor(as.numeric(grepl("b",names(individuals))))
T-test
We can now apply a t-test to each gene using the rowttest
function in the genefilter
package
library(genefilter)
##
## Attaching package: 'genefilter'
##
## The following object is masked from 'package:base':
##
## anyNA
tt=rowttests(y,g)
Now which genes do we report as statistically significant? For somewhat arbitrary reasons, in science p-values of 0.01 and 0.05 are used as cutoff. In this particular example we get
sum(tt$p.value<0.01)
## [1] 1578
sum(tt$p.value<0.05)
## [1] 2908
Multiple testing
We described multiple testing in detail in course 3. Here we provide a quick summary.
Do we report all these genes? Let’s explore what happens if we split the first group into two, forcing the null hypothesis to be true
set.seed(0)
shuffledIndex <- factor(sample(c(0,1),sum(g==0),replace=TRUE ))
nulltt <- rowttests(y[,g==0],shuffledIndex)
sum(nulltt$p.value<0.01)
## [1] 79
sum(nulltt$p.value<0.05)
## [1] 840
If we use the 0.05 cutoff we will be reporting 840 false positives. We have described several ways to adjust for this include the qvalue
method available in the qvalue
package. After this adjustment we include a smaller list of genes.
library(qvalue)
qvals = qvalue(tt$p.value)$qvalue
sum(qvals<0.05)
## [1] 1183
sum(qvals<0.01)
## [1] 538
And now the null case generates fewer false positives:
library(qvalue)
nullqvals = qvalue(nulltt$p.value)$qvalue
sum(nullqvals<0.05)
## [1] 0
sum(nullqvals<0.01)
## [1] 0