Inference: t-tests, multiple comparisons
In the previous section, we focused on a pair of genes to illustrate two aspects of variation. One of the genes appeared to have high between-mouse variation that was hidden in the act of pooling samples within strain. When strains were compared on the basis of the pooled data, there was an appearance of a significant strain effect for this gene (), but when individual-level data were used to perform the comparison, the strain effect was found to be very weak at best (). The lesson is to recognize that the most scientifically compelling questions concern biological variation, which can only be directly measured with good experimental design. Accurate interpretation of origin and size of biological variation requires appropriate statistical analysis.
In this section we will cover inference in the context of genome-scale experiments. There are several serious conceptual problems:
- there are many tests, often at least one test for each one of tens of thousands of features
- each feature (typically a gene) exhibits its own technical and biological variability
- there may be unmeasured or unreported sources of biological variation (such as time of day)
- many features are inherently interrelated, so the tests are not independent
We will apply some of the concepts we have covered in previous sections including t-tests and multiple comparisons; later we will compute standard deviation estimates from hierarchical models.
We start by loading the pooling experiment data
library(Biobase) library(maPooling) data(maPooling) pd=pData(maPooling) individuals=which(rowSums(pd)==1)
And extracting the individual mice as well as their strain
individuals=which(rowSums(pd)==1) individuals=individuals[-grep("tr",names(individuals))] y=exprs(maPooling)[,individuals] g=factor(as.numeric(grepl("b",names(individuals))))
We can now apply a t-test to each gene using the
rowttest function in the
Now which genes do we report as statistically significant? For somewhat arbitrary reasons, in science p-values of 0.01 and 0.05 are used as cutoff. In this particular example we get
NsigAt01 = sum(tt$p.value<0.01) NsigAt01
##  1578
NsigAt05 = sum(tt$p.value<0.05) NsigAt05
##  2908
We described multiple testing in detail in course 3. Here we provide a quick summary.
Do we report all the nominally significant genes identified above? Let’s explore what happens if we split the first group into two, forcing the null hypothesis to be true
set.seed(0) shuffledIndex <- factor(sample(c(0,1),sum(g==0),replace=TRUE )) nulltt <- rowttests(y[,g==0],shuffledIndex) NfalselySigAt01 = sum(nulltt$p.value<0.01) NfalselySigAt01
##  79
NfalselySigAt05 = sum(nulltt$p.value<0.05) NfalselySigAt05
##  840
If we use the 0.05 cutoff we will be reporting 840 false positives. We have described several ways to adjust for this including the
qvalue method available in the qvalue package. After this adjustment we acquire
a smaller list of genes.
library(qvalue) qvals = qvalue(tt$p.value)$qvalue sum(qvals<0.05)
##  1179
##  538
library(qvalue) nullqvals = qvalue(nulltt$p.value)$qvalue sum(nullqvals<0.05)
##  0
##  0
This addresses in a fairly general way the problem of inflating significance claims when performing many hypothesis tests at a fixed nominal level of significance.