Introduction

In this section we will cover inference in the context of genomics experiments. We apply some of the concepts we have covered in previous sections including t-tests, multiple comparisons and standard deviation estimates from hierarchical models.

We start by loading the pooling experiment data

library(Biobase)
library(maPooling)
data(maPooling)
pd=pData(maPooling)
individuals=which(rowSums(pd)==1)

And extracting the individual mice as well as their strain

individuals=which(rowSums(pd)==1)
individuals=individuals[-grep("tr",names(individuals))]
y=exprs(maPooling)[,individuals]
g=factor(as.numeric(grepl("b",names(individuals))))

T-test

We can now apply a t-test to each gene using the rowttest function in the genefilter package

library(genefilter)

## 
## Attaching package: 'genefilter'
## 
## The following object is masked from 'package:base':
## 
##     anyNA

tt=rowttests(y,g)

Now which genes do we report as statistically significant? For somewhat arbitrary reasons, in science p-values of 0.01 and 0.05 are used as cutoff. In this particular example we get

sum(tt$p.value<0.01)

## [1] 1578

sum(tt$p.value<0.05)

## [1] 2908

Multiple testing

We described multiple testing in detail in course 3. Here we provide a quick summary.

Do we report all these genes? Let’s explore what happens if we split the first group into two, forcing the null hypothesis to be true

set.seed(0)
shuffledIndex <- factor(sample(c(0,1),sum(g==0),replace=TRUE ))
nulltt <- rowttests(y[,g==0],shuffledIndex)
sum(nulltt$p.value<0.01)

## [1] 79

sum(nulltt$p.value<0.05)

## [1] 840

If we use the 0.05 cutoff we will be reporting 840 false positives. We have described several ways to adjust for this include the qvalue method available in the qvalue package. After this adjustment we include a smaller list of genes.

library(qvalue)
qvals = qvalue(tt$p.value)$qvalue
sum(qvals<0.05)

## [1] 1183

sum(qvals<0.01)

## [1] 538

And now the null case generates fewer false positives:

library(qvalue)
nullqvals = qvalue(nulltt$p.value)$qvalue
sum(nullqvals<0.05)

## [1] 0

sum(nullqvals<0.01)

## [1] 0

Inference with bioc

Introduction

T-test

Multiple testing