Adjusting with factor analysis exercises
Exercises
In this section we will use the sva function in the sva package (available from Bioconductor) and apply it to the following data:
library(sva)
library(Biobase)
library(GSE5859Subset)
data(GSE5859Subset)
-
In a previous section we estimated factors using PCA, but we noted that the first factor was correlated with our outcome of interest:
s <- svd(geneExpression-rowMeans(geneExpression)) cor(sampleInfo$group,s$v[,1])The
svafitfunction estimates factors, but downweighs the genes that appear to correlate with the outcome of interest. It also tries to estimate the number of factors and returns the estimated factors like this:sex = sampleInfo$group mod = model.matrix(~sex) svafit = sva(geneExpression,mod) head(svafit$sv)The resulting estimated factors are not that different from the PCs.
for(i in 1:ncol(svafit$sv)){ print( cor(s$v[,i],svafit$sv[,i]) ) }Now fit a linear model to each gene that instead of
monthincludes these factors in the model. Use theqvaluefunction.How many genes have q-value < 0.1?
-
How many of these genes are from chrY or chrX?