MDS exercises
{pagebreak}
Exercises
-
Using the
z
we computed in exercise 4 of the previous exercises:library(tissuesGeneExpression) data(tissuesGeneExpression) y = e - rowMeans(e) s = svd(y) z = s$d * t(s$v)
we can make an mds plot:
library(rafalib) ftissue = factor(tissue) mypar2(1,1) plot(z[1,],z[2,],col=as.numeric(ftissue)) legend("topleft",levels(ftissue),col=seq_along(ftissue),pch=1)
Now run the function
cmdscale
on the original data:d = dist(t(e)) mds = cmdscale(d)
What is the absolute value of the correlation between the first dimension of
z
and the first dimension in mds? -
What is the absolute value of the correlation between the second dimension of
z
and the second dimension in mds? -
Load the following dataset:
library(GSE5859Subset) data(GSE5859Subset)
Compute the svd and compute
z
.s = svd(geneExpression-rowMeans(geneExpression)) z = s$d * t(s$v)
Which dimension of
z
most correlates with the outcomesampleInfo$group
? -
What is this max correlation?
-
Which dimension of
z
has the second highest correlation with the outcomesampleInfo$group
? -
Note these measurements were made during two months:
sampleInfo$date
We can extract the month this way:
month = format( sampleInfo$date, "%m") month = factor( month)
Which dimension of
z
has the second highest correlation with the outcomemonth
-
What is this correlation?
-
(Advanced) The same dimension is correlated with both the group and the date. The following are also correlated:
table(sampleInfo$g,month)
So is this first dimension related directly to group or is it related only through the month? Note that the correlation with month is higher. This is related to batch effects which we will learn about later.
In exercise 3 we saw that one of the dimensions was highly correlated to the
sampleInfo$group
. Now take the 5th column of and stratify by the gene chromosome. RemovechrUn
and make a boxplot of the values of stratified by chromosome.Which chromosome looks different from the rest? Copy and paste the name as it appears in
geneAnnotation
.
Given the answer to the last exercise, any guesses as to what sampleInfo$group
represents?