Distance exercises
Exercises
If you have not done so already, install the data package tissueGeneExpression
:
library(devtools)
install_github("genomicsclass/tissuesGeneExpression")
The data represents RNA expression levels for eight tissues, each with several biological replictes. We call samples that we consider to be from the same population, such as liver tissue from different individuals, biological replicates:
library(tissuesGeneExpression)
data(tissuesGeneExpression)
head(e)
head(tissue)
-
How many biological replicates for hippocampus?
-
What is the distance between samples 3 and 45?
-
What is the distance between gene
210486_at
and200805_at
-
If I run the command (don’t run it!):
d = as.matrix( dist(e) )
how many cells (number of rows times number of columns) will this matrix have?
-
Compute the distance between all pair of samples:
d = dist( t(e) )
Read the help file for
dist
.How many distances are stored in
d
? Hint: What is the length of d? -
Why is the answer to exercise 5 not
ncol(e)^2
?- A) R made a mistake there.
- B) Distances of 0 are left out.
- C) Because we take advantage of symmetry: only lower triangular matrix is stored thus only
ncol(e)*(ncol(e)-1)/2
values. - D) Because it is equal
nrow(e)^2