## Principal Components Analysis

## Principal Component Analysis

We have already mentioned principal component analysis (PCA) above and noted its relation to the SVD. Here we provide further mathematical details.

#### Example: Twin heights

We started the motivation for dimension reduction with a simulated example and showed a rotation that is very much related to PCA.

Here we explain specifically what are the principal components (PCs).

Let be matrix representing our data. The analogy is that we measure expression from 2 genes and each column is a sample. Suppose we are given the task of finding a vector such that and it maximizes . This can be viewed as a projection of each sample or column of into the subspace spanned by . So we are looking for a transformation in which the coordinates show high variability.

Let’s try . This projection simply gives us the height of twin 1 shown in orange below. The sum of squares is shown in the title.

```
mypar(1,1)
plot(t(Y), xlim=thelim, ylim=thelim,
main=paste("Sum of squares :",round(crossprod(Y[,1]),1)))
abline(h=0)
apply(Y,2,function(y) segments(y[1],0,y[1],y[2],lty=2))
```

```
## NULL
```

```
points(Y[1,],rep(0,ncol(Y)),col=2,pch=16,cex=0.75)
```

Can we find a direction with higher variability? How about:

? This does not satisfy so let’s instead try

```
u <- matrix(c(1,-1)/sqrt(2),ncol=1)
w=t(u)%*%Y
mypar(1,1)
plot(t(Y),
main=paste("Sum of squares:",round(tcrossprod(w),1)),xlim=thelim,ylim=thelim)
abline(h=0,lty=2)
abline(v=0,lty=2)
abline(0,-1,col=2)
Z = u%*%w
for(i in seq(along=w))
segments(Z[1,i],Z[2,i],Y[1,i],Y[2,i],lty=2)
points(t(Z), col=2, pch=16, cex=0.5)
```

This relates to the difference between twins, which we know is small. The sum of squares confirms this.

Finally, let’s try:

```
u <- matrix(c(1,1)/sqrt(2),ncol=1)
w=t(u)%*%Y
mypar()
plot(t(Y), main=paste("Sum of squares:",round(tcrossprod(w),1)),
xlim=thelim, ylim=thelim)
abline(h=0,lty=2)
abline(v=0,lty=2)
abline(0,1,col=2)
points(u%*%w, col=2, pch=16, cex=1)
Z = u%*%w
for(i in seq(along=w))
segments(Z[1,i], Z[2,i], Y[1,i], Y[2,i], lty=2)
points(t(Z),col=2,pch=16,cex=0.5)
```

This is a re-scaled average height, which has higher sum of squares. There is a mathematical procedure for determining which maximizes the sum of squares and the SVD provides it for us.

#### The principal components

The orthogonal vector that maximizes the sum of squares:

is referred to as the first PC. The *weights* used to obtain this PC are referred to as the *loadings*. Using the language of rotations, it is also referred to as the *direction* of the first PC, which are the new coordinates.

To obtain the second PC, we repeat the exercise above, but for the residuals:

The second PC is the vector with the following properties:

and maximizes .

When is we repeat to find 3rd, 4th, …, m-th PCs.

`prcomp`

We have shown how to obtain PCs using the SVD. However, R has a function specifically designed to find the principal components. In this case, the data is centered by default. The following function:

```
pc <- prcomp( t(Y) )
```

produces the same results as the SVD up to arbitrary sign flips:

```
s <- svd( Y - rowMeans(Y) )
mypar(1,2)
for(i in 1:nrow(Y) ){
plot(pc$x[,i], s$d[i]*s$v[,i])
}
```

The loadings can be found this way:

```
pc$rotation
```

```
## PC1 PC2
## [1,] 0.7072304 0.7069831
## [2,] 0.7069831 -0.7072304
```

which are equivalent (up to a sign flip) to:

```
s$u
```

```
## [,1] [,2]
## [1,] -0.7072304 -0.7069831
## [2,] -0.7069831 0.7072304
```

The equivalent of the variance explained is included in the:

```
pc$sdev
```

```
## [1] 1.2542672 0.2141882
```

component.

We take the transpose of `Y`

because `prcomp`

assumes the previously discussed ordering: units/samples in row and features in columns.