The QR decomposition
The QR Factorization (Advanced)
We have seen that in order to calculate the LSE, we need to invert a matrix. In previous sections we used the function
solve. However, solve is not a stable solution. When coding LSE computation, we use the QR decomposition.
Remember that to minimize the RSS:
We need to solve:
The solution is:
Thus, we need to compute .
solve is numerically unstable
To demonstrate what we mean by numerically unstable, we construct an extreme case:
n <- 50;M <- 500 x <- seq(1,M,len=n) X <- cbind(1,x,x^2,x^3) colnames(X) <- c("Intercept","x","x2","x3") beta <- matrix(c(1,1,1,1),4,1) set.seed(1) y <- X%*%beta+rnorm(n,sd=1)
The standard R function for inverse gives an error:
solve(crossprod(X)) %*% crossprod(X,y)
To see why this happens, look at
## Intercept x x2 x3 ## Intercept 1.699 4.098 6.625 9.203 ## x 4.098 6.625 9.203 11.810 ## x2 6.625 9.203 11.810 14.434 ## x3 9.203 11.810 14.434 17.070
Note the difference of several orders of magnitude. On a digital computer, we have a limited range of numbers. This makes some numbers seem like 0, when we also have to consider very large numbers. This in turn leads to divisions that are practically divisions by 0 errors.
The QR factorization is based on a mathematical result that tells us that we can decompose any full rank matrix as:
- a matrix with
- a upper triangular matrix.
Upper triangular matrices are very convenient for solving system of equations.
Example of upper triangular matrix
In the example below, the matrix on the left is upper triangular: it only has 0s below the diagonal. This facilitates solving the system of equations greatly:
We immediately know that , which implies that . This in turn implies and thus so . Writing an algorithm to do this is straight-forward for any upper triangular matrix.
Finding LSE with QR
If we rewrite the equations of the LSE using instead of we have:
being upper triangular makes solving this more stable. Also, because , we know that the columns of are in the same scale which stabilizes the right side.
Now we are ready to find LSE using the QR decomposition. To solve:
backsolve which takes advantage of the upper triangular nature of .
QR <- qr(X) Q <- qr.Q( QR ) R <- qr.R( QR ) (betahat <- backsolve(R, crossprod(Q,y) ) )
## [,1] ## [1,] 0.9038 ## [2,] 1.0066 ## [3,] 1.0000 ## [4,] 1.0000
In practice, we do not need to do any of this due to the built-in
QR <- qr(X) (betahat <- solve.qr(QR, y))
## [,1] ## Intercept 0.9038 ## x 1.0066 ## x2 1.0000 ## x3 1.0000
This factorization also simplifies the calculation for fitted values:
In R, we simply do the following:
library(rafalib) mypar(1,1) plot(x,y) fitted <- tcrossprod(Q)%*%y lines(x,fitted,col=2)
To obtain the standard errors of the LSE, we note that:
chol2inv is specifically designed to find this inverse. So all we do is the following:
df <- length(y) - QR$rank sigma2 <- sum((y-fitted)^2)/df varbeta <- sigma2*chol2inv(qr.R(QR)) SE <- sqrt(diag(varbeta)) cbind(betahat,SE)
## SE ## Intercept 0.9038 4.508e-01 ## x 1.0066 7.858e-03 ## x2 1.0000 3.662e-05 ## x3 1.0000 4.802e-08
This gives us identical results to the
## Estimate Std. Error t value Pr(>|t|) ## XIntercept 0.9038 4.508e-01 2.005e+00 5.089e-02 ## Xx 1.0066 7.858e-03 1.281e+02 2.171e-60 ## Xx2 1.0000 3.662e-05 2.731e+04 1.745e-167 ## Xx3 1.0000 4.802e-08 2.082e+07 4.559e-300