Collinearity exercises
{pagebreak}
Exercises
Consider these design matrices:
-
Which of the above design matrices does NOT have the problem of collinearity?
-
The following exercises are advanced. Let’s use the example from the lecture to visualize how there is not a single best , when the design matrix has collinearity of columns. An example can be made with:
sex <- factor(rep(c("female","male"),each=4)) trt <- factor(c("A","A","B","B","C","C","D","D"))
The model matrix can then be formed with:
X <- model.matrix( ~ sex + trt)
And we can see that the number of independent columns is less than the number of columns of X:
qr(X)$rank
Suppose we observe some outcome Y. For simplicity, we will use synthetic data:
Y <- 1:8
Now we will fix the value for two coefficients and optimize the remaining ones. We will fix and . Then we will find the optimal value for the remaining betas, in terms of minimizing the residual sum of squares. We find the value that minimize:
where is the male column of the design matrix, is the D column, is a 1 by 3 matrix with the remaining column entries for unit , and is a 3 x 1 matrix with the remaining parameters.
So all we need to do is redefine as and fit a linear model. The following line of code creates this variable , after fixing to a value
a
, and to a value,b
:makeYstar <- function(a,b) Y - X[,2] * a - X[,5] * b
Now we’ll construct a function which, for a given value a and b, gives us back the sum of squared residuals after fitting the other terms.
fitTheRest <- function(a,b) { Ystar <- makeYstar(a,b) Xrest <- X[,-c(2,5)] betarest <- solve(t(Xrest) %*% Xrest) %*% t(Xrest) %*% Ystar residuals <- Ystar - Xrest %*% betarest sum(residuals^2) }
What is the sum of squared residuals when the male coefficient is 1 and the D coefficient is 2, and the other coefficients are fit using the linear model solution?
-
We can apply our function
fitTheRest
to a grid of values for and , using theouter
function in R.outer
takes three arguments: a grid of values for the first argument, a grid of values for the second argument, and finally a function which takes two arguments.Try it out:
outer(1:3,1:3,`*`)
We can run
fitTheRest
on a grid of values, using the following code (theVectorize
is necessary asouter
requires only vectorized functions):outer(-2:8,-2:8,Vectorize(fitTheRest))
In the grid of values, what is the smallest sum of squared residuals?