Collinearity exercises
{pagebreak}
Exercises
Consider these design matrices:

Which of the above design matrices does NOT have the problem of collinearity?

The following exercises are advanced. Let’s use the example from the lecture to visualize how there is not a single best , when the design matrix has collinearity of columns. An example can be made with:
sex < factor(rep(c("female","male"),each=4)) trt < factor(c("A","A","B","B","C","C","D","D"))
The model matrix can then be formed with:
X < model.matrix( ~ sex + trt)
And we can see that the number of independent columns is less than the number of columns of X:
qr(X)$rank
Suppose we observe some outcome Y. For simplicity, we will use synthetic data:
Y < 1:8
Now we will fix the value for two coefficients and optimize the remaining ones. We will fix and . Then we will find the optimal value for the remaining betas, in terms of minimizing the residual sum of squares. We find the value that minimize:
where is the male column of the design matrix, is the D column, is a 1 by 3 matrix with the remaining column entries for unit , and is a 3 x 1 matrix with the remaining parameters.
So all we need to do is redefine as and fit a linear model. The following line of code creates this variable , after fixing to a value
a
, and to a value,b
:makeYstar < function(a,b) Y  X[,2] * a  X[,5] * b
Now we’ll construct a function which, for a given value a and b, gives us back the sum of squared residuals after fitting the other terms.
fitTheRest < function(a,b) { Ystar < makeYstar(a,b) Xrest < X[,c(2,5)] betarest < solve(t(Xrest) %*% Xrest) %*% t(Xrest) %*% Ystar residuals < Ystar  Xrest %*% betarest sum(residuals^2) }
What is the sum of squared residuals when the male coefficient is 1 and the D coefficient is 2, and the other coefficients are fit using the linear model solution?

We can apply our function
fitTheRest
to a grid of values for and , using theouter
function in R.outer
takes three arguments: a grid of values for the first argument, a grid of values for the second argument, and finally a function which takes two arguments.Try it out:
outer(1:3,1:3,`*`)
We can run
fitTheRest
on a grid of values, using the following code (theVectorize
is necessary asouter
requires only vectorized functions):outer(2:8,2:8,Vectorize(fitTheRest))
In the grid of values, what is the smallest sum of squared residuals?