Advanced – A Closer Look at the OLS Estimator

Understanding how our estimators behave is crucial for making accurate predictions. In my blog about Linear Regression, we covered the OLS estimator, which characterizes the weight vector of our linear regression model in terms of $X$ and $y$ :

$w_{LS} = (X^{T}X)^{-1}X^{T}y$

In many practical situations, we assume that $y$ is drawn from a normal distribution. The mean of the distribution is represented by $Xw$ , where $w$ denotes the weights or coefficients assigned to our model. The covariance is given by $\sigma^{2}I$ , where $\sigma^2$ represents the variance of the errors and $I$ is the identity matrix.

Given that $y$ adheres to a specific distribution, it logically follows that $w_{LS}$ also possesses its own distribution. Unraveling the parameters defining this distribution for $w_{LS}$ could provide valuable insights into the fundamental principles at play. To unpack this concept, let’s calculate the mean and variance of $w_{LS}$ .

Evaluating the Mean:

We calculate the expected value of $w_{LS}$ using the formula

$\mathbb{E}[w_{LS}] = \mathbb{E}[(X^{T}X)^{-1}X^{T}y]$

utilizing the linearity of expectation, this simplifies to:

$= (X^{T}X)^{-1}X^{T}\mathbb{E}[y]$

and because $y \sim N(Xw, \sigma^{2}I)$ , we can substitute $\mathbb{E}[y]$ with $Xw$ :

$= (X^{T}X)^{-1}X^{T}Xw$

$= w$

what this tells us, is that the mean of the MLE (i.e. OLS estimator) of $w$ is indeed $w$ itself — the very parameter that represents the mean of our response variable $y$ in the model $Xw$ . In other words, under the assumption of a normal distribution for $y$ , the MLE $w_{LS}$ not only fits our data but also aligns with the expected value of our response variable.

But this insight is only half of a larger picture, and an important question remains: can we realistically expect to obtain a result close to this expected value in practical scenarios? To address this, let’s consider the variance.

Evaluating the Variance:

Again, we understand $y \sim N(Xw, \sigma^{2}I)$ , and therefore $\text{Var}[y] = \sigma^{2}I$ . This can also be expressed in terms of expectation:

$\text{Var}[y] = \mathbb{E}[yy^T] - \mathbb{E}[y]\mathbb{E}[y]^{T}$

Since $\mathbb{E}[y] = Xw$ , we can adjust this formula to:

$\sigma^{2}I = \mathbb{E}[yy^T] - Xww^{T}X^{T}$

Rearranging the terms gives an expression for $\mathbb{E}[yy^T]$ :

$\mathbb{E}[yy^T] = \sigma^{2}I + Xww^{T}X^{T}$

This calculation plays a crucial role in understanding the behavior of our model. Specifically, it helps in calculating the variance of our OLS estimator, $w^{LS}$ , which is given by the formula:

$\text{Var}[w_{LS}] = \mathbb{E}[w_{LS}w_{LS}^{T}] - \mathbb{E}[w_{LS}]\mathbb{E}[w_{LS}]^{T}$

From earlier discussion, we know that $\mathbb{E}[w_{LS}] = w$ , simplifying our variance formula to:

$\text{Var}[w_{LS}] = \mathbb{E}[w_{LS}w_{LS}^{T}] - ww^{T}$

Next, let’s substitute our OLS estimator formula into the variance equation:

$= \mathbb{E}[(X^{T}X)^{-1}X^{T}yy^{T}X(X^{T}X)^{-1}] - ww^{T}$

Applying the linearity of expectation, we have

$= (X^{T}X)^{-1}X^{T} \mathbb{E}[yy^{T}] X(X^{T}X)^{-1} - ww^{T}$

Now, recall the expression $\mathbb{E}[yy^T] = \sigma^{2}I + Xww^{T}X^{T}$ we derived earlier, substituting this into our variance equation gives us:

$= (X^{T}X)^{-1}X^{T}(\sigma^{2}I +Xww^{T}X^{T})X(X^{T}X)^{-1} - ww^{T}$

Expanding this and simplifying, we find

$= \sigma^{2}(X^{T}X)^{-1} + (X^{T}X)^{-1}X^{T}Xww^{T}X^{T}X(X^{T}X)^{-1} - ww^{T}$

$= \sigma^{2}(X^{T}X)^{-1} + ww^{T} - ww^{T}$

$= \sigma^{2}(X^{T}X)$

This formula highlights how the variance of $w_{LS}$ depends on the variance of errors in the regression model ( $\sigma^2$ ) and X.

When $X^{T}X$ is nearly singular due to multicollinearity, the eigenvalues become very small. Consequently the eigenvalues of its inverse become very large. Remember, the eigenvalues of the inverse are the reciprocals of the eigenvalues of the original matrix. Since $\text{Var}[w_{LS}] \propto (X^{T}X)^{-1}$ , these large eigenvalues would imply that the variance of our OLS estimator would be inflated. This inflation in variance leads to less reliable predictions, as it indicates a high level of uncertainty or instability in our coefficient estimates. Essentially, the high correlation among our predictors causes our model to be less certain about the true effect of each individual predictor.

Share Your Feedback

BGecko

Learn how to see.

Advanced – A Closer Look at the OLS Estimator

Evaluating the Mean:

Evaluating the Variance:

Leave a Reply Cancel reply

Evaluating the Mean:

Evaluating the Variance:

Share this:

Related

Leave a Reply Cancel reply