This is a brief summary of ML course provided by Andrew Ng and Stanford in Coursera.
You can find the lecture video and additional materials in
https://www.coursera.org/learn/machine-learning/home/welcome
Normal Equation
$\theta = (X^TX)^{-1} X^T y$
- what if $X^TX$ is non-invertible? (singular/ degenerate)
- Octave: pinv(X' * X)*X'*y (works even if it is non-invertible)
What if $X^TX$ is non-invertible?
- redundant features (linearly dependent)
e.g., $x_1$ = size in $feet^2$, $x_2$ = size in $m^2$
- too many features (e.g., $m \leq n$)
delete some features, or use regularization.
Summarize:
If ever you find that x transpose x is singular or alternatively you find it non-invertible, what I would recommend you do is first look at your features and see if you have redundant features like this x1, x2. You are being linearly dependent or being a linear function of each other like so. And if you do have redundant features and if you just delete one of these features, you really don't need both of these features. If you just delete one of these features, that would solve your non-invertibility problem. And so I would first think through my features and check if any are redundant. And if so keep deleting redundant features until they are no longer redundant.
And if your features are not redundant, I would check if I have too many features. And if that's the case, I would either delete some features if i can bear to use fewer features or else I would consider using regularization.
Lecturer's note
When implementing the normal equation in octave we want to use the 'pinv' function rather than 'inv.' The 'pinv' function will give you a value of $\theta$ even if $X^TX$ is not invertible.
If $X^TX$ is noninvertible, the common causes might be having :
- Redundant features, where two features are very closely related (i.e. they are linearly dependent)
- Too many features (e.g. m ≤ n). In this case, delete some features or use "regularization" (to be explained in a later lesson).
Solutions to the above problems include deleting a feature that is linearly dependent with another or deleting one or more features when there are too many features.