본문 바로가기

BITS

[Machine Learning by Stanford] Computing Parameters Analytically - Normal Equation Noninvertibility

This is a brief summary of ML course provided by Andrew Ng and Stanford in Coursera.

You can find the lecture video and additional materials in 

https://www.coursera.org/learn/machine-learning/home/welcome

 

Coursera | Online Courses From Top Universities. Join for Free

1000+ courses from schools like Stanford and Yale - no application required. Build career skills in data science, computer science, business, and more.

www.coursera.org

Normal Equation

$\theta = (X^TX)^{-1} X^T y$

- what if $X^TX$ is non-invertible? (singular/ degenerate)

- Octave: pinv(X' * X)*X'*y (works even if it is non-invertible) 

 

What if $X^TX$ is non-invertible?

- redundant features (linearly dependent)

e.g., $x_1$ = size in $feet^2$, $x_2$ = size in $m^2$

- too many features (e.g., $m \leq n$)

delete some features, or use regularization. 

 

Summarize:

If ever you find that x transpose x is singular or alternatively you find it non-invertible, what I would recommend you do is first look at your features and see if you have redundant features like this x1, x2. You are being linearly dependent or being a linear function of each other like so. And if you do have redundant features and if you just delete one of these features, you really don't need both of these features. If you just delete one of these features, that would solve your non-invertibility problem. And so I would first think through my features and check if any are redundant. And if so keep deleting redundant features until they are no longer redundant. 

 

And if your features are not redundant, I would check if I have too many features. And if that's the case, I would either delete some features if i can bear to use fewer features or else I would consider using regularization. 

 

Lecturer's note

When implementing the normal equation in octave we want to use the 'pinv' function rather than 'inv.' The 'pinv' function will give you a value of $\theta$ even if $X^TX$ is not invertible.

If $X^TX$ is noninvertible, the common causes might be having :

 

  • Redundant features, where two features are very closely related (i.e. they are linearly dependent)
  • Too many features (e.g. m ≤ n). In this case, delete some features or use "regularization" (to be explained in a later lesson).

Solutions to the above problems include deleting a feature that is linearly dependent with another or deleting one or more features when there are too many features.