This is a brief summary of ML course provided by Andrew Ng and Stanford in Coursera.
You can find the lecture video and additional materials in
https://www.coursera.org/learn/machine-learning/home/welcome
Coursera | Online Courses From Top Universities. Join for Free
1000+ courses from schools like Stanford and Yale - no application required. Build career skills in data science, computer science, business, and more.
www.coursera.org
Polynomial regression
Sometimes, non-linear regression model fits better and should apply the polynomial regression.
Quadratic model might not work because it will fall later.
Feature Scaling should work in this case below where you set the squred size and the cubic size as params.
Choice of features
Can apply squared root of the size instead of the cubic size.
Quiz: Suppose you want to predict a house's price as a function of its size. Your model is
$h_{\theta} (x) = \theta_0 + \theta_1(size) + \theta_2 \sqrt{size}$
Suppose size ranges from 1 to 1000 ( $ft^2$ ). You will implement this by fitting a model
$h_{\theta} (x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 $
Finally, suppose you want to use feature scaling (without mean normalization).
Which of the following choices for $x_1$ and $x_2$ should you use? (Note: $\sqrt{1000} \approx 32$)
Answer: $x_1 = \frac{size}{1000}, x_2 = \frac{\sqrt{size}}{32}$
Lecturer's Note
We can improve our features and the form of our hypothesis function in a couple different ways.
We can combine multiple features into one. For example, we can combine $x_1$ and $x_2$ into a new feature $x_3$ by taking $x_1$ x $x_2$ .
Polynomial Regression
Our hypothesis function need not be linear (a straight line) if that does not fit the data well.
We can change the behavior or curve of our hypothesis function by making it a quadratic, cubic or square root function (or any other form).
For example, if our hypothesis function is $h_{\theta} (x) = \theta_0 + \theta_1(size) x_1$, then we can create additional features based on $x_1$ to get the quadratic function $h_{\theta} (x) = \theta_0 + \theta_1 x_1 + \theta_2 x_1^2$ or the cubic function $h_{\theta} (x) = \theta_0 + \theta_1 x_1 + \theta_2 x_1^2 + \theta_3 x_1^3$
In the cubic version, we have created new features $x_2$ and $x_3$ where $x_2$ = $x_1^2$ and$x_3$ = $x_1^3$
To make it a square root function, we could do: $h_{\theta} (x) = \theta_0 + \theta_1 x_1 + \theta_2 \sqrt{x_1}$
One important thing to keep in mind is, if you choose your features this way then feature scaling becomes very important.
eg. if $x_1$ has range 1 - 1000 then range of $x_1^2$ becomes 1 - 1000000 and that of $x_1^3$ becomes 1 - 1000000000