This is a brief summary of ML course provided by Andrew Ng and Stanford in Coursera.
You can find the lecture video and additional materials in
https://www.coursera.org/learn/machine-learning/home/welcome
Objective: How to fit best possible straight line to our data.
Hypothesis: $h_{\theta}(x) = \theta_0 + \theta_1 x $
$\theta_i$ : Parameters
How to choose $\theta_i$?
Quiz:
1. 0, 1
2. 0.5, 1
3. 1, 0.5
4. 1, 1
Answer: 2
Values for the parameters theta zero and theta one, that corresponds to a good fit to the data
Idea: Choose $\theta_0, \theta_1$ so that $h_{\theta}(x)$, the value we predict on input x, is close to y for the training examples (x,y)
Minimize $\theta_0, \theta_1$ $ \frac{1}{2m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) -y^{(i)})^2$ (to be small)
$h_{\theta} (x^{(i)})= \theta_0 + \theta_1 x^{(i)} $
Objective function for linear regression is about finding the values of theta zero and theta one so that the average, the 1 over the 2m, times the sum of squre errors between my predictions on the training set.
Cost function is also known as the squared error function. These squared error cost function is a reasonable choice and works well for problems for most regression programs. Most common for regression probs.
Lecturer's Note
We can measure the accuracy of our hypothesis function by using a cost function. This takes an average difference of all the results of the hypothesis with inputws from x's and the actual output y's
To break it apart, it is $\frac{1}{2} \overline{x}$, where $\overline{x}$ is the mean of the squares of $h_\theta (x_{i}) - y_{i}$, or the difference between the predicted value and the actual value.
This function is otherwise called the "Squared error function", or "Mean squared error". The mean is halved $\frac{1}{2}$ as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the $\frac{1}{2}$ term.