This is a brief summary of ML course provided by Andrew Ng and Stanford in Coursera.
You can find the lecture video and additional materials in
https://www.coursera.org/learn/machine-learning/home/welcome
Hypothesis: $h_{\theta}(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 + \theta_4 x_4 = \theta^T x$
Parameters: $\theta_0, \theta_1, ... ,\theta_n$ -> $\theta$ of n+1 dimensional vector
Cost Function: $J(\theta_0, \theta_1, ... \theta_n) = \frac{1}{2m}\sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)})^2$
-> $J(\theta) =
Quiz: When there are n features, we define the cost function as
$J(\theta) = \frac{1}{2m}\sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)})^2$.
For linear regression, which of the following are also equivalent and correct definitions of $J(\theta)$?
1. $J(\theta) = \frac{1}{2m}\sum_{i=1}^m (\theta^T(x^{(i)}) - y^{(i)})^2$.
2. $J(\theta) = \frac{1}{2m}\sum_{i=1}^m ((\sum_{j=0}^n \theta_j x_j^{(i)} ) - y^{(i)} )^2$.
3. $J(\theta) = \frac{1}{2m}\sum_{i=1}^m ((\sum_{j=1}^n \theta_j x_j^{(i)} ) - y^{(i)} )^2$.
4. $J(\theta) = \frac{1}{2m}\sum_{i=1}^m ((\sum_{j=0}^n \theta_j x_j^{(i)}) - (\sum_{j=0}^n y_j^{(i)}))^2$.
Answer: 1, 2
Gradient Descent:
Repeat {
$\theta_j := \theta_j - \alpha \frac{\partial}{\partial\theta_j} J(\theta_0, ..., \theta_n)$
} simultaneously update for every j = 0, ..., n
-> Repeat {
$\theta_j := \theta_j - \alpha \frac{\partial}{\partial\theta_j} J(\theta)$
}
Lecturer's Note
The gradient descent equation itself is generally the same form; we just have to repeat it for our 'n' features:
Repeat until convergence: {
$\theta_0 := \theta_0 - \alpha frac{1}{m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)}) x_0^{(i)}$
$\theta_1 := \theta_1 - \alpha frac{1}{m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)}) x_1^{(i)}$
$\theta_2 := \theta_2 - \alpha frac{1}{m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)}) x_2^{(i)}$
}
In other words,
Repeat until convergence: {
$\theta_j := \theta_j - \alpha frac{1}{m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)}) x_j^{(i)}$, for j:= 0 ... n
}