본문 바로가기

BITS

[Machine Learning by Stanford] Gradient Descent for Multiple Variable

This is a brief summary of ML course provided by Andrew Ng and Stanford in Coursera.

You can find the lecture video and additional materials in 

https://www.coursera.org/learn/machine-learning/home/welcome

 

Coursera | Online Courses From Top Universities. Join for Free

1000+ courses from schools like Stanford and Yale - no application required. Build career skills in data science, computer science, business, and more.

www.coursera.org

Hypothesis: $h_{\theta}(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 + \theta_4 x_4 = \theta^T x$

 

Parameters: $\theta_0, \theta_1, ... ,\theta_n$ -> $\theta$ of n+1 dimensional vector

 

Cost Function: $J(\theta_0, \theta_1, ... \theta_n) = \frac{1}{2m}\sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)})^2$

-> $J(\theta) = 

 

Quiz: When there are n features, we define the cost function as

$J(\theta) = \frac{1}{2m}\sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)})^2$.

For linear regression, which of the following are also equivalent and correct definitions of $J(\theta)$?

 

1. $J(\theta) = \frac{1}{2m}\sum_{i=1}^m (\theta^T(x^{(i)}) - y^{(i)})^2$.

2. $J(\theta) = \frac{1}{2m}\sum_{i=1}^m ((\sum_{j=0}^n \theta_j x_j^{(i)} ) - y^{(i)} )^2$.

3. $J(\theta) = \frac{1}{2m}\sum_{i=1}^m ((\sum_{j=1}^n \theta_j x_j^{(i)} ) - y^{(i)} )^2$.

4. $J(\theta) = \frac{1}{2m}\sum_{i=1}^m ((\sum_{j=0}^n \theta_j x_j^{(i)}) - (\sum_{j=0}^n y_j^{(i)}))^2$.

 

Answer: 1, 2

 

Gradient Descent: 

Repeat {
$\theta_j := \theta_j - \alpha \frac{\partial}{\partial\theta_j} J(\theta_0, ..., \theta_n)$

} simultaneously update for every j = 0, ..., n

-> Repeat {
$\theta_j := \theta_j - \alpha \frac{\partial}{\partial\theta_j} J(\theta)$

 

Lecturer's Note

The gradient descent equation itself is generally the same form; we just have to repeat it for our 'n' features:

Repeat until convergence: {

 $\theta_0 := \theta_0 - \alpha frac{1}{m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)}) x_0^{(i)}$

 $\theta_1 := \theta_1 - \alpha frac{1}{m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)}) x_1^{(i)}$

 $\theta_2 := \theta_2 - \alpha frac{1}{m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)}) x_2^{(i)}$

}

In other words,

Repeat until convergence: {

  $\theta_j := \theta_j - \alpha frac{1}{m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)}) x_j^{(i)}$,  for j:= 0 ... n

}