[Machine Learning by Stanford] Gradient Descent for Multiple Variable

This is a brief summary of ML course provided by Andrew Ng and Stanford in Coursera.

You can find the lecture video and additional materials in

https://www.coursera.org/learn/machine-learning/home/welcome

Coursera | Online Courses From Top Universities. Join for Free

1000+ courses from schools like Stanford and Yale - no application required. Build career skills in data science, computer science, business, and more.

www.coursera.org

Hypothesis: $h_{\theta}(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 + \theta_4 x_4 = \theta^T x$

Parameters: $\theta_0, \theta_1, ... ,\theta_n$ -> $\theta$ of n+1 dimensional vector

Cost Function: $J(\theta_0, \theta_1, ... \theta_n) = \frac{1}{2m}\sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)})^2$

-> $J(\theta) =

Quiz: When there are n features, we define the cost function as

$J(\theta) = \frac{1}{2m}\sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)})^2$.

For linear regression, which of the following are also equivalent and correct definitions of $J(\theta)$?

1. $J(\theta) = \frac{1}{2m}\sum_{i=1}^m (\theta^T(x^{(i)}) - y^{(i)})^2$.

2. $J(\theta) = \frac{1}{2m}\sum_{i=1}^m ((\sum_{j=0}^n \theta_j x_j^{(i)} ) - y^{(i)} )^2$.

3. $J(\theta) = \frac{1}{2m}\sum_{i=1}^m ((\sum_{j=1}^n \theta_j x_j^{(i)} ) - y^{(i)} )^2$.

4. $J(\theta) = \frac{1}{2m}\sum_{i=1}^m ((\sum_{j=0}^n \theta_j x_j^{(i)}) - (\sum_{j=0}^n y_j^{(i)}))^2$.

Answer: 1, 2

Gradient Descent:

Repeat {
$\theta_j := \theta_j - \alpha \frac{\partial}{\partial\theta_j} J(\theta_0, ..., \theta_n)$

} simultaneously update for every j = 0, ..., n

-> Repeat {
$\theta_j := \theta_j - \alpha \frac{\partial}{\partial\theta_j} J(\theta)$

}

Lecturer's Note

The gradient descent equation itself is generally the same form; we just have to repeat it for our 'n' features:

Repeat until convergence: {

$\theta_0 := \theta_0 - \alpha frac{1}{m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)}) x_0^{(i)}$

$\theta_1 := \theta_1 - \alpha frac{1}{m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)}) x_1^{(i)}$

$\theta_2 := \theta_2 - \alpha frac{1}{m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)}) x_2^{(i)}$

}

In other words,

Repeat until convergence: {

$\theta_j := \theta_j - \alpha frac{1}{m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)}) x_j^{(i)}$, for j:= 0 ... n

}

저작자표시 비영리 변경금지

'BITS' 카테고리의 다른 글

[Machine Learning by Stanford] Gradient Descent in Practice II - Learning Rate (0)	2019.04.03
[Machine Learning by Stanford] Gradient Descent in Practice I - Feature Scaling (0)	2019.04.03
[Machine Learning by Stanford] Multivariate Linear Regression (0)	2019.04.02
[Machine Learning by Stanford] Linear Algebra Review - Inverse and Transpose (0)	2019.04.02
[Machine Learning by Stanford] Linear Algebra Review - Matrix Multiplication Properties (0)	2019.04.02

The Privilege Is Ours

[Machine Learning by Stanford] Gradient Descent for Multiple Variable

'BITS' 카테고리의 다른 글

티스토리툴바

[Machine Learning by Stanford] Gradient Descent for Multiple Variable

'BITS' 카테고리의 다른 글

'BITS' Related Articles

티스토리툴바