This is a brief summary of ML course provided by Andrew Ng and Stanford in Coursera.

You can find the lecture video and additional materials in

https://www.coursera.org/learn/machine-learning/home/welcome

Coursera | Online Courses From Top Universities. Join for Free

1000+ courses from schools like Stanford and Yale - no application required. Build career skills in data science, computer science, business, and more.

www.coursera.org

Simplifying the cost function (or combining the case of y=1 and y = 0) as belows:

$Cost (h_{\theta}(x), y) = -y log(h_{\theta}(x)) - (1-y) log (1-h_{\theta} (x))

Why this cost function instead of others that we could have chosen?

- this is from the principle of maximum likelihood estimation in statistics for how to efficiently find parameters' data for different models.

Given this cost function, in order to fit the params, try to find the params theta that minimize J of theta.

If you take partial derivatives term and plug it back in, we can write out the gradient descent algorithm as follows.

What has changed is that the definition for the hypothesis has changed.

So as whereas for linear regression, we had h(x) equals theta transpose x. and now this definition of h(x) has changed. Thus, even though it may look identical, this is actually not the same thing as gradient descent for linear regression.

Quiz: Suppose you are running gradient descent to fit a logistic regression model with parameter θ∈Rn+1. Which of the following is a reasonable way to make sure the learning rate α is set properly and that gradient descent is running correctly?

a. Plot J(θ)=1/m∑i=1m(hθ(x(i))−y(i))2 as a function of the number of iterations (i.e. the horizontal axis is the iteration number) and make sure J(θ) is decreasing on every iteration.

b. Plot J(θ)=−1/m∑i=1m[y(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))] as a function of the number of iterations and make sure J(θ) is decreasing on every iteration.

c. Plot J(θ) as a function of θ and make sure it is decreasing on every iteration.

d. Plot J(θ) as a function of θ and make sure it is convex.

answer: b

Quiz:

Answer: a

Lecturer's Note:

Simplified Cost Function and Gradient Descent

We can compress our cost function's two conditional cases into one case:

Cost(hθ(x),y)=−ylog(hθ(x))−(1−y)log(1−hθ(x))

Notice that when y is equal to 1, then the second term(1−y)log(1−hθ(x)) will be zero and will not affect the result. If y is equal to 0, then the first term −ylog(hθ(x)) will be zero and will not affect the result.

We can fully write out our entire cost function as follows:

A vectorized implementation is:

Gradient Descent

Remember that the general form of gradient descent is:

We can work out the derivative part using calculus to get:

Notice that this algorithm is identical to the one we used in linear regression. We still have to simultaneously update all values in theta.

A vectorized implementation is:

저작자표시 비영리 변경금지 (새창열림)

'BITS' 카테고리의 다른 글

[Machine Learning by Stanford] Solving the Problem of Overfitting - Cost Function (0)	2019.05.01
[Machine Learning by Stanford] Solving the Problem of Overfitting - The Problem of Overfitting (0)	2019.05.01
[Machine Learning by Stanford] Logistic Regression Model - Cost Function (0)	2019.05.01
[Machine Learning by Stanford] Classification and Representation - Decision Boundary (0)	2019.05.01
[Machine Learning by Stanford] Classification and Representation - Hypothesis Representation (0)	2019.05.01

The Privilege Is Ours

[Machine Learning by Stanford] Logistic Regression Model - Simplified Cost Function and Gradient Descent

Simplified Cost Function and Gradient Descent

Gradient Descent

'BITS' 카테고리의 다른 글

티스토리툴바

[Machine Learning by Stanford] Logistic Regression Model - Simplified Cost Function and Gradient Descent

Simplified Cost Function and Gradient Descent

Gradient Descent

'BITS' 카테고리의 다른 글

'BITS' Related Articles

티스토리툴바