본문 바로가기

BITS

[Machine Learning by Stanford] Logistic Regression Model - Cost Function

This is a brief summary of ML course provided by Andrew Ng and Stanford in Coursera.

You can find the lecture video and additional materials in 

https://www.coursera.org/learn/machine-learning/home/welcome

 

Coursera | Online Courses From Top Universities. Join for Free

1000+ courses from schools like Stanford and Yale - no application required. Build career skills in data science, computer science, business, and more.

www.coursera.org

 

How to fit the param theta in the logistic regression. 

Define optimization objective, or the cost function that will be used to fit the parameters.

 

Assume that we have m examples, and each example is represented by a n+1 dimensional. 

and $x_0 = 1, y \in {0, 1}$ because as a classification problem, every label y is either 0 or 1.

Cost Function:

Linear Regression: $J(\theta) = \frac{1}{m}\sum_{i=1}^m \frac{1}{2} (h_{\theta}(x^{(i)}) - y^{(i)})^2$

$Cost(h_{\theta}(x^{(i)}), y^{(i)}) = \frac{1}{2} (h_{\theta}(x^{(i)}) - y^{(i)})^2$

-> Cost that I want my learning algorithm to have to pay if it outputs that value

This is a non-convest cost function. If you were to run gradient descent on this sort of function, it is not guaranteed to converge to the global minimum. 

 

Whereas in contrast what we would like is to have a cost function j of theta that is convex, that is a single bow-shaped function that looks like this so that if you run gradient descent, we would be guaranteed that would converge to the global minimum.

 

And the problem with using this squared cost function (for linear regression) is that because fo this very non-linear sigmoid function that appears in the middle, J of theta end up being a non-convex function if you were to define it as a square cost function. 

 

Quiz: 

Answer: 2

 

So what we would like to do is, instead of come up with a different cost function, that is a convex, and so that we can apply a great algorithm like gradient descent, and be guaranteed to find the global min.

 

Logistic regression cost function

 

$Cost (h_{\theta}(x), y) = - log (h_{\theta}(x)) if y = 1, -log (1-h_{\theta}(x)_ if y = 0

 

Cost = 0 if y= 1, $h_{\theta} (x) = 1$, 

But as $h_{\theta}(x) $ -> 0, Cost -> inifinity

Captures intuition that if $h_{\theta} (x) = 0$, (predict $P(y=1|x; \theta$) = 0), but y=1, we will penalize learning algorithm by a very large cost. 

 

Quiz: In logistic regression, the cost function for our hypothesis outputting (predicting) hθ(x) on a training example that has label y{0,1} is:

cost(hθ(x),y)=loghθ(x) if y=1, log(1hθ(x)) if y=0

Which of the following are true? Check all that apply.

a. If hθ(x)=y, then cost(hθ(x),y)=0 (for y=0 and y=1).

b. If y=0 then cost(hθ(x),y) as hθ(x)1.

c. If y=0y=0, then cost(hθ(x),y) as hθ(x)0.

d. Regardless of whether y=0 or y=1, if hθ(x)=0.5, then cost(hθ(x),y)>0.

 

Answer: a,b,d

 

Lecturer's Note:

Cost Function

We cannot use the same cost function that we use for linear regression because the Logistic Function will cause the output to be wavy, causing many local optima. In other words, it will not be a convex function.

Instead, our cost function for logistic regression looks like:

If our correct answer 'y' is 0, then the cost function will be 0 if our hypothesis function also outputs 0. If our hypothesis approaches 1, then the cost function will approach infinity.

If our correct answer 'y' is 1, then the cost function will be 0 if our hypothesis function outputs 1. If our hypothesis approaches 0, then the cost function will approach infinity.

Note that writing the cost function in this way guarantees that J(θ) is convex for logistic regression.