본문 바로가기

BITS

[Machine Learning by Stanford] Classification and Representation - Hypothesis Representation

This is a brief summary of ML course provided by Andrew Ng and Stanford in Coursera.

You can find the lecture video and additional materials in 

https://www.coursera.org/learn/machine-learning/home/welcome

 

Coursera | Online Courses From Top Universities. Join for Free

1000+ courses from schools like Stanford and Yale - no application required. Build career skills in data science, computer science, business, and more.

www.coursera.org

What is the function we are going to use to represent our hypothesis when we have a classification problem?

 

Logistic Regression Model

$0 \leq h_{\theta}(x) \leq 1$

 

For linear regression,

the form of hypothesis is something like:

$h_{\theta}(x) = {\theta}^Tx$

with logistic regression,

1. $h_{\theta}(x) = g({\theta}^Tx)$

2. $g(z) = \frac{1}{1+e^{-z}}$

this function g of z is called sigmoid function, or interchangably logistic function. 

 

equation 1 and 2 can be combined to: 

$ h_{\theta}(x) = \frac{1}{1 + e^ {-\theta^T x}}$

For sigmoid function, as z goes to minus infinity, g(z) approaches zero, and as z goes to plus infinity, g(z) approaches 1. 

Interpretation of Hypothesis Output 

$h_{\theta} (x)$ = estimated probability that y = 1 on input x

Example: 

            if $x = \left[\begin{array} {rrr} x_0 \\ x_1 \end{array}\right] $

            $= \left[\begin{array} {rrr} 1 \\ tumorSize \end{array}\right] $

$h_{\theta}(x) = 0.7 $ 

--> Tell patient that 70% chance of tumor being malignant. 

 

$h_{\theta}(x) = p(y=1 | x; \theta) $ -> "Probability that y = 1, given x, parameterized by $\theta$"

 

QuizSuppose we want to predict, from data x about a tumor, whether it is malignant (y=1) or benign (y=0). Our logistic regression classifier outputs, for a specific tumor, hθ(x)=P(y=1|x;θ)=0.7, so we estimate that there is a 70% chance of this tumor being malignant. What should be our estimate for P(y=0|x;θ), the probability the tumor is benign?

1. P(y=0|x;θ)=0.3

2. P(y=0|x;θ)=0.7

3. P(y=0|x;θ)=0.72

4. P(y=0|x;θ)=0.3×0.7

 

Answer: 1

 

Because y = 0 or 1, following equations should work, 

P(y = 0| x; $\theta$) + P(y=1|x; $\theta$) = 1

P(y = 0| x; $\theta$) = 1 - P(y=1|x; $\theta$) 

 

Lecturer's Note

Hypothesis Representation

We could approach the classification problem ignoring the fact that y is discrete-valued, and use our old linear regression algorithm to try to predict y given x. However, it is easy to construct examples where this method performs very poorly. Intuitively, it also doesn’t make sense for h_\theta (x)hθ(x) to take values larger than 1 or smaller than 0 when we know that y ∈ {0, 1}. To fix this, let’s change the form for our hypotheses $h_\theta (x)$ to satisfy 0 $\leq h_{\theta} (x) \leq 1 $. This is accomplished by plugging $\theta^Tx$ into the Logistic Function.

 

Our new form uses the "Sigmoid Function," also called the "Logistic Function":

The function g(z), shown here, maps any real number to the (0, 1) interval, making it useful for transforming an arbitrary-valued function into a function better suited for classification.

$h_{\theta}(x)$ will give us the probability that our output is 1. For example, $h_{\theta}(x) = 0.7 $ gives us a probability of 70% that our output is 1. Our probability that our prediction is 0 is just the complement of our probability that it is 1 (e.g. if probability that it is 1 is 70%, then the probability that it is 0 is 30%).