본문 바로가기

BITS

[Machine Learning by Stanford] Classification and Representation - Classification

This is a brief summary of ML course provided by Andrew Ng and Stanford in Coursera.

You can find the lecture video and additional materials in 

https://www.coursera.org/learn/machine-learning/home/welcome

 

Coursera | Online Courses From Top Universities. Join for Free

1000+ courses from schools like Stanford and Yale - no application required. Build career skills in data science, computer science, business, and more.

www.coursera.org

Classification is about variable y that you want to predict is a discrete value. 

Logistic Regression, most widely used, algorithm will be used. 

 

Classification Example

- Email: Spam/ Not Spam?

- Online Transactions: Fraudulent (Yes/ No)?

- Tumor: Malignant/ Benign?

 

$ y \in {0, 1} $

0: Negative class (e.g., benign tumor), 1: Positive Class (e.g., malignant tumor)

Can be improved to multi-class problems.  --> $ y \in {0, 1, 2, 3} $ 

 

We can apply linear regression like above to classify whether tumor is malignant or not. 

There is  a threshold classifier output $h_{\theta}(x)$  at 0.5:

  if $h_{\theta}(x) \geq 0.5, predict y = "1" $

  if $h_{\theta}(x) < 0.5, predict y = "0" $

 

In this example (depicted in the above figure), linear regression is actually doing something reasonable. 

 

But by simply one add data, linear regression will be move along leading to worse hypothesis, won't be very effective. So applying a linear regression to classification problem is not so good. 

 

Classification y = 0 or 1

if using linear regession, there can be cases where $h_{\theta}(x) can be > 1 or < 0 $ while have all y = 0 or 1

 

By developing algorithm Logistic Regression, as a classification algorithm, : $0 \leq h_{\theta}(x) \leq 1$

(Word regression doesn't do anything with linear regression kind of logic)

 

Lecturer's Note

To attempt classification, one method is to use linear regression and map all predictions greater than 0.5 as a 1 and all less than 0.5 as a 0. However, this method doesn't work well because classification is not actually a linear function.

 

The classification problem is just like the regression problem, except that the values we now want to predict take on only a small number of discrete values. For now, we will focus on the binary classification problem in which y can take on only two values, 0 and 1. (Most of what we say here will also generalize to the multiple-class case.) For instance, if we are trying to build a spam classifier for email, then $x^{(i)}$ may be some features of a piece of email, and y may be 1 if it is a piece of spam mail, and 0 otherwise. Hence, y∈{0,1}. 0 is also called the negative class, and 1 the positive class, and they are sometimes also denoted by the symbols “-” and “+.” Given $x^{(i)}$, the corresponding $y^{(i)}$ is also called the label for the training example.