본문 바로가기

BITS

[Machine Learning by Stanford] Introduction - Supervised Learning

This is a brief summary of ML course provided by Andrew Ng and Stanford in Coursera.

You can find the lecture video and additional materials in 

https://www.coursera.org/learn/machine-learning/home/welcome

 

Coursera | Online Courses From Top Universities. Join for Free

1000+ courses from schools like Stanford and Yale - no application required. Build career skills in data science, computer science, business, and more.

www.coursera.org

Predicting housing prices

- assume that you have a relevant dataset (prices of different houses and the size of them) 

- your friend has a 750 ft square size house and wants to find out how much it is.

- one way to find it is by putting/fitting a straight line through data 

- one better way is by fitting a quadratic function (or a second-order polynomial) to this data

 

Supervised Learning

"Right answers given" - we give the algorithm a data set in which the called "right answers" were given.

That is we gave it a data set of houses in which for every example in this data set. 

(what was the actual price that house sold for)

The task of the algorithm was to just produce more of these right answers such as for this new house that your friend may be trying to sell.

 

Regression Problem: Predict continuous valued output (price)

- Prices can be rounded off to the nearest cent. --> Could be discrete value, but we normally think of the price of a house as a real number, as a scalar value, as a continuous value number, and the term regression refers to the fact that we are trying to predict the sort of continuous values attribute. 

 

Breast Cancer (malignant, benign)

- look at medical records and predsict of a breast cancer as malignant or benign

- tumor size on the horizontal axes and 1 or 0 for yes or no on malignancy on the vertical axes.

- what is the chance that a tumor (pink-colored) as malignant versus benign?

Classification Problem: Discrete valued output (0 or 1)

- Sometimes you can have more than two possible values for the output.

- Can have more than 1 or 2 features (tumor size, age, clump thickness, uniformity of cell size, uniformity of cell shape, ... ) 

- You may want to have infinite numbers of features or attributes

 

 

Support Vector MAchine

- there will be a neat mathematical trick that will allow a computer to deal with an infinite number of features

 

Recap

- In supervised learning, in every example in our data set, we are told what is the correct answer that we would have quite liked the algorithms have predicted on the previous examples such as the price of the house, or tumor detections.  

- Regression: is to predict a continuous valued output.

- Classification: is to predict a discrete value output.

 

Quiz: You are running a company and you want to develop learning algorithms to address each of two problems.

1. You have a large inventory of identical itrems. you want to predict how many of these items will sell over the next 3 months.

2. You would like sw to examine individual customer accounts, anbd for each account decide if it has been hacked/ compromised.

 

Should you treat these as classification or as regression problems?

1. treat both as classification problems

2. Treat 1 as a classification, 2 as a regrewssion

3. Treat 1 as a regression, 2 as a classification

4. Treat both as regression problems

 

Answer 3

1 - continuous real values

2 - discrete values, 0: not hacked, 1: hacked 

 

Lecturer's Note

- In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

- Supervised learning problems are categorized into "regression" and "classification" problems. In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.

 

Example 1:

Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.

We could turn this example into a classification problem by instead making our output about whether the house "sells for more or less than the asking price." Here we are classifying the houses based on price into two discrete categories.

Example 2:

(a) Regression - Given a picture of a person, we have to predict their age on the basis of the given picture

(b) Classification - Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.