csce 633: machine learning · lecture 1: introduction texas a&m university 8-26-19. welcome to...

48
CSCE 633: Machine Learning Lecture 1: Introduction Texas A&M University 8-26-19

Upload: others

Post on 11-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

CSCE 633: Machine Learning

Lecture 1: Introduction

Texas A&M University

8-26-19

Page 2: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Welcome to CSCE 633!

• About this class• Who are we?• Syllabus• Text and other references

• Introduction to Machine Learning

B Mortazavi CSECopyright 2019 1

Page 3: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Goals of this lecture

• Course Structure

• Machine Learning Common Tasks

• Terminology

• Learning Scenarios

• Mathematical Framework

• Generalization/Empirical Error

• Bayes Error

• Bias Variance Tradeoff

• No Free Lunch Theorem

B Mortazavi CSECopyright 2019 2

Page 4: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Who are we?

• Instructor• Bobak Mortazavi• [email protected] - Please put [CSCE 633] in subject line• Office Hours: W 10-11a R 11-12p 328A HRBB

• TA• Arash Pakbin• [email protected]• Office Hours: M 10:15-11:15a W 4:30-5:30p 320 HRBB

• Discussion Board on eCampus

B Mortazavi CSECopyright 2019 3

Page 5: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Syllabus, Schedule, and eCampus

B Mortazavi CSECopyright 2019 4

Page 6: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Text Book

Textbook for this course

B Mortazavi CSECopyright 2019 5

Page 7: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Other useful references

B Mortazavi CSECopyright 2019 6

Page 8: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Prerequisite Topics

The topics you need to know for this class from probability theoryand matrix operations are:

• Linear Algebra Review (Appendix A of the book)

• Probability Review (Appendix C of the book)

B Mortazavi CSECopyright 2019 7

Page 9: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Assignments

• Homework will be compiled in Latex - you will turn in PDFs

• Programming for homework will be in R or Python.

• for Projects - if you need to use a different language pleaseexplain why

• Project teams will be 2-3 people per team.

• Project proposals will be a 2 page written document

• Project report will be an 8 page IEEE conference-style report

B Mortazavi CSECopyright 2019 8

Page 10: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Topics

• About this class

• Introduction to Machine Learning• What is machine learning?• Models and accuracy• Some general supervised learning

B Mortazavi CSECopyright 2019 9

Page 11: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

What is machine learning?

B Mortazavi CSECopyright 2019 10

Page 12: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

What is machine learning?

Big Data! 40 billion web pages, 100 hours of videos on YouTubeevery minute, genomes of 1000s of people.

Murphy defines Machine learning as a set of methods that canautomatically detect patterns in data, and then use the uncoveredpatterns to predict future data, or to perform other kindso fdecision making under uncertainty (such as planning how to collectmore data!).

B Mortazavi CSECopyright 2019 11

Page 13: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Some cool examples!

Power P, Hobbs J, Ruiz H, Wei X, Lucey P. MythbustingSet-Pieces in Soccer. MIT Sloan Sports Conference 2018

B Mortazavi CSECopyright 2019 12

Page 14: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Some cool examples!

Power P, Hobbs J, Ruiz H, Wei X, Lucey P. MythbustingSet-Pieces in Soccer. MIT Sloan Sports Conference 2018

B Mortazavi CSECopyright 2019 13

Page 15: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Some cool examples!

Waymo.com/tech

B Mortazavi CSECopyright 2019 14

Page 16: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Some cool examples!

Waymo.com/tech

B Mortazavi CSECopyright 2019 15

Page 17: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Some cool examples!

Waymo.com/tech

B Mortazavi CSECopyright 2019 16

Page 18: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Standard Machine Learning Tasks

Machine learning tasks mostly fall into the following tasks:

• Classification - assigning a category to each item, i.e., textclassification and speech recognition

• Regression - learning a real value for each item, i.e., stockvalue prediction

• Clustering - learning how to partition a set of items intohomogeneous subsets, i.e., identifying communities withinlarge groups of people

• Dimensionality Reduction - learning a transformation from theinitial representation into a lower-dimensional representation,i.e., digital image preprocessing

B Mortazavi CSECopyright 2019 17

Page 19: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Machine Learning Terminology (buzzwords!)

The list of definitions and terminology commonly used in ML areas follows:

• Examples - instances of data for either learning or evaluation

• Features - set of attributes of a vector, often a vector

• Labels - values/categories assigned to examples

• Hyperparameters - free parameters not determined in training,but rather specified as inputs

• Training sample - examples used in training

• Validation sample - examples used in parameter tuning

• Test sample - examples used for performance evaluation

B Mortazavi CSECopyright 2019 18

Page 20: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Machine Learning Terminology (buzzwords!)

• Loss function - A function measuring the difference between apredicted label and the true label. This function is to beminimized in the optimization stage.

• Hypothesis set - A set of functions which map the featurevectors to the set of labels.

B Mortazavi CSECopyright 2019 19

Page 21: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Learning Scenarios

Based on how the training data is available to the learner, severaltraining scenarios can occur:

• Supervised Learning - the learner receives a set of labeledexamples, making predictions for all unseen points. Mostcommon scenario. Could be, among others, classification orregression.

• Unsupervised Learning - the training data is unlabeled and thelearner makes predictions for unseen points. Examples areclustering and dimensionality reduction.

• Semi-supervised Learning - the training sample containsunlabeled AND labeled samples, making predictions for unseensamples. Common when labels are expensive to obtain.

B Mortazavi CSECopyright 2019 20

Page 22: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Learning Scenarios

• On-line Learning - training and testing phases are intermixed.The data is made available to the learner over time. Theobjective is to minimize the cumulative loss over all rounds.

• Reinforcement Learning - training and testing phases areintermixed. The learner interacts with/affects theenvironment. Some actions of the learner lead to beingrewarded. Objective is to maximize the rewards based on acourse of actions. Exploration vs exploitation dilemma.

• Active Learning - the learner interactively collects trainingexamples, querying an oracle for requesting new labels. Usedwhen labels are expensive to obtain.

B Mortazavi CSECopyright 2019 21

Page 23: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Mathematical Framework of Learning

A learning framework can be mathematically laid out as:

• X : all possible instances (input space)

• Y: all possible labels/target values

• Concept c : X → Y: a mapping from X to Y• Concept class: a set of concepts we wish to learn denoted by C• The learning problem would be:

• The learner assumes a fixed set of concepts H, calledhypothesis set (not necessarily C)

• Input samples are S = (x1, ..., xm) with labelsS = (c(x1), ..., c(xm)) drawn i.i.d. according to data D, basedon a target concept c ∈ C to be learned

• The learning task would be selecting a hypothesis hs ∈ Hhaving a small generalization error R(h).

B Mortazavi CSECopyright 2019 22

Page 24: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Generalization Error vs Empirical Error

• Generalization Error: Given a hypothesis h ∈ H, a targetconcept c ∈ C, and an underlying distribution D, thegeneralization error of h is defined by:

R(h) = Px∼D

[h(x) 6= c(x)] = Ex∼D

[1h(x)6=c(x)] (1)

• The generalization error cannot be computed which gives riseto:

• Empirical Error: Given a hypothesis h ∈ H, a target conceptc ∈ C, and a sample S = (x1, ..., xm), the empirical error of his defined by:

RS(h) =1

m

m∑i=1

1h(x) 6=c(x) (2)

B Mortazavi CSECopyright 2019 23

Page 25: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Bayes Error

• The data distribution D is defined over X × Y, and thelabeled sample S drawn i.i.d according to distribution D:

S = ((x1, y1), ..., (xm, ym)) (3)

• In a general scenario, the label for a specific input is notunique and is a probabilistic function of it.

• In such a stochastic scenario, there is a non-zero error for anyhypothesis.

B Mortazavi CSECopyright 2019 24

Page 26: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Bayes Error

• Bayes Error: Given a distribution D over X × Y, the Bayeserror R∗ is defined as the infimum of the errors achieved bymeasurable functions h : X → Y:

R∗ = infhR(h) (4)

• A hypothesis h with R(h) = R∗ is called Bayes hypothesis orBayes classifier.

B Mortazavi CSECopyright 2019 25

Page 27: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Advertising Example

Assume we are trying to improve sales of a product throughadvertising in three different markets, tv, newspapers, andonline/mobile. We define the advertising budget as:

• input variables xi : consist of TV budget xi ,1, newspaperbudget xi ,2, and mobile budget xi ,3. (also known as predictors,independent variables, features, covariates, or just variables).

• output variable yi : sales numbers. (also known as theresponse variable, or dependent variable)

• Can generalize input predictors to xi ,1, xi ,2, · · · , xi ,m = xi

B Mortazavi CSECopyright 2019 26

Page 28: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Advertising Example

Assume we are trying to improve sales of a product throughadvertising in three different markets, tv, newspapers, andonline/mobile. We define the advertising budget as:

• input variables xi : consist of TV budget xi ,1, newspaperbudget xi ,2, and mobile budget xi ,3. (also known as predictors,independent variables, features, covariates, or just variables).

• output variable yi : sales numbers. (also known as theresponse variable, or dependent variable)

• Can generalize input predictors to xi ,1, xi ,2, · · · , xi ,n = xi

• Model for the relationship is then: Y = f (X ) + ε

• f is the systematic information, ε is the random error

B Mortazavi CSECopyright 2019 27

Page 29: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Why estimate f ?

• Prediction

• Inference

B Mortazavi CSECopyright 2019 28

Page 30: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Why estimate f ?

• Prediction - In many situations, xi is available to us but yi isnot, would like to estimate yi as accurately as possible.

• Inference

B Mortazavi CSECopyright 2019 29

Page 31: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Why estimate f ?

• Prediction - In many situations, xi is available to us but yi isnot, would like to estimate yi as accurately as possible.

• Inference - We would like to understand how yi is affected bychanges in xi .

B Mortazavi CSECopyright 2019 30

Page 32: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Predicting f ?

Create a f (x) that generates a y that estimates y as closely aspossible by attempting to minimize the reducible error, acceptingthat there is some irreducible error generated by ε that isindependent of x .

Why is there irreducible error? (hint: remember Bayes error!)

B Mortazavi CSECopyright 2019 31

Page 33: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Estimating f

Assume we have an estimate f and a set of predictors x , such thaty = f (x). Then we can calculate the expected value of the averageerror as:

E(y − y)2 = E[f (x) + ε− f (x)]2 = [f (x)− f (x)]2 + Var(ε)

The focus of this class is to learn techniques to estimate f tominimize reducible error.

B Mortazavi CSECopyright 2019 32

Page 34: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

How do we estimate f

• Need: Training Data to learn from

B Mortazavi CSECopyright 2019 33

Page 35: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

How do we estimate f

• Need: Training Data to learn from

• Training samples S = (x1, y1), ..., (xm, ym), wherexi = (xi ,1, xi ,2, · · · , xi ,n)T for n subjects and p dimensions.

• Notations could be different across different books

• y can be:• categorical or nominal → the problem we solve is classification

or pattern recognition• continuous → the problem we solve is regression• categorical with order (grades A, B, C, D, F) → the problem

we solve is ordinal regression

B Mortazavi CSECopyright 2019 34

Page 36: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Parametric Methods

• 2-step model based approach to estimate f .

• Step 1 - make an assumption about the form of f . Forexample, that f is linear

• f (X ) = β0 + β1x1 + β2x2 + · · ·+ βpxp

• Advantage - only have to estimate p + 1 coefficients to modelf

• Step 2 - fit or train a model/procedure to estimate β

• The most common method for this is Ordinary Least Squares.

B Mortazavi CSECopyright 2019 35

Page 37: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Parametric vs. Non-parametric

B Mortazavi CSECopyright 2019 36

Page 38: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Flexibiilty vs. Interpretability

B Mortazavi CSECopyright 2019 37

Page 39: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Model Accuracy

• Find some measure of error

• MSE = 1m

∑mi=1

(yi − f (xi )

)2• Note that this error is empirical error

• Methods trained to reduce MSE on training data S - is thatenough?

B Mortazavi CSECopyright 2019 38

Page 40: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Model Accuracy

• Find some measure of error

• MSE = 1m

∑mi=1

(yi − f (xi )

)2• Note that this error is empirical error

• Methods trained to reduce MSE on training data S - is thatenough?

• why do we care about previously seen data? Instead we careabout unseen test data to understand out the model predicts.

• so how do we select methods that reduce MSE on test data?

B Mortazavi CSECopyright 2019 39

Page 41: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Bias Variance Tradeoff

• Variance - the amount f would change with different trainingdata

• Bias - the error introduced by approximating f with a simplermodel

• E(y − f (x))2 = Var(f (x)) + [Bias(f (x))]2 + Var(ε)

• Error Rate = 1m

∑mi=1 I(yi 6= yi )

B Mortazavi CSECopyright 2019 40

Page 42: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Bias Variance Tradeoff

• Error Rate = 1m

∑mi=1 I(yi 6= yi )

• Test error is minimized by understanding the conditionalprobability p(Y = j |X = x0)

• Bayes Error Rate 1− E(maxjp(Y = j |X ))

• K-Nearest Neighbor to solve this:p(Y = j |X = x0) = 1

K

∑i∈N0

I(yi = j)

• More on this later!

B Mortazavi CSECopyright 2019 41

Page 43: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Key Challenge: Avoid Overfitting

B Mortazavi CSECopyright 2019 42

Page 44: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

U-Shape Tradeoff

B Mortazavi CSECopyright 2019 43

Page 45: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

K Nearest Neighbor

• p(y = c|x ,D, k) = 1k

∑i∈Nk (x ,D) I(yi = c)

B Mortazavi CSECopyright 2019 44

Page 46: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Curse of Dimensionality

•B Mortazavi CSE

Copyright 2019 45

Page 47: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

No Free Lunch Theorem

• ”All models are wrong but some models are useful”, G. Box,1987

• No single ML system works for everything

• Principal Component Analysis a way to solve curse ofdimensionality

• Machine learning is not magic: it can’t get something out ofnothing, but it can get more from less!

B Mortazavi CSECopyright 2019 46

Page 48: CSCE 633: Machine Learning · Lecture 1: Introduction Texas A&M University 8-26-19. Welcome to CSCE 633! ... • Homework will be compiled in Latex - you will turn in PDFs ... •

Takeaways and Next Time

• Course Structure

• What are machine learning tasks?

• What are some basics of learning?

• Can a learner always achieve zero prediction error?

• Next session: Model Selection, Testing methodologies

• Sources: Foundations of machine learning - Mohri et al, andAn Introduction to Statistical Learning - James et al

B Mortazavi CSECopyright 2019 47