machine learning : introduction · machine learning - ht 2016 1. introduction varun kanade...

47
Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Upload: others

Post on 20-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Machine learning - HT 2016

1. Introduction

Varun Kanade

University of OxfordJanuary 20, 2016

Page 2: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

What is machine learning?

1

Page 3: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Machine learning and Artificial intelligence

What does intelligence entail?

I Reasoning, planning, representation, learning

I Courses: Intelligent Systems (MT), Knowledge Representation &Reasoning (HT)

I Open AI Initiative: http://openai.com/

2

Page 4: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Outline

History of Machine Learning

This Class

Some Machine Learning Applications

Some Practical Concerns

Page 5: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

History of Machine Learning

Statistics: Ronald Fisher

I Three types of iris: setosa, versicolour, virginica(1936)

I For each flower: sepal width (x1), sepal length (x2),petal width (x3), petal length (x4)

3

Page 6: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Visualize Iris Data: Setosa vs Versicolor

4.0 4.5 5.0 5.5 6.0 6.5 7.0Sepal length (cm)

0

2

4

6

8

10

12

4

Page 7: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Visualize Iris Data: Setosa vs Versicolor

2.0 2.5 3.0 3.5 4.0 4.5Sepal width (cm)

0

2

4

6

8

10

12

14

16

4

Page 8: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Visualize Iris Data: Setosa vs Versicolor

4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5Sepal length (cm)

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0Se

pal w

idth (c

m)

4

Page 9: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

History of Machine Learning

Statistics: Ronald Fisher

I Three types of iris: setosa, versicolour, virginica(1936)

I For each flower: sepal width (x1), sepal length (x2),petal width (x3), petal length (x4)

I FindX = w1x1 +w2x2 +w3x3 +w4x4 that maximizesD2/S

I D =∑4

i=1 wi(E[xi | setosa]− E[xi | versicolour])

I µi = E[xi]

I S =∑4

i=1

∑4j=1 wiwjE[(xi − µi)(xj − µj)]

I We will see that this is basically linear regression

5

Page 10: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

History of Machine Learning

Computer Science: Alan Turing

Turing Test: The Imitation Game

Learning Machines (Computing Machinery andIntelligence.Mind (1950))

‘‘Instead of trying to produce a programme tosimulate the adult mind, why not rather try toproduce one which simulates the child’s? If thiswere then subjected to an appropriate course ofeducation one would obtain the adult brain.’’

Quantitative computational considerationsI Howmuch memory would be required?I Howmuch computational power would be

required?

6

Page 11: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

History of Machine Learning

Neuroscience: Frank Rosenblatt

I Perceptron - neurally-inspired

I Simple training (learning)algorithm

I Built using specialized hardware

1 x1 x2 x3 x4

ϕ

sign(w0 + w1x1 + · · ·w4x4)

w0w1 w2

w3w4

7

Page 12: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Perceptron Training Algorithm

Setting

I Get a sequence of points (xt, yt) (where only xt is observed at first)

I After prediction is made yt is revealed

I Start withw0 some arbitrary starting weights for the perceptron

Algorithm

1. Supposewt−1 are the weights after t− 1 steps

2. Predict yt = sign(wt−1 · xt)

3. Update:I If yt = yt; do nothingI Else setwt = wt−1 − η(1− 2yt)xt

8

Page 13: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Perceptron Training Algorithm

Setting

I Get a sequence of points (xt, yt) (where only xt is observed at first)

I After prediction is made yt is revealed

I Start withw0 some arbitrary starting weights for the perceptron

Algorithm

1. Supposewt−1 are the weights after t− 1 steps

2. Predict yt = sign(wt−1 · xt)

3. Update:I If yt = yt; do nothingI Else setwt = wt−1 − η(1− 2yt)xt

8

Page 14: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Perceptron Training Algorithm in Action

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

9

Page 15: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Perceptron Training Algorithm in Action

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

9

Page 16: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Perceptron Training Algorithm in Action

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

9

Page 17: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Perceptron Training Algorithm in Action

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

9

Page 18: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Perceptron Training Algorithm in Action

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

9

Page 19: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Perceptron Training Algorithm in Action

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

9

Page 20: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Perceptron Training Algorithm in Action

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

9

Page 21: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Perceptron Training Algorithm in Action

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

9

Page 22: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Perceptron Training Algorithm in Action

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

9

Page 23: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Perceptron Training Algorithm in Action

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

9

Page 24: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

What is machine learning?

Some Definitions

I Kevin Murphy: ‘‘. . . , we define machine learning as a set of methods thatcan automatically detect patterns in data, and then use the uncoveredpatterns to predict future data, or to perform other kinds of decisionmaking under uncertainty ..’’

I TomMitchell: ‘‘A computer program is said to learn from experience Ewith respect to some class of tasks T and performance measure P, if itsperformance at tasks in T, as measured by P, improves with experience E.’’

I Same (or similar) programs work across a range of learning tasks.(though not universally)

10

Page 25: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Machine Learning

I Intersection of computer science, statistics, neuroscience/biology,engineering, optimization etc.

I StatisticsI Howmuch data is needed?I When can we be confident in our predictions?

I Computer ScienceI Design algorithms for automated pattern discovery. How fast do

these run?I Howmuch computational power is needed?

11

Page 26: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Outline

History of Machine Learning

This Class

Some Machine Learning Applications

Some Practical Concerns

Page 27: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

About this course

I Pre-requisites: Basic linear algebra, calculus, probability, algorithms,programming.

I Mathematical foundations of ML; not computational or statisticallearning theory

I Regression, support vector machines, neural networks, deep learning,clustering [video]

I Conceptual programming assignments; not scaling to real-worldsystems

12

Page 28: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

About this course

I Discussion forum on Piazza (link on webpage)

I Classes in Weeks 3-7 (Mon, Wed, Fri - 6 groups)

I Practicals in Weeks 2-8 (Tue, Thu - 2 groups)

I Final examination over easter break

I Office Hours: Tue 15:30-16:30 (449 Wolfson Building)

13

Page 29: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Outline

History of Machine Learning

This Class

Some Machine Learning Applications

Some Practical Concerns

Page 30: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Application: Boston Housing Dataset

Real attributes

I Crime rate per capita

I Non-retail business fraction

I Nitric Oxide concentration

I Age of house

I Floor area

I Distance to city centre

Integer attributes

I Number of rooms

Categorical attributes

I On the Charles river?

I Index of highway access (1-5)

Predict house cost

Source: UCI repository

14

Page 31: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Application: Breast Cancer

Integer attributes

I Clump thickness

I Uniformity of cell size

I Uniformity of cell shape

I Marginal adhesion

I Single epithelial cell size

I Bare nuclei

I Bland Chromatin

I Normal nucleoli

I Mitoses

Predict: Benign vs Malignant

Source: UCI repository

15

Page 32: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Application: Object Detection and Localization

I 200-basic level categories

I Dataset contains over 400,000 images

I Imagenet competition (2010--15)

16

Page 33: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Application: Object Detection and Localization

Source: DeepLearning.net (top); Brain-Maps.com (bottom)

17

Page 34: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Application: Object Detection and Localization

Source: Zeiler and Fergus (2013)

18

Page 35: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Supervised Learning

I Training data has inputs (x) as well as outputs (y)

I Regression: When the output is real-valued, e.g.,Housing data

I Classification: Output is a categoryI Binary classification -- only two classes e.g.,Cancer, spam

I Multi-class classification -- several classes e.g.,Object detection

19

Page 36: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Unsupervised Learning : Grouping News Articles

I Group items intocategories: sports, music,business, etc.

I Labels are not known

I Algorithm cannot know‘‘label names’’

20

Page 37: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Unsupervised Learning : Genetic Data of European Populations

Source: Novembre et al., Nature (2008)

21

Page 38: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Active and Semi-Supervised Learning

Active Learning

I Data is unlabelled

I Learning algorithm can ask for a label (from ahuman)

Semi-supervised Learning

I Some data is labelled, a lot more unlabelled

I Can using the two together help?

22

Page 39: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Anomaly Detection or One-class Classification

Examples

I Detect possible malfunction at nuclearreactors

I Detect fraudulent transactions for creditcards

I Supervised learning vs anomalydetection

I Anomalous events much rarer, possiblynot related to each other

23

Page 40: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Recommendation Systems

Movie / User Alice Bob Charlie Dean EveThe Shawshank Redemption 7 9 9 5 2

The Godfather 3 ? 10 4 3The Dark Knight 5 9 ? 6 ?Pulp Fiction ? 5 ? ? 10

Schindler’s List ? 6 ? 9 ?

I Netflix competition to predictuser-ratings (2008-09)

I Applications to all kinds of productrecommendations

I No user will have used several products;take advantage of large number of users

24

Page 41: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Reinforcement Learning

I Automatic flying helicopter; self-driving cars

I Cannot program by hand

I Stochastic environment (hard to defineprecisely)

I Must take sequential decisions

I Can define reward functions

I Fun: Playing Atari breakout! [video]

25

Page 42: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Outline

History of Machine Learning

This Class

Some Machine Learning Applications

Some Practical Concerns

Page 43: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Cleaning up data

Spam Classification

I Look for words such as Nigeria, millions, Viagra, etc.

I Features such as the IP, other metadata

I If email addressed by name

Getting Features

I Often hand-crafted features by domain experts

I This class mainly assumes we already have features

I Feature learning using deep networks

26

Page 44: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Some pitfalls

Sample Email

‘‘To build a spam classifier, we look for words such as Nigeria, millions, etc.’’

Training vs Test Data

I Future data should look like past data

I Not true for spam classification

27

Page 45: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Some pitfalls

Sample Email

‘‘To build a spam classifier, we look for words such as Nigeria, millions, etc.’’

Training vs Test Data

I Future data should look like past data

I Not true for spam classification

27

Page 46: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Cats vs Dogs

28

Page 47: Machine Learning : Introduction · Machine learning - HT 2016 1. Introduction Varun Kanade University of Oxford January 20, 2016

Next Class

Linear Regression

I Brush up your linear algebra and calculus!

29