ai lecture 11 & 12 - machine learning

7/28/2019 AI Lecture 11 & 12 - Machine Learning

1/53

Machine LearningLecture 11 & 12

Artificial Intelligence Spring 2013

1


2/53

2

What is M achine L earni ng?

The field of machine learning is concerned with the questionof how to construct computer programs that automatically

improve with experience (T. Mitchell)

Principles, methods, and algorithms for learning andprediction on the basis of past experience

In the broadest sense, any method that incorporatesinformation from training samples in the design of aclassifier employs learning

INTRODUCTION


3/53

3


Our tendency is to view learning only in the manner in whichhumans learn, i.e. incrementally over time. This may not be

the case when ML algorithms are concerned.

INTRODUCTION


4/53

4


INTRODUCTION


5/53

5


INTRODUCTION


6/53

6


INTRODUCTION


7/53

7

A simple decision model


INTRODUCTION


8/538

An overly complex decision modelThis may lead to worse classification than a simple model


INTRODUCTION


9/539

May be this model is an optimal trade off between modelcomplexity and and performance on the training set


INTRODUCTION


10/5310

A classification problem: the grades for students taking thiscourse

Key Steps:1. Data (what past experience can we rely on?)2. Assumptions (what can we assume about the students

or the course?)3. Representation (how do we summarize a student?)4. Estimation (how do we construct a map from students

to grades?)5. Evaluation (how well are we predicting?)6. Model Selection (perhaps we can do even better?)


INTRODUCTION


11/5311

1. Data: The data we have available may be:

- names and grades of students in past years ML courses- academic record of past and current students

Student M L Course X Course Y Peter A B A Training

David B A A data

Jack ? C A CurrentKate ? A A data


INTRODUCTION


12/5312

2. Assumptions:

There are many assumptions we can make to facilitatepredictions

1. The course has remained same roughly over the years2. Each student performs independently from others


INTRODUCTION


13/5313

3. Representation:

Academic records are rather diverse so we might limit thesummaries to a select few coursesFor example, we can summarize the i th student (say Pete)with a vector

X i = [A C B]

where the grades may correspond to numerical values


INTRODUCTION


14/5314

3. Representation:

The available data in this representation is:

Training data Data for predictionStudent ML grade Student ML gradeX1t B X 1p ?

X2t A X 2p ?


INTRODUCTION


15/5315

4. Estimation

Given the training data

Student ML gradeX1t BX2t A

we need to find a mapping from input vectors x to labels y encoding the grades for the ML course.


INTRODUCTION


16/5316

Possible solution (nearest neighbor classifier):

1. For any student x find the closest student xi

in thetraining set2. Predict y i, the grade of the closest student


INTRODUCTION


17/5317

5. Evaluation

How can we tell how good our predictions are?- we can wait till the end of this course...- we can try to assess the accuracy based on the data

we already have (training data)

Possible solution:- divide the training set further into training and test

sets- evaluate the classifier constructed on the basis of

the training set on the test set


INTRODUCTION


18/5318

6. Model Selection

We can refine- The estimation algorithm (e.g., using a classifier

other than the nearest neighbor classifier)- The representation (e.g., base the summaries on a

different set of courses)

- The assumptions (e.g., perhaps students work ingroups) etc.

We have to rely on the method of evaluating the accuracyof our predictions to select among the possible refinements


INTRODUCTION


19/5319

Types of M achi ne L earn ing

Data can be- Symbolic or Categorical (e.g. High Temperature)

- Numerical (e.g. 450

C)

We will be primarily dealing with Symbolic data

Numerical data is primarily dealt by Artificial Neural Networks, whi ch have evolved into a separate f ield

INTRODUCTION


20/5320


From the available data we can- Model the system which has generated the data

- Find interesting patterns in the data

We will be primarily concerned with rule based modelling of the system from which the data was generated

The search for interesting patterns is considered to be the domain of Data M ining

INTRODUCTION


21/5321


Complete Pattern Recognition (or classification) system

consists of several steps

We will be primarily concernedwith the development of classifier systems

INTRODUCTION


22/5322


Supervised learning , where we get a set of training inputsand outputs. The correct output for the training samples is

available

Unsupervised learning , where we are interested in capturinginherent organization in the data. No specific output valuesare supplied with the learning patterns

Reinforcement learning , where there are no exact outputssupplied, but there is a reward (reinforcement) for desirablebehaviour

INTRODUCTION


23/5323

First, there are problems for which there exist no humanexperts.

Example: in modern automated manufactur ing f acil i ties, there is a need to predict machine fail ures before they occur by analyzing sensor readings. Because the machines are new,there are no human experts who can be interviewed by a

programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rul es.

Why Use M achine L earning?

INTRODUCTION


24/5324

Second, there are problems where human experts exist, butwhere they are unable to explain their expertise.

This is the case in many perceptual tasks, such as speech recogni tion, hand-wri ting recognition, and natur al l anguage understanding. Virtually all humans exhibit expert-level abil i ties on these tasks, but none of them can describe the

detailed steps that they follow as they perform them.F ortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learn ing algor i thms can learn to map the inputs to the outputs.


INTRODUCTION


25/5325


INTRODUCTION


26/5326

Third, there are problems where phenomena are changingrapidly.

Example: people would like to predict the future behavior of the stock market, of consumer pur chases, or of exchange rates.The rules and parameters governi ng these behaviors change f requently, so that the computer program for prediction would need to be rewritten f requently.


INTRODUCTION


27/5327

Fourth, there are applications thatneed to be customized for each

computer user separately.

Example: a program to f i l ter unwanted electronic mail messages. Different users wil l need different f i l ters.


INTRODUCTION


28/5328

Learning has been classified into several types

Much of human learning involves acquiring generalconcepts from specific training examples (this is cal led inductive learning)

VERSION SPACE

Concept L earning by I nduction


29/5329

Example: Concept of ball

* red, round, small

* green, round, small* red, round, medium

Complicated concepts: situations in which I should

study more to pass the exam

VERSION SPACE



30/53

30

Each concept can be thought of as a Boolean-valuedfunction whose value is true for some inputs and falsefor all the rest

(e.g. a function defined over all the animals, whosevalue is true for birds and false for all the otheranimals)

Problem of automatically inferring the generaldefinition of some concept, given examples labeled asmembers or nonmembers of the concept. This task iscalled concept learn ing , or approximating (inferring) aBoolean valued function from examples

VERSION SPACE



31/53

31

Target Concept to be learnt: Days on which Aldoenjoys his favorite water sport

Training Examples present are:

VERSION SPACE



32/53

32

The training examples are described by the values of seven Attributes

The task is to learn to predict the value of the attributeEnjoySport for an arbitrary day, based on the values of its other attributes

VERSION SPACE



33/53

33

The possible concepts are called Hypotheses and weneed an appropriate representation for the hypotheses

Let the hypothesis be a conjunction of constraints onthe attribute-values

VERSION SPACE

Concept Learning by I nduction: H ypothesis Representation


34/53

34

If sky = sunny temp = warm humidity = ?

wind = strong water = ? forecast = samethenEnjoy Sport = YeselseEnjoy sport = No

Alternatively, this can be written as:{sunny, warm, ?, strong, ?, same}

VERSION SPACE



35/53

35

For each attribute, the hypothesis will have either? Any value is acceptable

Value Any single value is acceptableNo value is acceptable

VERSION SPACE



36/53

36

If some instance (example/observation) satisfies all theconstraints of a hypothesis, then it is classified as

positive (belonging to the concept)

The most general hypothesis is {?, ?, ?, ?, ?, ?}

It would classify every example as a positive example

The most specific hypothesis is { , , , , , }It would classify every example as negative

VERSION SPACE



37/53

37

Alternate hypothesis representation could have beenDi sjunction of several conjunction of constraints

on the attr ibute-values Example:

{sunny, warm, normal, strong, warm, same} {sunny, warm, high, strong, warm, same} {sunny, warm, high, strong, cool, change}

VERSION SPACE



38/53

38

Another alternate hypothesis representation couldhave been

Conj unction of constraints on the attribute-values where each constraint may be a disjunction of values

Example:

{sunny, warm, normal high, strong, warm cool,same change}

VERSION SPACE



39/53

39

Yet another alternate hypothesis representation couldhave incorporated negations

Example:

{ sunny, warm, (normal high), ?, ?, ?}

VERSION SPACE



40/53

40

By selecting a hypothesis representation, the space of allhypotheses (that the program can ever represent andtherefore can ever learn) is implicitly defined

In our example, the instance space X can contain 3.2.2.2.2.2 =96 distinct instances

There are 5.4.4.4.4.4 = 5120 syntactically distinct hypotheses.Since every hypothesis containing even one classifies everyinstance as negative, hence semantically distinct hypothesesare: 4.3.3.3.3.3 + 1 = 973

VERSION SPACE



41/53

41

Most practical learning tasks involve much larger, sometimesinfinite, hypothesis spaces

VERSION SPACE



42/53

42

Concept learning can be viewed as the task of searchingthrough a large space of hypotheses implicitly defined bythe hypothesis representation

The goal of this search is to find the hypothesis that best fitsthe training examples

VERSION SPACE

Concept L earning by I nduction: Search in H ypotheses Space


43/53

43

Once a hypothesis that best fits the training examples isfound, we can use it to predict the class label of newexamples

The basic assumption while using this hypothesis is:

Any hypothesis found to approximate the target function well over a suf f icientl y large set of training examples wil l also approximate the target function well over other unobserved examples

VERSION SPACE

Concept Learni ng by I nduction: Basic Assumption


44/53

44

If we view learning as a search problem, then it is naturalthat our study of learning algorithms will examinedifferent strategies for searching the hypothesis space

Many algorithms for concept learning organize thesearch through the hypothesis space by relying on ageneral to specific ordering of hypotheses

VERSION SPACE

Concept L earning by I nduction: General to Specif ic Order ing


45/53

45

Example:Consider h1 = {sunny, ?, ?, strong, ?, ?}

h2 = {sunny, ?, ?, ?, ?, ?}any instance classified positive by h1 will also beclassified positive by h2 (because it imposes fewerconstraints on the instance)

Hence h2 is more general than h1 and h1 is morespecific than h2

VERSION SPACE



46/53

46

Consider the three hypotheses h1, h2 and h3

VERSION SPACE



47/53

47

Neither h1 nor h3 is more general than the other

h2 is more general than both h1 and h3

Note that the more -general- than relationship isindependent of the target concept. It depends only onwhich instances satisfy the two hypotheses and not on theclassification of those instances according to the targetconcept

VERSION SPACE



48/53

48

How to find a hypothesis consistent with the observedtraining examples?

- A hypothesis is consistent with the tr aini ng examples if i t correctly classif ies these examples

One way is to begin with the most specific possible

hypothesis, then generalize it each time it fails to cover apositive training example (i.e. classifies it as negative)

The algorithm based on this method is called Find-S

VERSION SPACE

F ind-S Algorithm


49/53

49

We say that a hypothesis covers a positive training exampleif it correctly classifies the example as positive

A positive training example is an example of the concept tobe learnt

Similarly a negative training example is not an example of the concept

VERSION SPACE

F ind-S Algorithm


50/53

50

VERSION SPACE

Find-S Algorithm


51/53

51

VERSION SPACE

F ind-S Algor ithm


52/53

52

The nodes shown in the diagram are the possible hypothesesallowed by our hypothesis representation scheme

Note that our search is guided by the positive examples andwe consider only those hypotheses which are consistentwith the positive training examples

The search moves from hypothesis to hypothesis, searchingfrom the most specific to progressively more generalhypotheses

VERSION SPACE

F ind-S Algorithm


53/53

At each step, the hypothesis is generalized only as far asnecessary to cover the new positive example

Therefore, at each stage the hypothesis is the most specifichypothesis consistent with the training examples observedup to this point

Hence, it is called Find-S

VERSION SPACE

F ind-S Algorithm

ai lecture 11 & 12 - machine learning

Documents