tournament

36
1 Tournament Not complete Processing will begin again tonight, 7:30PM until wee hours Friday, 8-5. Extra Credit 5 points for passing screening, in tournament 5 points for top 60% (top 16) 10 points for top 8 10 points for top 4 10 points for top 1 Celebration next Tuesday: it’s a party! Be prepared to talk about the evaluation function you used What worked Next class: tutorial on machine learning tools

Upload: twila

Post on 14-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Tournament. Not complete Processing will begin again tonight, 7:30PM until wee hours Friday, 8-5. Extra Credit 5 points for passing screening, in tournament 5 points for top 60% (top 16) 10 points for top 8 10 points for top 4 10 points for top 1 Celebration next Tuesday: it’s a party! - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tournament

1

Tournament Not complete Processing will begin again tonight, 7:30PM until wee hours Friday, 8-5. Extra Credit

• 5 points for passing screening, in tournament• 5 points for top 60% (top 16)• 10 points for top 8• 10 points for top 4• 10 points for top 1

Celebration next Tuesday: it’s a party!• Be prepared to talk about the evaluation function you used• What worked

Next class: tutorial on machine learning tools

Page 2: Tournament

Medical Application of Bayesian Networks:

Pathfinder

Page 3: Tournament

3

Pathfinder

Domain: hematopathology diagnosis• Microscopic interpretation of lymph-node

biopsies

• Given: 100s of histologic features appearing in lymph node sections

• Goal: identify disease type malignant or benign

• Difficult for physicians

Page 4: Tournament

4

Pathfinder System

Bayesian Net implementation Reasons about 60 malignant and benign

diseases of the lymph node Considers evidence about status of up to

100 morphological features presenting in lymph node tissue

Contains 105,000 subjectively-derived probabilities

Page 5: Tournament

5

Page 6: Tournament

6

Commercialization

Intellipath Integrates with videodisc libraries of

histopathology slides Pathologists working with the system

make significantly more correct diagnoses than those working without

Several hundred commercial systems in place worldwide

Page 7: Tournament

7

Sequential Diagnosis

Page 8: Tournament

8

Features

Structured into a set of 2-10 mutually exclusive values

Pseudofollicularity• Absent, slight, moderate, prominent

Represent evidence provided by a feature as F1,F2, … Fn

Page 9: Tournament

9

Value of information User enters findings from microscopic analysis of tissue

Probabilistic reasoner assigns level of belief to different diagnoses

Value of information determines which tests to perform next

Full disease utility model making use of life and death decision making

• Cost of tests

• Cost of misdiagnoses

Page 10: Tournament

10

Page 11: Tournament

11

Page 12: Tournament

12

Group Discrimination Strategy

Select questions based on their ability to discriminate between disease classes

For given differential diagnoisis, select most specific level of hierarchy and selects questions to discriminate among groups

Less efficient• Larger number of questions asked

Page 13: Tournament

13

Page 14: Tournament

14

Page 15: Tournament

15

Other Bayesian Net Applications

Lumiere – Who knows what it is?

Page 16: Tournament

16

Other Bayesian Net Applications Lumiere

• Single most widely distributed application of BN• Microsoft Office Assistant• Infer a user’s goals and needs using evidence about user

background, actions and queries VISTA

• Help NASA engineers in round-the-clock monitoring of each of the Space Shuttle’s orbiters subsystem

• Time critical, high impact• Interpret telemetry and provide advice about likely failures• Direct engineers to the best information• In use for several years

Microsoft Pregnancy and Child Care• What questions to ask next to diagnose illness of a child

Page 17: Tournament

Machine Learning

Reading: Chapter 18

Page 18: Tournament

18

Machine Learning and AI

Improve task performance through observation, teaching

Acquire knowledge automatically for use in a task

Learning as a key component in intelligence

Page 19: Tournament

19

Inductive Learning

Input: x, f(x)

Output: a function h that approximates f

A good hypothesis, h, is a generalization or learned rule

Page 20: Tournament

20

How do systems learn?

Supervised

Unsupervised

Reinforcement

Page 21: Tournament

21

Three Types of Learning

Rule induction• E.g., decision trees

Knowledge based• E.g., using a domain theory

Statistical• E.g., Naïve bayes, Nearest neighbor, support

vector machines

Page 22: Tournament

22

Applications Language/speech

• Machine translation• Summarization• Grammars

IR• Text categorization, relevance feedback

Medical• Assessment of illness severity

Vision• Face recognition, digit recognition, outdoor scene recognition

Security• Intrusion detection, network traffic, credit fraud

Social networks• Email traffic

To think about: applications to systems, computer engineering, software?

Page 23: Tournament

23

Language Tasks

Text summarization• Task: given a document which sentences could serve as

the summary

• Training data: summary + document pairs

• Output: rules which extract sentences given an unseen document

Grammar induction• Task: produce a tree representing syntactic structure

given a sentence

• Training data: set of sentences annotated with parse tree

• Output: rules which generate a parse tree given an unseen sentence

Page 24: Tournament

24

IR Task

Text categorization• http://www.yahoo.com

• Task: given a web page, is it news or not?

• Binary classification (yes, no)

• Classify as one of business&economy,news&media, computer

• Training data: documents labeled with category

• Output: a yes/no response for a new document; a category for a new document

Page 25: Tournament

25

Medical

Task: Does a patient have heart disease (on a scale from 1 to 4)

Training data:• Age, sex,cholesterol, chest pain location, chest

pain type, resting blood pressure, smoker?, fasting blood sugar, etc.

• Characterization of heart disease (0,1-4)

Output:• Given a new patient, classification by disease

Page 26: Tournament

26

General Approach Formulate task

Prior model (parameters, structure)

Obtain data

What representation should be used? (attribute/value pairs)

Annotate data

Learn/refine model with data(training)

Use model for classification or prediction on unseen data (testing)

Measure accuracy

Page 27: Tournament

27

Issues Representation

• How to map from a representation in the domain to a representation used for learning?

Training data• How can training data be acquired?

Amount of training data• How well does the algorithm do as we vary the amount

of data? Which attributes influence learning most? Does the learning algorithm provide insight into

the generalizations made?

Page 28: Tournament

28

Classification Learning

Input: a set of attributes and values

Output: discrete valued function• Learning a continuous valued function is called

regression

Binary or boolean classification: category is either true or false

Page 29: Tournament

29

Learning Decision Trees

Each node tests the value of an input attribute

Branches from the node correspond to possible values of the attribute

Leaf nodes supply the values to be returned if that leaf is reached

Page 30: Tournament

30

Example

http://www.ics.uci.edu/~mlearn/MLSummary.html Iris Plant Database Which of 3 classes is a given Iris plant?

• Iris Setosa• Iris Versicolour• Iris Virginica

Attributes• Sepal length in cm• Sepal width in cm• Petal length in cm• Petal width in cm

Page 31: Tournament

31

Summary Statistics: Min Max Mean SD ClassCorrelation sepal length: 4.3 7.9 5.84 0.83 0.7826 sepal width: 2.0 4.4 3.05 0.43 -0.4194 petal length: 1.0 6.9 3.76 1.76 0.9490 (high!) petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)

Rules to learn• If sepal length > 6 and sepal width > 3.8 and petal length < 2.5 and

petal width < 1.5 then class = Iris Setosa• If sepal length > 5 and sepal width > 3 and petal length >5.5 and petal

width >2 then class = Iris Versicolour• If sepal length <5 and sepal width > 3 and petal length 2.5 and ≤

5.5 and petal width 1.5 and ≤ 2 then class = Iris Virginica

Page 32: Tournament

32

DataS-length S-width P-length P-width Class

1 6.8 3 6.3 2.3 Versicolour

2 7 3.9 2.4 1.1 Setosa

3 2 3 2.6 1.7 Verginica

4 3 3.4 2.5 1.5 Verginica

5 5.5 3.6 6.8 2.4 Versicolour

6 7.7 4.1 1.2 1.4 Setosa

7 6.3 4.3 1.6 1.2 Setosa

8 1 3.7 2.8 2 Verginica

9 6 4.2 5.6 2.1 Versicolour

Page 33: Tournament

33

DataS-length S-width P-length Class

1 6.8 3 6.3 Versicolour

2 7 3.9 2.4 Setosa

3 2 3 2.6 Verginica

4 3 3.4 2.5 Verginica

5 5.5 3.6 6.8 Versicolour

6 7.7 4.1 1.2 Setosa

7 6.3 4.3 1.6 Setosa

8 1 3.7 2.8 Verginica

9 6 4.2 5.6 Versicolour

Page 34: Tournament

34

Constructing the Decision Tree

Goal: Find the smallest decision tree consistent with the examples

Find the attribute that best splits examples Form tree with root = best attribute For each value vi (or range) of best attribute

• Selects those examples with best=vi

• Construct subtreei by recursively calling decision tree with subset of examples, all attributes except best

• Add a branch to tree with label=vi and subtree=subtreei

Page 35: Tournament

35

Construct example decision tree

Page 36: Tournament

36

Issues Representation

• How to map from a representation in the domain to a representation used for learning?

Training data• How can training data be acquired?

Amount of training data• How well does the algorithm do as we vary the amount

of data? Which attributes influence learning most? Does the learning algorithm provide insight into

the generalizations made?