general information course id: cosc6342 machine learning time: tuesdays and thursdays 2:30 pm –...

33
General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta ([email protected]) Office: PGH 573 Telephone: (713) 743-3614 Office Hours: Tuesdays, Thursdays 1:30 PM – 2:30 PM

Upload: carson-beamish

Post on 29-Mar-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

General Information

Course Id: COSC6342 Machine Learning

Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM

Professor: Ricardo Vilalta ([email protected])

Office: PGH 573

Telephone: (713) 743-3614

Office Hours: Tuesdays, Thursdays 1:30 PM – 2:30 PM

Page 2: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Textbook

Textbook:Textbook: “Machine Learning” by Tom Mitchell “Machine Learning” by Tom Mitchell

11stst Edition. Ed. McGraw-Hill, 1997 Edition. Ed. McGraw-Hill, 1997

Additional Reading:Additional Reading:

““Pattern Classification” by Duda, Hart, and StorkPattern Classification” by Duda, Hart, and Stork

22ndnd Edition, Wiley-Interscience, 2000. Edition, Wiley-Interscience, 2000.

““Computer Systems that Learn” Computer Systems that Learn”

by Kulikowski and Weiss.1by Kulikowski and Weiss.1stst. Edition,1991.. Edition,1991.

Page 3: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Grading

Midterm Exams 30%Homework 20%Project 20%Final Exam 30% 

NOTE: PLAGIARISM IS NOT TOLERATED.

Page 4: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Homework

Homework will include mainly exercises from the textbookHomework will include mainly exercises from the textbook

The project will be a report on some area in machine learning you The project will be a report on some area in machine learning you find most interesting. find most interesting.

You can either report on some novel experiments after applying an You can either report on some novel experiments after applying an algorithm on a database or attempt a theoretical analysis. algorithm on a database or attempt a theoretical analysis.

The report must include a short survey of related work with the The report must include a short survey of related work with the corresponding list of references. corresponding list of references.

Page 5: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Dates to Remember

September 30 September 30 11stst Midterm Exam Midterm ExamNovember 23November 23 22ndnd Midterm Exam Midterm Exam November 25November 25 No class (Thanksgiving Holiday) No class (Thanksgiving Holiday)December 2December 2 Submit Project Report Submit Project Report December 9 December 9 Final Exam (2:00-5:00 PM)Final Exam (2:00-5:00 PM)  

Page 6: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

How to Succeed in Class

In case you miss a class, read the chapter corresponding to that class. In case you miss a class, read the chapter corresponding to that class. Consult the professor during his office hours if you have questions.Consult the professor during his office hours if you have questions. The exams will cover the material covered in class only, but itThe exams will cover the material covered in class only, but it is important to read the textbook thoroughly. is important to read the textbook thoroughly. Assignments will prepare you well for the exam. Assignments will prepare you well for the exam. Exams should not be a problem if you have been following the classesExams should not be a problem if you have been following the classes and reading the textbook. and reading the textbook. Familiarize with the software; think what aspect of machine learningFamiliarize with the software; think what aspect of machine learning you like the most soon. you like the most soon.

Page 7: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

What is Machine Learning?

• Where does machine learning fit in computer science?Where does machine learning fit in computer science?

• What is machine learning?What is machine learning?

• Where can machine learning be applied?Where can machine learning be applied?

• Should I care about machine learning at all?Should I care about machine learning at all?

Page 8: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Field of Study

Search

Artificial Intelligence

Planning Knowledge

Representation

Machine Learning Robotics

Clustering

Classification

Genetic Algorithms

Reinforcement

Learning

Page 9: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Multidisciplinary Field

MachineMachineLearningLearning

Probability &Probability &StatisticsStatistics

ComputationalComputationalComplexityComplexity

TheoryTheory InformationInformationTheoryTheory

PhilosophyPhilosophy

NeurobiologyNeurobiology

ArtificialArtificialIntelligenceIntelligence

Page 10: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

What is Machine Learning?

• Where does machine learning fit in computer science?Where does machine learning fit in computer science?

• What is machine learning?What is machine learning?

• DefinitionDefinition

• Design of a learning systemDesign of a learning system

• Where can machine learning be applied?Where can machine learning be applied?

• Should I care about machine learning at all?Should I care about machine learning at all?

Page 11: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Definition

Machine learning is the study of how to make computers Machine learning is the study of how to make computers learn; the goal is to make computers improve their learn; the goal is to make computers improve their performance through experience.performance through experience.

Experience Experience EE

ComputerComputer

LearningLearning

AlgorithmAlgorithm

Class of Tasks Class of Tasks TT PerformancePerformance PP

Page 12: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Class of Tasks

Experience Experience EE

ComputerComputer

LearningLearning

AlgorithmAlgorithm

Class of Tasks Class of Tasks TT PerformancePerformance PP

Page 13: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Class of Tasks

It is the kind of activity on which the computer will learn to It is the kind of activity on which the computer will learn to improve its performance. Examples: improve its performance. Examples:

Learning to Learning to Play chess Play chess

Recognizing Recognizing Images of Images of

Handwritten Handwritten WordsWords

Diagnosing Diagnosing patientspatients

coming into thecoming into thee hospitale hospital

Page 14: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Settings for learning

1.1. Tasks are generated by a random process outside the learnerTasks are generated by a random process outside the learner2.2. The learner can pose queries to a teacherThe learner can pose queries to a teacher3.3. The learner explores its surroundings autonomously The learner explores its surroundings autonomously

Example: Learning to play chessExample: Learning to play chess

1.1. Learn from a specific sequenceLearn from a specific sequence2.2. Ask: what if the sequence is this?Ask: what if the sequence is this?3.3. Give me an amateur player and then an expert player.Give me an amateur player and then an expert player.

Page 15: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Experience and Performance

Experience Experience EE

ComputerComputer

LearningLearning

AlgorithmAlgorithm

Class of Tasks Class of Tasks TT PerformancePerformance PP

Page 16: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Experience and Performance

ExperienceExperience: What has been recorded in the past: What has been recorded in the past

PerformancePerformance: A measure of the quality of the response or action. : A measure of the quality of the response or action.

Example: Example:

Handwritten recognition using Neural NetworksHandwritten recognition using Neural Networks

ExperienceExperience: a database of handwritten images : a database of handwritten images with their correct classification with their correct classification

PerformancePerformance: Accuracy in classifications: Accuracy in classifications

Page 17: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

What is Machine Learning?

• Where does machine learning fit in computer science?Where does machine learning fit in computer science?

• What is machine learning?What is machine learning?

• DefinitionDefinition

• Design of a learning systemDesign of a learning system

• Where can machine learning be applied?Where can machine learning be applied?

• Should I care about machine learning at all?Should I care about machine learning at all?

Page 18: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Designing a Learning System

Experience Experience EE

ComputerComputer

LearningLearning

AlgorithmAlgorithm

Class of Tasks Class of Tasks TT PerformancePerformance PP

Page 19: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Designing a Learning System

1.1. Define the knowledge to learnDefine the knowledge to learn2.2. Define the representation of the target knowledgeDefine the representation of the target knowledge3.3. Define the learning mechanismDefine the learning mechanism

Example: Example:

Handwritten recognition using Neural NetworksHandwritten recognition using Neural Networks

1.1. A function to classify handwritten imagesA function to classify handwritten images2.2. A linear combination of handwritten featuresA linear combination of handwritten features3.3. A linear classifierA linear classifier

Page 20: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

The Knowledge To Learn

Supervised learningSupervised learning: A function to predict the class of new examples : A function to predict the class of new examples

Let X be the space of possible examplesLet X be the space of possible examplesLet Y be the space of possible classesLet Y be the space of possible classesLearn F : X YLearn F : X Y

Example:Example: In learning to play chess the following are possible interpretations:In learning to play chess the following are possible interpretations: X : the space of board configurationsX : the space of board configurations Y : the space of legal movesY : the space of legal moves

Page 21: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

The Representation of the Target Knowledge

Example: Diagnosing a patient coming into the hospital.Example: Diagnosing a patient coming into the hospital.

Features:Features: X1: TemperatureX1: Temperature X2: Blood pressureX2: Blood pressure X3: Blood typeX3: Blood type X4: AgeX4: Age X5: WeightX5: Weight Etc.Etc.

Given a new example X = < x1, x2, …, xn >Given a new example X = < x1, x2, …, xn >

F(X) = w1x1 + w2x2 + w3x3 = … + wnxnF(X) = w1x1 + w2x2 + w3x3 = … + wnxn

If F(X) > T predict If F(X) > T predict heart diseaseheart disease otherwise predict otherwise predict no heart diseaseno heart disease

Page 22: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

The Representation of the Target Knowledge

There are many possibilities:There are many possibilities: The class of functions is very expressive. The class of functions is very expressive.

You can represent almost any function but to be effective You can represent almost any function but to be effective the method needs lots of examples. the method needs lots of examples.

The class of functions is very limited.The class of functions is very limited.Don’t need many examples but may fail to contain theDon’t need many examples but may fail to contain thetrue target function. true target function.

Page 23: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

The Learning Mechanism 1

Machine learning algorithms abound:Machine learning algorithms abound: Decision Trees Decision Trees Rule-based systemsRule-based systems Neural networksNeural networks Nearest-neighborNearest-neighbor Support-Vector MachinesSupport-Vector Machines Bayesian MethodsBayesian Methods

Important characteristics of the learning mechanism:Important characteristics of the learning mechanism:• What is the class of functionsWhat is the class of functions• How do you search over the class of functionsHow do you search over the class of functions

Page 24: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

The Learning Mechanism 2

Example:Example:

Look over the space of all possible decision trees.Look over the space of all possible decision trees.Prefer small trees to large trees.Prefer small trees to large trees.

Higher scoreHigher score Lower scoreLower score

Page 25: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

What is Machine Learning?

• Where does machine learning fit in computer science?Where does machine learning fit in computer science?

• What is machine learning?What is machine learning?

• Where can machine learning be applied?Where can machine learning be applied?

• Should I care about machine learning at all?Should I care about machine learning at all?

Page 26: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Application 1Automatic car drive (ALVINN 1989)

Train computer-controlled vehicle to steer correctly when

driving on a variety of road types.

computer

(learning algorithm)

class 1

steer to the left

class 2

steer to the right

class 3

continue straight

Page 27: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Application 1

Automatic Car DriveAutomatic Car Drive

Class of TasksClass of Tasks: : Learning to drive on highways from Learning to drive on highways from

vision stereos.vision stereos.

KnowledgeKnowledge: : Images and steering commands recorded Images and steering commands recorded while observing a human driver.while observing a human driver.

Performance ModulePerformance Module: Accuracy in classification : Accuracy in classification

Page 28: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Application 2

Learning to classify astronomical structures.Learning to classify astronomical structures.

galaxygalaxy

starsstars

Features:Features:o ColorColoro SizeSizeo MassMasso TemperatureTemperatureo LuminosityLuminosity

unkownunkown

Page 29: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Application 2

Classifying Astronomical ObjectsClassifying Astronomical Objects

Class of TasksClass of Tasks: : Learning to classify new objects.Learning to classify new objects.

KnowledgeKnowledge: : database of images with correct database of images with correct classification.classification.

Performance ModulePerformance Module: Accuracy in classification : Accuracy in classification

Page 30: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Other Applications

Bio-TechnologyBio-Technology Protein Folding Prediction Protein Folding Prediction Micro-array gene expressionMicro-array gene expression

Computer Systems Performance PredictionComputer Systems Performance Prediction Banking ApplicationsBanking Applications

Credit ApplicationsCredit Applications Fraud DetectionFraud Detection

Character Recognition (US Postal Service)Character Recognition (US Postal Service) Web ApplicationsWeb Applications

Document ClassificationDocument Classification Learning User PreferencesLearning User Preferences

Page 31: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

What is Machine Learning?

• Where does machine learning fit in computer science?Where does machine learning fit in computer science?

• What is machine learning?What is machine learning?

• Where can machine learning be applied?Where can machine learning be applied?

• Should I care about machine learning at all?Should I care about machine learning at all?

Page 32: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Should I care about Machine Learning at all?

Yes, you should!Yes, you should!

Machine learning is becoming increasingly popular and has become a Machine learning is becoming increasingly popular and has become a cornerstone in many industrial applications.cornerstone in many industrial applications.

Machine learning provides algorithms for data mining, where the goal Machine learning provides algorithms for data mining, where the goal is to extract useful pieces of information (i.e., patterns) from large is to extract useful pieces of information (i.e., patterns) from large databases. databases.

The computer industry is heading towards systems that will be able to The computer industry is heading towards systems that will be able to adapt and heal themselves automatically. adapt and heal themselves automatically.

The electronic game industry is now focusing on games where The electronic game industry is now focusing on games where characters adapt and learn through time.characters adapt and learn through time.

NASA is interested in robots able to adapt to any environment NASA is interested in robots able to adapt to any environment automatically.automatically.

Page 33: General Information Course Id: COSC6342 Machine Learning Time: Tuesdays and Thursdays 2:30 PM – 4:00 PM Professor: Ricardo Vilalta (vilalta@cs.uh.edu)

Summary

Machine learning is the study of how to make computers learn.Machine learning is the study of how to make computers learn.

A learning algorithm needs the following elements: A learning algorithm needs the following elements: class of tasks, class of tasks, performance metric, and body of experience. performance metric, and body of experience.

The design of a learning algorithm requires to define the The design of a learning algorithm requires to define the knowledge to knowledge to learn, the representation of the target knowledge, and the learning learn, the representation of the target knowledge, and the learning mechanism.mechanism.

Machine learning counts with many successful applications and is Machine learning counts with many successful applications and is becoming increasingly important in science and industry.becoming increasingly important in science and industry.