1er. escuela red protic - tandil, 18-28 de abril, 2006 introduction to machine learning alejandro...
TRANSCRIPT
![Page 1: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/1.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Introduction to Machine Learning
Alejandro Ceccatto
Instituto de Física Rosario CONICET-UNR
![Page 2: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/2.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Bibliography
Machine Learning, Tom Mitchell (McGraw Hill, 1997)
Principal Component Analysis, Ian Jolliffe (Springer-Verlag, 2002)
An introduction to SVM and other kernel-based learning methods, Cristianini-Shawe Taylor (Cambrige, 2000)
The Elements of Statistical Learning, Hastie-Tibshirani-Friedman (Springer, 2001)
![Page 3: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/3.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Machine Learning
• The field of Machine Learning is concerned with the question of how to construct computer programs that automatically improve with experience
• The purpose of this course is to present key algorithms and theory that form the core of Machine Learning
![Page 4: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/4.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Machine Learning
• Interdisciplinary nature of the material:
Statistics, Artificial Intelligence, Information Theory, etc.
• Basic question:
How to program computers to learn?
![Page 5: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/5.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Machine Learning
Intelligent Data Analysis:
• Intelligent application of data analytic tools (Statistics)
• Application of “intelligent” data analytic tools (Machine Learning)
Modern world: Data-driven world (industrial, commercial, financial, scientific activities)
![Page 6: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/6.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Why Machine Learning?
• Recent progress in algorithms and theory
• Growing flood of online data
• Computational power available
![Page 7: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/7.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Why Machine Learning?
• Niches for Machine Learning:
– Data Mining: using historical data to improve decisions
Medical records medical knowledge
– Software applications we can’t program by handAutonomous driving
Speech recognition
– Self customizing programsNewsreader that learns user interests
![Page 8: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/8.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Why Machine Learning?
• Data Mining
– Data: Recorded facts– Information: Set of patterns, or expectations, that
underlie the data– Data Mining: Extraction of implicit, previously
unknown, and potentially useful information from data
– Machine Learning: Provides the technical basis of data mining
![Page 9: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/9.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Why Machine Learning?
• Typical Datamining Tasks
– Risk of Emergency Cesarean Section
Given
• 9714 patient records, each describing a pregnancy and birth
• Each patient record contains 215 features
Learn to predict:
• Classes of patients at high risk for emergency cesarean section
![Page 10: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/10.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Why Machine Learning?
![Page 11: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/11.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Why Machine Learning?
One of the learned rules:
IF No previous vaginal delivery, and Abnormal 2nd Trimester
Ultrasound, and Malpresentation at admission
THEN Probability of Emergency C-Section 0.6
Over training data: 16/41=0.63
Over Test Data: 12/20=0.60
![Page 12: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/12.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Why Machine Learning?
– Credit Risk Analysis
![Page 13: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/13.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Why Machine Learning?
– Customer Retention
![Page 14: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/14.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Why Machine Learning?
– Problems Too Difficult to Program by Hand
![Page 15: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/15.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Why Machine Learning?
– Software that Customizes to User
![Page 16: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/16.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Where is This Headed?
Today: tip of the iceberg
• First-generation algorithms: neural nets, decision trees, regression....
• Applied to well-formated databases
Tomorrow: enormous impact
• Learn across mixed-media data and multiple databases
• Learn by active experimentation
• Learn decisions rather than predictions
• Cumulative, life-long learning
![Page 17: 1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR](https://reader035.vdocuments.us/reader035/viewer/2022062712/56649c855503460f9493aeb8/html5/thumbnails/17.jpg)
1er. Escuela Red ProTIC - Tandil, 18-28 de Abril, 2006
Where is This Headed?
Autonomous entities?
“I'm sorry Dave; I can't let you do that.” –HAL 9000 in 2001: A Space Odyssey, by Arthur Clarke