data mining and machine learning -...
TRANSCRIPT
DATA MINING AND MACHINE LEARNINGLecture 1: Introduction to machine learning
Lecturer: Simone Scardapane
Academic Year 2016/2017
Table of contents
About the courseMaterials and table of contents
What is machine learningTop-down programming vs. machine learningBasic conceptsSome bits of history
Organization
The course is organized in 52 hours, 2/3 theoretical, 1/3 prac-tical. Slides and lab sessions will be self-contained, and will beprovided along with a selection of further reading material at:
http://ispac.diet.uniroma1.it/scardapane/
Main reading book:
1. The Elements of Statistical Learning[Hastie, Tibshirani & Friedman], available online.
Additional books:
2. Introduction to Machine Learning[unpublished, Smola & Vishwanathan], available online.
3. Deep learning[Goodfellow, Bengio & Courville], available in HTML form.
Lab sessions
4 lab sessions will be organized in the Python programming lan-guage. A basic knowledge of the language is required. In orderto have a working scientific environment, it is recommended toinstall a scientific Python distribution such as Anaconda:
https://www.continuum.io/downloads
Tentative table of contents
I Introduction and basics of optimization [5 h].
I Linear models [3 h].
I Regularization and loss functions [2 h].
I Data preprocessing, model evaluation and fine-tuning [2 h].
I Neural networks and deep learning (Dr. Elisa Ricci) [5 h].
I Kernel methods [2 h].
I Ensemble learning [2 h].
I Clustering [3 h].
I Additional topics and seminars [4 h].
Table of contents
About the courseMaterials and table of contents
What is machine learningTop-down programming vs. machine learningBasic conceptsSome bits of history
An XKCD joke
The alt-text reads: “In the 60s, MarvinMinsky assigned a couple of undergradsto spend the summer programming acomputer to use a camera to identifyobjects in a scene. He figured they’dhave the problem solved by the end ofthe summer. Half a century later,we’re still working on it.”
What is simple to program?
Figure 1 : Taken from “Two big challenges in machinelearning”, by Leon Bottou, ICML 2015.
What is simple to program? (2)
Navigating in a labyrinth (or finding whether a path exists)is simple for a programmer to implement. The problem is well-defined, and there is a clear way to represent the data structures.On the opposite, humans can find this task tedious and notobvious if the labyrinth is huge.
Recognizing the mouse (or the cheese) is extremely intuitivefor a human, irrespective of the size of the image, but very hardto program in a computer. This is because there are countlesspossible configurations of pixels giving rise to the concepts of a‘mouse’/’cheese’.
Some additional examples
Other situations where some ‘heuristic’ reasoning is needed fordesigning some program:
1. Filtering an email in a spam folder: is it about the oc-currence of some words? Which words? Should we careabout sentence structure?
2. For a bank company, deciding whether a client will defaulton their loan given their history and demographic details.
3. Classifying a patient as ill or not-ill given their medicalrecords.
An alternative approach
In all previous cases, we generally have a long history of “exam-ples” in some database, such as photos of mice, spam emails, illpatients... As a matter of fact, it is reasonable to assume that abank will likely make a decision based on past interactions withsimilar clients.
The motivating question for this course than becomes:
How can we ‘learn’ from such data?
Standard programming: a schema
ProgrammingInterface
Data
Program
Given
Output
What we would like to have
ProgrammingInterface
New data
LearningProgram
Old data
Given
Output
Characteristics of a learning algorithm
In principle, we would like some sort of ‘universal ’ learning al-gorithm, which should be able to work irrespective of the datadomain. From a theoretical point of view, the impossibility ofhaving such an algorithm in the absence of any assumption isformalized in a set of ‘no-free-lunch’ theorems [1].
More practically, specific algorithms have vastly different trade-offs in terms of what type of data they can handle, their ex-pressive power, computational cost, comprehensibility, and soon. This is why ML is an incredibly vast world with hundredsof tools at your disposal.
[1] Wolpert, D.H., 1996. The lack of a priori distinctions betweenlearning algorithms. Neural computation, 8(7), pp.1341-1390.
Table of contents
About the courseMaterials and table of contents
What is machine learningTop-down programming vs. machine learningBasic conceptsSome bits of history
A formal definition of ML
The following classical definition was provided by Tom Mitchell:
“A computer program is said to learn from experience E withrespect to some class of tasks T and performance measure P ,if its performance at tasks in T , as measured by P , improveswith experience.”
— Machine Learning, 1997
Categorization of problems
The problems we described before belong to the subfield ofsupervised learning: learning a relation from a set of (labeled)examples, which are akin to a teacher signal. This will be themain topic of this course.
If we do not have an explicit label, we have the so-called un-supervised learning, which in itself contains a large set ofpossible problems: dimensionality reduction, clustering, 2D vi-sualization, etc. These are mostly concerned with hypothesesand modeling with respect to the structure of the data.
Categorization of problems (2)
Reinforcement learning is the more advanced subfield of ML,that considers the learning capabilities of an agent that canmove in an unstructured environment, and it only receives apartial ‘reward’ signal at given instants (e.g., an agent learningto play tic-tac-toe).
Some problems do not perfectly fit in this standard categoriza-tion, most notably recommending and ranking systems. Re-cently, Yann LeCun (Director of AI Research @ Facebook) pro-posed to include predictive learning to this standard catego-rization, e.g. the capability of an agent of entirely predictingthe state of the world (and its evolution) from data.
Practical synonims of ML
In practice, all these terms can be considered akin or highlyoverlapping with ML (this slide is open to many debates):
I Pattern recognition: sometimes this term refers to clas-sification only, which is a specific problem in supervisedlearning.
I Data mining (more focus on exploration of data). Datamining is sometimes referred to as ‘practical ML’.
I Predictive analytics (focus on predictive modeling).
I Knowledge discovery (common in the databases litera-ture).
I Inferential statistics (as opposed to descriptive statis-tics).
The main elements in a ML program
Despite the variety of algorithms and approaches to ML, mostmethods can be understood as a varying combination of thefollowing three items:
I Model: how we represent our knowledge (polynomials,trees, graphs, ...).
I Evaluation (performance measure P): how we evaluatethe results of our learning model. Depending on the mea-sure we choose, we can achieve extremely different resultseven with the same model formulation.
I Optimization: the algorithm we use to find a model thatmaximizes P . Many problems in ML are NP-hard, so weneed efficient heuristic procedures to handle current bigdata problems.
Table of contents
About the courseMaterials and table of contents
What is machine learningTop-down programming vs. machine learningBasic conceptsSome bits of history
ML today
A completely arbitrary timeline
1950 1960 1970 1980 1990 2000 2010
1952: First checkers program (Samuels)
1957: Perceptron (Rosenblatt)
1967: k-NN formalization
1969: Perceptrons [Book]
1979: Decision Trees
1980s: Expert systems
1986: Backpropagation (?)
1990s: SVMs (Vapnik & coll.)
1994: PAC theory
1998: Convolutional NN (LeCun)
2006: First ‘deep learning’ paper (Hinton)
2012: AlexNet
ML today is everywhere
Reason 1: data
The first main reason is the huge availability of data:
There is enough data in a day of tweets to possibly recreatethe English language from scratch [Image source].
Reason 2: computing power
Figure 2 : Evolution of computing power for deep learningapplications [Image source].
Reason 3: software libraries
Today, there are many mature software libraries for learning,such as scikit-learn in Python. Some of them can be easilydistributed over clusters (e.g., the MLlib module in Spark).
Additionally, there are new auto-differencing tools making deeplearning easily affordable, such as TensorFlow and Chainer.Models and algorithms built on these software are commonlyreleased in open-source, increasing the speed of research evenfurther.
Many ready-to-use services over the web, also known as ‘cogni-tive services’, launching the era of ML-as-a-service.
Crowd ML
Figure 3 : Competitions platforms such as Kaggle allow the userto compete in real-world (or very plausible) scenarios, and toexplore strategies from other users.
An initial word of caution... Can you find ML?
Further readings
The following is a selection of reading material related to thislecture:
[1] Domingos, P., 2012. A few useful things to know aboutmachine learning. Communications of the ACM, 55(10), pp. 78-87.
[2] Jordan, M.I. and Mitchell, T.M., 2015. Machine learning:Trends, perspectives, and prospects. Science, 349(6245), pp.255-260.
[3] LeCun, Y., Bengio, Y. and Hinton, G., 2015. Deep learning.
Nature, 521(7553), pp. 436-444.