data mining and machine learning -...

30
DATA MINING AND MACHINE LEARNING Lecture 1: Introduction to machine learning Lecturer: Simone Scardapane Academic Year 2016/2017

Upload: others

Post on 20-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

DATA MINING AND MACHINE LEARNINGLecture 1: Introduction to machine learning

Lecturer: Simone Scardapane

Academic Year 2016/2017

Page 2: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Table of contents

About the courseMaterials and table of contents

What is machine learningTop-down programming vs. machine learningBasic conceptsSome bits of history

Page 3: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Organization

The course is organized in 52 hours, 2/3 theoretical, 1/3 prac-tical. Slides and lab sessions will be self-contained, and will beprovided along with a selection of further reading material at:

http://ispac.diet.uniroma1.it/scardapane/

Main reading book:

1. The Elements of Statistical Learning[Hastie, Tibshirani & Friedman], available online.

Additional books:

2. Introduction to Machine Learning[unpublished, Smola & Vishwanathan], available online.

3. Deep learning[Goodfellow, Bengio & Courville], available in HTML form.

Page 4: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Lab sessions

4 lab sessions will be organized in the Python programming lan-guage. A basic knowledge of the language is required. In orderto have a working scientific environment, it is recommended toinstall a scientific Python distribution such as Anaconda:

https://www.continuum.io/downloads

Page 5: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Tentative table of contents

I Introduction and basics of optimization [5 h].

I Linear models [3 h].

I Regularization and loss functions [2 h].

I Data preprocessing, model evaluation and fine-tuning [2 h].

I Neural networks and deep learning (Dr. Elisa Ricci) [5 h].

I Kernel methods [2 h].

I Ensemble learning [2 h].

I Clustering [3 h].

I Additional topics and seminars [4 h].

Page 6: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Table of contents

About the courseMaterials and table of contents

What is machine learningTop-down programming vs. machine learningBasic conceptsSome bits of history

Page 7: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

An XKCD joke

The alt-text reads: “In the 60s, MarvinMinsky assigned a couple of undergradsto spend the summer programming acomputer to use a camera to identifyobjects in a scene. He figured they’dhave the problem solved by the end ofthe summer. Half a century later,we’re still working on it.”

Page 8: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

What is simple to program?

Figure 1 : Taken from “Two big challenges in machinelearning”, by Leon Bottou, ICML 2015.

Page 9: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

What is simple to program? (2)

Navigating in a labyrinth (or finding whether a path exists)is simple for a programmer to implement. The problem is well-defined, and there is a clear way to represent the data structures.On the opposite, humans can find this task tedious and notobvious if the labyrinth is huge.

Recognizing the mouse (or the cheese) is extremely intuitivefor a human, irrespective of the size of the image, but very hardto program in a computer. This is because there are countlesspossible configurations of pixels giving rise to the concepts of a‘mouse’/’cheese’.

Page 10: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Some additional examples

Other situations where some ‘heuristic’ reasoning is needed fordesigning some program:

1. Filtering an email in a spam folder: is it about the oc-currence of some words? Which words? Should we careabout sentence structure?

2. For a bank company, deciding whether a client will defaulton their loan given their history and demographic details.

3. Classifying a patient as ill or not-ill given their medicalrecords.

Page 11: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

An alternative approach

In all previous cases, we generally have a long history of “exam-ples” in some database, such as photos of mice, spam emails, illpatients... As a matter of fact, it is reasonable to assume that abank will likely make a decision based on past interactions withsimilar clients.

The motivating question for this course than becomes:

How can we ‘learn’ from such data?

Page 12: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Standard programming: a schema

ProgrammingInterface

Data

Program

Given

Output

Page 13: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

What we would like to have

ProgrammingInterface

New data

LearningProgram

Old data

Given

Output

Page 14: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Characteristics of a learning algorithm

In principle, we would like some sort of ‘universal ’ learning al-gorithm, which should be able to work irrespective of the datadomain. From a theoretical point of view, the impossibility ofhaving such an algorithm in the absence of any assumption isformalized in a set of ‘no-free-lunch’ theorems [1].

More practically, specific algorithms have vastly different trade-offs in terms of what type of data they can handle, their ex-pressive power, computational cost, comprehensibility, and soon. This is why ML is an incredibly vast world with hundredsof tools at your disposal.

[1] Wolpert, D.H., 1996. The lack of a priori distinctions betweenlearning algorithms. Neural computation, 8(7), pp.1341-1390.

Page 15: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Table of contents

About the courseMaterials and table of contents

What is machine learningTop-down programming vs. machine learningBasic conceptsSome bits of history

Page 16: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

A formal definition of ML

The following classical definition was provided by Tom Mitchell:

“A computer program is said to learn from experience E withrespect to some class of tasks T and performance measure P ,if its performance at tasks in T , as measured by P , improveswith experience.”

— Machine Learning, 1997

Page 17: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Categorization of problems

The problems we described before belong to the subfield ofsupervised learning: learning a relation from a set of (labeled)examples, which are akin to a teacher signal. This will be themain topic of this course.

If we do not have an explicit label, we have the so-called un-supervised learning, which in itself contains a large set ofpossible problems: dimensionality reduction, clustering, 2D vi-sualization, etc. These are mostly concerned with hypothesesand modeling with respect to the structure of the data.

Page 18: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Categorization of problems (2)

Reinforcement learning is the more advanced subfield of ML,that considers the learning capabilities of an agent that canmove in an unstructured environment, and it only receives apartial ‘reward’ signal at given instants (e.g., an agent learningto play tic-tac-toe).

Some problems do not perfectly fit in this standard categoriza-tion, most notably recommending and ranking systems. Re-cently, Yann LeCun (Director of AI Research @ Facebook) pro-posed to include predictive learning to this standard catego-rization, e.g. the capability of an agent of entirely predictingthe state of the world (and its evolution) from data.

Page 19: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Practical synonims of ML

In practice, all these terms can be considered akin or highlyoverlapping with ML (this slide is open to many debates):

I Pattern recognition: sometimes this term refers to clas-sification only, which is a specific problem in supervisedlearning.

I Data mining (more focus on exploration of data). Datamining is sometimes referred to as ‘practical ML’.

I Predictive analytics (focus on predictive modeling).

I Knowledge discovery (common in the databases litera-ture).

I Inferential statistics (as opposed to descriptive statis-tics).

Page 20: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

The main elements in a ML program

Despite the variety of algorithms and approaches to ML, mostmethods can be understood as a varying combination of thefollowing three items:

I Model: how we represent our knowledge (polynomials,trees, graphs, ...).

I Evaluation (performance measure P): how we evaluatethe results of our learning model. Depending on the mea-sure we choose, we can achieve extremely different resultseven with the same model formulation.

I Optimization: the algorithm we use to find a model thatmaximizes P . Many problems in ML are NP-hard, so weneed efficient heuristic procedures to handle current bigdata problems.

Page 21: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Table of contents

About the courseMaterials and table of contents

What is machine learningTop-down programming vs. machine learningBasic conceptsSome bits of history

Page 22: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

ML today

Page 23: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

A completely arbitrary timeline

1950 1960 1970 1980 1990 2000 2010

1952: First checkers program (Samuels)

1957: Perceptron (Rosenblatt)

1967: k-NN formalization

1969: Perceptrons [Book]

1979: Decision Trees

1980s: Expert systems

1986: Backpropagation (?)

1990s: SVMs (Vapnik & coll.)

1994: PAC theory

1998: Convolutional NN (LeCun)

2006: First ‘deep learning’ paper (Hinton)

2012: AlexNet

Page 24: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

ML today is everywhere

Page 25: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Reason 1: data

The first main reason is the huge availability of data:

There is enough data in a day of tweets to possibly recreatethe English language from scratch [Image source].

Page 26: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Reason 2: computing power

Figure 2 : Evolution of computing power for deep learningapplications [Image source].

Page 27: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Reason 3: software libraries

Today, there are many mature software libraries for learning,such as scikit-learn in Python. Some of them can be easilydistributed over clusters (e.g., the MLlib module in Spark).

Additionally, there are new auto-differencing tools making deeplearning easily affordable, such as TensorFlow and Chainer.Models and algorithms built on these software are commonlyreleased in open-source, increasing the speed of research evenfurther.

Many ready-to-use services over the web, also known as ‘cogni-tive services’, launching the era of ML-as-a-service.

Page 28: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Crowd ML

Figure 3 : Competitions platforms such as Kaggle allow the userto compete in real-world (or very plausible) scenarios, and toexplore strategies from other users.

Page 29: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

An initial word of caution... Can you find ML?

Page 30: DATA MINING AND MACHINE LEARNING - uniroma1.itispac.diet.uniroma1.it/.../03/...machine-learning.pdf · learning. I Data mining (more focus on exploration of data). Data mining is

Further readings

The following is a selection of reading material related to thislecture:

[1] Domingos, P., 2012. A few useful things to know aboutmachine learning. Communications of the ACM, 55(10), pp. 78-87.

[2] Jordan, M.I. and Mitchell, T.M., 2015. Machine learning:Trends, perspectives, and prospects. Science, 349(6245), pp.255-260.

[3] LeCun, Y., Bengio, Y. and Hinton, G., 2015. Deep learning.

Nature, 521(7553), pp. 436-444.