1 data mining & machine learning introduction intelligent systems lab. soongsil university...

86
1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin, Isabelle Guyon

Upload: georgia-mcdaniel

Post on 29-Dec-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

1

Data Mining & Machine LearningIntroduction

Intelligent Systems Lab.

Soongsil University

Thanks to Raymond J. Mooney in the University of Texas at Austin, Isabelle Guyon

Page 2: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

2

Artificial Intelligence (AI): Research Areas

ArtificialIntelligence

Research

Rationalism (Logical)Empiricism (Statistical)Connectionism (Neural)Evolutionary (Genetic)Biological (Molecular)

Paradigm

Application

Intelligent AgentsInformation RetrievalElectronic CommerceData MiningBioinformaticsNatural Language Proc.Expert Systems

Learning AlgorithmsInference MechanismsKnowledge RepresentationIntelligent System Architecture

Page 3: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

3

Artificial Intelligence (AI): Paradigms

Symbolic AI Rule-Based Systems

Connectionist AI Neural Networks

Evolutionary AI Genetic Algorithms

Molecular AI: DNA Computing

Page 4: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

4

What is Machine Learning?

Learning

algorithm

TRAININGDATA Answer

Trained

machine

Query

Page 5: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

5

Definition of learning

ProgramLearning Program

LearnedProgram

Experience, E Task, T

Performance, P

Task

Performance

• Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E

Page 6: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

6

What is Learning?

• Herbert Simon: “Learning is any process by which a system improves performance from experience.”

Page 7: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

7

Machine Learning

• Supervised Learning– Estimate an unknown mapping from known input- output pairs

– Learn fw from training set D={(x,y)} s.t.

– Classification: y is discrete– Regression: y is continuous

• Unsupervised Learning– Only input values are provided

– Learn fw from D={(x)} s.t.

– Clustering

)()( xxw fyf

xxw )(f

Page 8: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

8

Why Machine Learning?

• Recent progress in algorithms and theory• Growing flood of online data• Computational power is available• Knowledge engineering bottleneck. Develop systems

that are too difficult/expensive to construct manually because they require specific detailed skills or knowledge tuned to a specific task

• Budding industry

Page 9: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

9

Niches using machine learning

• Data mining from large databases.– Market basket analysis (e.g. diapers and beer)– Medical records → medical knowledge

• Software applications we can’t program by hand– Autonomous driving– Speech recognition

• Self customizing programs to individual users.– Spam mail filter– Personalized tutoring– Newsreader that learns user interests

Page 10: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

10

Trends leading to Data Flood

• More data is generated:– Bank, telecom, other

business transactions ...

– Scientific data: astronomy, biology, etc

– Web, text, and e-commerce

Page 11: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

11

Big Data Examples

• Europe's Very Long Baseline Interferometry (VLBI) has 16 telescopes, each of which produces 1 Gigabit/second of astronomical data over a 25-day observation session – storage and analysis a big problem

• AT&T handles billions of calls per day– so much data, it cannot be all stored -- analysis has to

be done “on the fly”, on streaming data

Page 12: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

12

Largest databases in 2007

• Commercial databases:– AT&T: 312 TB– World Data Centre for Climate: 220 TB– YouTube: 45TB of videos – Amazon: 42 TB (250,000 full textbooks)– Central Intelligence Agency (CIA): ?

Page 13: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

13

Data Growth

In 2 years, the size of the largest database TRIPLED!

Page 14: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

14

Machine Learning / Data Mining Application areas

• Science– astronomy, bioinformatics, drug discovery, …

• Business– CRM (Customer Relationship management), fraud

detection, e-commerce, manufacturing, sports/entertainment, telecom, targeted marketing, health care, …

• Web: – search engines, advertising, web and text mining, …

• Government– surveillance, crime detection, profiling tax cheaters, …

Page 15: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

15

Data Mining for Customer Modeling

• Customer Tasks:– attrition prediction– targeted marketing:

• cross-sell, customer acquisition

– credit-risk– fraud detection

• Industries– banking, telecom, retail sales, …

Page 16: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

16

Customer Attrition: Case Study

• Situation: Attrition rate at for mobile phone customers is around 25-30% a year !

• With this in mind, what is our task?– Assume we have customer information for the

past N months.

Page 17: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

17

Customer Attrition: Case Study

Task:

• Predict who is likely to attrite next month.

• Estimate customer value and what is the cost-effective offer to be made to this customer.

Page 18: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

18

Customer Attrition Results

• Verizon Wireless built a customer data warehouse

• Identified potential attriters

• Developed multiple, regional models

• Targeted customers with high propensity to accept the offer

• Reduced attrition rate from over 2%/month to under 1.5%/month (huge impact, with >30 M subscribers)

Page 19: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

19

Assessing Credit Risk: Case Study

• Situation: Person applies for a loan

• Task: Should a bank approve the loan?

• Note: People who have the best credit don’t need the loans, and people with worst credit are not likely to repay. Bank’s best customers are in the middle

Page 20: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

20

Credit Risk - Results

• Banks develop credit models using variety of machine learning methods.

• Mortgage and credit card proliferation are the

results of being able to successfully predict if a person is likely to default on a loan

• Widely deployed in many countries

Page 21: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

21

Successful e-commerce – Case Study

• Task: Recommend other books (products) this person is likely to buy

• Amazon does clustering based on books bought:– customers who bought “Advances in Knowledge

Discovery and Data Mining”, also bought “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations”

• Recommendation program is quite successful

Page 22: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

22

Security and Fraud Detection - Case Study

• Credit Card Fraud Detection• Detection of Money laundering

– FAIS (US Treasury)

• Securities Fraud– NASDAQ KDD system

• Phone fraud– AT&T, Bell Atlantic, British Telecom/MCI

• Bio-terrorism detection at Salt Lake Olympics 2002

Page 23: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

23

Example Problem:Handwritten Digit Recognition

• Handcrafted rules will result in large no. of rules and Exceptions

• Better to have a machine that learns from a large training set

Wide variability of same numeral

Page 24: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

24

Chess Game

– Let IBM’s stock increase

by $18 billion at that year

In 1997, Deep Blue(IBM) beat Garry Kasparov( 러 ).

Page 25: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

25

Some Successful Applications ofMachine Learning

• Learning to drive an autonomous vehicle

– Train computer-controlled vehicles

to steer correctly

– Drive at 70 mph for 90 miles on public

highways

– Associate steering commands with

image sequence

– 1200 computer-generated images as

training examples

• Half-hour training An additional information from previous image indicating the darkness or lightness of the road

Page 26: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

26

Some Successful Applications ofMachine Learning

• Learning to recognize spoken words– Speech recognition/synthesis

– Natural language understanding/generation

– Machine translation

Page 27: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

27

Example 1: visual object categorization

• A classification problem: predict category y based on image x.• Little chance to “hand-craft” a solution, without learning.• Applications: robotics, HCI, web search (a real image Google..)

Page 28: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

28

Face Recognition - 1

Given multiple angles/ views of a person, learn to identify them.

Learn to distinguishmale from female faces.

Page 29: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

29

Face Recognition - 2

Learn to recongnize emotions, gestures

Li, Ye, Kambhametta, 2003

Page 30: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

30

Robot

• Sony AIBO robot – Available on June 1, 1999 – Weight: 1.6 KG – Adaptive learning and growth capabilities – Simulate emotion such as happiness and anger

Page 31: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

31

Robot

• Honda ASIMO (Advanced Step in Innovate MObility)

– Born on 31 October, 2001

– Height: 120 CM, Weight: 52 KG

http://blog.makezine.com/archive/2009/08/asimo_avoids_moving_obstacles.html?CMP=OTC-0D6B48984890

Page 32: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

32

Biomedical / Biometrics

• Medicine:– Screening– Diagnosis and prognosis– Drug discovery

• Security:– Face recognition– Signature / fingerprint – DNA fingerprinting

Page 33: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

33

Computer / Internet

• Computer interfaces:– Troubleshooting wizards – Handwriting and speech– Brain waves

• Internet– Spam filtering– Text categorization– Text translation– Recommendation

7

Page 34: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

34

Classification

• Assign object/event to one of a given finite set of categories.– Medical diagnosis– Credit card applications or transactions– Fraud detection in e-commerce– Worm detection in network packets– Spam filtering in email– Recommended articles in a newspaper– Recommended books, movies, music, or jokes– Financial investments– DNA sequences– Spoken words– Handwritten letters– Astronomical images

Page 35: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

35

Problem Solving / Planning / Control

• Performing actions in an environment in order to achieve a goal.– Solving calculus problems– Playing checkers, chess, or backgammon– Driving a car or a jeep– Flying a plane, helicopter, or rocket– Controlling an elevator– Controlling a character in a video game– Controlling a mobile robot

Page 36: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

36

Applications

inputs

training examples

10

102

103

104

105

Bioinformatics

Ecology

OCRHWR

MarketAnalysis

TextCategorization

Machine Vision

Syst

em d

iagn

osis

10 102 103 104 105

Page 37: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

37

Disciplines Related with Machine Learning

• Artificial intelligence– 기호 표현 학습 , 탐색문제 , 문제해결 , 기존지식의

활용

• Bayesian methods– 가설 확률계산의 기초 , naïve Bayes classifier,

unobserved 변수 값 추정

• Computational complexity theory– 계산 효율 , 학습 데이터의 크기 , 오류의 수 등의

측정에 필요한 이론적 기반

• Control theory– 이미 정의된 목적을 최적화하는 제어과정과 다음 상태

예측을 학습

Page 38: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

38

• Information theory– Entropy 와 Information Content 를 측정 , Minimum

Description Length, Optimal Code 와 Optimal Training 의 관계

• Philosophy– Occam’s Razor, 일반화의 타당성 분석

• Psychology and neurobiology– Neural network models

• Statistics– 가설의 정확도 추정시 발생하는 에러의 특성화 , 신뢰구간 ,

통계적 검증

Disciplines Related with Machine Learning (2)

Page 39: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

39

Definition of learning

ProgramLearning Program

LearnedProgram

Experience, E Task, T

Performance, P

Task

Performance

• Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E

Page 40: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

40

Example: checkers

Task T: Playing checkers.

Performance measure P: % of games won.

Training experience E: Practice games by playing against itself.

Page 41: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

41

Example: Recognizing handwritten letters

Task T: Recognizing and classifying handwritten words within images.

Performance measure P: % words correctly classified.

Training experience E: A database of handwritten words with given classifications.

Page 42: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

42

Example: Robot driving

Task T: Driving on public four-lane highway using vision sensors.

Performance measure P: Average distance traveled before an error (as judged by a human overseer).

Training experience E: A sequence of images and steering commands recorded while observing a human driver.

Page 43: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

43

Designing a learning system

Task T: Playing checkers.Performance measure P: % of games won.Training experience E: Practice games by playing against itself.

– What does this mean?– and what can we learn from it?

Page 44: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

44

Measuring Performance

• Classification Accuracy• Solution correctness• Solution quality (length, efficiency)• Speed of performance

Page 45: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

45

Designing a Learning System

1. Choose the training experience2. Choose exactly what is to be learned, i.e. the target

function.3. Choose how to represent the target function.4. Choose a learning algorithm to infer the target function

from the experience.

Environment/Experience

Learner

Knowledge

PerformanceElement

Train examples…

Test examples…

Page 46: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

46

Designing a Learning System

1. Choosing the Training Experience

• Key Attributes– Direct/indirect feedback 을 제공하는가 ?

• Direct feedback: checkers states and correct move• Indirect feedback: move sequences and final outcomes of various games

– Credit assignment problem

– Degree of controlling the sequence of training example• Learner 의 자율성 , 학습 정보를 얻을 때 teacher 의

도움을 받는 정도 ,

– Distribution of examples• Train examples 의 분포와 Test examples 의 분포의 문제• 시스템의 성능을 평가하는 테스트의 예제 분포를 잘

반영해야 함• 특수한 경우의 분포가 반영 (The Checkers World Champion

이 격은 특수한 경우의 분포가 반영 될까 ?)

Page 47: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

47

Training Experience

• Direct experience: Given sample input and output pairs for a useful target function.– Checker boards labeled with the correct move,

• e.g. extracted from record of expert play

• Indirect experience: Given feedback which is not direct I/O pairs for a useful target function.– Potentially arbitrary sequences of game moves and their

final game results.– Credit/Blame Assignment Problem: How to assign credit/

blame to individual moves given only indirect feedback?

Page 48: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

48

Source of Training Data

• Provided random examples outside of the learner’s control. ( 학습자 - 알고리즘 판단 밖의 사례를 무작위 추출 )– Negative examples available or only positive?

• Good training examples selected by a “benevolent teacher.” (Teacher 로 부터 )

• Learner can construct an arbitrary example and query an oracle for its label. ( 시스템 자체에서 예제를 구축 )

– Learner can design and run experiments directly in the environment without any human guidance.

( 사람의 도움없이 시스템이 새로운 문제틀을 제시함 )

– Learner can query an oracle about class of an unlabeled example in the environment. ( 불안전한 예제의 답을 질문 , 즉 (x, ___) 입력은 있으나 결과값이 없는 경우 )

Page 49: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

49

Training vs. Test Distribution

• Generally assume that the training and test examples are independently drawn from the same overall distribution of data.– IID: Independently and identically distributed

• If examples are not independent, requires collective classification.

(e.g. communication network, financial transaction network, social network 에 대한 관계 모델구축 시 )

• If test distribution is different, requires transfer learning. that is, achieving cumulative learning

Page 50: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

50

Page 51: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

Transfer learning

51

Page 52: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

52

참고 : Transfer learning

• Transfer learning is what happens when someone finds it much easier to learn to play chess having already learned to play checkers;

or to recognize tables having already learned to recognize chairs;

or to learn Spanish having already learned Italian.

• Achieving significant levels of transfer learning across tasks -- that is, achieving cumulative learning -- is perhaps the central problem facing machine learning.

Page 53: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

53

Designing a Learning System

1. Choose the training experience

2. Choose exactly what is to be learned, i.e. the target function.

3. Choose how to represent the target function.

4. Choose a learning algorithm to infer the target function from the experience.

Environment/Experience

Learner

Knowledge

PerformanceElement

Page 54: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

54

• 어떤 지식 ( 함수 ) 을 학습시키고 , 평가 시스템에 의하여 어떻게 이용 되어 질 것인가 ?

• 가정 : “ 장기게임에서 , 현재의 장기판에서 타당한 움직임들을 생성하는 함수가 있고 , 최고의 움직임을 선택한다” .

– Could learn a function: 1. ChooseMove : B →M( 최고의 움직임 )

Or

2. Evaluation function, V : B → R * 각 보드의 위치에 따라 얼마나 유리한가에 대한 점수를 부여함 . * V 는 각각의 움직임에 대한 결과의 점수에 따라 선택하는데 이용가능 하여 * 결국은 최고 높은 점수를 얻을 수 있는 움직임을 선택하게 한다 .

2. Choosing a Target Function

Designing a Learning System

Page 55: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

55

• A function that chooses the best move M for any B– ChooseMove : B →M– Difficult to learn

• It is useful to reduce the problem of improving performance P at task T, to the problem of learning some particular target function.

• An evaluation function that assigns a numerical score to any B– V : B → R

Designing a Learning System

2. Choosing the Target Function

Page 56: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

56

The start of the learning work

Instead of learning ChooseMove we establish a value function:

target function, V : B → R

that maps any legal board state in B to some real value in R.

어떤 Position 에서도 그 Position 이 초래하게 되는 Score 를 최대화 시키는 움직임을 선택하도록 한다 .

1. if b is a final board state that is won, then V (b) = 100.2. if b is a final board state that is lost, then V (b) = −100.3. if b is a final board state that is drawn, then V (b) = 0.4. if b is not a final board state, then V (b) =

Page 57: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

57

The start of the learning work

Instead of learning ChooseMove we establish a value function:

target function, V : B → R

that maps any legal board state in B to some real value in R..어떤 Position 에서도 그 Position 이 초래하게 되는 Score 를 최대화 시키는

움직임을 선택하도록 한다 .

1. if b is a final board state that is won, then V (b) = 100.2. if b is a final board state that is lost, then V (b) = −100.3. if b is a final board state that is drawn, then V (b) = 0.4. if b is not a final board state, then V (b) = V (b’), b’ : 성취될 수 있는 최고의 마지막 상태 (the best final board state) ( 단 상대방도 최적으로 수를 둔다고 가정 )

Unfortunately, this did not take us any further!

Page 58: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

58

Approximating V(b)

• Computing V(b) is intractable since it involves searching the complete exponential game tree.

• Therefore, this definition is said to be non-operational.

• An operational definition can be computed in reasonable (polynomial) time.

• Need to learn an operational approximation to the ideal evaluation function.

Page 59: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

59

Designing a Learning System

1. Choose the training experience

2. Choose exactly what is to be learned, i.e. the target function.

3. Choose how to represent the target function.

4. Choose a learning algorithm to infer the target function from the experience.

Environment/Experience

Learner

Knowledge

PerformanceElement

Page 60: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

60

3. Choosing a Representation for the Target Function

• Describing the function – Tables

– Rules

– Polynomial functions

– Neural nets

• Trade-off in choice– Expressive power( 함수의 표현 )

– Size of training data ( 예제의 크기 )

• 표현 ( 제약조건 ) 이 많을 수록 함수의 해의 결과는 더 좋아 진다 . 즉 더욱 좋은 근사값을 얻는 함수를 구할 수 있다 .

• 그러나 정확한 함수를 얻기 위해서 더 많은 예제 , 사이즈가 큰 예제가 필요하게 된다 .

Page 61: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

61

Approximate representation

w1 - w6: weights

Page 62: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

62

Linear Function for Representing V(b)

• Use a linear approximation of the evaluation function.

(b) = w0 + w1x1 + w2x2 + w3x3 +w4x4 + w5x5 + w6x6

w0 : an additive constant

^

V

Page 63: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

63

Designing a Learning System

1. Choose the training experience

2. Choose exactly what is to be learned, i.e. the target function.

3. Choose how to represent the target function.

4. Choose a learning algorithm to infer the target function from the experience.

Environment/Experience

Learner

Knowledge

PerformanceElement

Page 64: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

64

4. Choosing a Function Approximation Algorithm

A training example is represented as an ordered pair <b, Vtrain(b) >

b: board stateVtrain(b) : training value for b

Instance: “black has won the game <<x1=3, x2=0, x3=1, x4=0, x5=0, x6=0>, +100> (x2 = 0) indicates that white has no remaining pieces.

Estimating training values for intermediate board states

Vtrain(bi) ← ( bi+1 ) = (Successor(bi))

: current approximation to V, ( 즉 the learned function, hypothesis)

Successor(bi): the next board state, 즉 bi+1 state

* 현재의 b 보드상태에 대한 training value 은 ← 다음단계의 장기판 상태 (b+1) 의 가설의 함수값을 사용

^

V

^

V^

V

Page 65: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

65

Estimating Training Values

DESIGNING A LEARNING SYSTEM

Page 66: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

66

부연설명 : Temporal Difference Learning

• Estimate training values for intermediate (non-terminal) board positions by the estimated value of their successor in an actual game trace.

• where successor(b) is the next board position

where it is the program’s move in actual play.

• Values towards the end of the game are initially more accurate and continued training slowly “backs up” accurate values to earlier board positions.

))successor(()( bVbVtrain

Page 67: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

67

How to learn?

Page 68: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

68

How to learn?

Page 69: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

69

How to change the weights?

Page 70: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

70

How to change the weights?

Page 71: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

71

Obtaining Training Values

• Direct supervision may be available for the target function.

• With indirect feedback, training values can be estimated using temporal difference learning (used in reinforcement learning where supervision is delayed reward).

Page 72: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

72

Learning Algorithm

• Uses training values for the target function to induce a hypothesis definition that fits these examples and hopefully generalizes to unseen examples.

• In statistics, learning to approximate a continuous function is called regression.

• Attempts to minimize some measure of error (loss function) such as mean squared error:

Page 73: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

73

The LMS(Least Mean Square) weight update rule

• Due to mathematical reasoning, the following update rule is very sensible.

Page 74: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

74

LMS Discussion

• Intuitively, LMS executes the following rules:– 예제를 적용한 결과 (the output for an example ) 가 정확하다면 , 변화를 주지

않는다 . – 예제를 적용한 결과 가 너무 높게 나오면 , 해당 features 의 값에

비례하여 weight 값을 낮춘다 . 그러면 전반적인 예제의 결과도 줄어들게 된다 .

– 예제를 적용한 결과 가 너무 낮게 나오면 , 해당 features 의 값에 비례하여 weight 값을 높인다 . 그러면 전반적인 예제의 결과도 늘어들게 된다 .

• Under the proper weak assumptions, LMS can be proven to eventually converge to a set of weights that minimizes the mean squared error.

)(bV

)(bV

Page 75: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

75

Lessons Learned about Learning

• Learning 은 ? 선택된 target function 을 근사화 (approximation) 하기

위해 direct or indirect experience 을 사용한다 .

• Function approximation 이란 ?: a space of hypotheses 에서 training data 들에 가장

적합한 가설 (hypotheses) 을 찾아가는 Search 에 해당 됨

• Different learning methods assume different hypothesis

spaces (representation languages) and/or employ different search techniques.

Page 76: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

76

Various Function Representations

• Numerical functions– Linear regression– Neural networks– Support vector machines

• Symbolic functions– Decision trees– Rules in propositional logic– Rules in first-order predicate logic

• Instance-based functions– Nearest-neighbor– Case-based

• Probabilistic Graphical Models– Naïve Bayes– Bayesian networks– Hidden-Markov Models (HMMs)– Probabilistic Context Free Grammars (PCFGs)– Markov networks

Page 77: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

77

Various Search Algorithms

• Gradient descent– Perceptron– Backpropagation

• Dynamic Programming– HMM Learning– Probabilistic Context Free Grammars (PCFGs) Learning

• Divide and Conquer– Decision tree induction– Rule learning

• Evolutionary Computation– Genetic Algorithms (GAs)– Genetic Programming (GP)– Neuro-evolution

Page 78: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

78

Evaluation of Learning Systems

• Experimental– Conduct controlled cross-validation experiments to compare

various methods on a variety of benchmark datasets.– Gather data on their performance, e.g. test accuracy,

training-time, testing-time.– Analyze differences for statistical significance.

• Theoretical– Analyze algorithms mathematically and prove theorems

about their:• Computational complexity• Ability to fit training data• Sample complexity (number of training examples needed to

learn an accurate function)

Page 79: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

79

Core parts of the machine learning

Many machine learning systems can be usefully characterized in terms of these four generic modules.

(Game history)

(Initial game board)( )^

V

Page 80: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

80

Four Components of a Learning System (2)

• Generalizer– Input: training example – Output: hypothesis (estimate of the target function)– Generalizes from the specific training examples– Hypothesizes a general function

• Experiment generator– Input - current hypothesis– Output - a new problem– Picks new practice problem maximizing the learning

rate

Page 81: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

81

Four Components of a Learning System(1)

• Performance system

- Solve the given performance task

- Use the learned target function

- New problem → trace of its solution

• Critic

- Output a set of training examples of the target function

Page 82: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

82

History of Machine Learning

• 1950s– Samuel’s checker player– Selfridge’s Pandemonium

• 1960s: – Neural networks: Perceptron– Pattern recognition – Learning in the limit theory– Minsky and Papert prove limitations of Perceptron

• 1970s: – Symbolic concept induction– Winston’s arch learner– Expert systems and the knowledge acquisition bottleneck– Quinlan’s ID3– Michalski’s AQ and soybean diagnosis– Scientific discovery with BACON– Mathematical discovery with AM

Page 83: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

83

History of Machine Learning (cont.)

• 1980s:– Advanced decision tree and rule learning– Explanation-based Learning (EBL)– Learning and planning and problem solving– Utility problem– Analogy– Cognitive architectures– Resurgence of neural networks (connectionism, backpropagation)– Valiant’s PAC Learning Theory– Focus on experimental methodology

• 1990s– Data mining– Adaptive software agents and web applications– Text learning– Reinforcement learning (RL)– Inductive Logic Programming (ILP)– Ensembles: Bagging, Boosting, and Stacking– Bayes Net learning

Page 84: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

84

History of Machine Learning (cont.)

• 2000s– Support vector machines– Kernel methods– Graphical models– Statistical relational learning– Transfer learning– Sequence labeling– Collective classification and structured outputs– Computer Systems Applications

• Compilers• Debugging• Graphics• Security (intrusion, virus, and worm detection)

– E mail management– Personalized assistants that learn– Learning in robotics and vision

Page 85: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

85

Remind

• “Learning as search in a space of possible hypotheses”

• Learning methods are characterized by their search strategies and by the underlying structure of the search spaces.

Page 86: 1 Data Mining & Machine Learning Introduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin,

86

Issues in Machine Learning

• 특정 학습예제에 대한 general target function 의 학습 가능한 알고리즘 개발

• 훈련용 data 는 얼마나 되어야 하는가 ?

• 학습에 대해 지식을 가지고 있다면 인간의 간여가 도움이 되는가 ?

• 각 훈련 단계별로 학습문제의 복잡도 (Complexity) 를 개선하는 문제

• Function approximation 을 위해 학습의 노력을 줄이는 것

• Target function 의 표현을 자동으로 개선 혹은 변경하는 것