introduction to pattern recognition -...
TRANSCRIPT
Lec1: Introduction to Pattern Recognition 1
Introduction to
Pattern Recognition
Prof. Daniel Yeung
School of Computer Science and Engineering
South China University of Technology
Lecture 1Pattern Recognition
Lec1: Introduction to Pattern Recognition 2
A Cyber Security Example
Lec1: Introduction to Pattern Recognition 3
Cyberinfrastructure vulnerability According to the 2012 Norton Study
� Global Cybercrime annual cost: US$114 Billion
� More Than One Million Victims a Day
� Time lost due to cybercrime -- additional US$274 billion
� Cybercrime costs the world significantly more than the global black market in marijuana, cocaine and heroin combined (US$288 billion)
� 69% online adults have at least once been a victim of cybercrime
� Every second 14 adult victims of cybercrime
� The number of Cybercrime doubles in 3 years“Norton Study Calculates Cost of Global Cybercrime: US$114 Billion Annually”, Symantec Corp, 2012
http://www.symantec.com/about/news/release/article.jsp?prid=20110907_02
Lec1: Introduction to Pattern Recognition 4
Cyberinfrastructure vulnerability� The average cost of cybercrime in the five countries covered by the ‘2012
Cost of Cybercrime Study’
� Source: HP/Ponemon Institute.
“Cybercrime Attacks Double in Three Years”, Computer Fraud & Security, 2012
Lec1: Introduction to Pattern Recognition 5
Cyberinfrastructure vulnerability� In May, 2011, more than 77 million Sony Playstations hacked
� 24 Million customers’ information of Amazon Fashion e-retailer
hacked
� Q4 2009 to Q4 2010: 8% increase in DDoS attacks against e-
Commerce companies
� Q4 2010 to Q4 2011, attacks up by 153%
� 94 Billion spam messages sent daily, cost society 20 Billions
US
� Global Payments, reported credit card data leakage in 2012
and its share price drops 9% immediately
� Heartland Payment Systems, paid more than US 110
million to Visa, Master and American Express and other
companies to settle claims
Sony warns of almost 25 million extra user detail thefthttp://www.bbc.co.uk/news/technology-13256817
Lec1: Introduction to Pattern Recognition 6
Cyberinfrastructure vulnerability
� Steganography made news headlines
� U.S. charged 11 individuals to act as unlawful
agents of a foreign nation by using
steganography to embed messages in more
than 100 image files posted on public websites
� Rumor has it that the 911 terrorists used stego
media to communicate their attack plan
http://www.nij.gov/nij/topics/forensics/evidence/digital/analysis/steganography.htmhttp://www.justice.gov/opa/pr/2010/June/10-nsd-753.html
Lec1: Introduction to Pattern Recognition 7
Challenge of Cybersecurity
� Misuse Detection
� Stealing information, unauthorized control
� Anomaly Detection
� Denial of Service Attack
� Scan Detection
� Scan for weak point in cyber-infrastructure
� Network Profiling
� Mimic normal network flow patterns
� Steganography
� Hide information in JPEG files on websites
“Data Mining and Machine Learning in Cybersecurity”, CRC Press, 2011
Lec1: Introduction to Pattern Recognition 8
Challenge of Cybersecurity
� Machine Learning for Cybersecurity
� Misuse Detection
� Classify abnormal behavior or command sequence
� Anomaly Detection
� Classify abnormal burst of flow and increase in flow traffic
� Scan Detection
� Classify incoming request to be malicious or not
� Network Profiling
� Learn the pattern of network flow to facilitate misuse and anomaly detection, understanding network pattern and discovery of abnormal flows
� Steganography
� Classify a JPEG image containing hidden message or not
Lec1: Introduction to Pattern Recognition 9
Cybersecurity Solutions
� Policy Driven Approach
� Simulation Approach
� Data Mining Approach
� Machine Learning Approach
� Hybrid Approach
Lec1: Introduction to Pattern Recognition 10
Challenge of Cybersecurity
� Big data:
� 2.5 quintillion (2.5×1018) bytes of data created daily in 2012
� 3 V phenomenon -- volume (data size), velocity (data speed in and out), and variety (Image, map, FB, vedio, email, website)
� Imbalance between attack and normal patterns
� Among billions of TCP packets, only a few of them may be malicious
� Attacks and abnormal behaviors are minorities comparing to normal usage and packets
� Fast response
� Characteristics of the Internet and local area networks
� Robustness
� With only training samples, it is not reasonable to ask for correct decision for all possible future attacks which may be very different from training samples
� Robustness refers to the ability of correctly classifying patterns similar to training samples
Lec1: Introduction to Pattern Recognition 11
Steganography� Steganography method can hide information in carriers
such as images, audio or video files that no one, except
the sender and the recipient, suspects the existence of
the message
Stego imageOriginal image Secret messages
Lec1: Introduction to Pattern Recognition 1212
Steganography for Secured
Communication
Send a lovely cat
to Mary
Lec1: Introduction to Pattern Recognition 13
Looks good and nothing suspicious
Network Admin
Steganography for Secured Communication
Lec1: Introduction to Pattern Recognition 1414
Steganography for Secured Communication
Wow! What A
lovely cat!
Lec1: Introduction to Pattern Recognition 15
How to hide the cat?
Simple LSB
248,230,250
30,19,13
30,20,1230,19,12
31,21,1530,21,1230,20,12
M
O
O
L
Every pixel is represented by three [0, 255] values (RGB)
12,20,30; 12,21,30; 15,21,31; …250,230,248
00001100 00010100 00011110
Lec1: Introduction to Pattern Recognition 16
How to hide the cat?
Simple LSB
218,222,240
229,219,243
230,220,242230,219,242
231,233,245230,233,245230,232,245
M
O
O
L
Every pixel is represented by three [0, 255] values (RGB)
245,232,230; 245,233,230; 245,233,231; …240,222,218
11110101 11101000 11100110 11110101 11101001 11100110
We do the same conversion to the cover image
Lec1: Introduction to Pattern Recognition 17
How to hide the cat?
Simple LSB
00001100 00010100 00011110
11110101 11101000 11100110 11110101 11101001 111001100 0 0 0 1 1
11110101 11101000 11100110 11110101 11101001 11100110
11110100 11101000 11100110 11110100 11101001 11100111
00001100 00010100 00011110
Lec1: Introduction to Pattern Recognition 18
How to hide the cat?
Simple LSB
218,222,240
229,219,243
230,220,242230,219,242
231,233,245231,232,244230,232,244
M
O
O
L
Every pixel is represented by three [0, 255] values (RGB)
244,232,230; 244,233,231; 245,233,231; …240,222,218
11110100 11101000 11100110 11110100 11101001 11100111
Lec1: Introduction to Pattern Recognition 1919
Internet
Steganography & Steganalysis
Raw
Image
JPEG
Image
Quantization
Steganography
Stego
JPEG
Steganalysis Feature
Extraction
Classification Result
(Stego or not)
Steganalysis
classification
Steganalysis
Lec1: Introduction to Pattern Recognition 20
Steganalysis� Train a classifier to classify an image whether
it contains a hidden message or not
� Basic idea is to identify the changes in
statistics of transitional probabilities of
different DCT coefficients in the JPEG file
after compression
Lec1: Introduction to Pattern Recognition 21
Robust Steganalysis
� Why current ML methods not effective?
� Performance of current steganalysis methods drops significantly when images of training and testing are different and/or compressed by different quantization tables
� Difference in quantization tables is unavoidable owing to the large variety of digital camera and editing software
� Difference in training and testing images is natural since no training set covers all possibilities
� So, how to design a robust steganalysis method is a key issue for discovering stego images.
Lec1: Introduction to Pattern Recognition 22
Examples of standard and non-standard
quantization tables of 75 quality factor
8 6 5 8 12 20 26 31
6 6 7 10 13 29 30 28
7 7 8 12 20 29 35 28
7 9 11 15 26 44 40 31
9 11 19 28 34 55 52 39
12 18 28 32 41 52 57 46
25 32 39 44 52 61 60 51
36 46 48 49 56 50 52 50
8 6 5 8 12 19 26 31
6 5 7 9 13 29 29 27
6 7 7 11 19 28 35 28
6 9 11 15 25 43 40 30
9 10 18 27 34 54 51 39
11 18 27 31 40 51 57 45
24 31 39 43 51 61 59 50
36 46 48 49 56 49 51 50
8 5 4 8 11 20 25 30
5 6 7 9 13 28 29 27
6 6 8 11 20 29 35 27
7 9 11 14 26 44 40 30
8 11 19 28 34 54 52 39
12 17 27 31 41 51 56 45
25 31 38 44 52 60 60 51
36 46 48 48 55 49 52 49
75s 75ns 75ns
�JPEG compression uses Quantization Table (QT)
•100 standard JPEG Quantization Tables (QTs)
•QT is a 8x8 integer matrix
Lec1: Introduction to Pattern Recognition 2323
Difference in steganalysis features of the SAME IMAGE
compressed by different QTs of different digital cameras
Sony DSC-H9 uses dynamic QTs
Robust to Quantization Table Changes
Lec1: Introduction to Pattern Recognition 24
Training Image
Very Similar Images
Similar Images
Dissimilar Images
Totally DifferentImages
Testing Images
Lec1: Introduction to Pattern Recognition 25
Robust to Different Testing Images
Chen: LGEM:
Chen: LGEM:
Chen: LGEM:
Chen: LGEM:
Chen: LGEM:
Chen: LGEM:Chen: LGEM:
Chen: LGEM:
Chen: LGEM:
Chen: LGEM:
is the training image.
Others are testing images.
Δx denotes Manhattan distance.
Lec1: Introduction to Pattern Recognition 26
The Case of
Salmon and Sea Bass
Lec1: Introduction to Pattern Recognition 27
Pattern Recognition – An Example
Salmon / Sea Bass
� Real Life Example
�A fish packing plant
wants to automate the
process of sorting
incoming fishes
(Salmon / Sea Bass)
on a belt according to
species
?Sea bass
Salmon
Lec1: Introduction to Pattern Recognition 28
PR: Salmon / Sea Bass
Process?
Fish
� Steps in the sorting process
Preprocessing (Isolate Fish, reduce noise…)
Image
Classification
Input Features
Class (Salmon / Sea Bass)
Output
Feature Extraction (Take Measurement)
Refined Image
Sensing (camera)
Object
Lec1: Introduction to Pattern Recognition 29
PR: Salmon / Sea Bass
Process
� Sensing
�Digitize the object to the format which can be
handled by machines
� Preprocessing
�Refine the data
�What can cause problems during sensing?
� E.g. lighting conditions, position of fish on the
conveyor belt, camera noise, etc.
?
Lec1: Introduction to Pattern Recognition 30
PR: Salmon / Sea Bass
Process
� Feature Extraction
�What kind of information can distinguish one
specie of fish from the other?
� E.g. length, width, weight, number and shape of
fins, tail shape, etc.
�Experts may help
� Classification
�Many classification techniques (classifiers)
available
�Discuss in detail later
?
Lec1: Introduction to Pattern Recognition 31
?PR: Salmon / Sea Bass
Example of Feature Extraction
� Fisherman (the expert) :
�A salmon is usually shorter than a sea bass
� Length is chosen (as a feature) as a
decision criterion
� But what is the decision threshold?
Lec1: Introduction to Pattern Recognition 32
?
Histograms of the length feature
for two types of fish in Training Samples
Sea BassSalmon
l*
� 15 is selected as the threshold
� Although sea bass is longer than salmon in general, there are many exceptions
� The experts may be wrong!
� How about other features?
� E.g. lightness
PR: Salmon / Sea Bass
Length as feature
Lec1: Introduction to Pattern Recognition 33
?
Histograms for the lightness feature
for the types of fish in Training Samples
Sea BassSalmon
� 5.5 is selected as the threshold
� Using “lightness” as a feature is much betterthan using “length”
PR: Salmon / Sea Bass
Lightness as feature
Lec1: Introduction to Pattern Recognition 34
?PR: Salmon / Sea Bass
Cost Consideration� Besides accuracy, “costs of different errors” should also be
considered
� For example:
� Case 1: Company’s view
� Salmon is more expensive than sea bass. Selling Salmon with the
price of sea bass will be a loss
� “If a fish is a salmon, it is classified as sea bass” HIGH cost
� “If a fish is a sea bass, it is classified as salmon” LOW cost
� Case 2: Customer’s view
� Customers who buy salmon will be very upset if they get sea bass;
Customers who buy sea bass will not be upset if they get the more
expensive salmon
� “If a fish is a salmon, it is classified as sea bass” LOW cost
� “If a fish is a sea bass, it is classified as salmon” HIGH cost
Lec1: Introduction to Pattern Recognition 35
PR: Salmon / Sea Bass
Cost Consideration� How would these cost considerations affect our decision?
?
Sea BassSalmon
Sea BassSalmon Sea BassSalmon
Case 1 Case 2
More seabass
Mistaken as salmonMore salmon
Mistaken as seabass
Lec1: Introduction to Pattern Recognition 36
PR: Salmon / Sea Bass
Multiple Features
� If using only ONE feature is not good enough, more features can be used.
� Two features:
�Lightness: x1
�Width: x2
� A fish is represented by a point in a two-dimensional feature space:
?
Lec1: Introduction to Pattern Recognition 37
PR: Salmon / Sea Bass
Simple Classifier?
The two features (lightness and width)
for sea bass and salmon in Training Samples
� A decision boundary
can be drawn to divide
the feature space into
two regions
(Salmon / Sea Bass)
� Is it (a linear classifier)
too simple?
Sea Bass
Salmon
?
What is this unseen fish?
Lec1: Introduction to Pattern Recognition 38
PR: Salmon / Sea Bass
Complex Classifier
� Will other classifiers be better than Linear
Classifier (Straight Line)?
� More complex classifier:
?
� It classifies training
samples perfectly
� However, the ultimate
objective is to classify
unseen fishes correctly
� Can it be generalized to
unseen sample??
What is this unseen fish?
Lec1: Introduction to Pattern Recognition 39
PR: Salmon / Sea Bass
Appropriate Classifier� Simple Classifier
� Performance on the training samples is not good
� Complex Classifier� Cannot be generalized to the unseen samples
� Tradeoff between accuracy of training samples and complexity
?
� Look more
reasonable
� Not too complex
� Good in classifying
the training samples
?
What is this unseen fish?
Lec1: Introduction to Pattern Recognition 40
Pattern Recognition Systems
Environment
Preprocessing
Classification
Feature Extraction
Sensing
Action / Decision
Training Samples
Learning
Model
Learning Phase
Classifying Phase
Lec1: Introduction to Pattern Recognition 41
Pattern Recognition Systems
� Sensing
�Use of a transducer
� E.g. camera or microphone
�Depends on
� Bandwidth
� Resolution
� Sensitivity
� Distortion of the transducer
� Cost
Environment
Preprocessing
Classification
Feature Extraction
Sensing
Action / Decision
Lec1: Introduction to Pattern Recognition 42
Pattern Recognition Systems
� Pre-Processing (Segmentation)
�Patterns should be well separated
and should not overlap, E.g.
� Fish Recognition:
� Fish often abutting or overlapping
� System must determine where one fish ends and the
next begins
� Speech recognition:
� Clear boundaries between two consecutive words
(Difficult for speech because we only receive a sequence
of waveform)
Environment
Preprocessing
Classification
Feature Extraction
Sensing
Action / Decision
Lec1: Introduction to Pattern Recognition 43
Pattern Recognition Systems
� Feature Extraction
�Choice of features vital to success of a pattern
recognition system
�Problem and domain dependent
� Requires domain knowledge
�Criteria:
Recognition result should be
� Invariant to translation
� Location of fish on conveyor belt irrelevant
Environment
Preprocessing
Classification
Feature Extraction
Sensing
Action / Decision
Lec1: Introduction to Pattern Recognition 44
Pattern Recognition Systems
� Invariant to rotation� Rotation of fish irrelevant
� Invariant to scale� Size of fish irrelevant
� This is why the length of fish is not a good feature
� Invariant to occlusion� Parts of the object hidden by other parts irrelevant
� The eye of fish may not be captured by the camera when
the fish is rotated
� Face recognition with and without sun-glasses
Environment
Preprocessing
Classification
Feature Extraction
Sensing
Action / Decision
Lec1: Introduction to Pattern Recognition 45
Pattern Recognition Systems
� Invariant to projection distortion� Distortion caused by camera angle
or distance irrelevant
� Invariant to rate� In speech recognition, duration of the word irrelevant
� Different people speak at different speeds
� Invariant to deformation� Particularly significant for Handwritten character
recognition
� Different people write the same word differently
� Even the same person at different times
Environment
Preprocessing
Classification
Feature Extraction
Sensing
Action / Decision
Lec1: Introduction to Pattern Recognition 46
Pattern Recognition Systems
� Example: Fish Recognition
� The features should be invariant to rotation
� However, for character recognition, a good feature should not
be invariant to rotation
� Example: Classification of horse and dog
� Number of leg is not a good feature
� Body color is not a good feature
� Height may be a good feature
� Example: Classification of spices of dog
� Body color is important
� Height may or may not be a good feature
= ≠
Environment
Preprocessing
Classification
Feature Extraction
Sensing
Action / Decision
Lec1: Introduction to Pattern Recognition 47
Pattern Recognition Systems
� Classification
�Use a feature vector provided by a feature
extractor to assign an object to a category
�Two factors decide the degree of difficulty of
the classification
� Variability in the feature values in the same
category
� Small within-class variation is preferred
� Variability in the feature values in the different
categories
� Large inter-class variation is preferred
Environment
Preprocessing
Classification
Feature Extraction
Sensing
Action / Decision
Lec1: Introduction to Pattern Recognition 48
Pattern Recognition Systems
� Action / Decision
(Post Processing)
�Exploit context input-dependent information
other than from the target pattern itself to
improve performance
�After classification, we could perform some
actions (cost) based on the classification
result
Environment
Preprocessing
Classification
Feature Extraction
Sensing
Action / Decision
Lec1: Introduction to Pattern Recognition 49
Pattern Recognition Systems
�We may also measure the
performance of classification
� Error rate
�Percent of patterns being correctly classified
� Risk
�Different misclassifications may lead to different
penalties
�Salmon is more expensive than sea bass
�Higher penalty for misclassifying salmon to
be sea bass
Environment
Preprocessing
Classification
Feature Extraction
Sensing
Action / Decision
Lec1: Introduction to Pattern Recognition 50
Pattern Recognition
Design Cycle
Data Collection
Model Selection
Evaluation
Training
Feature Selection
Start
End
Lec1: Introduction to Pattern Recognition 51
Pattern Recognition
Design Cycle
� Data Collection
�Collect samples for real environment
�Separate samples into two different sets
exclusively:
� Training Samples (For training)
� Testing Samples (For evaluation)
Data Collection
Model Selection
Evaluation
Training
Feature Selection
Start
End
Lec1: Introduction to Pattern Recognition 52
Pattern Recognition
Design Cycle�How to decide a sample set being
adequately large and representative?
� The more the better ?
� May be, depending on quality of samples collected. Time
and cost could be constraints, e.g.
� Medical data very expensive to collect
� Stock Market data is time dependent
� Not really. Sometimes
� Too much data could be confusing, e.g., internet
traffic data
� Only representative samples are useful
� Try to collect samples randomly without bias
Data Collection
Model Selection
Evaluation
Training
Feature Selection
Start
End
Lec1: Introduction to Pattern Recognition 53
Pattern Recognition
Design Cycle
� Feature Selection
�Depends on the characteristics
of the problem domain
�Prior Information
� E.g. Expert’s advices
�Computational cost and feasibility
� Simple to extract
Data Collection
Model Selection
Evaluation
Training
Feature Selection
Start
End
Lec1: Introduction to Pattern Recognition 54
Pattern Recognition
Design Cycle
�Discriminative� Similar values for patterns in same class
� Different values for patterns in different classes
� Invariant to transformation
� E.g. Translation, rotation and scale
�Robust to noise
� E.g. Occlusion, distortion, deformation, and variations in environment
Data Collection
Model Selection
Evaluation
Training
Feature Selection
Start
End
Lec1: Introduction to Pattern Recognition 55
Pattern Recognition
Design Cycle
� Model Selection
�Many different models� E.g. Neural Network, Decision Tree
� Will be discussed in detail later
�Domain dependent
� 2 class problem?
� Many features?
� Scattered data?
�How close to the true model?� Classification performance
� Complexity
Data Collection
Model Selection
Evaluation
Training
Feature Selection
Start
End
Lec1: Introduction to Pattern Recognition 56
Pattern Recognition
Design Cycle
� Training
�“Knowledge” is learnt from training samples
� Parameters of the classifiers are determined
�Only samples in training set are used
�Supervised learning� A teacher provides a class label for each pattern in
the training set
�Unsupervised learning� No teacher is available => No class label is
provided
� Input patterns are grouped “naturally”
Data Collection
Model Selection
Evaluation
Training
Feature Selection
Start
End
Lec1: Introduction to Pattern Recognition 57
Pattern Recognition
Design Cycle
� Evaluation
�Can the trained model generalize the
knowledge from training samples to
future unseen samples?
�Ultimate Objective
� Performance on unseen samples
(Generalization Ability)
� Cannot be calculated
Data Collection
Model Selection
Evaluation
Training
Feature Selection
Start
End
Lec1: Introduction to Pattern Recognition 58
Pattern Recognition
Design Cycle
�Measurable Criteria
� Performance on training samples
(Training Accuracy)
� Perfect in training samples > Over-fitting problem
(slide 38)
� Performance on testing samples
(Testing Accuracy)
� Better evaluation criterion than training samples
� Testing samples are not involved in training process
� Good at testing samples may be very bad at unseen
samples (No guarantee!)
� Repeating the experiments may help
Data Collection
Model Selection
Evaluation
Training
Feature Selection
Start
End
Lec1: Introduction to Pattern Recognition 59
Comparing Classifiers
� For a classification problem, given
�Dataset D
�Classifiers A and B
� How can we measure which classifier, A or
B, is better for D?
Lec1: Introduction to Pattern Recognition 60
Comparing Classifiers
� Method
�Randomly separate D into Training Set and
Testing Set
�Use Training Set to train A and B
�Use Testing Set to measure the performances
of the trained A and B
�Select the better performing classifier
� Any problem with this proposed approach?
Lec1: Introduction to Pattern Recognition 61
Comparing Classifiers
� Problem: The winner may just be lucky in
performing better for that specific testing
set. No guarantee for different testing sets
� Two re-sampling techniques are
introduced to reduce the bias on testing
set:
� Independent Run
�Cross-Validation
Lec1: Introduction to Pattern Recognition 62
Comparing Classifiers
Independent Run� Statistical method
� Also called Bootstrap and Jackknifing
� Repeat the experiment “n” times independently
� Repeat n times
� i is the number of running time
� Randomly separate D into Training Setiand Testing Set
i
� Use Training Setito train A
iand B
i
� Use Testing Setito calculate the accuracy of trained A
iand B
i
� Select the classifier with higher mean (average)
accuracy
Lec1: Introduction to Pattern Recognition 63
Comparing Classifiers
Cross-Validation� M-fold Cross-Validation
� Dataset D is randomly
divided into m disjoint sets Di
of equal size n / m, where n is
the number of samples in
dataset
� Classifier is trained m times
and each time with different
set held out as a testing set
� Select the classifier with
higher mean accuracy
D1
D2
D3
D4
D5
D
D1
D2
D3
D4
D5
D1
D2
D3
D4
D5
D1
D2
D3
D4
D5
D1
D2
D3
D4
D5
D1
D2
D3
D4
D5
1
2
3
4
5
Training Set
Testing Set
randomly
Lec1: Introduction to Pattern Recognition 64
Pattern Recognition
� Three types of learning
�Supervised Learning
� Lectures 01 - 07
�Unsupervised Learning
� Lecture 08
�Reinforcement Learning
� Not covered in this course
Lec1: Introduction to Pattern Recognition 65
Pattern Recognition
Supervised Learning
� Need a teacher
� A class label for each training sample is
known
� Any mistake made by the model during
training is known
� Examples:
�Neural Network and Decision Tree
Lec1: Introduction to Pattern Recognition 66
Pattern Recognition
Unsupervised Learning
� No teacher is available
� Don’t know if a model is correct or not
� Patterns are grouped “naturally”
� Examples:
�Clustering
Lec1: Introduction to Pattern Recognition 67
Pattern Recognition
Reinforcement Learning
� Training examples as input-output pattern
pairs, with evaluative output provided by a
critic( “lazy” teacher)
�Just know the answer is incorrect
�But do not know the correct answer
� Examples:
�Learning to play chess game