Download - Chernodub Paris Deeplearning
Deep Learning for Face Recognition
Artem Chernodub, Yurii Paschenko
IMMSP NASU
Paris, 30 September 2015.
ZZWolf
𝑝 (𝑥|𝑦 )=𝑝 ( 𝑦|𝑥 )𝑝 (𝑥)
𝑝 (𝑦 )
Biological-inspired models
Neuroscience
Machine Learning
2 / 50
Biological Neural Networks
3 / 50
Artificial Neural Networks
Traditional (Shallow) Neural Networks
Deep Neural Networks
Deep Feedforward Neural Networks
Recurrent Neural Networks
4 / 50
Deep Neural Networks
Deep Learning = Learning of Representations (Features)
The traditional model of pattern recognition (since the late 50's):
fixed/engineered features + trainable classifierHand-
crafted Feature
Extractor
TrainableClassifier
Trainable Feature
Extractor
TrainableClassifier
End-to-end learning / Feature learning / Deep learning:
trainable features + trainable classifier
6 / 50
Deep Feedforward Neural Networks
• 2-stage training process: i) unsupervised pre-training; ii) fine tuning (vanishing gradients problem is beaten!).
• Number of hidden layers > 1 (usually 6-9).• 100K – 100M free parameters.• No (or less) feature preprocessing stage. 7 / 50
FLOPS comparison
https://ru.wikipedia.org/wiki/FLOPS
Type Name Flops Cost
Mobile Raspberry Pi 1st Gen, 700 Mhz
0,04 Gflops $35
Mobile Apple A8 1,4 Gflops $700 (in iPhone 6)
CPU Intel Core i7-4930K (Ivy Bridge), 3.7 GHz
140 Gflops $700
CPU Intel Core i7-5960X (Haswell), 3.0 GHz
350 Gflops $1300
GPU NVidia GTX 980 4612 Gflops (single precision),
144 Gflops (double precision)
$600 + cost of PC (~$1000)
GPU NVidia Tesla K80 8740 Gflops (single precision),
2910 Gflops (double precision)
$4500 + cost of PC (~1500)
8 / 50
Sparse Autoencoders
9 / 50
Dimensionality reduction
• Use a stacked RBM as deep auto-encoder
1. Train RBM with images as input & output
2. Limit one layer to few dimensions
Information has to pass through middle layer
G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.10 /
50
Original
Deep RBN
PCA
Dimensionality reduction
Olivetti face data, 25x25 pixel images reconstructed from 30 dimensions (625 30)
G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.11 /
50
How to use unsupervised pre-training stage / 1
12 / 50
How to use unsupervised pre-training stage / 2
13 / 50
How to use unsupervised pre-training stage / 3
14 / 50
How to use unsupervised pre-training stage / 4
15 / 50
Unlabeled data
Unlabeled data is readily available
Example: Images from the web
1. Download 10’000’000 images2. Train a 9-layer DNN3. Concepts are formed by DNN
G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.16 /
50
Dimensionality reduction
PCA Deep RBN
804’414 Reuters news stories, reduction to 2 dimensions
G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.17 /
50
Hierarchy of trained representations
Low-level feature
Middle-level
feature
Top-level feature
Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]
18 / 50
Face Recognition
Standard Face Recognition Scenario
Step 1. Face Detection
Step 2. Face Alignment
Step 3. Face Matching
20 / 50
Face Matching (Face Recognition)
A) Training the classifier B) Matching two face images
Classifier
Features
Answer21 /
50
Recognition of Face Attributes
• man;• white;• 45-50 years;• glasses – no;• moustache –
yes;• beard – yes.
22 / 50
Beauty Recognition
http://www.linkface.cn/face/beauty
Чернодуб А.Н., Пащенко Ю.А., Головченко К.А. Нейросетевая система определения привлекательности лица человека // «Нейроинформатика-2013», Москва, 21-25 января 2013, c. 254 — 259. 23 /
50
Face Matching
EigenFaces, Principal Component Analysis (PCA) for Face Matching, 1991
M. Turk, A. Pentland. Face Recognition using EigenFaces // Journal of cognitive neuroscience 3 (1), p. 71-86.
• Well-known method for dimensionality reduction.• Building features from the training data.• Impressive visualization of the basis vectors.
25 / 50
Local Binary Pattern Histograms (LBPH), 2006
• Fast & robust features for texture descriptions applied to face recognition.
• distance for comparison of histograms.• Matrix of importance regions (weighted .
T. Ahonen, A. Hadid. Face Description with Local Binary Patterns:Application to Face Recognition // IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 12, p. 2037-2041.
26 / 50
Deep Face (Facebook), 2014
Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification // CVPR 2014.
Model # of parameters
Accuracy, %, LFW
Deep Face Net 128M 97.25
Human level N/A ~ 97.64
Training data: 4M facial images
27 / 50
Convolutional Neural Networks: Return of Jedi
Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.github.io/convolutional-networks
Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An MIT Press book in preparation http://www-labs.iro.umontreal.ca/~bengioy/DLbook 28 /
50
DeepFace Training Framework
Step 1. Supervised training for identification
Step 2. Use produced features for face matching
Big Face database
~1-10M images,~ 1-5K persons
ConvolutionalNeuralNetwork
Feature
1-5Klabels
Feature 2
Feature 1 - Cosine metric- Euclid metric
- metric- Siamese networks
Similarity
29 / 50
Face Recognition Algorithm
Face localization
Normalization
Feature extraction
Comparing
Verification
metricFALSE
30 / 50
Databases for Deep Face Training
• Facebook’s Social Face Classification (SCF) dataset, 2014. 4.4M labeled faces, 4030 people with 800 to 1200 faces each. Private dataset.
• Visual Geometry Group Dataset, Oxford, 2015. 2622 people with 1000 faces each. Private dataset.
• CASIA WebFace Database, 2014. 10575 people, 500K faces. Public dataset.http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html
31 / 50
CASIA
• Dataset: CASIA WebFace (10 575 subjects and 494 414 face images)
• Alignment: 2D• Architectures: single CNN + triplet
loss• Verification metric:– Cosine– Joint Bayes
D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. Technical report, arXiv:1411.7923, 2014.
32 / 50
CASIA architecture
• Input: gray image 100x100
• Feature vector size: 320
• Parameters: ~5M
33 / 50
CASIA replication
Training data Accuracy on LFW
Normalized 97,27 %
No alignment 94,15 %
Our 2D alignment 96,23 %
34 / 50
Error rate vs Database size
Identities
Faces Error rate
1.5K 150K 3.1%
9K 450K 1.35%
18K 1.2M 0.87%
J. Liu, Y. Deng, T. Bai, Zh. Wei, C. Huang. Baidu Targeting Ultimate Accuracy: Face Recognition via Deep Embedding, 2015 http://hdl.handle.net/1721.1/97999
35 / 50
Face Databases (benchmarks)
AT&T Database (Olivetti, ORL): 1992-1994
F. Samaria, A. Harter. Parameterisation of a Stochastic Model for Human Face Identification // Proceedings of 2nd IEEE Workshop on Applications of Computer Vision, Sarasota FL, December 1994.
http://www.uk.research.att.com/
• Grayscale images, 92x112.
• 40 subjects, 10 images per subject.
• Different facial expressions, poses, glasses.
• Studio, single session.
37 / 50
FERET dataset, 1994-1996
P.J. Phillips, H. Moon, P. Rauss, S. A. Rizvi. The FERET September 1996 Database and Evaluation Procedure // First International Conference, AVBPA'97 Crans-Montana, Switzerland, March 12–14, 1997, pp. 395-402
http://www.nist.gov/srd/
• 2413 still facial images, representing 856 individuals, studio, two sessions.
• Different facial expressions, poses, light.
38 / 50
Labeled Faces in the Wild dataset, 2007
G. B. Huang, M. Ramesh, T. Berg, E. Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments // University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.http://vis-www.cs.umass.edu/lfw/
• Images with faces collected from the web.
• Pairs comparison, restricted mode.
• test: 10-fold cross-validation, 6000 face pairs.
39 / 50
Results for Face Matching, LWF (option “Labeled outside data”)
http://vis-www.cs.umass.edu/lfw/results.html
Method Accuracy
1 EigenFaces 60,20%
2 LBP-classic (3K features) 72,43%
3 High-dim LBP (100K features) 95,17%
4 DeepFace 97,25%
5 Human level 97,64%
6 DeepID3 99,53%
7 Baidu 99,77%
40 / 50
Face Detection
Viola-Jones Object Detector
• Very popular for Face Detection.• Different features except HAAR: LBP, SURF/SIFT, etc.• Available free in OpenCV library (http://opencv.org).• OpenCV implementation supports Windows, Linux,
Mac OS, iOS (not officially) and Android.• May be trained with MATLAB tool traincascade.
42 / 50
Images pyramid for Viola-Jones
43 / 50
FDDB: Face Detection Data Set and Benchmark
Vidit Jain and Erik Learned-Miller. FDDB: A Benchmark for Face Detection in Unconstrained Settings // Technical Report UM-CS-2010-009, Dept. of Computer Science, University of Massachusetts, Amherst. 2010.
http://vis-www.cs.umass.edu/fddb
• 5171 faces in a set of 2845 LWF images.
• Special software tool for testing.
44 / 50
Viola-Jones Object Detector Classifier Structure
P. Viola, M. Jones. Rapid object detection using a boosted cascade of simple features // Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001.
45 / 50
FDDB: state-of-the-art
http://vis-www.cs.umass.edu/fddb/results.html46 /
50
Face Alignment
3D Face Alignment
Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification // CVPR 2014. 48 /
50
2D Face Alignment
D. Yi, Z. Lei, S. Liao, S.Z. Li. Learning Face Representation from Scratch // CoRR abs/1411.7923 (2014). [CASIA]
49 / 50
Some Computer Vision / Machine Learning tools• OpenCV, VLFeat – popular Computer Vision
libraries.• mexopencv – MATLAB interface for
OpenCV.• pythonXY – all-in-one Machine Learning
Package.• libSVM – reliable cross-platform Support
Vector Machine library.• NetLab – Shallow Neural Networks & other
ML models, from Christopher Bishop’s book.
• Caffe, Theano, Torch, matconvnet – frameworks for training Deep Neural Networks. 50 /
50