© alchemyapi ceng 501 deep learning week 2

CENG 501 Deep Learning

Week 2

Sinan Kalkan

Administrative Issues

• Registrations• Access to Moodle (https://odtuclass.metu.edu.tr/)• Project paper selection– https://docs.google.com/spreadsheets/d/1tzPHq_Vgu6gCwNyXJHGv

qeA6pgU67H0nKYjqkisWfKc/edit?usp=sharing– Deadline: 29 March + 1-2 weeks

Spring 2021 Sinan Kalkan 2

Deep learning and other approaches

3Fig.: I. Goodfellow

Previously on CENG501!

So what does Deep (Machine) Learning bring/provide?

• (Hierarchical) Compositionality� Cascade of non-linear transformations� Multiple layers of representations

• End-to-End Learning� Learning (goal-driven) representations� Learning to feature extraction

• Distributed Representations� No single neuron “encodes” everything� Groups of neurons work together

4Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Building A Complicated Function

Given a library of simple functions

Compose into a

complicate function

Idea 2: Compositions• Deep Learning

• Grammar models

• Scattering transforms…

f(x) = g1(g2(. . . (gn(x) . . .))

Slide Credit: Marc'Aurelio Ranzato, Yann LeCunSpring 2021Sinan Kalkan

Trainable Classifier

Low-LevelFeature

Mid-LevelFeature

High-LevelFeature

Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]

“car”

Deep Learning = Hierarchical Compositionality

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun 6Spring 2021Sinan Kalkan

Distributed Representations Toy Example• Can we interpret each dimension?

7Slide Credit: Moontae Lee Spring 2021Sinan Kalkan

It can make mistakes

• T. Xu, J. White, S. Kalkan, H. Gunes, "Investigating Bias and Fairness in Facial Expression Recognition", ECCV2020 Workshop: ChaLearn Looking at People workshop ECCV: Fair Face Recognition and Analysis, 2020.

• J. Cheong, S. Kalkan, H. Gunes, "The Hitchhiker’s Guide to Bias and Fairness in Facial Affective Signal Processing", IEEE Signal Processing Magazine.

8Joy Buolamwini & Timnit Gebru

Beery, Sara, Grant Van Horn, and Pietro Perona. "Recognition in terra incognita." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

• Introduction to machine learning

• Towards deep learning� History of deep learning� Biological neuron� Artificial neuron� Perceptron learning� Linear classification/regression� Non-linear classification/regression� Multi-layer perceptrons

A CRASH COURSE ONMACHINE LEARNING

Facial images in this part of the lecture are from the JAFFE database: https://figshare.com/articles/journal_contribution/jaffe_desc_pdf/5245003

Spring 2021 Sinan Kalkan

Sinan Kalkan

machine learningWhat is ?

Finding “hidden” structures in data “automatically”.

Input Label

Unhappy

Input Label

Sinan Kalkan

Finding “hidden” structures in data “automatically”.

With supervision è Supervised learning

Sinan Kalkan

𝐱 ∈ ℐ

Unhappy

𝐲 ∈ 𝒪

Sinan Kalkan

𝐱 ∈ ℐ

Unhappy

𝐲 ∈ 𝒪𝐲 = 𝑓(𝐱)

… … ……

𝑓 ?

Testing

Training

Machine learning pipeline

“Learn”Learned

Models or Classifiers

Data with label(label: happy, unhappy)

- Size- Texture- Color- Histogram of

oriented gradients- SIFT- Etc.

ExtractFeatures

Predict

Happy or Unhappy

Test input

GenerativeDiscriminativeGeneral Approaches

Find separating line (in general: hyperplane)

Fit a function to data(regression).

Learn a model for each class.

Supervised

Instance Label

unhappy

Unsupervised

General Approaches (cont’d)

Extract Features

Learn a model

Extract Features

Learn a model

e.g. SVM

e.g. k-meansclustering

General Approaches (cont’d)

Generative Discriminative

Supervised

Unsupervised

Neural Networks

Support Vector MachinesNeural Networks

K-Nearest NeighborsDecision TreesRandom Forest

Mixture Models K-means

Outline for the Machine Learning Part

Supervised Learning

Unsupervised Learning

Other Forms of Learning

General Issues in Learning

Evaluation

SUPERVISED LEARNING

Supervised Machine Learning

• Find the following mapping, given the training set 𝐱!, 𝐲! !"#$ :

𝐲 = 𝑓 𝐱

𝑓: 𝕏 → 𝕐.

• Targets can be for classification or regression

• Methods: – K Nearest Neighbors, Support Vector Machines,

Decision Trees, Random Forest, Neural Networks

Input Target Value

Target Class

Supervised Machine Learning MethodsK Nearest Neighbors

• Advantages:– No training– Simple

• Disadvantages:– Slow testing time (more efficient

versions exist)– Needs a lot of memory

Fig: http://cs231n.github.io/classification/Spring 2021 Sinan Kalkan 23

k shall be 3

Supervised Machine Learning MethodsSupport Vector Machines

Borrowed mostly from the slides of: ML Group, Uni of Texas at Austin; Mingyue Tan, Uni of British Columbia

wTx + b = 0

wTx + b < 0wTx + b > 0

Binary classification can be viewed as the task of separating classes in feature space:

𝐲 = 𝑓(𝐱) = sign(𝐰𝑻𝐱 + 𝑏)

• How many lines are there?• Which one is optimal?

25Borrowed mostly from the slides of: ML Group, Uni of Texas at Austin; Mingyue Tan, Uni of British Columbia

• Distance from example xi to the separator is 𝑟 = 𝐰!𝐱"#$𝐰

• Examples closest to the hyperplane are support vectors. • Margin ρ of the separator is the distance between support vectors.

Maximum Margin Classification• Maximizing the margin is good

according to intuition.• Implies that only support vectors

matter; other training examples are ignorable.

What we know:• w . x+ + b = +1 • w . x- + b = -1 • w . (x+-x-) = 2

“Predict Class =

zonewx+b=1

wx+b=0

wx+b=-1

wwwxx 2)(=

ρ = Margin Width

Borrowed mostly from the slides of: ML Group, Uni of Texas at Austin; Mingyue Tan, Uni of British Columbia

• Then we can formulate the optimization problem:

which can be reformulated as:

or as the following in more compact form:

Find w and b that maximizes:

𝜌 = %𝐰 ,

such that: 𝑦𝑖(𝐰𝑻𝐱𝑖 + 𝑏) ≥ 1 for all (xi, yi), i=1..n

Find w and b that minimizes:

𝚽(𝐰) = 𝐰 2 = 𝐰𝑇𝐰,

such that: 𝑦𝑖(𝐰𝑻𝐱𝑖 + 𝑏) ≥ 1 for all (xi, yi), i=1..n

min𝐰∈ℝ(,)∈ℝ

𝐰 * −6!

𝛼! 𝑦𝑖(𝐰𝑻𝐱𝑖 + 𝑏) − 1

Limitation: Not robust to noisy data: • Solution: Allow some margin of

error for each sample point

Find w and b that minimizes:

𝚽(𝐰) = 𝐰 2 = 𝐰𝑇𝐰,

such that: 𝑦𝑖(𝐰𝑻𝐱𝑖 + 𝑏) ≥ (1 − 𝜖)) for all (xi, yi), i=1..n

and 𝜖# +⋯+ 𝜖$ < 𝐶. Spring 2021 Sinan Kalkan

Take C as …

Limitation: Not suitable for non-linearly separable data

• Solution: Expand the feature space using a “kernel”, a function calculating some similarity between samples as a dot product:

𝐾 𝐱) , 𝐱* = 𝜙 𝐱) ⋅ 𝜙(𝐱*)

• Example kernels:– Polynomial

𝐾 𝐱! , 𝐱" = 𝐱! ⋅ 𝐱" + 1#

– Radial-basis Function:

𝐾 𝐱! , 𝐱" = exp −𝛾6$

(𝑥!$−𝑥"$)%

Supervised Machine Learning MethodsDecision Trees

T. Mitchell, “Machine Learning”, McGraw Hill, 1997.

Supervised Machine Learning MethodsDecision Trees

• At each node 𝑚, we divide the space based on a simple rule:

𝑥! > 𝜃+• Following the nodes, we reach

leaves.• Classification: Leaves are

labels• Regression: Leaves are

numerical values (a function of local values)

Fig: Oya Celiktutan.Spring 2021 Sinan Kalkan 34

Supervised Machine Learning MethodsRandom Forest

• Instead of a single strong classifier, have many weak classifiers and combine their answers.

Fig: https://towardsdatascience.com/understanding-random-forest-58381e0602d2

Take 9 trees

UNSUPERVISED LEARNING

Fig: http://www.nlpca.org/

Unsupervised Machine Learning

• Learning without labels• Goal: Discover a better

representation of data• Uses:

– Dimensionality reduction– Data visualization– Preparation for supervised learning– Clustering

• Methods:– Principle/Independent Component

Analysis, k-Means/x-Means Clustering, Manifold learning methods, …

Unsupervised Machine Learning MethodsClustering with k-Means

k shall be 2

Unsupervised Machine Learning MethodsClustering with k-Means

Raw data

Problem:

- Need to know k beforehand.

Unsupervised Machine Learning MethodsDimensionality Reduction

Principle Component Analysis• Analyze variance in the

data• Find new set of axes

such that along the first axis, variance is the highest; along the second, variance is the second highest etc.

Fig: http://cs231n.github.io/neural-networks-2/

Fig: http://www.nlpca.org/

OTHER FORMS OF LEARNING

Self-supervised learningWeakly-supervised learningZero-shot/Few-show learningReinforcement LearningMeta-learningLife-long learning

Self-supervisedLearning

• Form supervision from data itself by predicting one portion of data from another portion

• An introductory read: https://lilianweng.github.io/lil-log/2019/11/10/self-supervised-learning.html

Spring 2021 Sinan Kalkan 43Fig.: Doersch et al., 2015

Weakly-supervised LearningE.g. weakly supervised object detection

There is at least one penguin in this photo

Few-shot/Zero-shot Learning

Fig: https://preparingforgre.com/item/42488

Reinforcement Learning

Fig: Richard Sutton

Meta Learning

• One learning module shaping/controlling another

F. I. Dogan, I. Bozcan, M. Celik, S. Kalkan, "CINet: A Learning Based Approach to Incremental Context Modeling in Robots",IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.

Lifelong Learning

I. Dogan, H. Celikkanat, S. Kalkan, "A Deep Incremental Boltzmann Machine for Modeling Context in Robots", International Conference on Robotics and Automation (ICRA), pp. 2411-2416, IEEE, 2018.

ISSUES IN MACHINE LEARNING

Model selection

𝑓 𝑥 = 𝑊𝑥

𝑓 𝑥 =4%

𝑊%𝑥%%

𝑓 𝑥 = 𝑊#𝑥 +W&𝑥&

𝑓 𝑥 = 𝑊𝑓#(𝑥)

𝑓 𝑥 = 𝑊#𝑓#(𝑥) +W&𝑓&(𝑥)&

Model ComplexityModels range in their flexibility to fit arbitrary data

complex model

unconstrained

large capacity mayallow it to memorizedata and fail tocapture regularities

simple model

constrained

small capacity mayprevent it from representing allstructure in data

low bias

high variance

high bias

low variance

Slide Credit: Michael Mozer51Spring 2021 Sinan Kalkan

Bias-Variance Dilemma

underfit overfit

Slide Credit: Michael Mozer52Spring 2021 Sinan Kalkan

EVALUATION

How to evaluate performance

• Hold-out method:

Training set Test set

Training set Test setValidation set

Cross-validation

• K-fold cross validation

Fig: https://scikit-learn.org/stable/modules/cross_validation.htmlSpring 2021 Sinan Kalkan 55

Cross-validation

Exhaustive validation:

– Leave-p-out (LPO) cross validation• One iteration: Use p samples as the validation set

and others as training set• Repeat this for all possible combinations of p

samples.

– Leave-one-out (LOO) is LPO with p=1

Cross-validation

• Group/subject-based validation: Determine the folds based on group/subject information

• Leave one subject out (LOSO) cross validation• Divide dataset into k subsets with respect to subject identifiers

Training and test sets contain the same subjects

Training and test sets do NOT contain the same subjects

Evaluation MetricsClassification

Credit: Oya CeliktutanSpring 2021 Sinan Kalkan 58

Evaluation MetricsClassification

• Always try to quantify the predictions that have not been made.

• Recall & F-measure are frequently used

Evaluation MetricsRegression

Training Vs. Test Set Error

Test Set

Training Set

Slide Credit: Michael Mozer 61Spring 2021 Sinan Kalkan

EXAMPLE

Facial Emotion Recognition

- Using SVM and MLP on Google Colab + sci-kit learn:https://colab.research.google.com/drive/1mZimbXBAJlqgl-04phHxXRYku6EvgUOw

© alchemyapi ceng 501 deep learning week 2

Documents

ceng.41-2013fall_lessonsheets_1-28 (1)

ceng 6101 project management

ceng summer 2020 undergraduate research program...

ccm alchemyapi and real-time aggregation

ceng 465 introduction to bioinformatics

guidance notes for ceng candidates

saad haj bakry, phd, ceng, fiee 1 security policy issues...

ceng 291 project

0303-0304 f04-04 ceng

· course title semester-ju-2006/07 haramaya university...

© alchemyapi ceng 501 deep learning week 8

i : ' lu ceng - padhle

ceng 394 introduction to human-computer interaction ceng 394...

ceng-5001 technical report · pdf filenb-108 dr. asnake...

ceng 585 paper presentation

ceng ebook3 v1

visual basic tutorial - ceng anadolu

general technical requirements for road works...prof....

ceng annual communication meeting with nrc · 2012. 12....

ceng 255 laboratory experiment #0 tutorial:...