few-shot and zero-shot learning

Post on 24-May-2022

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Few-Shot and Zero-Shot Learning

Xiaolong Wang

This Class

• Few-shot learning

• Meta-learning for few-shot learning

• Zero-shot learning

Few-shot Learning

The Problem

• Humans can learn a novel concept from a few samples

• Goal: let machine learning algorithms learn from a few samples

(Saiga Antelope)

❌ ✔

Introduction

• Issue: learning with insufficient data causes overfitting

Introduction

• Intuition: humans can learn quickly, as they have a lot of relevant experience

… …

Introduction

• Solution: transfer learning

• Base classes: classes with sufficient samples (training set)

• Novel classes: classes with a few samples… …

N-way K-shot Task

• N novel classes• Support-set: N×K images (training set)• Query-set: images to classify, typically N×Q

• Common evaluation protocol• Sampled from dataset to evaluate

• “task” / “FSL task” denotes this task by default…

Main Types of FSL Algorithms

• Transferring standard classification model• Nearest neighbor/centroid• Fine-tuning

• Meta-learning• Metric-based• Optimization-based

Transferring Standard Classification Model

Baseline #1: Nearest centroid

1. Train a classifier for all base classes (1st stage)

Baseline #1: Nearest centroid

1. Train a classifier for all base classes (1st stage)

2. Remove the last FC layer and get a feature encoder

Baseline #1: Nearest centroid

1. Train a classifier for all base classes (1st stage)

2. Remove the last FC layer and get a feature encoder

3. In a FSL task: compute the mean feature for each class in support set, classify query set by nearest neighbor

Baseline #1: Nearest centroid

• mean feature: “prototype” of a class

• Can be also viewed as estimated weights of a FC layer

• Distance: square Euclidean / cosine similarity

Improving #1: Cosine Classifier

Use cosine similarity for both:• Training (1st stage): replace the last FC layer

• Inference: cosine distance to prototypes

scaler 𝜏 is a learnable parameter

Gidaris et al. Dynamic Few-Shot Visual Learning without Forgetting. CVPR 2018

Improving #1: Cosine Classifier

Gidaris et al. Dynamic Few-Shot Visual Learning without Forgetting. CVPR 2018

Improving #1: Cosine Classifier

(ablation study on validation set with generalized FSL setting, focus Novel only)

Baseline #2: Fine-tuning

1. Train a classifier for all base classes (1st stage)2. In a FSL task: Fine-tune with support set

Fine-tuning may cause overfittingOption: Fine-tuning the last FC layer

Improving #2: “Baseline++”

• Use cosine-classifier for both the 1st stage and fine-tuning

Chen et al. A Closer Look at Few-shot Classification. ICLR 2019

Improving #2: “Baseline++”

How to get a good representation for FSL?Idea: let the learning objective describe our goal

—— directly optimize towards the FSL tasks

Meta-Learning for FSL

• Learning the model by optimizing towards FSL tasks sampled from images in training set (base classes)

Meta-Learning for FSL• A Task: N-way K-shot (and Q-query). N*(K+Q) images1. Sample a task (support-set + query-set) from base classes2. A model processes the support-set, then classifies samples in query-set3. Compute the loss of query-set classification (using ground-truths), optimize towards the loss

http://web.stanford.edu/class/cs330/

Meta-Learning for FSL1. Sample a task (support-set + query-set) from base classes2. A model processes the support-set, then classifies samples in query-set3. Compute the loss of query-set classification (using ground-truths), optimize towards the loss

The key is Step 2: a differentiable algorithm

• A differentiable algorithm↔ A meta-learning method

Meta-Learning for FSL1. Sample a task (support-set + query-set) from base classes2. A model processes the support-set, then classifies samples in query-set3. Compute the loss of query-set classification (using ground-truths), optimize towards the loss

• Metric-based: Get features of the support-set, classify the query-set by feature comparison

• Optimization-based: the model optimizes towards the support-set for a few steps, then classifies the query-set

Metric-Based Meta-Learning

Matching Network

• Get features of support / query images• Classify query images by nearest-neighbor (cosine distance)

Vinyals et al. Matching Networks for One Shot Learning. NeurIPS 2016

Prototypical Network

• Get mean features for each class in support set• Classify query images to nearest class center

Snell et al. Prototypical Networks for Few-shot Learning. NeurIPS 2017

Prototypical Network

Simplifies Matching Network.

Differences:1. Merge class features by averaging, instead of 1-to-1 matching

2. Squared Euclidean distance

Relation Network

• Relation net g: learnable comparing module

Sung et al. Learning to Compare: Relation Network for Few-Shot Learning. CVPR 2018

Results

Metric-based meta-learning:

Learning how to do matching.

Optimization-Based Meta-Learning

MAML

• Learn an initialization θ that works well for per-task fine-tuning

Finn et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML 2017

MAML

• The computation of the fine-tuning process is differentiable

• θ ← θ − 𝛽 𝛻!𝐿(𝜃 − 𝛼𝛻! 𝐿 θ, 𝑆 , 𝑄)• 𝑆 : support-set• 𝑄 : query-set• 2nd order gradient

Finn et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML 2017

MAML

• Works only for small networks

• Very task dependent, performance can vary a lot depending on tasks

• Can perform worse than simple fine-tuning on larger dataset and networks

Summary with Few-Shot learning

• Few-Shot learning is an important problem

• Meta-Learning makes the form of training/testing consistent

• Challenges• Scalability of the meta-learning algorithms• More practical settings: generalized FSL, any-shot / higher-shot• Discrepancy between base classes and novel classes…

Zero-Shot Learning

Word2vec Embeddings

Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality. 2013

Skip-gram model

Word2vec Embeddings

DeViSE

Frome et al. DeViSE: A Deep Visual-Semantic Embedding Model

DeViSE

• Use the implicit relation between words with word embeddings

• How about using explicit relation in a knowledge graph?

Using Knowledge Graphs• Never Ending Language Learning (NELL) Knowledge Graph

https://rtw.ml.cmu.edu

Zero-Shot Recognition

• Word Embedding + Knowledge Graph• Graph Convolutional Network (GCN)• Training class: 𝑥!, 𝑥" ; Testing class: 𝑥#

300-d 2048-d

Zero-Shot Recognition

300-d 2048-d

Zero-Shot Recognition

2.5K classes

8.8K classes

21K classes

This Class

• Few-shot learning

• Meta-learning for few-shot learning

• Zero-shot learning

top related