multi-task machine learning - yunsheng...

Post on 11-Aug-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Multi-Task Machine LearningWasi Ahmad

Overview

● What is Multi-Task Learning?● Multi-Task Learning: Motivation● Multi-Task Learning Methods● Recent Works on MTL for Deep Learning

2

What is Multi-Task Learning?

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and

differences across tasks.

- Wikipedia

3

What is Multi-Task Learning?

Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training

signals of related tasks as an inductive bias.

- Rich Caruana, 1997

4

Overview

● What is Multi-Task Learning?● Multi-Task Learning: Motivation● Multi-Task Learning Methods● Recent Works on MTL for Deep Learning

5

Motivation

● Learning multiple tasks jointly with the aim of mutual benefit● Inductive transfer helps to improve a model by introducing inductive bias

○ Common form of inductive bias: L1 regularization○ L1 regularization leads to a preference for sparse solutions

● Improves generalization on other tasks○ Caused by the inductive bias provided by the auxiliary task

6

Web Pages Categorization

● Classify documents into categories● The classification of each category is a task● The tasks of predicting different categories may be latently related

7Courtesy: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012

Collaborative Ordinal Regression

● The preference prediction of each user can be modeled using ordinal regression

● Some users have similar tastes and their predictions may also have similarities

● Simultaneously perform multiple prediction to use such similarity information

8Courtesy: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012

MTL for HIV Therapy Screening

● Hundreds of possible combinations of drugs, some of which use similar biochemical mechanisms

● The sample available for each combination is limited● For a patient, the prediction of using one combination is a task● Use the similarity information by simultaneously infer multiple tasks

9Courtesy: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012

Image courtesy: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012 10

Single Task Learning vs. Multi-Task Learning

Overview

● What is Multi-Task Learning?● Multi-Task Learning: Motivation● Multi-Task Learning Methods● Recent Works on MTL for Deep Learning

11

Learning Methods

12

Source: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012

Key QuestionWhat to Share? How to Share?

13

MTL Methods (based on what to share?)

● Feature-based MTL○ Aims to learn common features among different tasks

● Parameter-based MTL○ Learns model parameters to help learn parameters for other tasks

● Instance-based MTL○ Identify useful data instances in a task for others task

14

MTL Methods (based on what to share?)

● Feature-based MTL○ Aims to learn common features among different tasks

● Parameter-based MTL○ Learns model parameters to help learn parameters for other tasks

● Instance-based MTL○ Identify useful data instances in a task for others task

15

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

16

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

17

Feature Learning Approach

● Why we need to learn common feature representations?○ Original features may not have enough expressive power

● Two sub-categories of feature learning approach○ Feature transformation approach○ Feature selection approach

18

Feature Learning Approach

● Feature transformation approach○ The learned features are a linear or nonlinear transformation of the original feature

representations.

● Feature selection approach○ Selects a subset of the original features as the learned representations○ Eliminates useless features based on different criteria

19

Feature Transformation Approach

● Multi-task feedforward NN

20A Survey on Multi-Task Learning, 2017

Feature Transformation Approach

● Context-sensitive multi-task feedforward NN

21Inductive transfer with context-sensitive neural networks, 2008

Feature Transformation Approach

● Regularization Framework

● First term measures the empirical loss on the training sets of all the tasks● Second term enforces parameter matrix to be row-sparse

○ Equivalent to selecting features after transformation

22A Survey on Multi-Task Learning, 2017

Feature Transformation Approach

● Regularization Framework

23A Survey on Multi-Task Learning, 2017

Feature Selection Approach

● Regularization Framework

● The regularizer on W is to enforce W to be row-sparse, which helps to select important features

24A Survey on Multi-Task Learning, 2017

Feature Transformation vs. Selection

● Feature transformation fits data better than selection approach● Feature transformation can generalize well

○ If there is no overfitting

● Feature selection has better interpretability● Feature transformation is preferred -

○ If an application needs better performance

● Feature selection is preferred - ○ If the application needs a decision support

25A Survey on Multi-Task Learning, 2017

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

26

Two MTL Methods for Deep Learning

● Hard Parameter Sharing○ Generally applied by sharing the hidden layers between all tasks.○ Keeps several task-specific output layers.

● Soft Parameter Sharing○ Each task has its own model with its own parameters.○ The distance between the parameters of the model is regularized in

order to encourage the parameters to be similar.

27

Two MTL Methods for Deep Learning

28Hard parameter sharing

Soft parameter sharing

Cross-stitch Networks

29Courtesy: http://ruder.io/multi-task/

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

30

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

31

Low-Rank Approach

● Assumes the model parameters of different tasks share a low-rank subspace*.

● The objective function can be formulated as:

32* A framework for learning predictive structures from multiple tasks and unlabeled data, 2005

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

33

Task Clustering Approach

● First, cluster the tasks into groups○ Learn a task transfer matrix○ Minimizing pairwise within-class distances○ Maximizing pairwise between-class distances

● Second, learn classifier on the training data of tasks in a cluster○ A weighted nearest neighbor classifier is proposed*

34* Discovering Structure in Multiple Learning Tasks: The TC Algorithm, ICML 1996

Task Clustering Approach

● Learn task clusters under regularization framework○ Considering three orthogonal aspects

● Aspect 1: A global penalty to measure on average on average how large the parameters are:

35* Clustured Multi-task Learning: A Convex Formulation, NIPS 2008

Task Clustering Approach

● Learn task clusters under regularization framework○ Considering three orthogonal aspects

● Aspect 2: A measure of between-cluster variance to quantify the distance among different clusters.

36* Clustured Multi-task Learning: A Convex Formulation, NIPS 2008

Where,

Task Clustering Approach

● Learn task clusters under regularization framework○ Considering three orthogonal aspects

● Aspect 3: A measure of within-cluster variance to quantify the compactness of task clusters.

● Final regularizer:

37* Clustured Multi-task Learning: A Convex Formulation, NIPS 2008

Task Clustering Approach

● Cluster tasks by identifying representative tasks○ A subset of the given tasks

38* Flexible clustered multi-task learning by learning representative tasks, IEEE transactions on Pattern Analysis and Machine Intelligence, 2016

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

39

Task Relation Learning Approach

● Two type of studies○ Task relations are assumed to be known as a priori information○ Learn task relations automatically from data

● Type 1: Task relations are given○ Similar task parameters are expected to be close○ Utilize task similarities to design regularizers

40A Survey on Multi-Task Learning, 2017

Task Relation Learning Approach

● Two type of studies○ Task relations are assumed to be known as a priori information○ Learn task relations automatically from data

● Type 2: Learn task relations from data○ Global learning model

■ Multi-task Gaussian process (defined as prior on functional values for training data)■ Keep the task covariance matrix positive definite

○ Local learning model■ Ex., kNN classifier (learning function as a weighted voting of neighbors)

41A Survey on Multi-Task Learning, 2017

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

42

Dirty Approach

● Assumption: parameter matrix, W can be decomposed into two component matrices U and V

● Objective function can be defined as:

● g(U) and h(V) can be defined as*:

43* A dirty model for multi-task learning, NIPS 2010

Dirty Approach

● g(U) and h(V) defined in different ways in the following works○ Learning incoherent sparse and low-rank patterns from multiple tasks, SIGKDD, 2010○ Integrating low-rank and group-sparse structures for robust multi-task learning, SIGKDD

2011○ Robust multi-task feature learning, SIGKDD 2012○ Convex multi-task learning with flexible task clusters, ICML 2012

44A Survey on Multi-Task Learning, 2017

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

45

Multi-Level Approach

● An extension of the dirty approach● Assumption: parameter matrix, W can be decomposed into h component

matrices

● Multi-level approach has more expressive power than the dirty approach● Represent task clusters as a tree - learn relations from structure

46A Survey on Multi-Task Learning, 2017

Overview

● What is Multi-Task Learning?● Multi-Task Learning: Motivation● Multi-Task Learning Methods● Recent Works on MTL for Deep Learning

47

Deep Relationship Networks

48Courtesy: http://ruder.io/multi-task/

Fully Adaptive Feature Sharing

49Courtesy: http://ruder.io/multi-task/

Cross-stitch Networks

50Courtesy: http://ruder.io/multi-task/

A Joint Many Task Model

51

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks, EMNLP 2017

Sluice Networks

52Courtesy: http://ruder.io/multi-task/

Multi-Task Sequence to Sequence Learning

53Multi-Task Sequence to Sequence Learning, ICLR 2016

Multi-Task Learning for IR tasks

54

Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval, NAACL 2015

Multi-Task Domain Adaptation

55Multi-Task Domain Adaptation for Sequence Tagging, Rep4NLP, 2017

Adversarial Multi-Task Learning

56Adversarial Multi-task Learning for Text Classification, ACL 2017

One Model to Learn Them All

57

One Model to Learn Them All

58

Reference

● A Survey on Multi-Task Learning● An Overview of Multi-Task Learning in Deep Neural Networks

59

Thank You

60

top related