multi-task machine learning - yunsheng...
Post on 11-Aug-2020
4 Views
Preview:
TRANSCRIPT
Multi-Task Machine LearningWasi Ahmad
Overview
● What is Multi-Task Learning?● Multi-Task Learning: Motivation● Multi-Task Learning Methods● Recent Works on MTL for Deep Learning
2
What is Multi-Task Learning?
Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and
differences across tasks.
- Wikipedia
3
What is Multi-Task Learning?
Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training
signals of related tasks as an inductive bias.
- Rich Caruana, 1997
4
Overview
● What is Multi-Task Learning?● Multi-Task Learning: Motivation● Multi-Task Learning Methods● Recent Works on MTL for Deep Learning
5
Motivation
● Learning multiple tasks jointly with the aim of mutual benefit● Inductive transfer helps to improve a model by introducing inductive bias
○ Common form of inductive bias: L1 regularization○ L1 regularization leads to a preference for sparse solutions
● Improves generalization on other tasks○ Caused by the inductive bias provided by the auxiliary task
6
Web Pages Categorization
● Classify documents into categories● The classification of each category is a task● The tasks of predicting different categories may be latently related
7Courtesy: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012
Collaborative Ordinal Regression
● The preference prediction of each user can be modeled using ordinal regression
● Some users have similar tastes and their predictions may also have similarities
● Simultaneously perform multiple prediction to use such similarity information
8Courtesy: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012
MTL for HIV Therapy Screening
● Hundreds of possible combinations of drugs, some of which use similar biochemical mechanisms
● The sample available for each combination is limited● For a patient, the prediction of using one combination is a task● Use the similarity information by simultaneously infer multiple tasks
9Courtesy: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012
Image courtesy: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012 10
Single Task Learning vs. Multi-Task Learning
Overview
● What is Multi-Task Learning?● Multi-Task Learning: Motivation● Multi-Task Learning Methods● Recent Works on MTL for Deep Learning
11
Learning Methods
12
Source: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012
Key QuestionWhat to Share? How to Share?
13
MTL Methods (based on what to share?)
● Feature-based MTL○ Aims to learn common features among different tasks
● Parameter-based MTL○ Learns model parameters to help learn parameters for other tasks
● Instance-based MTL○ Identify useful data instances in a task for others task
14
MTL Methods (based on what to share?)
● Feature-based MTL○ Aims to learn common features among different tasks
● Parameter-based MTL○ Learns model parameters to help learn parameters for other tasks
● Instance-based MTL○ Identify useful data instances in a task for others task
15
MTL Methods (based on how to share?)
● Feature-based MTL○ Feature learning approach○ Deep learning approach
● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach
16
MTL Methods (based on how to share?)
● Feature-based MTL○ Feature learning approach○ Deep learning approach
● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach
17
Feature Learning Approach
● Why we need to learn common feature representations?○ Original features may not have enough expressive power
● Two sub-categories of feature learning approach○ Feature transformation approach○ Feature selection approach
18
Feature Learning Approach
● Feature transformation approach○ The learned features are a linear or nonlinear transformation of the original feature
representations.
● Feature selection approach○ Selects a subset of the original features as the learned representations○ Eliminates useless features based on different criteria
19
Feature Transformation Approach
● Multi-task feedforward NN
20A Survey on Multi-Task Learning, 2017
Feature Transformation Approach
● Context-sensitive multi-task feedforward NN
21Inductive transfer with context-sensitive neural networks, 2008
Feature Transformation Approach
● Regularization Framework
● First term measures the empirical loss on the training sets of all the tasks● Second term enforces parameter matrix to be row-sparse
○ Equivalent to selecting features after transformation
22A Survey on Multi-Task Learning, 2017
Feature Transformation Approach
● Regularization Framework
23A Survey on Multi-Task Learning, 2017
Feature Selection Approach
● Regularization Framework
● The regularizer on W is to enforce W to be row-sparse, which helps to select important features
24A Survey on Multi-Task Learning, 2017
Feature Transformation vs. Selection
● Feature transformation fits data better than selection approach● Feature transformation can generalize well
○ If there is no overfitting
● Feature selection has better interpretability● Feature transformation is preferred -
○ If an application needs better performance
● Feature selection is preferred - ○ If the application needs a decision support
25A Survey on Multi-Task Learning, 2017
MTL Methods (based on how to share?)
● Feature-based MTL○ Feature learning approach○ Deep learning approach
● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach
26
Two MTL Methods for Deep Learning
● Hard Parameter Sharing○ Generally applied by sharing the hidden layers between all tasks.○ Keeps several task-specific output layers.
● Soft Parameter Sharing○ Each task has its own model with its own parameters.○ The distance between the parameters of the model is regularized in
order to encourage the parameters to be similar.
27
Two MTL Methods for Deep Learning
28Hard parameter sharing
Soft parameter sharing
Cross-stitch Networks
29Courtesy: http://ruder.io/multi-task/
MTL Methods (based on how to share?)
● Feature-based MTL○ Feature learning approach○ Deep learning approach
● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach
30
MTL Methods (based on how to share?)
● Feature-based MTL○ Feature learning approach○ Deep learning approach
● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach
31
Low-Rank Approach
● Assumes the model parameters of different tasks share a low-rank subspace*.
● The objective function can be formulated as:
32* A framework for learning predictive structures from multiple tasks and unlabeled data, 2005
MTL Methods (based on how to share?)
● Feature-based MTL○ Feature learning approach○ Deep learning approach
● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach
33
Task Clustering Approach
● First, cluster the tasks into groups○ Learn a task transfer matrix○ Minimizing pairwise within-class distances○ Maximizing pairwise between-class distances
● Second, learn classifier on the training data of tasks in a cluster○ A weighted nearest neighbor classifier is proposed*
34* Discovering Structure in Multiple Learning Tasks: The TC Algorithm, ICML 1996
Task Clustering Approach
● Learn task clusters under regularization framework○ Considering three orthogonal aspects
● Aspect 1: A global penalty to measure on average on average how large the parameters are:
35* Clustured Multi-task Learning: A Convex Formulation, NIPS 2008
Task Clustering Approach
● Learn task clusters under regularization framework○ Considering three orthogonal aspects
● Aspect 2: A measure of between-cluster variance to quantify the distance among different clusters.
36* Clustured Multi-task Learning: A Convex Formulation, NIPS 2008
Where,
Task Clustering Approach
● Learn task clusters under regularization framework○ Considering three orthogonal aspects
● Aspect 3: A measure of within-cluster variance to quantify the compactness of task clusters.
● Final regularizer:
37* Clustured Multi-task Learning: A Convex Formulation, NIPS 2008
Task Clustering Approach
● Cluster tasks by identifying representative tasks○ A subset of the given tasks
38* Flexible clustered multi-task learning by learning representative tasks, IEEE transactions on Pattern Analysis and Machine Intelligence, 2016
MTL Methods (based on how to share?)
● Feature-based MTL○ Feature learning approach○ Deep learning approach
● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach
39
Task Relation Learning Approach
● Two type of studies○ Task relations are assumed to be known as a priori information○ Learn task relations automatically from data
● Type 1: Task relations are given○ Similar task parameters are expected to be close○ Utilize task similarities to design regularizers
40A Survey on Multi-Task Learning, 2017
Task Relation Learning Approach
● Two type of studies○ Task relations are assumed to be known as a priori information○ Learn task relations automatically from data
● Type 2: Learn task relations from data○ Global learning model
■ Multi-task Gaussian process (defined as prior on functional values for training data)■ Keep the task covariance matrix positive definite
○ Local learning model■ Ex., kNN classifier (learning function as a weighted voting of neighbors)
41A Survey on Multi-Task Learning, 2017
MTL Methods (based on how to share?)
● Feature-based MTL○ Feature learning approach○ Deep learning approach
● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach
42
Dirty Approach
● Assumption: parameter matrix, W can be decomposed into two component matrices U and V
● Objective function can be defined as:
● g(U) and h(V) can be defined as*:
43* A dirty model for multi-task learning, NIPS 2010
Dirty Approach
● g(U) and h(V) defined in different ways in the following works○ Learning incoherent sparse and low-rank patterns from multiple tasks, SIGKDD, 2010○ Integrating low-rank and group-sparse structures for robust multi-task learning, SIGKDD
2011○ Robust multi-task feature learning, SIGKDD 2012○ Convex multi-task learning with flexible task clusters, ICML 2012
44A Survey on Multi-Task Learning, 2017
MTL Methods (based on how to share?)
● Feature-based MTL○ Feature learning approach○ Deep learning approach
● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach
45
Multi-Level Approach
● An extension of the dirty approach● Assumption: parameter matrix, W can be decomposed into h component
matrices
● Multi-level approach has more expressive power than the dirty approach● Represent task clusters as a tree - learn relations from structure
46A Survey on Multi-Task Learning, 2017
Overview
● What is Multi-Task Learning?● Multi-Task Learning: Motivation● Multi-Task Learning Methods● Recent Works on MTL for Deep Learning
47
Deep Relationship Networks
48Courtesy: http://ruder.io/multi-task/
Fully Adaptive Feature Sharing
49Courtesy: http://ruder.io/multi-task/
Cross-stitch Networks
50Courtesy: http://ruder.io/multi-task/
A Joint Many Task Model
51
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks, EMNLP 2017
Sluice Networks
52Courtesy: http://ruder.io/multi-task/
Multi-Task Sequence to Sequence Learning
53Multi-Task Sequence to Sequence Learning, ICLR 2016
Multi-Task Learning for IR tasks
54
Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval, NAACL 2015
Multi-Task Domain Adaptation
55Multi-Task Domain Adaptation for Sequence Tagging, Rep4NLP, 2017
Adversarial Multi-Task Learning
56Adversarial Multi-task Learning for Text Classification, ACL 2017
One Model to Learn Them All
57
One Model to Learn Them All
58
Reference
● A Survey on Multi-Task Learning● An Overview of Multi-Task Learning in Deep Neural Networks
59
Thank You
60
top related