multi-task machine learning - yunsheng...

Multi-Task Machine LearningWasi Ahmad

Overview

● What is Multi-Task Learning?● Multi-Task Learning: Motivation● Multi-Task Learning Methods● Recent Works on MTL for Deep Learning

What is Multi-Task Learning?

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and

differences across tasks.

- Wikipedia

What is Multi-Task Learning?

Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training

signals of related tasks as an inductive bias.

- Rich Caruana, 1997

Overview

Motivation

● Learning multiple tasks jointly with the aim of mutual benefit● Inductive transfer helps to improve a model by introducing inductive bias

○ Common form of inductive bias: L1 regularization○ L1 regularization leads to a preference for sparse solutions

● Improves generalization on other tasks○ Caused by the inductive bias provided by the auxiliary task

Web Pages Categorization

● Classify documents into categories● The classification of each category is a task● The tasks of predicting different categories may be latently related

7Courtesy: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012

Collaborative Ordinal Regression

● The preference prediction of each user can be modeled using ordinal regression

● Some users have similar tastes and their predictions may also have similarities

● Simultaneously perform multiple prediction to use such similarity information

MTL for HIV Therapy Screening

● Hundreds of possible combinations of drugs, some of which use similar biochemical mechanisms

● The sample available for each combination is limited● For a patient, the prediction of using one combination is a task● Use the similarity information by simultaneously infer multiple tasks

Image courtesy: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012 10

Single Task Learning vs. Multi-Task Learning

Overview

Learning Methods

Source: Multi-Task Learning: Theory, Algorithms, and Applications SDM 2012

Key QuestionWhat to Share? How to Share?

MTL Methods (based on what to share?)

● Feature-based MTL○ Aims to learn common features among different tasks

● Parameter-based MTL○ Learns model parameters to help learn parameters for other tasks

● Instance-based MTL○ Identify useful data instances in a task for others task

MTL Methods (based on what to share?)

● Feature-based MTL○ Aims to learn common features among different tasks

● Parameter-based MTL○ Learns model parameters to help learn parameters for other tasks

● Instance-based MTL○ Identify useful data instances in a task for others task

MTL Methods (based on how to share?)

● Feature-based MTL○ Feature learning approach○ Deep learning approach

● Parameter-based MTL○ Low-rank approach○ Task clustering approach○ Task relation learning approach○ Dirty approach○ Multi-level approach

Feature Learning Approach

● Why we need to learn common feature representations?○ Original features may not have enough expressive power

● Two sub-categories of feature learning approach○ Feature transformation approach○ Feature selection approach

Feature Learning Approach

● Feature transformation approach○ The learned features are a linear or nonlinear transformation of the original feature

representations.

● Feature selection approach○ Selects a subset of the original features as the learned representations○ Eliminates useless features based on different criteria

Feature Transformation Approach

● Multi-task feedforward NN

20A Survey on Multi-Task Learning, 2017

● Context-sensitive multi-task feedforward NN

21Inductive transfer with context-sensitive neural networks, 2008

● Regularization Framework

● First term measures the empirical loss on the training sets of all the tasks● Second term enforces parameter matrix to be row-sparse

○ Equivalent to selecting features after transformation

Feature Selection Approach

● The regularizer on W is to enforce W to be row-sparse, which helps to select important features

Feature Transformation vs. Selection

● Feature transformation fits data better than selection approach● Feature transformation can generalize well

○ If there is no overfitting

● Feature selection has better interpretability● Feature transformation is preferred -

○ If an application needs better performance

● Feature selection is preferred - ○ If the application needs a decision support

Two MTL Methods for Deep Learning

● Hard Parameter Sharing○ Generally applied by sharing the hidden layers between all tasks.○ Keeps several task-specific output layers.

● Soft Parameter Sharing○ Each task has its own model with its own parameters.○ The distance between the parameters of the model is regularized in

order to encourage the parameters to be similar.

Two MTL Methods for Deep Learning

28Hard parameter sharing

Soft parameter sharing

Cross-stitch Networks

29Courtesy: http://ruder.io/multi-task/

Low-Rank Approach

● Assumes the model parameters of different tasks share a low-rank subspace*.

● The objective function can be formulated as:

32* A framework for learning predictive structures from multiple tasks and unlabeled data, 2005

Task Clustering Approach

● First, cluster the tasks into groups○ Learn a task transfer matrix○ Minimizing pairwise within-class distances○ Maximizing pairwise between-class distances

● Second, learn classifier on the training data of tasks in a cluster○ A weighted nearest neighbor classifier is proposed*

34* Discovering Structure in Multiple Learning Tasks: The TC Algorithm, ICML 1996

● Learn task clusters under regularization framework○ Considering three orthogonal aspects

● Aspect 1: A global penalty to measure on average on average how large the parameters are:

35* Clustured Multi-task Learning: A Convex Formulation, NIPS 2008

● Aspect 2: A measure of between-cluster variance to quantify the distance among different clusters.

Where,

● Aspect 3: A measure of within-cluster variance to quantify the compactness of task clusters.

● Final regularizer:

● Cluster tasks by identifying representative tasks○ A subset of the given tasks

38* Flexible clustered multi-task learning by learning representative tasks, IEEE transactions on Pattern Analysis and Machine Intelligence, 2016

Task Relation Learning Approach

● Two type of studies○ Task relations are assumed to be known as a priori information○ Learn task relations automatically from data

● Type 1: Task relations are given○ Similar task parameters are expected to be close○ Utilize task similarities to design regularizers

Task Relation Learning Approach

● Two type of studies○ Task relations are assumed to be known as a priori information○ Learn task relations automatically from data

● Type 2: Learn task relations from data○ Global learning model

■ Multi-task Gaussian process (defined as prior on functional values for training data)■ Keep the task covariance matrix positive definite

○ Local learning model■ Ex., kNN classifier (learning function as a weighted voting of neighbors)

Dirty Approach

● Assumption: parameter matrix, W can be decomposed into two component matrices U and V

● Objective function can be defined as:

● g(U) and h(V) can be defined as*:

43* A dirty model for multi-task learning, NIPS 2010

Dirty Approach

● g(U) and h(V) defined in different ways in the following works○ Learning incoherent sparse and low-rank patterns from multiple tasks, SIGKDD, 2010○ Integrating low-rank and group-sparse structures for robust multi-task learning, SIGKDD

2011○ Robust multi-task feature learning, SIGKDD 2012○ Convex multi-task learning with flexible task clusters, ICML 2012

Multi-Level Approach

● An extension of the dirty approach● Assumption: parameter matrix, W can be decomposed into h component

matrices

● Multi-level approach has more expressive power than the dirty approach● Represent task clusters as a tree - learn relations from structure

Overview

Deep Relationship Networks

Fully Adaptive Feature Sharing

Cross-stitch Networks

A Joint Many Task Model

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks, EMNLP 2017

Sluice Networks

Multi-Task Sequence to Sequence Learning

53Multi-Task Sequence to Sequence Learning, ICLR 2016

Multi-Task Learning for IR tasks

Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval, NAACL 2015

Multi-Task Domain Adaptation

55Multi-Task Domain Adaptation for Sequence Tagging, Rep4NLP, 2017

Adversarial Multi-Task Learning

56Adversarial Multi-task Learning for Text Classification, ACL 2017

One Model to Learn Them All

Reference

● A Survey on Multi-Task Learning● An Overview of Multi-Task Learning in Deep Neural Networks

Thank You

multi-task machine learning - yunsheng...

Documents

yunsheng liu

deep learning for multi-task medical image segmentation in...

nonparametric risk and stability analysis for multi-task...

deep multi-task multi-channel learning for joint classi...

transfer and multi-task learning

mti-net: multi-scale task interaction networks for multi

on multi-sensor task allocation

multi-task and multi-level detection neural network based

zero-shot task generalization with multi-task deep ... ·...

multi-task homework

multi-task multi-modal models for collective...

exploiting task-feature co-clusters in multi-task...

task allocation for cooperative multi-agent...

multi-task correlation particle filter for robust object...

interactive multi-task relationship learning · interactive...

woodscape: a multi-task, multi-camera fisheye dataset for...

transfer and multi-task...

multi-task multi-sensor fusion for 3d object...

timme: twitter ideology-detection via multi-task...

towards a multi-modal, multi-task learning based pre