deep transfer mapping for unsupervised writer...

19
Deep Transfer Mapping 1 NLPR, Institute of Automation, Chinese Academy of Sciences Aug. 8, 2018 for Unsupervised Writer Adaptation 2 University of Chinese Academy of Sciences Hong-Ming Yang 1,2 , Xu-Yao Zhang 1,2 , Fei Yin 1,2 , Jun Sun 4 , Cheng-Lin Liu 1,2,3 3 CAS Center for Excellence in Brain Science and Intelligence Technology 4 Fujitsu Research & Development Center

Upload: others

Post on 14-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Deep Transfer Mapping

1NLPR, Institute of Automation, Chinese Academy of Sciences

Aug. 8, 2018

for Unsupervised Writer Adaptation

2University of Chinese Academy of Sciences

Hong-Ming Yang1,2, Xu-Yao Zhang1,2, Fei Yin1,2, Jun Sun4, Cheng-Lin Liu1,2,3

3CAS Center for Excellence in Brain Science and Intelligence Technology

4Fujitsu Research & Development Center

Page 2: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Outline

Introduction

Style Transfer Mapping

Experiments and Analysis

Conclusions

Motivation and the Proposed Method

2/18

Page 3: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Introduction

3/18

The large variability of distributions across training and different test data

A main challenge for handwriting recognition:

– Different writing styles of different writers

– Different writing tools (e.g. different pens or electronic writing devices)

– Different writing environments (e.g. normal or emergency situations)

– …………

Written characters of two writers.

Page 4: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Introduction

β€’ Domain adaption: a form of transfer learning Adapt the base classifier to each domain in the test dataset

β€’ Recent methods: mainly based on deep learning

– Fine tuning with target domain data

– Learning domain invariant representations (features)

– Project source or target domain data to align the distribution

4

Training Datasets

Training Methods Base Classifier

Writer 1

Writer 2

Writer N

⁞

Adaptation

Page 5: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Style Transfer Mapping

[Xu-Yao. Zhang et al., writer adaptation with style transfer mapping. TPAMI’13] 5/18

Main idea: project the target domain (test) data to balance the data distribution

Style transfer mapping (STM)

𝑝 π‘₯𝑠 β‰  𝑝 π‘₯𝑑

π‘₯ 𝑑 = 𝐴𝑑π‘₯𝑑 + 𝑏𝑑

𝑝 π‘₯𝑠 β‰ˆ 𝑝 π‘₯ 𝑑

Learning classifier on π‘₯𝑠 and apply π‘₯ 𝑑 to the base classifier

Learning of the projection (𝑨𝒕 and 𝒃𝒕)

π‘šπ‘–π‘›π΄βˆˆπ‘…π‘‘Γ—π‘‘,π‘βˆˆπ‘…π‘‘ 𝑓𝑖 𝐴𝑠𝑖 + 𝑏 βˆ’ 𝑑𝑖 22 + 𝛽

𝑛

𝑖=1

𝐴 βˆ’ 𝐼 𝐹2 + 𝛾 𝑏 2

2

Source points π’”π’Š: features in the target domain, i.e., π‘₯𝑑

Target points π’•π’Š: prototype (LVQ) or mean (MQDF) for class 𝑦𝑖 (𝑦𝑖 is the label of sample 𝑠𝑖)

Page 6: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Style Transfer Mapping

[Xu-Yao. Zhang et al., writer adaptation with style transfer mapping. TPAMI’13] 6/18

Solution: a convex quadratic programming problem, has a closed-form solution

𝐴 = π‘„π‘ƒβˆ’1, 𝑏 =1

𝑓 𝑑 βˆ’ 𝐴𝑠

𝑄 = 𝑓𝑖𝑑𝑖𝑠𝑖΀

𝑛

𝑖=1

βˆ’1

𝑓 𝑑 𝑠 Ξ€ + 𝛽𝐼 𝑃 = 𝑓𝑖𝑠𝑖𝑠𝑖

Ξ€

𝑛

𝑖=1

βˆ’1

𝑓 𝑠 𝑠 Ξ€ + 𝛽𝐼 𝑠 = 𝑓𝑖𝑠𝑖

𝑛

𝑖=1

𝑑 = 𝑓𝑖𝑑𝑖

𝑛

𝑖=1

𝑓 = 𝑓𝑖

𝑛

𝑖=1

+ 𝛾

Extend to convolutional neural networks (CNNs)

Main idea: perform adaptation on the deep features

𝑓 π‘₯ : 𝐢𝑁𝑁 π‘“π‘’π‘Žπ‘‘π‘’π‘Ÿπ‘’ 𝑒π‘₯π‘‘π‘Ÿπ‘Žπ‘π‘‘π‘œπ‘Ÿ

π‘₯𝑠 = 𝑓 π‘₯𝑠 , π‘₯𝑑 = 𝑓 π‘₯𝑑

Dealing with unsupervised adaptation

– Using the pseudo labels, predicted by the base classifier

– Iteration method: base classifier pseudo label adaptation better pseudo label adaptation ……

[Xu-Yao. Zhang et al., online and offline handwritten Chinese character recognition: a comprehensive study and new benchmark. PR’17]

Page 7: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Motivations & Methods

Traditional adaptation methods with CNN

7/18

Motivations

– Consider only the fully connected layers

– Perform adaptation only on one layer

– Adaptation on both fully connected layers and convolutional layers

– Perform adaptation on multiple (or all) layers of the base CNN

Adaptation method for fully connected layers

STM based on the deep features of the layer (unsupervised adaptation)

Adaptation methods for convolutional layers

– Use a linear transformation to project the target domain data for aligning the data distributions

– Propose four variations of linear transformation, which are based on different assumptions of the space relation in the feature maps

Page 8: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Motivations & Methods

8/18

– Output of a convolutional layer for an input π‘₯𝑖

π‘œπ‘– = π‘‘π‘π‘—π‘˜ 𝑐=1,𝑗=1,π‘˜=1

𝑐=𝐢,𝑗=𝐻,π‘˜=π‘Š

𝑐, 𝑗, π‘˜: index of the feature maps, rows, and columns in each feature map

Fully associate adaptation (FAA):

– Assumption: all positions of 𝑐, 𝑗, π‘˜ are related to each other

– Method: expand π‘œπ‘– to a long vector 𝑣𝑖 with dimension πΆπ»π‘Š, and learn a transformation 𝐴 ∈ π‘…πΆπ»π‘ŠΓ—πΆπ»π‘Š, 𝑏 ∈ π‘…πΆπ»π‘Š by STM

– 𝑣𝑖′ = 𝐴𝑣𝑖 + 𝑏, 𝑣𝑖

′𝑗 = π΄π‘—π‘˜ 𝑣𝑖 π‘˜ + 𝑏𝑗

πΆπ»π‘Šπ‘˜=1 , each position 𝑗 in 𝑣𝑖

β€² are related

to all positions in 𝑣𝑖

Page 9: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Motivations & Methods

9/18

Partly associate adaptation (PAA):

– Assumption: positions within the same feature map are related to each other, but the feature maps are mutually independent

– Method: expand each feature map to a vector with dimension π»π‘Š, and learn a transformation 𝐴𝑐 ∈ π‘…π»π‘ŠΓ—π»π‘Š, 𝑏𝑐 ∈ π‘…π»π‘Š for each feature map 𝑐 separately by STM

– Transformation 𝐴𝑐, 𝑏𝑐 ensures the relation of positions within a feature map, learn 𝐴𝑐, 𝑏𝑐 separately ensures the independence between the feature maps

Separate STM for each feature map

Page 10: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Motivations & Methods

10/18

Weakly independent adaptation (WIA):

– Assumption: all positions 𝑐, 𝑗, π‘˜ in π‘œπ‘– are independent to each other

– Learn a transformation π‘Ž, 𝑏 ∈ 𝑅 for each position 𝑐, 𝑗, π‘˜ separately by STM

– π‘œπ‘–β€²

𝑐0,𝑗0,π‘˜0= π‘Ž π‘œπ‘– 𝑐0,𝑗0,π‘˜0

+ 𝑏

– π‘šπ‘–π‘›π‘Ž,π‘βˆˆπ‘… 𝑓𝑖 π‘Ž π‘œπ‘– 𝑐0,𝑗0,π‘˜0+ 𝑏 βˆ’ 𝑑𝑖 𝑐0,𝑗0,π‘˜0

2𝑁𝑑𝑖=1 + 𝛽 π‘Ž βˆ’ 1 2 + 𝛾𝑏2

Page 11: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Motivations & Methods

11/18

Strong independent adaptation (SIA):

– Assumption: all positions are independent to each other and the positions within the same feature map share a same linear transformation

– Learn a transformation π‘Ž, 𝑏 ∈ 𝑅 for each feature map separately by STM

π‘šπ‘–π‘›π‘Ž,π‘βˆˆπ‘… 𝑓𝑖 π‘Ž π‘œπ‘– 𝑐0,𝑗,π‘˜ + 𝑏 βˆ’ 𝑑𝑖 𝑐0,𝑗,π‘˜2

π‘Š

π‘˜=1

𝐻

𝑗=1

𝑁𝑑

𝑖=1

+ 𝛽 π‘Ž βˆ’ 1 2 + 𝛾𝑏2

– Similar to the linear projection in the batch normalization (BN) layer

Page 12: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Motivations & Methods

12/18

Analysis and comparison

– Complexity & Flexibility: FAA > PAA > WIA > SIA

– Computation & Memory efficiency: SIA > WIA > PAA > FAA

Adaptation Methods

FAA PAA WIA SIA

Assumption All positions

related Inner feature map related

All positions independent

All positions independent & parameter sharing

Feature Dimension πΆπ»π‘Š π»π‘Š 1 1

Matrix Size πΆπ»π‘Š Γ— πΆπ»π‘Š π»π‘Š Γ— π»π‘Š 1 Γ— 1 1 Γ— 1

Transformation Number

1 𝐢 πΆπ»π‘Š 𝐢

Total Parameters πΆπ»π‘Š πΆπ»π‘Š + 1 Cπ»π‘Š π»π‘Š + 1 2πΆπ»π‘Š 2𝐢

Page 13: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Motivations & Methods

13/18

Algorithm

1. Select a group of layers 𝐿 on which to perform adaptation

– More powerful for flexibly aligning the distributions between the domains

2. From bottom to top layers in 𝐿, perform adaptation on the specific layer with the proposed adaptation methods, but keep the other layers unchanged

3. After adaptation on each layer, insert an linear layer after it and set the weights and bias of the linear layer as the solved 𝐴 and 𝑏

Advantages of DTM

– Captures more comprehensive information and minimize the discrepancy of distributions under different abstract levels

Deep transfer mapping (DTM)

Perform adaptation on multiple layers in a deep manner

Page 14: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Experiments & Analysis

Datasets

14/18

Dataset Dataset Info #Sample (in 3755-

class) #Writers

(domains)

Training Set CASIA OLHWDB 1.0-1.2 2,697,673 1020

Test Set On-ICDAR2013

Competition 224,590 60

Adaptation Set Unlabeled samples from each domain (writer) in test set

Online handwritten Chinese characters. The samples of one writer stored in one single file, can be viewed as a domain.

Page 15: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Experiments & Analysis

15/18

Different adaptation methods for Convolutional layers

Base classifier: 11 layers 97.55% accuracy on test set

Four adaption methods on the same convolutional layer #8

Page 16: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Experiments & Analysis

Adaptation property of different layers in CNN

16/18

– From bottom to top layers, the adaptation performance increases

– Bottom layers extract general features, which are applicable across different domains, thus the promotion are not obvious after adaptation

– Top layers occupy abstract features, which are more domain specific, thus adaptation is helpful for such layers

Adaptation method: WIA

Page 17: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Experiments & Analysis

Deep transfer mapping

17/18

– DTM can further boost the performance of the base classifier

– DTM still has some limitations, the promotion is not obvious when adopt overmuch adaptations

Page 18: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Conclusions

β€’ Unsupervised domain adaptation to alleviate the writing style variation, assuming each writer has an consistent style

β€’ Four variations of adaptation methods for convolutional layers, assuming different space relations in the output of convolutional layers

β€’ Deep transfer mapping (DTM) method to conduct adaptation on multiple (or all) layers of CNN, to better align the data distributions of different styles

β€’ Remaining Problems

– What is the best way of adaptation for deep neural networks

– How to adapt in the case of small sample in adaption/testing (currently 3,755 samples per writer)

– Theoretical modeling of within/between-writer style variation

– Continuous adaptation of classifier

18

Page 19: Deep Transfer Mapping for Unsupervised Writer Adaptationicfhr2018.org/SlidesPosters/Slides-Paper69.pdfDeep Transfer Mapping 1NLPR, Institute of Automation, Chinese Academy of Sciences

Thanks for your attention!