deep learning for character-based information extractionqyj/papersa08/2014_ecir_deep.pdf ·...
TRANSCRIPT
![Page 1: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based](https://reader036.vdocuments.us/reader036/viewer/2022081607/5ed45790d83ad958f471bcee/html5/thumbnails/1.jpg)
Deep Learning for Character-based Information Extraction
Yanjun Qi (University of Virginia, USA) Sujatha Das G (Penn. State University, USA)
Ronan Collobert (IDIAP) Jason Weston (Google Research NY)
Deep Learning for Character-based Information Extraction , ECIR 2014 1
![Page 2: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based](https://reader036.vdocuments.us/reader036/viewer/2022081607/5ed45790d83ad958f471bcee/html5/thumbnails/2.jpg)
• Target task: automa.cally extract informa.on about pre-‐specified types of events from a linear sequence of unit tokens – character-based, – no word boundaries, – no capitalization cues
• For examples,
– Chinese language NLP: Chinese-character based – Protein sequence tagging: Amino-acid based
Deep Learning for Character-based Information Extraction , ECIR 2014 2
Task: Target Applica0ons
![Page 3: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based](https://reader036.vdocuments.us/reader036/viewer/2022081607/5ed45790d83ad958f471bcee/html5/thumbnails/3.jpg)
§ Natural Language Processing on Chinese sequences
Deep Learning for Character-based Information Extraction , ECIR 2014 3
Task: Case Study (1):
Word Segmentation (WS): Basic task, separate contiguous characters into words Part of Speech (POS) tagging: Determine part of speech of each word in the text Name Entity Recognition (NER): determine person, organization and location names in text
![Page 4: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based](https://reader036.vdocuments.us/reader036/viewer/2022081607/5ed45790d83ad958f471bcee/html5/thumbnails/4.jpg)
Protein Sequence è Structural Segments
§ Input X:Primary sequence
MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE
helices strands loops
§ Output Y: e.g. Secondary structure (SS)
Task: Case Study (2):
Deep Learning for Character-based Information Extraction , ECIR 2014 4
![Page 5: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based](https://reader036.vdocuments.us/reader036/viewer/2022081607/5ed45790d83ad958f471bcee/html5/thumbnails/5.jpg)
XNVLALDTSQRIRIGLRKGEDLFEISYTGEKKHAEILPV ...
LBBBBBBLHHHBBBBBBBHHBBBBBBBBHLHHHHHHHHH ... Labels
X
Y
Task: Context-‐Window based Per-‐Character Tagging
Deep Learning for Character-based Information Extraction , ECIR 2014 5
Character Inputs
Context window around each character of interest
![Page 6: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based](https://reader036.vdocuments.us/reader036/viewer/2022081607/5ed45790d83ad958f471bcee/html5/thumbnails/6.jpg)
Feature Engineering ü Most time-consuming in development cycle
ü Often hand-craft and task dependent in practice
Feature Learning ü Easily adaptable to new similar tasks
ü Layerwise feature representation learning
6
Method: Deep Learning to Rescue
Previous approaches: Use task-specific/hand-crafted features with a shallow learning structure
Now: Task-independent “deep” structure using simple features input to deep neural network (NN) architecture
![Page 7: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based](https://reader036.vdocuments.us/reader036/viewer/2022081607/5ed45790d83ad958f471bcee/html5/thumbnails/7.jpg)
Learn Feature Representa.on for each character
Learn Representa.on for each segment around current posi.on
Learning Func.on to map from representa.on to output class label
7
Criteria to train: Negative Log Likelihood Using Stochastic Gradient descent (SGD)
![Page 8: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based](https://reader036.vdocuments.us/reader036/viewer/2022081607/5ed45790d83ad958f471bcee/html5/thumbnails/8.jpg)
ü The first layer in our deep
structure; ü Idea: Characters are embed
in a vector space ü Embedding are trained
8
Method: Character to Vector Representa0ons Learning
How to train this embedding layer: ü (1). Supervised: Trained as a normal NN layer, using SGD, based on target
task’s training pairs ü (2). Initialized with unsupervised “language model” (lm) pre-training: to
captures similarities among characters based on their contexts in the unsupervised sequences, e.g. Chinese Wiki, swissprot protein sequence DB
![Page 9: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based](https://reader036.vdocuments.us/reader036/viewer/2022081607/5ed45790d83ad958f471bcee/html5/thumbnails/9.jpg)
• A Viterbi algorithm to capture spatial dependencies between y_i
• i.e. optimize the whole sentence-level log-likelihood • i.e. encourage valid paths of output tags • = Output tag transition scores + deep network scores
9
Method: Modeling spa0al dependency among characters
y2 – y2 – y1– y1– y1– y2– y2 – y3 – y3 – y3 –y2 – y2
e.g.
![Page 10: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based](https://reader036.vdocuments.us/reader036/viewer/2022081607/5ed45790d83ad958f471bcee/html5/thumbnails/10.jpg)
10 10
Experiments : Data Sets
![Page 11: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based](https://reader036.vdocuments.us/reader036/viewer/2022081607/5ed45790d83ad958f471bcee/html5/thumbnails/11.jpg)
c1: character unigrams, c2: character bigrams, lm: embedding obtained with deep language model, vit: Viterbi algorithm 11
Experiments : Performance Comparison
![Page 12: Deep Learning for Character-based Information Extractionqyj/papersA08/2014_ecir_deep.pdf · 2014-05-11 · Natural Language Processing on Chinese sequences Deep Learning for Character-based](https://reader036.vdocuments.us/reader036/viewer/2022081607/5ed45790d83ad958f471bcee/html5/thumbnails/12.jpg)
Why is our method preferable?
§ No particular task-specific feature engineering. § Robust and flexible
§ Easily adaptable to other character-based tagging tasks, e.g. Japanese NER
12
References
[0] http://www.cs.cmu.edu/~qyj/zhSenna/ [1] R. Collobert et al, Natural language processing (almost) from scratch, JMLR 12 [2] Qi et al, A unified multitask architecture for predicting local protein properties. PLoS ONE 12 [3] Levow, G.A.: The third international chinese language processing bakeoff, SIGHAN 2006 [4] Y. LeCun et al. 1998. Efficient BackProp.
Summary: