deep learning for nlp @noah€¦ · deep learning for nlp @noah: progress, challenges and...
TRANSCRIPT
![Page 1: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/1.jpg)
Deep Learning for NLP @Noah: progress, challenges and opportunities
Zhengdong Lu Noah’s Ark Lab Huawei
![Page 2: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/2.jpg)
Noah’s Ark Lab
• Research Areas • Machine Learning • Data Mining • Speech and Language
Processing • Information and Knowledge
Management • Intelligent Systems • Human Computer Interaction
• Founded in 2012 • Base in Hong Kong & Shenzhen • Connections with research
institutes in mainland China • Researchers and student interns
from Tsinghua, PKU, ICT-CAS, etc
• Collaboration with ..
![Page 3: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/3.jpg)
Natural Language Processing (NLP)
Our goal is to build • System that can understand you, act to your instructions, answer your
questions, and talk back to you. That involves technologies of – Question answering – Machine translation – Natural language dialogue – Cross-modal communication
• It is almost A.I. It is cool It is hard
![Page 4: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/4.jpg)
The Road Map of DL4NLP (my personal view)
Representa.on
Classifica.on Matching
Machine Transla.on Natural Language Dialog
Genera.on
![Page 5: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/5.jpg)
The Road Map of DL4NLP (my personal view)
Representa.on
Classifica.on Matching
Machine Transla.on Natural Language Dialog
Genera.on
Reasoning
Symbolic A.I.
External Memory
Knowledge Representa.on
![Page 6: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/6.jpg)
The DeepLearners @Noah
researchers
student interns (in the descending order of fidelity to their real look)
![Page 7: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/7.jpg)
Several Threads of Work (1)
Deep Matching Models: • Models addressing different aspects of
objects and their matching • Applied to retrieval-based dialog, image
retrieval, and machine translation, with state-of-the-art performance
DeepMatchtopic
DeepMatchtree
DeepMatchCNN M.T. application Image search application
![Page 8: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/8.jpg)
Several Threads of Work (2)
Convolutional architecture for text modeling • “Guided”-encoding of source sentence in traditional
M.T. ( 1 BLEU improvement over the BBN model) • Sentence generation and even an end-to-end neural
M.T. system (similar to LSTM on generation) • Multi-resolution representation of sentence for
short-text classification (state-of-the-art performance)
(ACL-15)
generative CNN (ACL-15)
Self-adaptive sentence model (IJCAI-15)
Encoding source sentence for machine translation
![Page 9: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/9.jpg)
Several Threads of Work (3)
Neural Dialogue Models • World’s first purely neural dialog model • Models that can “understand” what you said and generate an appropriate
response • Better response quality than SMT-based and retrieval-based model • Moving towards multi-turn and knowledge-powered dialogue
Neural Responding Machine
![Page 10: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/10.jpg)
Several Threads of Work (4)
Memory-based Deep Architecture • Flexible intermediate representation of sequence
as content in a memory • Novel deep architecture based on recursive read-
write between multiple memory • “Generalize” Neural Turing Machine for
sequence-to-sequence modeling • Performance comparable to Moses with small
parameter set on zh-en task
![Page 11: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/11.jpg)
Several Threads of Work (5)
Neural Reasoning System and Neural Symbolic A. I. • Teach neural network the rules/grammar
• Purely neural net-based reasoning system that can infer over multiple supporting facts (state-of-the-art performance)
Rule Assimilation & Neural Reasoner
Some cool stuff here
![Page 12: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/12.jpg)
Memory-based Deep Architectures for Machine Translation
![Page 13: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/13.jpg)
{NN + Memory }for NLP
• External memory has been recently added to the loop of end-to-end neural learning, but still very much toyish – Neural Turing Machine – Memory Network – Dynamic Memory Network – Neural Stack – Neural Transducer
• We are the first to apply it to neural machine translation, with remarkable improvement
![Page 14: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/14.jpg)
Memory Read-Write as An Nonlinear Transformation
• Memory (with its content) as a more flexible way to represent sequences (e.g, NL sentences)
• A system to read from R-memory and write to W-memory defines an nonlinear transformation between two representations
• controller + read heads + write head (as in Neural Turing Machine)
![Page 15: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/15.jpg)
Stacking them together …
• We can stack them together to form a deep architecture to do layer-by-layer transformation of sequences
![Page 16: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/16.jpg)
Neural Transformation Machine (NTram)
• We can stack them together to form a deep architecture to do layer-by-layer transformation of sequences
• Takes a source language sentence and gradually transform it into target language sentence
![Page 17: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/17.jpg)
“Types”of W-R transformations
• Different W-R combinations define different types of nonlinear transformation
• The stacking layers in deep RNN as a special case
jointly by design (in specifying the strategy and the implementation details) and the later su-pervised learning (in tuning the parameters). We argue that the memory o↵ers more represen-tational flexibility for encoding sequences with complicated structures, and the transformationintroduced above provides more modeling flexibility for sequence-to-sequence learning.
As the most “conventional” special case,if we use L-addressing for both reading andwriting, we actually get the familiar struc-ture in units found in RNN with stacked lay-ers [16]. Indeed, as illustrated in the figureright to the text, this read-write strategy will invoke a relatively local dependency based onthe original spatial order in R-memory. It is not hard to show that we can recover some deepRNN model in [16] after stacking layers of read-write operations like this.
The C-addressing, however, be it for reading and writing, o↵ers a means for major re-ordering of the cells, while H-addressing can add into it the spatial structure of the lowerlayer memory. Memory with designed inner structure gives more representational flexibilityfor sequences than a fixed-length vector, especially when coupled with appropriate readingstrategy in composing the memory of next layer in a deep architecture as shown later. Onthe other hand, the learned memory-based representation is in general less universal than thefixed-length representation, since they typically needs a particular reading strategy to decodethe information.
In this paper, we consider four types of transformations induced by combinations of theread and write addressing strategies, listed pictorially in Figure 2. Notice that 1) we onlyinclude one combination with C-addressing for writing since it is computationally expensiveto optimize when combined with a C-addressing reading (see Section 3.2 for some analysis) ,and 2) for one particular read-write strategy there are still a fair amount of implementationdetails to be specified, which are omitted due to the space limit. One can easily designdi↵erent read/write strategies, for example a particular way of H-addressing for writing.
Figure 2: Examples of read-write strategies.
3 NTram: Stacking Them Together
We stack the transformations together to form a deep architecture for sequence-to-sequencelearning (named NTram), in a way analogous to the layers in DNNs. The aim of NTram
is to learn the representation of sequence better suited to the task (e.g., machine transla-tion) through layer-by-layer transformations. Just as in DNN, we expect that stacking rela-tively simple transformations can greatly enhances the expressing power and the e�ciency ofNTram, especially in handling translation between languages with vastly di↵erent syntacticalstructures (e.g., Chinese and English).
5
![Page 18: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/18.jpg)
Architectures in the NTram family
• Different architectures, with different combination of nonlinear transformations
• RNNsearch as a special case
T
=
![Page 19: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/19.jpg)
NTram as a neural machine translator
• Performance comparable to Moses with small number of parameters (compared to Google’s brute-force version)
Slightly out-of-date
![Page 20: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/20.jpg)
Neural Network-based Reasoning
![Page 21: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/21.jpg)
Neural Reasoning
• The ambition is to build neural reasoning system, with the rigor of logics and the flexibility of human language
• Currently, only have rather naïve models
End-‐to-‐end Memory Net from FB AI Research
![Page 22: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/22.jpg)
Neural Reasoning (cont’d)
• Task:reasoning with multiple support facts • Answer can be casted into classification problems with pre-determined
classes
![Page 23: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/23.jpg)
Neural Reasoning (cont’d)
Results • Ours is way better than other neural models with end-to-end learning • Actually ours is even better than them with strong supervision, when
there are enough training data
![Page 24: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/24.jpg)
Future Steps
![Page 25: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/25.jpg)
Ongoing and future research
• Rule and reasoning: – Teach the neural network the rules – Neural reasoning with “hybrid” mechanism
• Dialogue: – Multi-turn dialogue, dialogue with knowledge – Neural reinforcement learning for dialogue with a purpose
• Machine translation – M.T. as classification problem (with CNN-based neural system ) – Deep memory-based M.T. system
• Natural language understanding/Semantic parsing – Game-theoretic end-to-end learning for NL understanding
![Page 26: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/26.jpg)
References
• Self-Adaptive Hierarchical Sentence Model. H. Zhao, Z. Lu, P. Poupart, IJCAI 2015
• Syntax-based Deep Matching of Short Texts. M. Wang, Z. Lu, H. Li, Q. Liu, IJCAI 2015 • A Deep Architecture for Matching Short Texts. Z. Lu, H. Li. NIPS-2013 • Convolutional Neural Network Architectures for Matching Natural Language Sentences. B. Hu, Z. Lu,
H. Li, Q. Chen. NIPS-2014 • Multimodal Convolutional Neural Networks for Matching Image and Sentence, L. Ma, Z. Lu, L. Shang,
H. Li. arXiv:1504.06063
• Neural Transformation Machine: A New Architecture for Sequence-to-Sequence Learning. F. Meng, Z. Lu H. Li, Q. Liu. arXiv:1504.06442
• genCNN: A Convolutional Architecture for Word Sequence Prediction. M. Wang, Z. Lu, H. Li, W. Jiang, Q. Liu. ACL-IJCNLP, 2015
• Encoding Source Language with Convolutional Neural Network for Machine Translation. F. Meng, Z. Lu, M. Wang, H. Li, W. Ji, Q. Liu. ACL-IJCNLP, 2015
• Context-Dependent Translation Selection Using Convolutional Neural Network. Z. Tu, B. Hu, Z. Lu, H. Li. ACL-IJCNLP, 2015
• Neural Responding Machine for Short-Text Conversation. L. Shan, Z. Lu, H. Li. ACL-IJCNLP, 2015 • Towards Neural Network-based Reasoning. B. Peng, Z. Lu, H. Li, K-F. Wong. To appear
![Page 27: Deep Learning for NLP @Noah€¦ · Deep Learning for NLP @Noah: progress, challenges and opportunities Zhengdong Lu ... Deep Matching Models: • Models addressing different aspects](https://reader031.vdocuments.us/reader031/viewer/2022040410/5ec9133fb85035682021fa1d/html5/thumbnails/27.jpg)
Thanks, Question?