knowledge-driven semantic understandinghainanumeeting.net/yssnlp2019/file/1.pdf · • logic rules...
TRANSCRIPT
Distributional semantics• Target word = “stars”
Distributional semantics• Collect the contextual words for “stars”
Distributional semantics• Distributional word representation • Distributional hypothesis: words that are used and occur in the same
contexts tend to purport similar meanings
• Implementations with distributed word embedding models • Word2vec • GloVec
Distributional semantics
A huge success of deep contextualized models• EMLo
• Char-level word encoding + 2 BiLSTM layers • Pretrained language models + can be fine-tuned according to specific tasks
• Transformer • Pairwise interaction, called self-attention (with positional embedding) • Multi-head mechanism • Deep architecture (using six layers)
• BERT • Built on top of Transformer • Bidirectional context • Masked LM + Next sentence prediction • Very deep
• BERT-Base: 12-layer, 768-hidden, 12-head • BERT-Large: 24-layer, 1024-hidden, 16-head
Basic motivations
• Deeper architecture and larger context scope • Pretrained language models that can be fine-tuned
Connecting the dots• Deep architecture
• NLPers always want to use very deep neural networks as CVers
• Contextualized models • Distributional semantics
• Word-by-word attention • Reasoning about entailment with neural attention
• Self-match • R-Net: Machine Reading Comprehension with Self-Matching
The context scope can be even larger • Document-level information • Document Context Neural Machine Translation with Memory Network
The context scope can be even larger • Document-level information • Improving the Transformer Translation Model with Document-Level
Context
Structural knowledge • Triplets • Form: (h, r, t) • Examples:
• <YAO Ming, birthPlace, Shanghai> • <YAO Ming, gender, male>
• Embedding methods • E.g., TransE
Structural knowledge • Knowledge base (or KB like) information • Knowledgeable Reader: Enhancing Cloze-Style Reading Comprehension
with External Commonsense Knowledge
Structural knowledge • Linguistic information • Type-Aware Question Answering over Knowledge Base with Attention-
Based Tree-Structured Neural Networks
Structural knowledge • Logic rules • Harnessing Deep Neural Networks with Logic Rules
First-order logic rules on two tasks: sentiment classification and NER
Structural knowledge • Demographic attributes • Mining Product Adopter Information from Online Reviews for Improving
Product Recommendation
Structural knowledge • Demographic attributes • Mining Product Adopter Information from Online Reviews for Improving
Product Recommendation
• Based on the analysis of 13.9 million JD reviews, about 10.8% reviews contain at least an adopter mention • We can even infer the information about the buyer
• Marital status • Age range
Structural knowledge • Demographic attributes • Adversarial Removal of Demographic Attributes from Text Data
Knowledge utilization• Point #1: • Enriching information for the NLP tasks • The widely used procedure
• Knowledge retrieval ! Knowledge contextualization ! Knowledge utilization
Knowledge-powered conversational agents
Knowledge utilization• Point #1: • Enriching information for the NLP tasks • The widely used procedure
• Knowledge retrieval ! Knowledge contextualization ! Knowledge utilization
Commonsense Knowledge Aware Conversation Generation with Graph Attention
Knowledge utilization• Point #1: • Enriching information for the NLP tasks • Challenging problems:
• How to identify and find suitable knowledge resources to use • How to learn knowledge representations that are useful for some specific tasks
Knowledge utilization• Point #2: • Making models more explainable
• e-SNLI: Natural Language Inference with Natural Language Explanations
Knowledge utilization• Point #2: • Making models more explainable
• Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks
Knowledge utilization• Point #2: • Making models more explainable
• Challenging problems • How to define explainability • How to balance explainability and effectiveness
Knowledge utilization• Point #3: • Knowledge can guide the model design
• Sentence Encoding with Tree-constrained Relation Networks
Knowledge utilization• Point #3: • Knowledge can guide the model design
• Taxonomy-Aware Multi-Hop Reasoning Networks for Sequential Recommendation
Knowledge utilization• Point #3: • Knowledge can guide the model design
• Challenging problems: • Given some kind of knowledge, what is the suitable model to integrate it? • Given some existing models, how to adapt it to fully utilize knowledge information?
Conclusion