neural name matching - amazon s3 · step two: modeling transliterations ジ オ ョ ホ ン j o h n...
TRANSCRIPT
![Page 1: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/1.jpg)
Neural Name MatchingAn Overview
Philip Blair, Senior Research Engineer
![Page 2: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/2.jpg)
Agenda
● Why Name Matching is Hard
● How to Approach Name Matching?
● Non-Neural Approach
● Deep Learning Approach
● Bonus: Beyond Transliterations
● Q&A
![Page 3: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/3.jpg)
Name Matching is a Hard Problem
● Script
● Language
● Order
![Page 4: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/4.jpg)
How to Start?
Idea: What if we had a machine which could transliterate names?
We can then "ask" it how good of a transliteration we have.
![Page 5: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/5.jpg)
HMM-Based Name Matching
![Page 6: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/6.jpg)
Step One: Modeling Sequences of Characters
J o h n
e a
r t
![Page 7: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/7.jpg)
Step One: Modeling Sequences of Characters
J o h n
![Page 8: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/8.jpg)
Step Two: Modeling Transliterations
ジ
オ
ョ
ホ
ン
J o h n
![Page 9: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/9.jpg)
Step Two: Modeling Transliterations
ジ
オ
ョ
ホ
ン
J o h nGiven a sequence of characters in the source language...
...what is the probability of the corresponding sequence of characters in the target language?
This probability is our score!
![Page 10: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/10.jpg)
Issues with HMM-Based Name Matching
ジ
オ
ョ
J oEnglish Character(s) Japanese Equivalent
o オ
yo ヨ
ji ジ
jo ジョ
...but this represents just "o", not "o following a 'j'"
![Page 11: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/11.jpg)
Issues with HMM-Based Name Matching
ジ
オ
ョ
J o
...but this represents just "o", not "o following a 'j'"
Problems with HMMs:
● Multi-character equivalents
● Morphological effects on
pronunciation
○ Arabic
○ Similar: "photograph" vs
"photography"
Common Thread: Missing Context!
![Page 12: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/12.jpg)
Deep Learning for HLT
![Page 13: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/13.jpg)
This is super awful
This is super awful
Context-free statistical
representation
Neural Language
Model
Context-enriched statistical
representation
Further Reading: https://allennlp.org/elmo
![Page 14: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/14.jpg)
Starting Over
![Page 15: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/15.jpg)
How Would You Transliterate a Name?
John Titor
ジョ ン ・ タイ ター
![Page 16: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/16.jpg)
How Would You Transliterate a Name?
John Titor
ジョ ン ・ タイ ター
![Page 17: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/17.jpg)
Enter Deep Learning
![Page 18: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/18.jpg)
Step One: Learning to Transliterate
"Tupac" English Name Reader
Japanese Name Generator
"トゥパック"
![Page 19: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/19.jpg)
Step One: Learning to Transliterate
T u p a c
ト ゥ ー パ ッ ク
First we "read"
the English
name...
...then we
generate the
translation
![Page 20: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/20.jpg)
Step Two: Running the Transliterator in Reverse to Score
"Tupac" English Name Reader
Japanese Name Generator
"トゥパック"
0.790
![Page 21: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/21.jpg)
Step Two: Running the Transliterator in Reverse to Score
T u p a c
ト ゥ ー パ ッ ク
First we "read"
the English
name...
...then we pass in
the Japanese
name...
0.790
...to produce a
score.
![Page 22: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/22.jpg)
Broader HLT Applications
Key Idea: Read text and use a representation to produce data
● Many Manifestations
○ Learn to produce translated names (shown here)
○ Learn to answer questions (Amazon Alexa, Google Assistant, etc.)
● Similar idea from machine learning at large: Variational Autoencoders
○ Train a model to learn a "compressed" version of the input
○ Two compressed representations can be compared for similarity
● In general, deep neural networks help us model context
![Page 23: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/23.jpg)
Beyond Transliterations
![Page 24: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/24.jpg)
Text Embeddings
"king" - [0.1, 0.3, ...]
"queen" - [0.0, 0.4, ...] "woman" - [0.2, 0.3, ...]
"man" - [0.3, 0.2, ...]
"king" - "queen" ≈ "man" - "woman"
![Page 25: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/25.jpg)
Multilingual Text Embeddings
"Company" "会社" "شركة"
"Company" 1.0 0.69528 0.69764
"会社" 0.69528 1.0 0.51733
"شركة" 0.69764 0.51733 1.0
* In all language pairs, the translation is the closest word.
![Page 26: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/26.jpg)
What's the point of this?
![Page 27: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/27.jpg)
Semantic Name Matching
Nippon Telegraph and Telephone Corporation
日本電信電話株式会社(Nippon Denshin Denwa Kabushiki Gaisha)
Virtually No Phonetic Relationship!
![Page 28: Neural Name Matching - Amazon S3 · Step Two: Modeling Transliterations ジ オ ョ ホ ン J o h n Given a sequence of characters in the source language.....what is the probability](https://reader034.vdocuments.us/reader034/viewer/2022050307/5f6fe24e9d0a23205a6368d7/html5/thumbnails/28.jpg)
Bringing it all Together
● Each of the models shown here have their strengths
○ Traditional methods provide good performance and decent baseline results
○ Deep transliteration systems better handle context
○ Multilingual text embeddings enable semantic matching
● Successful systems incorporate all of the above into an ensemble approach
○ Pull from the strengths of each to deliver the optimal results