example-based machine translation based on deeper nlp 1. graduate school of information science and...

24
Example-based Machine Example-based Machine Translation based on Deeper NLP Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan, 113-8656 2. Graduate School of Informatics, Toshiaki Nakazawa 1 , Kun Yu 1 , Sadao Kurohashi 2

Upload: amos-maurice-robbins

Post on 19-Jan-2016

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Example-based Machine Example-based Machine

Translation based on Deeper NLPTranslation based on Deeper NLP

1. Graduate School of Information Science and Technology,

The University of Tokyo, Tokyo, Japan, 113-8656

2. Graduate School of Informatics,

Kyoto University, Kyoto, Japan, 606-8501

Toshiaki Nakazawa1, Kun Yu1, Sadao Kurohashi2

Page 2: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

OutlineOutline

Why EBMT?

Description of Kyoto-U EBMT System

Japanese Particular Processing

Pronoun Estimation

Japanese Flexible Matching

Result and Discussion

Conclusion and Future Work

Page 3: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

OutlineOutline

Why EBMT?

Description of Kyoto-U EBMT System

Japanese Particular Processing

Pronoun Estimation

Japanese Flexible Matching

Result and Discussion

Conclusion and Future Work

Page 4: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Why EBMT?Why EBMT?

Pursuing deep NLP

- Improvement of fundamental analyses leads to

improvement of MT

- Feedback from MT can be expected

EBMT setting is suitable in many cases

- Not a large corpus, but similar translation examples

in relatively close domain

- e.g. manual translation, patent translation, …

Page 5: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

OutlineOutline

Why EBMT?

Description of Kyoto-U EBMT System

Japanese Particular Processing

Pronoun Estimation

Japanese Flexible Matching

Result and Discussion

Conclusion and Future Work

Page 6: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Kyoto-U System OverviewKyoto-U System Overview

my

traffic

The light

was green

when

entering

the intersection

Language Model

My traffic light was green when entering the intersection.

Input

Output

交差

点 に

入る

私 の

信号 は

でした 。

(cross)

(point)

(enter)

(when)

(my)

(signal)

(blue)

(was)

came

at me

from the side

at the intersection

私 の

サイン

家 に

入る

脱ぐ

交差

点 で 、

突然

飛び出して 来た のです 。

信号 は

でした 。

my

signature

traffic

The light

was green

to remove

when

entering

a house

Translation Examples

(suddenly)

(rush out)

(house)

(put off)

(signal)

(enter)(when)

(cross)

(point)

(my)

(signal)

(blue)

(was)

交差点に入る時私の信号は青でした。

Page 7: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Structure-based AlignmentStructure-based Alignment

- Step1: Dependency structure transformation

- Step2: Word/phrase correspondences detection

- Step3: Correspondences disambiguation

- Step4: Handling remaining words

- Step5: Registration to database

Page 8: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Dependency Structure Dependency Structure TransformationTransformation

交差

点 で 、

突然

あの車 が

飛び出して 来た のです

the car

came

at me

from the side

at the intersection

J: 交差点で、突然あの車が

飛び出して来たのです。

E : The car came at me from

the side at the intersection.

J: JUMAN/KNP E: Charniak’s nlparser → Dependency tree

Step1

Page 9: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Word Correspondence Detection Word Correspondence Detection

KENKYUSYA J-E, E-J dictionaries (300K entries)

Transliteration (person/place names, Katakana words)

交差

点 で 、

突然あの

車 が

飛び出して 来た のです

the car

came

at me

from the side

at the intersection

Ex) 新宿 → shinjuku sinjuku synjucu

...

⇔ shinjuku (similarity:1.0)

Step2

Page 10: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Calculate correspondence score based on

unambiguous alignment

Select correspondence with higher score

11

Score.

MatchesUnamb EJ distdist

Correspondence Disambiguation Correspondence Disambiguation

distJ/E = Distance to unambiguous correspondence in Japanese/English tree

Step3

Page 11: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

日本 で

保険

会社 に

対して

保険

請求の

申し立て が

可能です よ

you

will have

to file

insurance

an claim

insurance

with the office

in Japan1.5

1.0

0.8

Correspondence Disambiguation Correspondence Disambiguation (cont.)(cont.)

Step3

Page 12: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Handling Remaining Words Handling Remaining Words

交差

点 で 、

突然あの

車 が

飛び出して 来た のです

the car

came

at me

from the side

at the intersection

Align root nodes when remained

Merge Base NP nodes

Merge into ancestor nodes

Step4

Page 13: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Registration to Database Registration to Database

交差

点 で 、

突然

あの車 が

飛び出して 来た のです

the car

came

at me

from the side

at the intersection

Step5

Register each correspondence

Register a couple of correspondences

Page 14: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

TranslationTranslation

Translation example (TE) retrieval

- for all the sub-trees in the input

TE selection

- prefer to large size example

TE combination

- greedily from the root node

Page 15: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Combination ExampleCombination Example

my

traffic

The light

was green

when

entering

the intersection

Input交差

点 に

入る

私 の

信号 は

でした 。

(cross)

(point)

(enter)

(when)

(my)

(signal)

(blue)

(was)

came

at me

from the side

at the intersection

私 の

サイン

家 に

入る

脱ぐ

交差

点 で 、

突然

飛び出して 来た のです 。

信号 は

でした 。

my

signature

traffic

The light

was green

to remove

when

entering

a house

Translation Examples

(suddenly)

(rush out)

(house)

(put off)

(signal)

(enter)

(when)

(cross)

(point)

(my)

(signal)

(blue)

(was)

Page 16: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Combination Example (cont.)Combination Example (cont.)

my

traffic

The light

was green

when

entering

the intersection

Input交差

点 に

入る

私 の

信号 は

でした 。

(cross)

(point)

(enter)

(when)

(my)

(signal)

(blue)

(was)

came

at me

from the side

at the intersection

私 の

サイン

家 に

入る

脱ぐ

交差

点 で 、

突然

飛び出して 来た のです 。

信号 は

でした 。

my

signature

traffic

The light

was green

to remove

when

entering

a house

Translation Examples

(suddenly)

(rush out)

(house)

(put off)

(signal)

(enter)

(when)

(cross)

(point)

(my)

(signal)

(blue)

(was)

Page 17: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

OutlineOutline

Why EBMT?

Description of Kyoto-U EBMT System

Japanese Particular Processing

Pronoun Estimation

Japanese Flexible Matching

Result and Discussion

Conclusion and Future Work

Page 18: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Pronoun EstimationPronoun Estimation

Omitted in TE:

- TE

胃が痛いのです → I’ve a stomachache

- Input

私は胃が痛いのです → Omitted in Input

- TE

これを日本に送ってください → Will you mail this to

Japan?

- Input:

日本へ送ってください →

Pronouns are often omitted in Japanese sentences

I I’ve a stomachache ×

Will you mail to Japan? ×△

Page 19: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Pronoun Estimation (cont.)Pronoun Estimation (cont.)

Omitted in TE:

- TE

胃が痛いのです → I’ve a stomachache

- Input

私は胃が痛いのです → Omitted in Input

- TE

これを日本に送ってください → Will you mail this to

Japan?

- Input:

日本へ送ってください →

Estimate omitted pronoun by modality and subject case

Will you mail this to Japan? ○

I’ve a stomachache ○

(私は)胃が痛いのです → I’ve a stomachache

(これを)日本へ送ってください →

Page 20: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Synonymous Relation

Various Expressions in Japanese Various Expressions in Japanese

- Hiragana/Katakana/Kanji variations

りんご = リンゴ = 林檎 (apple)

- Variations of Katakana expressions

コンピュータ = コンピューター (computer)

- Synonymous words

登山 = 山登り (climbing mountain vs mountain climgbing)

- Synonymous phrases

最寄りの = 一番近い Hypernym-Hyponym Relation

Morphological Analyzer

Automatically Acquired from Japanese Dictionaries

- 災難 ← 災害 ← 地震 (earthquake) 、台風 (typhoon)(disaster)

(nearest) (most) (near)

Page 21: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Japanese Flexible MatchingJapanese Flexible Matching

Page 22: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Open data track (JE)

Correct recognition translation & ASR output translation

BLEU NIST

Correct recognition

Dev1 0.5087 9.6803

Dev2 0.4881 9.4918

Dev3 0.4468 9.1883

Dev4 0.1921 5.7880

Test 0.1655 (8th/14) 5.4325 (8th/14)

ASR outputDev4 0.1590 5.0107

Test 0.1418 (9th/14) 4.8804 (10th/14)

IWSLT06 Evaluation ResultsIWSLT06 Evaluation Results

Page 23: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

Punctuation insertion failure caused parsing error

Dictionary robustness affected alignment accuracy

TE selection criterion failed when choosing among ‘almost equal’ examples

- e.g. Input: “ 買います” (buy a ticket)

TE: “ 買いません” (not buy a ticket)

Results DiscussionResults Discussion

Page 24: Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,

We not only aim at the development of MT, but also tackle this task from the viewpoint of structural NLP.

Conclusion and Future WorkConclusion and Future Work

Implement statistical method on alignment

Improve parsing accuracies (both J and E)

Improve Japanese flexible matching method

J-C and C-J MT Project with NICT