pp2`pb2r q7 i?2 9i? qq`fb?qt qm bb m h` mbh ibqm · proceedings of the 4th workshop on asian...

Proceedings of the 4th Workshop on Asian Translation, pages 1–54,Taipei, Taiwan, November 27, 2017. c©2017 AFNLP

Overview of the 4th Workshop on Asian TranslationToshiaki Nakazawa

Japan Science andTechnology Agency

[email protected]

Shohei Higashiyama and Chenchen DingNational Institute of

Information andCommunications Technology

{shohei.higashiyama, chenchen.ding}@nict.go.jp

Hideya Mino and Isao GotoNHK

{mino.h-gq, goto.i-es}@nhk.or.jp

Hideto KazawaGoogle

[email protected]

Yusuke OdaNara Institute of

Science and [email protected]

Graham NeubigCarnegie Mellon University

[email protected]

Sadao KurohashiKyoto University

[email protected]

AbstractThis paper presents the results ofthe shared tasks from the 4th work-shop on Asian translation (WAT2017)including J↔E, J↔C scientific pa-per translation subtasks, C↔J, K↔J,E↔J patent translation subtasks,H↔E mixed domain subtasks, J↔Enewswire subtasks and J↔E recipesubtasks. For the WAT2017, 12 in-stitutions participated in the sharedtasks. About 300 translation resultshave been submitted to the automaticevaluation server, and selected submis-sions were manually evaluated.

1 IntroductionThe Workshop on Asian Translation (WAT)is a new open evaluation campaign focusingon Asian languages. Following the success ofthe previous workshops WAT2014 (Nakazawaet al., 2014), WAT2015 (Nakazawa et al.,2015) and WAT2016 (Nakazawa et al., 2016),WAT2017 brings together machine translationresearchers and users to try, evaluate, shareand discuss brand-new ideas of machine trans-lation. We have been working toward practi-cal use of machine translation among all Asiancountries.

For the 4th WAT, we adopted new transla-tion subtasks with English-Japanese news cor-pus and English-Japanese recipe corpus in ad-dition to the subtasks at WAT2016 1. Fur-

1This year we did not conduct Indonesian-Englishnewswire subtask, which is conducted in WAT2016,due to corpus license reasons.

thermore, we invited research papers on top-ics related to machine translation, especiallyfor Asian languages. The submitted researchpapers were peer reviewed by three programcommittee members and the committee ac-cepted 4 papers, which focus on on neural ma-chine translation, and construction and evalu-ation of language resources. We also launchedthe small NMT task, which aims to build asmall NMT system that keeps a reasonabletranslation quality. There are, however, nosubmissions to the task this year.

WAT is the uniq workshop on Asian lan-guage transration with the following charac-teristics:

• Open innovation platformDue to the fixed and open test data, wecan repeatedly evaluate translation sys-tems on the same dataset over years.There is no deadline of translation re-sult submission with respect to auto-matic evaluation of translation qualityand WAT receives submissions at anytime.

• Domain and language pairsWAT is the world’s first workshopthat targets scientific paper do-main, and Chinese↔Japanese andKorean↔Japanese language pairs. In thefuture, we will add more Asian languagessuch as Vietnamese, Thai, Burmese andso on.

• Evaluation methodEvaluation is done both automatically

1

Lang Train Dev DevTest TestJE 3,008,500 1,790 1,784 1,812JC 672,315 2,090 2,148 2,107

Table 1: Statistics for ASPEC.

and manually. For automatic evalua-tion, we use three metrics: BLEU, RIBESand AMFM. As human evaluation, weevaluate translation results by pairwiseevaluation and JPO adequacy evaluation.JPO adequacy evaluation is conducted forthe selected submissions according to thepairwise evaluation results.

2 DatasetWAT2017 uses the Asian Scientific Paper Ex-cerpt Corpus (ASPEC) 2, JPO Patent Corpus(JPC) 3, JIJI Corpus 4, IIT Bombay English-Hindi Corpus (IITB Corpus) 5 and RecipeCorpus 6 as the dataset.

2.1 ASPECASPEC was constructed by the Japan Scienceand Technology Agency (JST) in collaborationwith the National Institute of Information andCommunications Technology (NICT). Thecorpus consists of a Japanese-English sci-entific paper abstract corpus (ASPEC-JE),which is used for J↔E subtasks, and aJapanese-Chinese scientific paper excerpt cor-pus (ASPEC-JC), which is used for J↔C sub-tasks. The statistics for each corpus are shownin Table 1.

2.1.1 ASPEC-JEThe training data for ASPEC-JE was con-structed by NICT from approximately twomillion Japanese-English scientific paper ab-stracts owned by JST. The data is a compara-ble corpus and sentence correspondences arefound automatically using the method from(Utiyama and Isahara, 2007). Each sentence

2http://lotus.kuee.kyoto-u.ac.jp/ASPEC/index.html

3http://lotus.kuee.kyoto-u.ac.jp/WAT/patent/index.html

4http://lotus.kuee.kyoto-u.ac.jp/WAT/jiji-corpus/index.html

5http://www.cfilt.iitb.ac.in/iitb_parallel/index.html6http://lotus.kuee.kyoto-u.ac.jp/WAT/recipe-

corpus/index.html

pair is accompanied by a similarity score thatare calculated by the method and a field IDthat indicates a scientific field. The corre-spondence between field IDs and field names,along with the frequency and occurrence ra-tios for the training data, are descripted in theREADME file of ASPEC-JE.

The development, development-test andtest data were extracted from parallel sen-tences from the Japanese-English paper ab-stracts that exclude the sentences in the train-ing data. Each dataset consists of 400 docu-ments and contains sentences in each field atthe same rate. The document alignment wasconducted automatically and only documentswith a 1-to-1 alignment are included. It istherefore possible to restore the original docu-ments. The format is the same as the trainingdata except that there is no similarity score.

2.1.2 ASPEC-JCASPEC-JC is a parallel corpus consisting ofJapanese scientific papers, which come fromthe literature database and electronic journalsite J-STAGE by JST, and their translation toChinese with permission from the necessaryacademic associations. Abstracts and para-graph units are selected from the body textso as to contain the highest overall vocabularycoverage.

The development, development-test andtest data are extracted at random from docu-ments containing single paragraphs across theentire corpus. Each set contains 400 para-graphs (documents). There are no documentssharing the same data across the training, de-velopment, development-test and test sets.

2.2 JPCJPC was constructed by the Japan Patent Of-fice (JPO). The corpus consists of Chinese-Japanese patent description corpus (JPC-CJ),Korean-Japanese patent description corpus(JPC-KJ) and English-Japanese patent de-scription corpus (JPC-EJ) with the sectionsof Chemistry, Electricity, Mechanical engi-neering, and Physics on the basis of Interna-tional Patent Classification (IPC). Each cor-pus is partitioned into training, development,development-test and test data. This corpusis used for patent subtasks C↔J, K↔J andE↔J. The statistics for each corpus are shown

2

Lang Train Dev DevTest TestCJ 1,000,000 2,000 2,000 2,000KJ 1,000,000 2,000 2,000 2,000EJ 1,000,000 2,000 2,000 2,000

Table 2: Statistics for JPC.

in Table 2.The Sentence pairs in each data were ran-

domly extracted from a description part ofcomparable patent documents under the con-dition that a similarity score between two sen-tences is greater than or equal to the thresholdvalue 0.05. The similarity score was calculatedby the method from (Utiyama and Isahara,2007) as with ASPEC. Document pairs whichwere used to extract sentence pairs for eachdata were not used for the other data. Fur-thermore, the sentence pairs were extractedso as to be the same number among the foursections. The maximize number of sentencepairs which are extracted from one documentpair was limited to 60 for training data and20 for the development, development-test andtest data.

The training data for JPC-CJ was madewith sentence pairs of Chinese-Japanesepatent documents published in 2012. ForJPC-KJ and JPC-EJ, the training data wasextracted from sentence pairs of Korean-Japanese and English-Japanese patent docu-ments published in 2011 and 2012. The de-velopment, development-test and test data forJPC-CJ, JPC-KJ and JPC-EJ were respec-tively made with 100 patent documents pub-lished in 2013.

2.3 JIJI CorpusJIJI Corpus was constructed by Jiji Press, Ltd.in collaboration with NICT. The corpus con-sists of news text that comes from Jiji Pressnews of various categories including politics,economy, nation, business, markets, sportsand so on. The corpus is partitioned intotraining, development, development-test andtest data, which consists of Japanese-Englishsentence pairs. The statistics for each corpusare shown in Table 3.

The sentence pairs in each data are identi-fied in the same manner as that for ASPEC

Lang Train Dev DevTest TestEJ 200,000 2,000 2,000 2,000

Table 3: Statistics for JIJI Corpus.

Lang Train Dev Test MonoH – – – 45,075,279EH 1,492,827 520 2,507 –JH 152,692 1,566 2,000 –

Table 4: Statistics for IITB Corpus. “Mono”indicates monolingual Hindi corpus.

using the method from (Utiyama and Isahara,2007).

2.4 IITB CorpusIIT Bombay English-Hindi corpus containsEnglish-Hindi parallel corpus (IITB-EH) aswell as monolingual Hindi corpus collectedfrom a variety of sources and corpora devel-oped at the Center for Indian Language Tech-nology, IIT Bombay over the years. This cor-pus is used for mixed domain subtasks H↔E.Furthermore, mixed domain subtasks H↔Jwere added as a pivot language task with aparallel corpus created using publicly availablecorpora (IITB-JH) 7. Most sentence pairs inIITB-JH come from the Bible corpus. Thestatistics for each corpus are shown in Table4.

2.5 Recipe CorpusRecipe Corpus was constructed by CookpadInc. Each recipe consists of a title, ingredi-ents, steps, a description and a history. Everytext in titles, ingredients and steps consists ofa parallel sentence while one in descriptionsand histories is not always a parallel sentence.Although all of the texts in the training set canbe used for training, only titles, ingredientsand steps in the test set is used for evaluation.The statistics for each corpus are described inTable 5.

3 Baseline SystemsHuman evaluations were conducted as pair-wise comparisons between the translation re-sults for a specific baseline system and trans-lation results for each participant’s system.

7http://lotus.kuee.kyoto-u.ac.jp/WAT/Hindi-corpus/WAT2017-Ja-Hi.zip

3

Lang TextType Train Dev DevTest Test

EJTitle 14,779 500 500 500Ingredient 127,244 4,274 4,188 3,935Step 108,993 3,303 3,086 2,804

Table 5: Statistics for Recipe Corpus.

That is, the specific baseline system was thestandard for human evaluation. A phrase-based statistical machine translation (SMT)system was adopted as the specific baselinesystem at WAT 2017, which is the same sys-tem as that at WAT 2014 to WAT 2016.

In addition to the results for the baselinephrase-based SMT system, we produced re-sults for the baseline systems that consistedof a hierarchical phrase-based SMT system,a string-to-tree syntax-based SMT system,a tree-to-string syntax-based SMT system,seven commercial rule-based machine transla-tion (RBMT) systems, and two online trans-lation systems. We also experimentally pro-duced results for the baseline systems thatconsisted of an neural machine translation sys-tem using the implementation of (Vaswaniet al., 2017). The SMT baseline systems con-sisted of publicly available software, and theprocedures for building the systems and fortranslating using the systems were publishedon the WAT web page8. We used Moses(Koehn et al., 2007; Hoang et al., 2009) as theimplementation of the baseline SMT systems.The Berkeley parser (Petrov et al., 2006) wasused to obtain syntactic annotations. Thebaseline systems are shown in Table 6.

The commercial RBMT systems and the on-line translation systems were operated by theorganizers. We note that these RBMT com-panies and online translation companies didnot submit themselves. Because our objectiveis not to compare commercial RBMT systemsor online translation systems from companiesthat did not themselves participate, the sys-tem IDs of these systems are anonymous inthis paper.

8http://lotus.kuee.kyoto-u.ac.jp/WAT/

4

ASP

ECJP

CII

TB

JIJI

REC

IPE

Syst

emID

Syst

emT

ype

JEEJ

JCC

JJE

EJJC

CJ

JKK

JHE

EHJE

EJJE

EJSM

TPh

rase

Mos

es’P

hras

e-ba

sed

SMT

SMT

✓✓

✓✓

✓✓

✓✓

✓✓

✓✓

✓✓

✓✓

SMT

Hie

roM

oses

’Hie

rarc

hica

lPhr

ase-

base

dSM

TSM

T✓

✓✓

✓✓

✓✓

✓✓

✓✓

✓✓

✓SM

TS2

TM

oses

’Str

ing-

to-T

ree

Synt

ax-b

ased

SMT

and

Ber

kele

ypa

rser

SMT

✓✓

✓✓

✓✓

SMT

T2S

Mos

es’T

ree-

to-S

trin

gSy

ntax

-bas

edSM

Tan

dB

erke

ley

pars

erSM

T✓

✓✓

✓✓

✓R

BM

TX

The

Hon

yaku

V15

(Com

mer

cial

syst

em)

RB

MT

✓✓

✓✓

✓✓

✓✓

RB

MT

XAT

LAS

V14

(Com

mer

cial

syst

em)

RB

MT

✓✓

✓✓

RB

MT

XPA

T-T

rans

er20

09(C

omm

erci

alsy

stem

)R

BM

T✓

✓✓

✓R

BM

TX

PC-T

rans

erV

13(C

omm

erci

alsy

stem

)R

BM

T✓

✓✓

✓R

BM

TX

J-B

eijin

g7

(Com

mer

cial

syst

em)

RB

MT

✓✓

✓✓

RB

MT

XH

ohra

i201

1(C

omm

erci

alsy

stem

)R

BM

T✓

✓✓

RB

MT

XJ

Soul

9(C

omm

erci

alsy

stem

)R

BM

T✓

✓R

BM

TX

Kor

ai20

11(C

omm

erci

alsy

stem

)R

BM

T✓

✓O

nlin

eX

Goo

gle

tran

slate

Oth

er✓

✓✓

✓✓

✓✓

✓✓

✓✓

✓✓

✓✓

✓O

nlin

eX

Bin

gtr

ansla

tor

Oth

er✓

✓✓

✓✓

✓✓

✓✓

✓✓

✓✓

✓✓

✓A

IAY

NG

oogl

e’s

impl

emen

tatio

nof

“Att

entio

nIs

All

You

Nee

d”N

MT

✓✓

Tabl

e6:

Base

line

Syst

ems

5

3.1 Training DataWe used the following data for training theSMT baseline systems.

• Training data for the language model: Allof the target language sentences in theparallel corpus.

• Training data for the translation model:Sentences that were 40 words or less inlength. (For ASPEC Japanese–Englishtraining data, we only used train-1.txt,which consists of one million parallel sen-tence pairs with high similarity scores.)

• Development data for tuning: All of thedevelopment data.

3.2 Common Settings for BaselineSMT

We used the following tools for tokenization.

• Juman version 7.09 for Japanese segmen-tation.

• Stanford Word Segmenter version 2014-01-0410 (Chinese Penn Treebank (CTB)model) for Chinese segmentation.

• The Moses toolkit for English and Indone-sian tokenization.

• Mecab-ko11 for Korean segmentation.• Indic NLP Library12 for Hindi segmenta-

tion.

To obtain word alignments, GIZA++ andgrow-diag-final-and heuristics were used. Weused 5-gram language models with modifiedKneser-Ney smoothing, which were built us-ing a tool in the Moses toolkit (Heafield et al.,2013).

3.3 Phrase-based SMTWe used the following Moses configuration forthe phrase-based SMT system.

• distortion-limit– 20 for JE, EJ, JC, and CJ– 0 for JK, KJ, HE, and EH– 6 for IE and EI

• msd-bidirectional-fe lexicalized reorder-ing

9http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?JUMAN

10http://nlp.stanford.edu/software/segmenter.shtml11https://bitbucket.org/eunjeon/mecab-ko/12https://bitbucket.org/anoopk/indic_nlp_library

• Phrase score option: GoodTuring

The default values were used for the other sys-tem parameters.

3.4 Hierarchical Phrase-based SMTWe used the following Moses configuration forthe hierarchical phrase-based SMT system.

• max-chart-span = 1000• Phrase score option: GoodTuring


3.5 String-to-Tree Syntax-based SMTWe used the Berkeley parser to obtain tar-get language syntax. We used the follow-ing Moses configuration for the string-to-treesyntax-based SMT system.

• max-chart-span = 1000• Phrase score option: GoodTuring• Phrase extraction options: MaxSpan =

1000, MinHoleSource = 1, and NonTerm-ConsecSource.


3.6 Tree-to-String Syntax-based SMTWe used the Berkeley parser to obtain sourcelanguage syntax. We used the following Mosesconfiguration for the baseline tree-to-stringsyntax-based SMT system.

• max-chart-span = 1000• Phrase score option: GoodTuring• Phrase extraction options: MaxSpan =

1000, MinHoleSource = 1, MinWords =0, NonTermConsecSource, and AllowOn-lyUnalignedWords.


4 Automatic Evaluation4.1 Procedure for Calculating

Automatic Evaluation ScoreWe evaluated translation results by three met-rics: BLEU (Papineni et al., 2002), RIBES(Isozaki et al., 2010) and AMFM (Banchset al., 2015). BLEU scores were calculated us-ing multi-bleu.perl which was distributed

6

with the Moses toolkit (Koehn et al., 2007).RIBES scores were calculated using RIBES.pyversion 1.02.4 13. AMFM scores were calcu-lated using scripts created by the technical col-laborators of WAT2017. All scores for eachtask were calculated using the correspondingreference.

Before the calculation of the automatic eval-uation scores, the translation results were to-kenized with word segmentation tools for eachlanguage. For Japanese segmentation, we usedthree different tools: Juman version 7.0 (Kuro-hashi et al., 1994), KyTea 0.4.6 (Neubig et al.,2011) with Full SVM model 14 and MeCab0.996 (Kudo, 2005) with IPA dictionary 2.7.015. For Chinese segmentation, we used twodifferent tools: KyTea 0.4.6 with Full SVMModel in MSR model and Stanford Word Seg-menter (Tseng, 2005) version 2014-06-16 withChinese Penn Treebank (CTB) and PekingUniversity (PKU) model 16. For Korean seg-mentation we used mecab-ko 17. For Englishsegmentation, we used tokenizer.perl 18 inthe Moses toolkit. For Hindi segmentation,we used Indic NLP Library 19. The detailedprocedures for the automatic evaluation areshown on the WAT2017 evaluation web page20.

4.2 Automatic Evaluation SystemThe participants submit translation results viaan automatic evaluation system deployed onthe WAT2017 web page, which automaticallygives evaluation scores for the uploaded re-sults. Figure 1 shows the submission inter-face for participants. The system requires par-ticipants to provide the following informationwhen they upload translation results:

• Subtask:Scientific papers subtask (J↔E, J↔C),Patents subtask (C↔J, K↔J, E↔J),

13http://www.kecl.ntt.co.jp/icl/lirg/ribes/index.html14http://www.phontron.com/kytea/model.html15http://code.google.com/p/mecab/downloads/detail?

name=mecab-ipadic-2.7.0-20070801.tar.gz16http://nlp.stanford.edu/software/segmenter.shtml17https://bitbucket.org/eunjeon/mecab-ko/18https://github.com/moses-

smt/mosesdecoder/tree/RELEASE-2.1.1/scripts/tokenizer/tokenizer.perl

19https://bitbucket.org/anoopk/indic_nlp_library20http://lotus.kuee.kyoto-

u.ac.jp/WAT/evaluation/index.html

Newswire subtask (J↔E),Mixed domain subtask (H↔E, H↔J) orRecipe subtask (J↔E);

• Method:SMT, RBMT, SMT and RBMT, EBMT,NMT or Other;

• Use of other resources in addition to theprovided data ASPEC / JPC / IITB Cor-pus / JIJI Corpus / Recipe Corpus;

• Permission to publish automatic evalua-tion scores on the WAT2017 web page.

Although participants can confirm only theinformation that they filled or uploaded, theserver for the system stores all submitted in-formation including translation results andscores. Information about translation resultsthat participants permit to be published is dis-closed via the WAT2017 evaluation web page.Participants can also submit the results for hu-man evaluation using the same web interface.This automatic evaluation system will remainavailable even after WAT2017. Anybody canregister an account for the system by followingthe procesures in the registration web page 21.

21http://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2017/registration/index.html

7

Figu

re1:

The

subm

issio

nwe

bpa

gefo

rpa

rtic

ipan

ts

8

5 Human Evaluation

In WAT2017, we conducted 2 kinds of humanevaluations: pairwise evaluation and JPO ad-equacy evaluation.

5.1 Pairwise EvaluationThe pairwise evaluation is the same as thelast year, but not using the crowdsourcing thisyear. We asked professional translation com-pany to do pairwise evaluation. The cost ofpairwise evaluation per sentence is almost thesame to that of last year.

We randomly chose 400 sentences from theTest set for the pairwise evaluation. We usedthe same sentences as the last year for thecontinuous subtasks. Each submission is com-pared with the baseline translation (Phrase-based SMT, described in Section 3) and givena Pairwise score.

5.1.1 Pairwise Evaluation of SentencesWe conducted pairwise evaluation of each ofthe 400 test sentences. The input sentenceand two translations (the baseline and a sub-mission) are shown to the annotators, and theannotators are asked to judge which of thetranslation is better, or if they are of the samequality. The order of the two translations areat random.

5.1.2 VotingTo guarantee the quality of the evaluations,each sentence is evaluated by 5 different anno-tators and the final decision is made dependingon the 5 judgements. We define each judge-ment ji(i = 1, · · · , 5) as:

ji =

1 if better than the baseline−1 if worse than the baseline0 if the quality is the same

The final decision D is defined as follows usingS =

∑ji:

D =

win (S ≥ 2)loss (S ≤ −2)tie (otherwise)

5.1.3 Pairwise Score CalculationSuppose that W is the number of wins com-pared to the baseline, L is the number of lossesand T is the number of ties. The Pairwise

score can be calculated by the following for-mula:

Pairwise = 100× W − L

W + L + T

From the definition, the Pairwise score rangesbetween -100 and 100.

5.1.4 Confidence Interval EstimationThere are several ways to estimate a confi-dence interval. We chose to use bootstrap re-sampling (Koehn, 2004) to estimate the 95%confidence interval. The procedure is as fol-lows:

1. randomly select 300 sentences from the400 human evaluation sentences, and cal-culate the Pairwise score of the selectedsentences

2. iterate the previous step 1000 times andget 1000 Pairwise scores

3. sort the 1000 scores and estimate the 95%confidence interval by discarding the top25 scores and the bottom 25 scores

5.2 JPO Adequacy Evaluation

The participants’ systems, which achieved thetop 3 highest scores among the pairwise eval-uation results of each subtask22, were alsoevaluated with the JPO adequacy evaluation.The JPO adequacy evaluation was carried outby translation experts with a quality evalua-tion criterion for translated patent documentswhich the Japanese Patent Office (JPO) de-cided. For each system, two annotators evalu-ate the test sentences to guarantee the quality.

5.2.1 Evaluation of SentencesThe number of test sentences for the JPO ad-equacy evaluation is 200. The 200 test sen-tences were randomly selected from the 400test sentences of the pairwise evaluation. Thetest sentence include the input sentence, thesubmitted system’s translation and the refer-ence translation.

22The number of systems varies depending on thesubtasks.

9

5 All important information is transmitted cor-rectly. (100%)

4 Almost all important information is transmit-ted correctly. (80%–)

3 More than half of important information istransmitted correctly. (50%–)

2 Some of important information is transmittedcorrectly. (20%–)

1 Almost all important information is NOTtransmitted correctly. (–20%)

Table 7: The JPO adequacy criterion

5.2.2 Evaluation CriterionTable 7 shows the JPO adequacy criterionfrom 5 to 1. The evaluation is performedsubjectively. “Important information” repre-sents the technical factors and their relation-ships. The degree of importance of each ele-ment is also considered to evaluate. The per-centages in each grade are rough indicationsfor the transmission degree of the source sen-tence meanings. The detailed criterion can befound on the JPO document (in Japanese) 23.

6 Participants ListTable 8 shows the list of participants forWAT2017. This includes not only Japanese or-ganizations, but also some organizations fromoutside Japan. 12 teams submitted one ormore translation results to the automatic eval-uation server or human evaluation.

23http://www.jpo.go.jp/shiryou/toushin/chousa/tokkyohonyaku_hyouka.htm

10

REC

IPE

ASP

ECJP

CII

TB

CJI

JIT

TL

ING

STE

Team

IDO

rgan

izat

ion

JEEJ

JCC

JJE

EJJC

CJK

JHE

EHJE

EJJE

EJJE

EJJE

EJK

yoto

-U(C

rom

iere

set

al.,

2017

)K

yoto

Uni

vers

ity✓

✓✓

✓T

MU

(Mat

sum

ura

and

Kom

achi

,201

7)To

kyo

Met

ropo

litan

Uni

vers

ity✓

✓✓

EHR

(Eha

ra,2

017)

Ehar

aN

LPR

esea

rch

Labo

rato

ry✓

✓✓

NT

T(M

orish

itaet

al.,

2017

)N

TT

Com

mun

icat

ion

Scie

nce

Labo

rato

ries

✓✓

✓✓

JAPI

O(K

inos

hita

etal

.,20

17)

Japa

nPa

tent

Info

rmat

ion

Org

aniz

atio

n✓

✓✓

✓N

ICT

-2(I

mam

ura

and

Sum

ita,2

017)

Nat

iona

lIns

titut

eof

Info

rmat

ion

and

Com

mun

icat

ions

Tech

nolo

gy✓

✓✓

✓✓

XM

UN

LP(W

ang

etal

.,20

17)

Xia

men

Uni

vers

ity✓

✓✓

✓✓

✓✓

✓✓

✓U

T-I

IS(N

eish

iet

al.,

2017

)T

heU

nive

rsity

ofTo

kyo

✓C

UN

I(K

ocm

iet

al.,

2017

)C

harle

sU

nive

rsity

,Ins

titut

eof

Form

alan

dA

pplie

dLi

ngui

stic

s✓

✓✓

IIT

B-M

TG

(Sin

ghet

al.,

2017

)In

dian

Inst

itute

ofTe

chno

logy

Bom

bay

✓✓

u-tk

b(L

ong

etal

.,20

17)

Uni

vers

ityof

Tsuk

uba

✓✓

✓✓

NA

IST

-NIC

T(O

daet

al.,

2017

)N

AIS

T/N

ICT

✓

Tabl

e8:

List

ofpa

rtic

ipan

tsw

hosu

bmitt

edtr

ansla

tion

resu

ltsto

WAT

2017

and

thei

rpa

rtic

ipat

ion

inea

chsu

btas

ks.

11

7 Evaluation ResultsIn this section, the evaluation results forWAT2017 are reported from several perspec-tives. Some of the results for both automaticand human evaluations are also accessible atthe WAT2017 website24.

7.1 Official Evaluation ResultsFigures 2, 3, 4 and 5 show the official evalu-ation results of ASPEC subtasks, Figures 6,7, 8, 9 and 10 show those of JPC subtasks,Figures 11 and 12 show those of IITBC sub-tasks, Figures 13 and 14 show those of JIJIsubtasks and Figures 15, 16, 17, 18, 19 and 20show those of RECIPE subtasks. Each figurecontains automatic evaluation results (BLEU,RIBES, AM-FM), the pairwise evaluation re-sults with confidence intervals, correlation be-tween automatic evaluations and the pairwiseevaluation, the JPO adequacy evaluation re-sult and evaluation summary of top systems.

The detailed automatic evaluation resultsfor all the submissions are shown in AppendixA. The detailed JPO adequacy evaluation re-sults for the selected submissions are shownin Table 9. The weights for the weighted κ(Cohen, 1968) is defined as |Evaluation1 −Evaluation2|/4.

From the evaluation results, the followingcan be observed:

• The translation quality of this year is bet-ter than that of last year for all the sub-tasks.

• There is no big difference between theneural network based translation modelsaccording to the JPO adequacy evalua-tion results for ASPEC subtasks.

7.2 Statistical Significance Testing ofPairwise Evaluation betweenSubmissions

Tables 10, 11 and 12 show the results of statis-tical significance testing of ASPEC subtasks,Tables 13, 14 and 15 show those of JPC sub-tasks, Table 16 shows those of IITBC subtasks,Table 17 shows those of JIJI subtasks and Ta-bles 18, 19 and 20 show those of RECIPE sub-tasks. ≫, ≫ and > mean that the system in

24http://lotus.kuee.kyoto-u.ac.jp/WAT/evaluation/

the row is better than the system in the col-umn at a significance level of p < 0.01, 0.05and 0.1 respectively. Testing is also done bythe bootstrap resampling as follows:

1. randomly select 300 sentences from the400 pairwise evaluation sentences, andcalculate the Pairwise scores on the se-lected sentences for both systems

2. iterate the previous step 1000 times andcount the number of wins (W ), losses (L)and ties (T )

3. calculate p = LW+L

Inter-annotator AgreementTo assess the reliability of agreement betweenthe workers, we calculated the Fleiss’ κ (Fleisset al., 1971) values. The results are shown inTable 21. We can see that the κ values arelarger for X → J translations than for J → Xtranslations. This may be because the major-ity of the workers are Japanese, and the eval-uation of one’s mother tongue is much easierthan for other languages in general.

8 Submitted Data

The number of published automatic evalua-tion results for the 14 teams exceeded 300 be-fore the start of WAT2017, and 67 translationresults for pairwise evaluation were submittedby 12 teams. Furthermore, we selected severaltranslation results from each subtask accord-ing to the pairwise evaluation scores and eval-uated them for JPO adequacy evaluation. Wewill organize the all of the submitted data forhuman evaluation and make this public.

9 Conclusion and FuturePerspective

This paper summarizes the shared tasks ofWAT2017. We had 12 participants worldwide,and collected a large number of useful submis-sions for improving the current machine trans-lation systems by analyzing the submissionsand identifying the issues.

For the next WAT workshop, we plan tochange the baseline system from the PBSMTto NMT because the pairwise scores are sat-urated for some of the subtasks. Also, we

12

are planning to do extrinsic evaluation of thetranslations.

Unfortunately, there was no participants forthe small NMT task this year. We will brush-up the task definition and invite participantsfor the next WAT.

Appendix A SubmissionsTables 22 to 41 summarize all the submissionslisted in the automatic evaluation server at thetime of the WAT2017 workshop (27th, Novem-ber, 2017). The OTHER column shows the useof resources such as parallel corpora, monolin-gual corpora and parallel dictionaries in addi-tion to ASPEC, JPC, IITB Corpus, JIJI Cor-pus, RECIPE Corpus.

13

Figure 2: Official evaluation results of ASPEC-JE.

14

Figure 3: Official evaluation results of ASPEC-EJ.

15

Figure 4: Official evaluation results of ASPEC-JC.

16

Figure 5: Official evaluation results of ASPEC-CJ.

17

Figure 6: Official evaluation results of JPC-JE.

18

Figure 7: Official evaluation results of JPC-EJ.

19

Figure 8: Official evaluation results of JPC-JC.

20

Figure 9: Official evaluation results of JPC-CJ.

21

Figure 10: Official evaluation results of JPC-KJ.

22

Figure 11: Official evaluation results of IITBC-HE.

23

Figure 12: Official evaluation results of IITBC-EH.

24

Figure 13: Official evaluation results of JIJI-JE.

25

Figure 14: Official evaluation results of JIJI-EJ.

26

Figure 15: Official evaluation results of RECIPE-TTL-JE.

27

Figure 16: Official evaluation results of RECIPE-TTL-EJ.

28

Figure 17: Official evaluation results of RECIPE-ING-JE.

29

Figure 18: Official evaluation results of RECIPE-ING-EJ.

30

Figure 19: Official evaluation results of RECIPE-STE-JE.

31

Figure 20: Official evaluation results of RECIPE-STE-EJ.

32

SYSTEM DATA Annotator A Annotator B all weightedSubtask ID ID average varianceaverage varianceaverage κ κ

ASPEC-JENTT 1681 4.15 0.58 4.13 0.52 4.14 0.29 0.41

AIAYN 1736 4.16 0.67 4.05 0.75 4.10 0.26 0.42Kyoto-U 1717 4.11 0.69 4.09 0.54 4.10 0.26 0.402016 best 1246 3.76 0.68 4.01 0.67 3.89 0.21 0.31

ASPEC-EJ

NTT 1729 4.54 0.56 4.28 0.49 4.41 0.33 0.43AIAYN 1737 4.38 0.83 4.21 0.76 4.30 0.36 0.52NICT-2 1479 4.43 0.73 4.16 0.69 4.29 0.35 0.48Kyoto-U 1731 4.37 0.84 4.15 0.74 4.26 0.39 0.54

NAIST-NICT 1507 4.36 0.69 4.06 0.57 4.21 0.26 0.362016 best 1172 3.97 0.76 4.07 0.85 4.02 0.35 0.49

ASPEC-JCNICT-2 1483 4.25 0.73 3.71 0.98 3.98 0.10 0.18Kyoto-U 1722 4.25 0.79 3.64 1.07 3.95 0.12 0.23AIAYN 1738 4.26 0.69 3.54 1.03 3.90 0.17 0.27

2016 best 1071 4.00 1.09 3.76 1.14 3.88 0.20 0.36

ASPEC-CJNICT-2 1481 4.63 0.47 3.99 0.98 4.31 0.17 0.23Kyoto-U 1720 4.62 0.56 3.97 0.94 4.30 0.16 0.22AIAYN 1740 4.59 0.61 3.96 1.04 4.27 0.14 0.23

2016 best 1256 4.25 1.04 3.64 1.23 3.94 0.23 0.34

JPC-JEJAPIO 1574 4.80 0.26 4.78 0.51 4.79 0.34 0.42u-tkb 1472 4.24 1.26 4.08 2.27 4.16 0.43 0.64CUNI 1666 4.12 1.49 3.99 2.35 4.05 0.40 0.63

2016 best 1149 4.09 0.80 4.51 0.58 4.30 0.25 0.39

JPC-EJJAPIO 1454 4.74 0.45 4.76 0.38 4.75 0.32 0.48EHR 1407 4.64 0.61 4.61 0.65 4.63 0.42 0.60u-tkb 1470 4.39 1.07 4.42 0.99 4.40 0.43 0.61

2016 best 1098 4.03 0.91 4.51 0.57 4.27 0.23 0.41JPC-JC u-tkb 1465 3.99 1.12 4.19 0.94 4.09 0.22 0.32

2016 best 1150 3.49 1.72 3.02 1.75 3.25 0.27 0.51

JPC-CJJAPIO 1484 4.41 0.68 4.51 0.64 4.46 0.26 0.34EHR 1414 4.27 0.92 4.35 1.03 4.31 0.33 0.48u-tkb 1468 3.84 1.16 4.04 1.36 3.94 0.23 0.43

2016 best 1200 3.61 1.89 3.27 1.76 3.44 0.26 0.52

JPC-KJJAPIO 1448 4.82 0.24 4.87 0.11 4.84 0.55 0.55EHR 1417 4.76 0.30 4.86 0.23 4.81 0.35 0.47

2016 best 1209 4.58 0.32 4.66 0.30 4.62 0.33 0.36IITBC-HE XMUNLP 1511 3.43 1.64 3.60 1.74 3.51 0.22 0.45

IITB-MTG 1726 2.14 1.45 2.45 1.87 2.29 0.30 0.51

IITBC-EHXMUNLP 1576 3.95 1.18 3.76 1.85 3.86 0.17 0.36IITB-MTG 1725 2.78 1.74 2.58 1.87 2.68 0.15 0.382016 best 1032 3.20 1.33 3.53 1.19 3.36 0.10 0.16

JIJI-JEOnline A 1523 3.03 1.60 3.28 2.24 3.15 0.15 0.37

NTT 1599 1.87 1.25 2.23 1.69 2.05 0.26 0.46XMUNLP 1442 1.91 1.26 2.19 1.56 2.05 0.24 0.44

JIJI-EJOnline A 1518 3.31 1.92 3.78 2.06 3.54 0.23 0.50

NTT 1679 1.78 1.18 2.28 1.97 2.03 0.29 0.52XMUNLP 1443 1.72 1.02 2.20 1.70 1.96 0.33 0.51

RECIPE-TTL-JE XMUNLP 1637 3.90 1.98 3.62 1.57 3.76 0.30 0.56Online A 1534 3.52 2.04 3.16 2.07 3.34 0.36 0.60

RECIPE-TTL-EJ Online B 1533 4.56 0.55 3.84 2.02 4.20 0.25 0.35XMUNLP 1636 4.54 0.62 3.62 2.40 4.08 0.26 0.34

RECIPE-ING-JE XMULNP 1635 4.68 0.65 4.58 0.85 4.63 0.47 0.67Online A 1544 4.29 1.44 4.23 1.57 4.26 0.55 0.76

RECIPE-ING-EJ XMUNLP 1634 4.71 0.43 4.43 1.03 4.57 0.40 0.53Online A 1542 4.50 0.95 4.54 0.91 4.52 0.50 0.65

RECIPE-STE-JE XMUNLP 1632 4.61 0.76 3.98 0.96 4.29 0.13 0.28Online A 1551 3.34 1.54 2.69 1.21 3.01 0.14 0.36

RECIPE-STE-EJ XMUNLP 1633 4.75 0.36 4.04 1.33 4.39 0.12 0.21Online A 1549 4.18 0.42 3.16 1.52 3.67 0.11 0.17

Table 9: JPO adequacy evaluation results in detail.

33

NT

T(1

681)

OR

GA

NIZ

ER(1

736)

NT

T(1

616)

Kyo

to-U

(173

3)N

ICT

-2(1

480)

NIC

T-2

(147

6)C

UN

I(1

665)

OR

GA

NIZ

ER(1

333)

TM

U(1

703)

TM

U(1

695)

Kyoto-U (1717) - ≫ ≫ ≫ ≫ ≫ ≫ ≫ ≫ ≫NTT (1681) > ≫ ≫ ≫ ≫ ≫ ≫ ≫ ≫ORGANIZER (1736) - - ≫ ≫ ≫ ≫ ≫ ≫NTT (1616) - ≫ ≫ ≫ ≫ ≫ ≫Kyoto-U (1733) ≫ ≫ ≫ ≫ ≫ ≫NICT-2 (1480) - ≫ ≫ ≫ ≫NICT-2 (1476) ≫ ≫ ≫ ≫CUNI (1665) > ≫ ≫ORGANIZER (1333) - ≫TMU (1703) ≫

Table 10: Statistical significance testing of the ASPEC-JE Pairwise scores.N

ICT

-2(1

479)

OR

GA

NIZ

ER(1

334)

NT

T(1

684)

NA

IST

-NIC

T(1

507)

OR

GA

NIZ

ER(1

737)

Kyo

to-U

(173

1)U

T-I

IS(1

710)

NA

IST

-NIC

T(1

506)

NIC

T-2

(147

5)T

MU

(170

9)T

MU

(170

4)NTT (1729) - - ≫ ≫ ≫ ≫ ≫ ≫ ≫ ≫ ≫NICT-2 (1479) - ≫ ≫ ≫ ≫ ≫ ≫ ≫ ≫ ≫ORGANIZER (1334) - ≫ ≫ ≫ ≫ ≫ ≫ ≫ ≫NTT (1684) ≫ ≫ ≫ ≫ ≫ ≫ ≫ ≫NAIST-NICT (1507) - - > ≫ ≫ ≫ ≫ORGANIZER (1737) - > ≫ ≫ ≫ ≫Kyoto-U (1731) > ≫ ≫ ≫ ≫UT-IIS (1710) ≫ ≫ ≫ ≫NAIST-NICT (1506) - ≫ ≫NICT-2 (1475) ≫ ≫TMU (1709) ≫

Table 11: Statistical significance testing of the ASPEC-EJ Pairwise scores.

Kyo

to-U

(164

2)O

RG

AN

IZER

(173

8)N

ICT

-2(1

483)

NIC

T-2

(147

8)O

RG

AN

IZER

(133

6)T

MU

(174

3)

Kyoto-U (1722) - > ≫ ≫ ≫ ≫Kyoto-U (1642) - > ≫ ≫ ≫ORGANIZER (1738) - ≫ ≫ ≫NICT-2 (1483) > ≫ ≫NICT-2 (1478) ≫ ≫ORGANIZER (1336) ≫

Kyo

to-U

(157

7)N

ICT

-2(1

481)

OR

GA

NIZ

ER(1

740)

NIC

T-2

(147

7)O

RG

AN

IZER

(134

2)

Kyoto-U (1720) ≫ ≫ ≫ ≫ ≫Kyoto-U (1577) - - - ≫NICT-2 (1481) - - ≫ORGANIZER (1740) - ≫NICT-2 (1477) ≫

Table 12: Statistical significance testing of the ASPEC-JC (left) and ASPEC-CJ (right) Pairwisescores.

34

JAPI

O(1

574)

JAPI

O(1

578)

CU

NI

(166

6)u-

tkb

(147

2)

ORGANIZER (1338) ≫ ≫ ≫ ≫JAPIO (1574) - ≫ ≫JAPIO (1578) ≫ ≫CUNI (1666) ≫

EHR

(140

6)JA

PIO

(145

4)u-

tkb

(147

0)O

RG

AN

IZER

(133

9)JA

PIO

(146

2)

EHR (1407) - ≫ ≫ ≫ ≫EHR (1406) - ≫ ≫ ≫JAPIO (1454) ≫ ≫ ≫u-tkb (1470) - ≫ORGANIZER (1339) ≫

Table 13: Statistical significance testing of the JPC-JE (left) and JPC-EJ (right) Pairwise scores.u-

tkb

(146

5)

ORGANIZER (1340) ≫

EHR

(141

4)EH

R(1

408)

JAPI

O(1

447)

u-tk

b(1

468)

OR

GA

NIZ

ER(1

341)

JAPIO (1484) ≫ ≫ ≫ ≫ ≫EHR (1414) - ≫ ≫ ≫EHR (1408) ≫ ≫ ≫JAPIO (1447) ≫ ≫u-tkb (1468) -

Table 14: Statistical significance testing of the JPC-JC (left) and JPC-CJ (right) Pairwise scores.

JAPI

O(1

450)

EHR

(141

7)EH

R(1

416)

OR

GA

NIZ

ER(1

344)

JAPIO (1448) - ≫ ≫ ≫JAPIO (1450) ≫ ≫ ≫EHR (1417) ≫ ≫EHR (1416) ≫

Table 15: Statistical significance testing of the JPC-KJ Pairwise scores.

IIT

B-M

TG

(172

6)

XMUNLP (1511) ≫

IIT

B-M

TG

(172

5)

XMUNLP (1576) ≫

Table 16: Statistical significance testing of the IITBC-HE (left) and IITBC-EH (right) Pairwisescores.

35

OR

GA

NIZ

ER(1

526)

NT

T(1

599)

NT

T(1

677)

XM

UN

LP(1

442)

OR

GA

NIZ

ER(1

396)

NIC

T-2

(147

4)N

ICT

-2(1

473)

CU

NI

(166

8)

ORGANIZER (1523) ≫ ≫ ≫ ≫ ≫ ≫ ≫ ≫ORGANIZER (1526) ≫ ≫ ≫ ≫ ≫ ≫ ≫NTT (1599) ≫ ≫ ≫ ≫ ≫ ≫NTT (1677) ≫ ≫ ≫ ≫ ≫XMUNLP (1442) ≫ ≫ ≫ ≫ORGANIZER (1396) > ≫ ≫NICT-2 (1474) ≫ ≫NICT-2 (1473) ≫

OR

GA

NIZ

ER(1

514)

NT

T(1

679)

NT

T(1

603)

XM

UN

LP(1

443)

OR

GA

NIZ

ER(1

395)

ORGANIZER (1518) ≫ ≫ ≫ ≫ ≫ORGANIZER (1514) ≫ ≫ ≫ ≫NTT (1679) ≫ ≫ ≫NTT (1603) - ≫XMUNLP (1443) -

Table 17: Statistical significance testing of the JIJI-JE (left) and JIJI-EJ (right) Pairwise scores.

XM

UN

LP(1

637)

OR

GA

NIZ

ER(1

531)

ORGANIZER (1534) - ≫XMUNLP (1637) ≫

OR

GA

NIZ

ER(1

533)

OR

GA

NIZ

ER(1

528)

XMUNLP (1636) ≫ ≫ORGANIZER (1533) ≫

Table 18: Statistical significance testing of the RECIPE-TTL-JE (left) and RECIPE-TTL-EJ(right) Pairwise scores.

OR

GA

NIZ

ER(1

544)

OR

GA

NIZ

ER(1

539)


OR

GA

NIZ

ER(1

542)

OR

GA

NIZ

ER(1

537)


Table 19: Statistical significance testing of the RECIPE-ING-JE (left) and RECIPE-ING-EJ(right) Pairwise scores.

OR

GA

NIZ

ER(1

551)

OR

GA

NIZ

ER(1

548)


OR

GA

NIZ

ER(1

549)

OR

GA

NIZ

ER(1

546)


Table 20: Statistical significance testing of the RECIPE-STE-JE (left) and RECIPE-STE-EJ(right) Pairwise scores.

36

ASPEC-JESYSTEMDATA κOnline D 1333 0.230AIAYN 1736 0.204Kyoto-U 1717 0.217Kyoto-U 1733 0.204TMU 1695 0.188TMU 1703 0.191NTT 1616 0.201NTT 1681 0.173NICT-2 1476 0.274NICT-2 1480 0.257CUNI 1665 0.241ave. 0.216

ASPEC-EJSYSTEM DATA κOnline A 1334 0.290AIAYN 1737 0.338Kyoto-U 1731 0.321TMU 1704 0.269TMU 1709 0.260NTT 1684 0.353NTT 1729 0.341NICT-2 1475 0.315NICT-2 1479 0.395UT-IIS 1710 0.305NAIST-NICT 1506 0.301NAIST-NICT 1507 0.339ave. 0.319

ASPEC-JCSYSTEMDATA κOnline D 1336 0.189AIAYN 1738 0.183Kyoto-U 1642 0.159Kyoto-U 1722 0.128TMU 1743 0.171NICT-2 1478 0.222NICT-2 1483 0.194ave. 0.178

ASPEC-CJSYSTEMDATA κOnline A 1342 0.215AIAYN 1740 0.310Kyoto-U 1577 0.284Kyoto-U 1720 0.254NICT-2 1477 0.191NICT-2 1481 0.279ave. 0.255

JPC-JESYSTEMDATA κOnline A 1338 0.424JAPIO 1574 0.280JAPIO 1578 0.296CUNI 1666 0.249u-tkb 1472 0.380ave. 0.326

JPC-EJSYSTEMDATA κOnline A 1339 0.410EHR 1406 0.364EHR 1407 0.385JAPIO 1454 0.409JAPIO 1462 0.280u-tkb 1470 0.349ave. 0.366

JPC-JCSYSTEMDATA κOnline A 1340 0.185u-tkb 1465 0.176ave. 0.180

JPC-CJSYSTEMDATA κOnline A 1341 0.194EHR 1408 0.201EHR 1414 0.170JAPIO 1447 0.257JAPIO 1484 0.247u-tkb 1468 0.172ave. 0.207

JPC-KJSYSTEMDATA κOnline A 1344 0.257EHR 1416 0.413EHR 1417 0.459JAPIO 1448 0.224JAPIO 1450 0.235ave. 0.317

IITBC-HESYSTEM DATA κXMUNLP 1511 0.376IITB-MTG 1726 0.626ave. 0.501

IITBC-EHSYSTEM DATA κXMUNLP 1576 0.269IITB-MTG 1725 0.371ave. 0.320

JIJI-JESYSTEM DATA κHiero 1396 0.117Online A 1523 0.035RBMT B 1526 0.004NTT 1599 0.095NTT 1677 0.077NICT-2 1473 0.078NICT-2 1474 0.064XMUNLP 1442 0.070CUNI 1668 0.060ave. 0.067

JIJI-EJSYSTEM DATA κHiero 1395 0.104RBMT A 1514 0.167Online A 1518 0.179NTT 1603 0.189NTT 1679 0.155XMUNLP 1443 0.151ave. 0.157

RECIPE-TTL-JESYSTEM DATA κRBMT B 1531 0.305Online A 1534 0.333XMUNLP 1637 0.366ave. 0.334

RECIPE-TTL-EJSYSTEM DATA κRBMT B 1528 0.340Online B 1533 0.356XMUNLP 1636 0.341ave. 0.345

RECIPE-STE-JESYSTEM DATA κRBMT B 1548 0.290Online A 1551 0.289XMUNLP 1632 0.261ave. 0.280

RECIPE-STE-EJSYSTEM DATA κRBMT B 1546 0.108Online A 1549 0.138XMUNLP 1633 0.162ave. 0.136

RECIPE-ING-JESYSTEM DATA κRBMT B 1539 0.537Online B 1544 0.551XMUNLP 1635 0.614ave. 0.567

RECIPE-ING-EJSYSTEM DATA κRBMT B 1537 0.665Online B 1542 0.515XMUNLP 1634 0.618ave. 0.599

Table 21: The Fleiss’ kappa values for the pairwise evaluation results.

37

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

SMT

Hie

ro2

SMT

NO

18.7

20.

6510

660.

5888

80+

7.75

SMT

Phra

se6

SMT

NO

18.4

50.

6451

370.

5909

50—

–SM

TS2

T9

SMT

NO

20.3

60.

6782

530.

5934

10+

25.5

0O

nlin

eD

(201

4)35

Oth

erY

ES15

.08

0.64

3588

0.56

4170

+13

.75

RBM

TE

76O

ther

YES

14.8

20.

6638

510.

5616

20—

–R

BMT

F79

Oth

erY

ES13

.86

0.66

1387

0.55

6840

—–

Onl

ine

C(2

014)

87O

ther

YES

10.6

40.

6248

270.

4664

80—

–R

BMT

D(2

014)

96O

ther

YES

15.2

90.

6833

780.

5516

90+

23.0

0O

nlin

eD

(201

5)77

5O

ther

YES

16.8

50.

6766

090.

5622

70+

0.25

SMT

S2T

877

SMT

NO

20.3

60.

6782

530.

5934

10+

7.00

RBM

TD

(201

5)88

7O

ther

YES

15.2

90.

6833

780.

5516

90+

16.7

5O

nlin

eC

(201

5)89

2O

ther

YES

10.2

90.

6225

640.

4533

70—

–O

nlin

eD

(201

6)10

42O

ther

YES

16.9

10.

6774

120.

5642

70+

28.0

0O

nlin

eD

(201

6/11

)13

33N

MT

YES

22.0

40.

7334

830.

5843

90+

63.0

0A

IAY

N17

36N

MT

NO

28.0

60.

7675

770.

5955

80+

75.2

5K

yoto

-U1

1717

NM

TN

O27

.53

0.76

1403

0.58

5540

+77

.75

Kyo

to-U

217

33N

MT

NO

27.6

60.

7654

640.

5911

60+

74.5

0T

MU

116

95N

MT

NO

21.0

00.

7252

840.

5857

10+

56.7

5T

MU

217

03N

MT

NO

23.0

30.

7411

750.

5952

60+

61.0

0N

TT

116

16N

MT

NO

27.4

30.

7648

310.

5976

20+

75.0

0N

TT

216

81N

MT

NO

28.3

60.

7688

800.

5978

60+

77.2

5N

ICT

-21

1476

NM

TN

O24

.79

0.74

7335

0.57

4810

+68

.75

NIC

T-2

214

80N

MT

NO

26.7

60.

7413

290.

5781

50+

69.7

5C

UN

I116

65N

MT

NO

23.4

30.

7416

990.

5837

80+

66.0

0

Tabl

e22

:A

SPEC

-JE

subm

issio

ns

38

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

jum

anky

tea

mec

abju

man

kyte

am

ecab

jum

anky

tea

mec

abSM

TPh

rase

5SM

TN

O27

.48

29.8

028

.27

0.68

3735

0.69

1926

0.69

5390

0.73

6380

0.73

6380

0.73

6380

—–

SMT

T2S

12SM

TN

O31

.05

33.4

432

.10

0.74

8883

0.75

8031

0.76

0516

0.74

4370

0.74

4370

0.74

4370

+34

.25

Onl

ine

A(2

014)

34O

ther

YES

19.6

621

.63

20.1

70.

7180

190.

7234

860.

7258

480.

6954

200.

6954

200.

6954

20+

42.5

0R

BMT

B(2

014)

66O

ther

YES

13.1

814

.85

13.4

80.

6719

580.

6807

480.

6826

830.

6229

300.

6229

300.

6229

30+

0.75

RBM

TA

68O

ther

YES

12.8

614

.43

13.1

60.

6701

670.

6764

640.

6789

340.

6269

400.

6269

400.

6269

40—

–O

nlin

eB

(201

4)91

Oth

erY

ES17

.04

18.6

717

.36

0.68

7797

0.69

3390

0.69

8126

0.64

3070

0.64

3070

0.64

3070

—–

RBM

TC

95O

ther

YES

12.1

913

.32

12.1

40.

6683

720.

6726

450.

6760

180.

5943

800.

5943

800.

5943

80—

–SM

TH

iero

367

SMT

NO

30.1

932

.56

30.9

40.

7347

050.

7469

780.

7477

220.

7439

000.

7439

000.

7439

00+

31.5

0O

nlin

eA

(201

5)77

4O

ther

YES

18.2

219

.77

18.4

60.

7058

820.

7139

600.

7181

500.

6772

000.

6772

000.

6772

00+

34.2

5SM

TT

2S87

5SM

TN

O31

.05

33.4

432

.10

0.74

8883

0.75

8031

0.76

0516

0.74

4370

0.74

4370

0.74

4370

+30

.00

RBM

TB

(201

5)88

3O

ther

YES

13.1

814

.85

13.4

80.

6719

580.

6807

480.

6826

830.

6229

300.

6229

300.

6229

30+

9.75

Onl

ine

B(2

015)

889

Oth

erY

ES17

.80

19.5

218

.11

0.69

3359

0.70

1966

0.70

3859

0.64

6160

0.64

6160

0.64

6160

—–

Onl

ine

A(2

016)

1041

Oth

erY

ES18

.28

19.8

118

.51

0.70

6639

0.71

5222

0.71

8559

0.67

7020

0.67

7020

0.67

7020

+49

.75

Onl

ine

A(2

016/

11)

1334

NM

TY

ES26

.19

28.2

226

.68

0.77

6787

0.78

0217

0.78

2674

0.72

7040

0.72

7040

0.72

7040

+74

.00

AIA

YN

1737

NM

TN

O40

.79

42.5

541

.50

0.84

4896

0.84

7559

0.85

1471

0.76

8630

0.76

8630

0.76

8630

+69

.75

Tabl

e23

:A

SPEC

-EJ

subm

issio

ns(O

rgan

izer

)

39

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

jum

anky

tea

mec

abju

man

kyte

am

ecab

jum

anky

tea

mec

abK

yoto

-U1

1731

NM

TN

O38

.72

40.6

539

.37

0.83

2472

0.83

5870

0.83

9646

0.75

4220

0.75

4220

0.75

4220

+69

.75

TM

U1

1704

NM

TN

O32

.65

35.0

533

.72

0.80

2262

0.80

9649

0.81

1057

0.74

0620

0.74

0620

0.74

0620

+50

.75

TM

U2

1709

NM

TN

O34

.05

36.6

935

.32

0.81

2926

0.81

8443

0.82

1563

0.74

4890

0.74

4890

0.74

4890

+56

.50

NT

T1

1684

NM

TN

O39

.80

42.2

740

.47

0.83

5806

0.83

9981

0.84

4326

0.75

7740

0.75

7740

0.75

7740

+72

.25

NT

T2

1729

NM

TN

O40

.32

42.8

040

.95

0.83

8594

0.84

1769

0.84

6486

0.76

2170

0.76

2170

0.76

2170

+75

.75

NIC

T-2

114

75N

MT

NO

36.8

538

.94

37.8

70.

8267

910.

8344

480.

8352

550.

7595

700.

7595

700.

7595

70+

62.0

0N

ICT

-22

1479

NM

TN

O40

.17

42.2

541

.17

0.84

2206

0.84

8170

0.84

9929

0.76

5580

0.76

5580

0.76

5580

+74

.75

UT

-IIS

117

10N

MT

NO

36.2

638

.93

37.0

60.

8278

910.

8320

540.

8363

940.

7469

100.

7469

100.

7469

10+

68.0

0N

AIS

T-N

ICT

115

06N

MT

NO

36.4

738

.54

37.3

00.

8219

890.

8272

250.

8301

160.

7633

100.

7633

100.

7633

10+

63.5

0N

AIS

T-N

ICT

215

07N

MT

NO

38.2

540

.29

39.0

50.

8344

920.

8393

210.

8423

370.

7704

800.

7704

800.

7704

80+

70.0

0

Tabl

e24

:A

SPEC

-EJ

subm

issio

ns(P

artic

ipan

ts)

40

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

kyte

ast

anfo

rd(c

tb)

stan

ford

(pku

)ky

tea

stan

ford

(ctb

)st

anfo

rd(p

ku)

kyte

ast

anfo

rd(c

tb)

stan

ford

(pku

)SM

TH

iero

3SM

TN

O27

.71

27.7

027

.35

0.80

9128

0.80

9561

0.81

1394

0.74

5100

0.74

5100

0.74

5100

+3.

75SM

TPh

rase

7SM

TN

O27

.96

28.0

127

.68

0.78

8961

0.79

0263

0.79

0937

0.74

9450

0.74

9450

0.74

9450

—–

SMT

S2T

10SM

TN

O28

.65

28.6

528

.35

0.80

7606

0.80

9457

0.80

8417

0.75

5230

0.75

5230

0.75

5230

+14

.00

Onl

ine

D(2

014)

37O

ther

YES

9.37

8.93

8.84

0.60

6905

0.60

6328

0.60

4149

0.62

5430

0.62

5430

0.62

5430

-14.

50O

nlin

eC

(201

4)21

6O

ther

YES

7.26

7.01

6.72

0.61

2808

0.61

3075

0.61

1563

0.58

7820

0.58

7820

0.58

7820

—–

RBM

TB

(201

4)24

3R

BMT

NO

17.8

617

.75

17.4

90.

7448

180.

7458

850.

7437

940.

6679

600.

6679

600.

6679

60-2

0.00

RBM

TC

244

RBM

TN

O9.

629.

969.

590.

6422

780.

6487

580.

6453

850.

5949

000.

5949

000.

5949

00—

–O

nlin

eD

(201

5)77

7O

ther

YES

10.7

310

.33

10.0

80.

6604

840.

6608

470.

6604

820.

6340

900.

6340

900.

6340

90-1

4.75

SMT

S2T

881

SMT

NO

28.6

528

.65

28.3

50.

8076

060.

8094

570.

8084

170.

7552

300.

7552

300.

7552

30+

7.75

RBM

TB

(201

5)88

6O

ther

YES

17.8

617

.75

17.4

90.

7448

180.

7458

850.

7437

940.

6679

600.

6679

600.

6679

60-1

1.00

Onl

ine

C(2

015)

891

Oth

erY

ES7.

447.

056.

750.

6119

640.

6150

480.

6121

580.

5660

600.

5660

600.

5660

60—

–O

nlin

eD

(201

6)10

45O

ther

YES

11.1

610

.72

10.5

40.

6651

850.

6673

820.

6669

530.

6394

400.

6394

400.

6394

40-2

6.00

Onl

ine

D(2

016/

11)

1336

NM

TY

ES15

.94

15.6

815

.38

0.72

8453

0.72

8270

0.72

8284

0.67

3730

0.67

3730

0.67

3730

+17

.75

AIA

YN

1738

NM

TN

O34

.97

34.9

634

.72

0.85

0199

0.85

0052

0.84

8394

0.78

7250

0.78

7250

0.78

7250

+70

.50

Kyo

to-U

116

42N

MT

NO

35.6

735

.30

35.4

00.

8494

640.

8481

070.

8483

180.

7794

000.

7794

000.

7794

00+

71.5

0K

yoto

-U2

1722

NM

TN

O35

.31

35.3

735

.06

0.85

0103

0.84

9168

0.84

7879

0.78

5420

0.78

5420

0.78

5420

+72

.50

TM

U1

1743

NM

TN

O22

.92

22.8

622

.74

0.79

8681

0.79

8736

0.79

7969

0.70

0030

0.70

0030

0.70

0030

+4.

25N

ICT

-21

1478

NM

TN

O33

.72

33.6

433

.60

0.84

7223

0.84

6578

0.84

6158

0.77

9870

0.77

9870

0.77

9870

+67

.25

NIC

T-2

214

83N

MT

NO

35.2

335

.23

35.1

40.

8520

840.

8518

930.

8515

480.

7858

200.

7858

200.

7858

20+

69.5

0

Tabl

e25

:A

SPEC

-JC

subm

issio

ns

41

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

jum

anky

tea

mec

abju

man

kyte

am

ecab

jum

anky

tea

mec

abSM

TH

iero

4SM

TN

O35

.43

35.9

135

.64

0.81

0406

0.79

8726

0.80

7665

0.75

0950

0.75

0950

0.75

0950

+4.

75SM

TPh

rase

8SM

TN

O34

.65

35.1

634

.77

0.77

2498

0.76

6384

0.77

1005

0.75

3010

0.75

3010

0.75

3010

—–

SMT

T2S

13SM

TN

O36

.52

37.0

736

.64

0.82

5292

0.82

0490

0.82

5025

0.75

4870

0.75

4870

0.75

4870

+16

.00

Onl

ine

A(2

014)

36O

ther

YES

11.6

313

.21

11.8

70.

5959

250.

5981

720.

5985

730.

6580

600.

6580

600.

6580

60-2

1.75

Onl

ine

B(2

014)

215

Oth

erY

ES10

.48

11.2

610

.47

0.60

0733

0.59

6006

0.60

0706

0.63

6930

0.63

6930

0.63

6930

—–

RBM

TA

(201

4)23

9R

BMT

NO

9.37

9.87

9.35

0.66

6277

0.65

2402

0.66

1730

0.62

6070

0.62

6070

0.62

6070

-37.

75R

BMT

D24

2R

BMT

NO

8.39

8.70

8.30

0.64

1189

0.62

6400

0.63

3319

0.58

6790

0.58

6790

0.58

6790

—–

Onl

ine

A(2

015)

776

Oth

erY

ES11

.53

12.8

211

.68

0.58

8285

0.59

0393

0.59

2887

0.64

9860

0.64

9860

0.64

9860

-19.

00SM

TT

2S87

9SM

TN

O36

.52

37.0

736

.64

0.82

5292

0.82

0490

0.82

5025

0.75

4870

0.75

4870

0.75

4870

+17

.25

RBM

TA

(201

5)88

5O

ther

YES

9.37

9.87

9.35

0.66

6277

0.65

2402

0.66

1730

0.62

6070

0.62

6070

0.62

6070

-28.

00O

nlin

eB

(201

5)89

0O

ther

YES

10.4

111

.03

10.3

60.

5973

550.

5928

410.

5972

980.

6282

900.

6282

900.

6282

90—

–O

nlin

eA

(201

6)10

43O

ther

YES

11.5

612

.87

11.6

90.

5898

020.

5893

970.

5933

610.

6595

400.

6595

400.

6595

40-5

1.25

Onl

ine

A(2

016/

11)

1342

NM

TY

ES18

.75

20.6

419

.04

0.71

9022

0.71

7173

0.72

0095

0.69

2820

0.69

2820

0.69

2820

+22

.50

AIA

YN

1740

NM

TN

O46

.87

47.3

047

.00

0.88

0815

0.87

5511

0.88

0368

0.79

8110

0.79

8110

0.79

8110

+78

.50

Kyo

to-U

115

77N

MT

NO

48.4

348

.84

48.5

10.

8834

570.

8789

640.

8841

370.

7995

200.

7995

200.

7995

20+

79.5

0K

yoto

-U2

1720

NM

TN

O48

.34

48.7

648

.40

0.88

4210

0.88

0069

0.88

4745

0.79

9840

0.79

9840

0.79

9840

+82

.75

NIC

T-2

114

77N

MT

NO

44.2

644

.90

44.5

00.

8714

380.

8683

590.

8717

360.

7889

400.

7889

400.

7889

40+

78.0

0N

ICT

-22

1481

NM

TN

O46

.84

47.5

147

.27

0.88

2356

0.87

8580

0.88

2195

0.79

9680

0.79

9680

0.79

9680

+79

.00

Tabl

e26

:A

SPEC

-CJ

subm

issio

ns

42

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

SMT

Phra

se97

7SM

TN

O30

.80

0.73

0056

0.66

4830

—–

SMT

Hie

ro97

9SM

TN

O32

.23

0.76

3030

0.67

2500

+8.

75SM

TS2

T98

0SM

TN

O34

.40

0.79

3483

0.67

2760

+23

.00

Onl

ine

A(2

016)

1035

Oth

erY

ES35

.77

0.80

3661

0.67

3950

+32

.25

Onl

ine

B(2

016)

1051

Oth

erY

ES16

.00

0.68

8004

0.48

6450

—–

RBM

TC

(201

6)10

88O

ther

YES

21.0

00.

7550

170.

5192

10—

–R

BMT

A(2

016)

1090

Oth

erY

ES21

.57

0.75

0381

0.52

1230

+23

.75

RBM

TB

(201

6)10

95O

ther

YES

18.3

80.

7109

920.

5181

10—

–O

nlin

eA

(201

6/11

)13

38N

MT

YES

49.3

50.

8783

420.

7225

90+

71.5

0JA

PIO

115

74N

MT

YES

49.0

00.

8782

980.

7247

10+

68.5

0JA

PIO

215

78N

MT

YES

48.0

80.

8730

930.

7155

60+

67.0

0C

UN

I116

66SM

TN

O38

.29

0.83

7425

0.68

1520

+58

.00

u-tk

b1

1472

NM

TN

O37

.31

0.84

1136

0.69

7290

+51

.50

Tabl

e27

:JP

C-J

Esu

bmiss

ions

43

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

jum

anky

tea

mec

abju

man

kyte

am

ecab

jum

anky

tea

mec

abSM

TPh

rase

973

SMT

NO

32.3

634

.26

32.5

20.

7285

390.

7282

810.

7290

770.

7119

000.

7119

000.

7119

00—

–SM

TH

iero

974

SMT

NO

34.5

736

.61

34.7

90.

7777

590.

7786

570.

7790

490.

7153

000.

7153

000.

7153

00+

21.0

0SM

TT

2S97

5SM

TN

O35

.60

37.6

535

.82

0.79

7353

0.79

6783

0.79

8025

0.71

7030

0.71

7030

0.71

7030

+30

.75

Onl

ine

A(2

016)

1036

Oth

erY

ES36

.88

37.8

936

.83

0.79

8168

0.79

2471

0.79

6308

0.71

9110

0.71

9110

0.71

9110

+20

.00

Onl

ine

B(2

016)

1073

Oth

erY

ES21

.57

22.6

221

.65

0.74

3083

0.73

5203

0.74

0962

0.65

9950

0.65

9950

0.65

9950

—–

RBM

TD

(201

6)10

85O

ther

YES

23.0

224

.90

23.4

50.

7612

240.

7573

410.

7603

250.

6477

300.

6477

300.

6477

30—

–R

BMT

F(2

016)

1086

Oth

erY

ES26

.64

28.4

826

.84

0.77

3673

0.76

9244

0.77

3344

0.67

5470

0.67

5470

0.67

5470

+12

.75

RBM

TE

(201

6)10

87O

ther

YES

21.3

523

.17

21.5

30.

7434

840.

7419

850.

7423

000.

6469

300.

6469

300.

6469

30—

–O

nlin

eA

(201

6/11

)13

39N

MT

YES

50.6

051

.65

50.8

30.

8793

820.

8773

360.

8783

160.

7704

800.

7704

800.

7704

80+

48.5

0EH

R1

1406

NM

TN

O44

.44

45.5

944

.15

0.86

0998

0.85

8466

0.86

0659

0.74

7050

0.74

7050

0.74

7050

+58

.25

EHR

214

07N

MT

NO

44.6

345

.94

44.5

30.

8667

220.

8642

560.

8662

050.

7477

700.

7477

700.

7477

70+

60.0

0JA

PIO

114

54N

MT

YES

50.2

751

.23

50.1

70.

8864

030.

8834

810.

8857

470.

7767

900.

7767

900.

7767

90+

56.2

5JA

PIO

214

62SM

TY

ES51

.79

52.2

351

.75

0.86

4038

0.86

1596

0.86

2200

0.78

1150

0.78

1150

0.78

1150

+41

.00

u-tk

b1

1470

NM

TN

O38

.91

41.1

239

.11

0.84

5815

0.84

6888

0.84

5551

0.73

4010

0.73

4010

0.73

4010

+49

.50

Tabl

e28

:JP

C-E

Jsu

bmiss

ions

44

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

kyte

ast

anfo

rd(c

tb)

stan

ford

(pku

)ky

tea

stan

ford

(ctb

)st

anfo

rd(p

ku)

kyte

ast

anfo

rd(c

tb)

stan

ford

(pku

)SM

TPh

rase

966

SMT

NO

30.6

032

.03

31.2

50.

7873

210.

7978

880.

7943

880.

7109

400.

7109

400.

7109

40—

–SM

TH

iero

967

SMT

NO

30.2

631

.57

30.9

10.

7884

150.

7991

180.

7966

850.

7183

600.

7183

600.

7183

60+

4.75

SMT

S2T

968

SMT

NO

31.0

532

.35

31.7

00.

7938

460.

8028

050.

8008

480.

7200

300.

7200

300.

7200

30+

4.25

Onl

ine

A(2

016)

1038

Oth

erY

ES23

.02

23.5

723

.29

0.75

4241

0.76

0672

0.76

0148

0.70

2350

0.70

2350

0.70

2350

-23.

00O

nlin

eB

(201

6)10

69O

ther

YES

9.42

9.59

8.79

0.64

2026

0.65

1070

0.64

3520

0.52

7180

0.52

7180

0.52

7180

—–

RBM

TC

(201

6)11

18O

ther

YES

12.3

513

.72

13.1

70.

6882

400.

7086

810.

7002

100.

4754

300.

4754

300.

4754

30-4

1.25

Onl

ine

A(2

016/

11)

1340

NM

TY

ES33

.04

33.9

233

.34

0.82

4829

0.82

9122

0.82

9067

0.73

5470

0.73

5470

0.73

5470

+32

.50

u-tk

b1

1465

NM

TN

O31

.80

33.1

932

.83

0.81

9791

0.82

6055

0.82

5025

0.70

6720

0.70

6720

0.70

6720

+21

.75

Tabl

e29

:JP

C-J

Csu

bmiss

ions

45

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

jum

anky

tea

mec

abju

man

kyte

am

ecab

jum

anky

tea

mec

abSM

TH

iero

430

SMT

NO

39.2

239

.52

39.1

40.

8060

580.

8020

590.

8045

230.

7293

700.

7293

700.

7293

70—

–SM

TPh

rase

431

SMT

NO

38.3

438

.51

38.2

20.

7820

190.

7789

210.

7814

560.

7231

100.

7231

100.

7231

10—

–SM

TT

2S43

2SM

TN

O39

.39

39.9

039

.39

0.81

4919

0.81

1350

0.81

3595

0.72

5920

0.72

5920

0.72

5920

+20

.75

Onl

ine

A(2

015)

647

Oth

erY

ES26

.80

27.8

126

.89

0.71

2242

0.70

7264

0.71

1273

0.69

3840

0.69

3840

0.69

3840

-7.0

0O

nlin

eB

(201

5)64

8O

ther

YES

12.3

312

.72

12.4

40.

6489

960.

6412

550.

6487

420.

5883

800.

5883

800.

5883

80—

–R

BMT

A(2

015)

759

RBM

TN

O10

.49

10.7

210

.35

0.67

4060

0.66

4098

0.66

7349

0.55

7130

0.55

7130

0.55

7130

-39.

25R

BMT

B76

0R

BMT

NO

7.94

8.07

7.73

0.59

6200

0.58

1837

0.58

6941

0.50

2100

0.50

2100

0.50

2100

—–

Onl

ine

A(2

016)

1040

Oth

erY

ES26

.99

27.9

127

.02

0.70

7739

0.70

2718

0.70

6707

0.69

3720

0.69

3720

0.69

3720

-19.

75O

nlin

eA

(201

6/11

)13

41N

MT

YES

42.6

643

.76

42.9

50.

8458

580.

8449

180.

8457

940.

7472

400.

7472

400.

7472

40+

54.2

5EH

R1

1408

NM

TN

O47

.08

47.4

446

.83

0.85

9070

0.85

6376

0.85

8888

0.75

6350

0.75

6350

0.75

6350

+68

.25

EHR

214

14N

MT

NO

46.5

247

.17

46.3

50.

8596

190.

8567

840.

8583

530.

7613

700.

7613

700.

7613

70+

69.7

5JA

PIO

114

47SM

TY

ES50

.52

51.2

550

.57

0.84

7793

0.84

3774

0.84

6081

0.77

4660

0.77

4660

0.77

4660

+60

.50

JAPI

O2

1484

NM

TY

ES50

.06

50.5

150

.00

0.87

5398

0.87

3390

0.87

4822

0.77

9420

0.77

9420

0.77

9420

+80

.25

u-tk

b1

1468

NM

TN

O38

.79

40.4

738

.99

0.83

2144

0.83

3610

0.83

1209

0.72

9580

0.72

9580

0.72

9580

+55

.50

Tabl

e30

:JP

C-C

Jsu

bmiss

ions

46

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

jum

anky

tea

mec

abju

man

kyte

am

ecab

jum

anky

tea

mec

abSM

TPh

rase

438

SMT

NO

69.2

270

.36

69.7

30.

9413

020.

9397

290.

9407

560.

8562

200.

8562

200.

8562

20—

–SM

TH

iero

439

SMT

NO

67.4

168

.65

68.0

00.

9371

620.

9359

030.

9365

700.

8505

600.

8505

600.

8505

60+

2.75

Onl

ine

B(2

015)

651

Oth

erY

ES36

.41

38.7

237

.01

0.85

1745

0.85

2263

0.85

1945

0.72

8750

0.72

8750

0.72

8750

—–

Onl

ine

A(2

015)

652

Oth

erY

ES55

.05

56.8

455

.46

0.90

9152

0.90

9385

0.90

8838

0.80

0460

0.80

0460

0.80

0460

+38

.75

RBM

TA

(201

5)65

3O

ther

YES

42.0

043

.97

42.4

50.

8763

960.

8737

340.

8751

460.

7120

200.

7120

200.

7120

20-7

.25

RBM

TB

654

Oth

erY

ES34

.74

37.5

135

.54

0.84

5712

0.84

9014

0.84

6228

0.64

3150

0.64

3150

0.64

3150

—–

Onl

ine

A(2

015)

963

Oth

erY

ES55

.05

56.8

455

.46

0.90

9152

0.90

9385

0.90

8838

0.80

0610

0.80

0610

0.80

0610

—–

RBM

TA

(201

5)96

4O

ther

YES

42.0

043

.97

42.4

50.

8763

960.

8737

340.

8751

460.

7127

000.

7127

000.

7127

00—

–O

nlin

eA

(201

6)10

39O

ther

YES

54.7

856

.68

55.1

40.

9073

200.

9076

520.

9067

430.

7987

500.

7987

500.

7987

50+

8.00

Onl

ine

A(2

016/

11)

1344

NM

TN

O44

.42

45.1

444

.72

0.85

7642

0.85

4158

0.85

7083

0.78

3850

0.78

3850

0.78

3850

-55.

75EH

R1

1416

NM

TN

O71

.52

72.3

471

.82

0.94

4516

0.94

2940

0.94

4219

0.86

6060

0.86

6060

0.86

6060

+6.

25EH

R2

1417

NM

TN

O71

.36

72.2

671

.65

0.94

6126

0.94

4812

0.94

5888

0.87

1110

0.87

1110

0.87

1110

+11

.25

JAPI

O1

1448

SMT

YES

73.0

073

.71

73.2

30.

9468

800.

9457

540.

9466

450.

8725

100.

8725

100.

8725

10+

48.7

5JA

PIO

214

50SM

TY

ES73

.00

73.7

373

.25

0.94

6985

0.94

5841

0.94

6745

0.87

3200

0.87

3200

0.87

3200

+48

.50

Tabl

e31

:JP

C-K

Jsu

bmiss

ions

47

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

Onl

ine

A(2

016)

1031

Oth

erY

ES21

.37

0.71

4537

0.62

1100

+44

.75

Onl

ine

B(2

016)

1048

Oth

erY

ES15

.58

0.68

3214

0.59

0520

+14

.00

SMT

Phra

se10

54SM

TN

O10

.32

0.63

8090

0.57

4850

0.00

XM

UN

LP1

1511

NM

TN

O22

.44

0.75

0921

0.62

9530

+68

.25

IIT

B-M

TG

117

26N

MT

NO

11.5

50.

6829

020.

5570

40+

21.0

0

Tabl

e32

:II

TB-

HE

subm

issio

ns

SYST

EMID

IDM

ETH

OD

OT

HER

RES

OU

RC

ESBL

EUR

IBES

AM

FMPa

irO

nlin

eA

(201

6)10

32O

ther

YES

18.7

2000

00.

7167

880.

6706

60+

57.2

5O

nlin

eB

(201

6)10

47O

ther

YES

16.9

7000

00.

6912

980.

6684

50+

42.5

0SM

TPh

rase

1252

SMT

NO

10.7

9000

00.

6511

660.

6608

60—

–X

MU

NLP

115

76N

MT

NO

21.3

9000

00.

7496

600.

6887

70+

64.5

0II

TB-

MT

G1

1725

NM

TN

O12

.230

000

0.68

8606

0.62

4780

+28

.75

Tabl

e33

:II

TB-

EHsu

bmiss

ions

48

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

SMT

Phra

se13

94SM

TN

O15

.11

0.55

4550

0.47

5740

—–

SMT

Hie

ro13

96SM

TN

O15

.67

0.55

8225

0.47

0610

+10

.25

SMT

S2T

1398

SMT

NO

14.5

40.

5567

280.

4771

70—

–O

NLI

NE-

A1

1523

NM

TN

O8.

190.

5298

440.

4508

50+

70.0

0R

BMT

-A15

25R

BMT

NO

4.36

0.47

2312

0.39

1050

—–

RBM

T-B

1526

RBM

TN

O4.

670.

4757

600.

3856

00+

51.7

5N

TT

115

99N

MT

NO

19.4

40.

6388

410.

4762

00+

32.0

0N

TT

216

77N

MT

NO

20.9

00.

6489

310.

4743

60+

26.7

5N

ICT

-21

1473

NM

TN

O16

.52

0.64

2379

0.45

9000

+0.

25N

ICT

-22

1474

NM

TN

O18

.19

0.63

2638

0.45

3420

+7.

25X

MU

NLP

114

42N

MT

NO

17.9

50.

6370

590.

4657

10+

20.7

5C

UN

I116

68SM

TN

O10

.67

0.56

4797

0.43

2700

-24.

00

Tabl

e34

:JI

JI-J

Esu

bmiss

ions

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

jum

anky

tea

mec

abju

man

kyte

am

ecab

jum

anky

tea

mec

abSM

TPh

rase

1393

SMT

NO

15.7

716

.65

15.7

60.

5802

840.

5848

920.

5854

370.

5452

400.

5452

400.

5452

40—

–SM

TH

iero

1395

SMT

NO

16.2

216

.95

16.2

20.

5949

230.

6015

050.

6025

160.

5502

600.

5502

600.

5502

60+

10.2

5SM

TT

2S13

97SM

TN

O14

.95

15.3

814

.79

0.59

4072

0.59

7791

0.59

9530

0.53

0370

0.53

0370

0.53

0370

—–

RBM

T-A

1514

RBM

TN

O5.

316.

685.

690.

5052

270.

5150

500.

5135

800.

4739

400.

4739

400.

4739

40+

31.2

5R

BMT

-B15

15R

BMT

NO

4.72

5.98

4.97

0.51

8416

0.53

1603

0.53

2079

0.48

7320

0.48

7320

0.48

7320

—–

ON

LIN

E-A

115

18N

MT

NO

11.2

913

.12

11.8

40.

5974

730.

6055

320.

6033

740.

5331

200.

5331

200.

5331

20+

69.7

5N

TT

116

03N

MT

NO

19.1

320

.47

19.4

10.

6685

170.

6709

200.

6765

940.

5369

700.

5369

700.

5369

70+

14.5

0N

TT

216

79N

MT

NO

20.3

721

.82

20.6

80.

6805

980.

6840

480.

6888

630.

5378

000.

5378

000.

5378

00+

17.7

5X

MU

NLP

114

43N

MT

NO

19.6

120

.72

20.1

40.

6841

200.

6884

970.

6910

560.

5463

600.

5463

600.

5463

60+

11.7

5

Tabl

e35

:JI

JI-E

Jsu

bmiss

ions

49

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

RBM

T-A

1538

RBM

TN

O11

.14

0.48

4800

0.61

3950

—–

RBM

T-B

1539

RBM

TN

O12

.52

0.52

0800

0.60

7190

-24.

75O

NLI

NE-

B1

1544

NM

TN

O20

.33

0.56

3419

0.65

6630

-15.

50SM

TPh

rase

1571

SMT

NO

44.4

20.

8301

050.

8590

40—

–X

MU

NLP

116

35N

MT

NO

46.9

80.

8312

610.

8549

70+

3.50

Tabl

e36

:R

ECIP

EIN

G-J

Esu

bmiss

ions

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

jum

anky

tea

mec

abju

man

kyte

am

ecab

jum

anky

tea

mec

abR

BMT

-A15

36R

BMT

NO

4.52

4.74

4.29

0.36

5913

0.37

1584

0.35

5798

0.42

6550

0.42

6550

0.42

6550

—–

RBM

T-B

1537

RBM

TN

O5.

444.

955.

070.

3852

690.

3633

440.

3727

930.

4453

700.

4453

700.

4453

70-4

8.50

ON

LIN

E-B

115

42N

MT

NO

15.5

714

.89

14.8

50.

5480

260.

5445

810.

5371

470.

6180

100.

6180

100.

6180

10-2

4.25

SMT

Phra

se15

70SM

TN

O31

.39

30.6

129

.60

0.74

9305

0.74

0283

0.74

1249

0.77

5770

0.77

5770

0.77

5770

—–

XM

UN

LP1

1634

NM

TN

O34

.88

34.2

633

.19

0.74

7521

0.74

2770

0.73

9909

0.77

8530

0.77

8530

0.77

8530

-3.7

5

Tabl

e37

:R

ECIP

EIN

G-E

Jsu

bmiss

ions

50

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

RBM

T-A

1547

RBM

TN

O5.

370.

5466

420.

3159

30—

–R

BMT

-B15

48R

BMT

NO

5.82

0.56

5086

0.26

8580

-60.

25O

NLI

NE-

A1

1551

NM

TN

O11

.04

0.67

0484

0.41

5880

+2.

75SM

TPh

rase

1569

SMT

NO

22.8

40.

7055

060.

5952

90—

–X

MU

NLP

116

32N

MT

NO

28.0

30.

7842

350.

5980

50+

40.5

0

Tabl

e38

:R

ECIP

EST

E-JE

subm

issio

ns

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

jum

anky

tea

mec

abju

man

kyte

am

ecab

jum

anky

tea

mec

abR

BMT

-A15

45R

BMT

NO

3.06

4.55

3.43

0.51

8467

0.53

3326

0.52

1713

0.36

8710

0.36

8710

0.36

8710

—–

RBM

T-B

1546

RBM

TN

O3.

305.

194.

000.

5251

160.

5329

190.

5291

130.

4411

800.

4411

800.

4411

80-6

5.50

ON

LIN

E-A

115

49N

MT

NO

8.14

11.2

38.

970.

6238

650.

6358

880.

6290

120.

5266

800.

5266

800.

5266

80-2

9.25

SMT

Phra

se15

68SM

TN

O17

.60

21.4

318

.53

0.69

4179

0.69

8499

0.69

5400

0.62

6610

0.62

6610

0.62

6610

—–

XM

UN

LP1

1633

NM

TN

O22

.55

26.8

724

.00

0.77

6539

0.77

6469

0.77

5689

0.64

5050

0.64

5050

0.64

5050

+45

.50

Tabl

e39

:R

ECIP

EST

E-EJ

subm

issio

ns

51

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

RBM

T-A

1530

RBM

TN

O0.

530.

0863

780.

4333

80—

–R

BMT

-B15

31R

BMT

NO

0.56

0.10

0899

0.44

5400

-25.

75O

NLI

NE-

A1

1534

NM

TN

O2.

190.

1993

380.

5094

70+

10.2

5SM

TPh

rase

1567

SMT

NO

9.72

0.45

1707

0.57

1230

—–

XM

UN

LP1

1637

NM

TN

O15

.57

0.52

6993

0.54

2690

+10

.25

Tabl

e40

:R

ECIP

ETT

L-JE

subm

issio

ns

SYST

EMID

IDM

ETH

OD

OT

HER

BLEU

RIB

ESA

MFM

Pair

jum

anky

tea

mec

abju

man

kyte

am

ecab

jum

anky

tea

mec

abR

BMT

-A15

27R

BMT

NO

0.00

0.00

0.00

0.12

8134

0.13

7884

0.12

2371

0.18

7540

0.18

7540

0.18

7540

—–

RBM

T-B

1528

RBM

TN

O2.

572.

302.

080.

2991

330.

3005

780.

2704

820.

4028

300.

4028

300.

4028

30-5

0.25

ON

LIN

E-B

115

33N

MT

NO

16.1

615

.85

15.4

00.

5737

710.

5591

420.

5321

300.

5904

400.

5904

400.

5904

40+

3.75

SMT

Phra

se15

66SM

TN

O17

.16

16.2

316

.57

0.60

0503

0.57

6617

0.54

8811

0.57

1650

0.57

1650

0.57

1650

—–

XM

UN

LP1

1636

NM

TN

O19

.41

18.8

718

.78

0.59

2087

0.57

3466

0.55

8997

0.58

4980

0.58

4980

0.58

4980

+23

.75

Tabl

e41

:R

ECIP

ETT

L-EJ

subm

issio

ns

52

ReferencesRafael E. Banchs, Luis F. D’Haro, and Haizhou

Li. 2015. Adequacy-fluency metrics: Evaluat-ing mt in the continuous space model frame-work. IEEE/ACM Trans. Audio, Speech andLang. Proc., 23(3):472–482.

Jacob Cohen. 1968. Weighted kappa: Nominalscale agreement with provision for scaled dis-agreement or partial credit. Psychological Bul-letin, 70(4):213 – 220.

Fabien Cromieres, Raj Dabre, Toshiaki Nakazawa,and Sadao Kurohashi. 2017. Kyoto universityparticipation to wat 2017. In Proceedings of the4th Workshop on Asian Translation (WAT2017),pages 146–153, Taipei, Taiwan. Asian Federa-tion of Natural Language Processing.

Terumasa Ehara. 2017. Smt reranked nmt. In Pro-ceedings of the 4th Workshop on Asian Trans-lation (WAT2017), pages 119–126, Taipei, Tai-wan. Asian Federation of Natural Language Pro-cessing.

J.L. Fleiss et al. 1971. Measuring nominal scaleagreement among many raters. PsychologicalBulletin, 76(5):378–382.

Kenneth Heafield, Ivan Pouzyrevsky, Jonathan H.Clark, and Philipp Koehn. 2013. Scalable mod-ified kneser-ney language model estimation. InProceedings of the 51st Annual Meeting of theAssociation for Computational Linguistics (Vol-ume 2: Short Papers), pages 690–696, Sofia,Bulgaria. Association for Computational Lin-guistics.

Hieu Hoang, Philipp Koehn, and Adam Lopez.2009. A unified framework for phrase-based,hierarchical, and syntax-based statistical ma-chine translation. In Proceedings of the Inter-national Workshop on Spoken Language Trans-lation, pages 152–159.

Kenji Imamura and Eiichiro Sumita. 2017. En-semble and reranking: Using multiple models inthe nict-2 neural machine translation system atwat2017. In Proceedings of the 4th Workshopon Asian Translation (WAT2017), pages 127–134, Taipei, Taiwan. Asian Federation of Natu-ral Language Processing.

Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Kat-suhito Sudoh, and Hajime Tsukada. 2010. Au-tomatic evaluation of translation quality for dis-tant language pairs. In Proceedings of the 2010Conference on Empirical Methods in NaturalLanguage Processing, EMNLP ’10, pages 944–952, Stroudsburg, PA, USA. Association forComputational Linguistics.

Satoshi Kinoshita, Tadaaki Oshio, and TomoharuMitsuhashi. 2017. Comparison of smt and nmttrained with large patent corpora: Japio at

wat2017. In Proceedings of the 4th Workshopon Asian Translation (WAT2017), pages 140–145, Taipei, Taiwan. Asian Federation of Natu-ral Language Processing.

Tom Kocmi, Dušan Variš, and Ondřej Bojar. 2017.Cuni nmt system for wat 2017 translation tasks.In Proceedings of the 4th Workshop on AsianTranslation (WAT2017), pages 154–159, Taipei,Taiwan. Asian Federation of Natural LanguageProcessing.

Philipp Koehn. 2004. Statistical significance testsfor machine translation evaluation. In Proceed-ings of EMNLP 2004, pages 388–395, Barcelona,Spain. Association for Computational Linguis-tics.

Philipp Koehn, Hieu Hoang, Alexandra Birch,Chris Callison-Burch, Marcello Federico, NicolaBertoldi, Brooke Cowan, Wade Shen, ChristineMoran, Richard Zens, Chris Dyer, Ondrej Bojar,Alexandra Constantin, and Evan Herbst. 2007.Moses: Open source toolkit for statistical ma-chine translation. In Annual Meeting of the As-sociation for Computational Linguistics (ACL),demonstration session.

T. Kudo. 2005. Mecab : Yet anotherpart-of-speech and morphological analyzer.http://mecab.sourceforge.net/.

Sadao Kurohashi, Toshihisa Nakamura, Yuji Mat-sumoto, and Makoto Nagao. 1994. Improve-ments of Japanese morphological analyzer JU-MAN. In Proceedings of The InternationalWorkshop on Sharable Natural Language, pages22–28.

Zi Long, Ryuichiro Kimura, Takehito Utsuro,Tomoharu Mitsuhashi, and Mikio Yamamoto.2017. Patent nmt integrated with large vocab-ulary phrase translation by smt at wat 2017.In Proceedings of the 4th Workshop on AsianTranslation (WAT2017), pages 110–118, Taipei,Taiwan. Asian Federation of Natural LanguageProcessing.

Yukio Matsumura and Mamoru Komachi. 2017.Tokyo metropolitan university neural machinetranslation system for wat 2017. In Proceed-ings of the 4th Workshop on Asian Transla-tion (WAT2017), pages 160–166, Taipei, Tai-wan. Asian Federation of Natural Language Pro-cessing.

Makoto Morishita, Jun Suzuki, and Masaaki Na-gata. 2017. Ntt neural machine translation sys-tems at wat 2017. In Proceedings of the 4thWorkshop on Asian Translation (WAT2017),pages 89–94, Taipei, Taiwan. Asian Federationof Natural Language Processing.

Toshiaki Nakazawa, Chenchen Ding, HideyaMINO, Isao Goto, Graham Neubig, and Sadao

53

Kurohashi. 2016. Overview of the 3rd work-shop on asian translation. In Proceedings of the3rd Workshop on Asian Translation (WAT2016),pages 1–46, Osaka, Japan. The COLING 2016Organizing Committee.

Toshiaki Nakazawa, Hideya Mino, Isao Goto,Sadao Kurohashi, and Eiichiro Sumita. 2014.Overview of the 1st Workshop on Asian Trans-lation. In Proceedings of the 1st Workshopon Asian Translation (WAT2014), pages 1–19,Tokyo, Japan.

Toshiaki Nakazawa, Hideya Mino, Isao Goto, Gra-ham Neubig, Sadao Kurohashi, and EiichiroSumita. 2015. Overview of the 2nd Workshopon Asian Translation. In Proceedings of the 2ndWorkshop on Asian Translation (WAT2015),pages 1–28, Kyoto, Japan.

Masato Neishi, Jin Sakuma, Satoshi Tohda, Shon-osuke Ishiwatari, Naoki Yoshinaga, and MasashiToyoda. 2017. A bag of useful tricks for prac-tical neural machine translation: Embeddinglayer initialization and large batch size. In Pro-ceedings of the 4th Workshop on Asian Transla-tion (WAT2017), pages 99–109, Taipei, Taiwan.Asian Federation of Natural Language Process-ing.

Graham Neubig, Yosuke Nakata, and ShinsukeMori. 2011. Pointwise prediction for robust,adaptable japanese morphological analysis. InProceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics: Hu-man Language Technologies: Short Papers - Vol-ume 2, HLT ’11, pages 529–533, Stroudsburg,PA, USA. Association for Computational Lin-guistics.

Yusuke Oda, Katsuhito Sudoh, Satoshi Nakamura,Masao Utiyama, and Eiichiro Sumita. 2017. Asimple and strong baseline: Naist-nict neuralmachine translation system for wat2017 english-japanese translation task. In Proceedings of the4th Workshop on Asian Translation (WAT2017),pages 135–139, Taipei, Taiwan. Asian Federa-tion of Natural Language Processing.

Kishore Papineni, Salim Roukos, Todd Ward, andWei-Jing Zhu. 2002. Bleu: a method for au-tomatic evaluation of machine translation. InACL, pages 311–318.

Slav Petrov, Leon Barrett, Romain Thibaux, andDan Klein. 2006. Learning accurate, compact,and interpretable tree annotation. In Pro-ceedings of the 21st International Conferenceon Computational Linguistics and 44th AnnualMeeting of the Association for ComputationalLinguistics, pages 433–440, Sydney, Australia.Association for Computational Linguistics.

Sandhya Singh, Ritesh Panjwani, AnoopKunchukuttan, and Pushpak Bhattacharyya.

2017. Comparing recurrent and convolutionalarchitectures for english-hindi neural machinetranslation. In Proceedings of the 4th Work-shop on Asian Translation (WAT2017), pages167–170, Taipei, Taiwan. Asian Federation ofNatural Language Processing.

Huihsin Tseng. 2005. A conditional random fieldword segmenter. In In Fourth SIGHAN Work-shop on Chinese Language Processing.

Masao Utiyama and Hitoshi Isahara. 2007. Ajapanese-english patent parallel corpus. In MTsummit XI, pages 475–482.

Ashish Vaswani, Noam Shazeer, Niki Parmar,Jakob Uszkoreit, Llion Jones, Aidan N. Gomez,Lukasz Kaiser, and Illia Polosukhin. 2017. At-tention is all you need. CoRR, abs/1706.03762.

Boli Wang, Zhixing Tan, Jinming Hu, YidongChen, and xiaodong shi. 2017. Xmu neural ma-chine translation systems for wat 2017. In Pro-ceedings of the 4th Workshop on Asian Transla-tion (WAT2017), pages 95–98, Taipei, Taiwan.Asian Federation of Natural Language Process-ing.

54

pp2`pb2r q7 i?2 9i? qq`fb?qt qm bb m h` mbh ibqm · proceedings of the 4th workshop on asian...

Documents