pptphrase tagset mapping for french and english treebanks and its application in machine translation...
TRANSCRIPT
Phra
se T
agse
t Map
ping
for
Fren
ch a
nd E
nglis
hTr
eeba
nks a
nd It
s App
licat
ion
in M
achi
ne T
rans
latio
nEv
alua
tion
25th
Inte
rnat
iona
l Con
fere
nce,
GSC
L 20
13
Aar
on L
.-F. H
an, D
erek
F. W
ong,
Lid
ia S
. Cha
o, L
iang
ye H
e, S
huo
Li,
and
Ling
Zhu
Se
ptem
ber 2
5th
-27t
h, 2
013,
Dar
mst
adt,
Ger
man
y
Natu
ral L
angu
age
Proc
essin
g &
Por
tugu
ese-
Chin
ese
Mach
ine
Tran
slatio
n La
bora
tory
Dep
artm
ent o
f Com
pute
r and
Info
rmat
ion
Sci
ence
Uni
vers
ity o
f Mac
au
●B
ackg
roun
d of
lang
uage
Tre
eban
k●
Mot
ivat
ion
●D
esig
ned
phra
se ta
gset
map
ping
●A
pplic
atio
n in
MT
eval
uatio
n1.
Man
ual e
valu
atio
ns2.
Trad
ition
al a
utom
atic
MT
eval
uatio
n m
etho
ds3.
Des
igne
d un
supe
rvis
ed M
T ev
alua
tion
4.E
valu
atin
g th
e ev
alua
tion
met
hod
5.E
xper
imen
ts6.
Ope
n so
urce
cod
e●
Dis
cuss
ion
●Fu
rthe
r in
form
atio
n
Contents
1. B
ackg
roun
d of
lang
uage
Tre
eban
k
•To
pro
mot
e th
e de
velo
pmen
t of s
ynta
ctic
ana
lysi
s•
Man
y la
ngua
ge tr
eeba
nks a
re d
evel
oped
–En
glis
h Pe
nn T
reeb
ank
(Mar
cus e
t al.,
199
3; M
itche
ll et
al.,
1994
)–
Ger
man
Neg
ra T
reeb
ank
(Sku
t et a
l., 1
997)
–Fr
ench
Tre
eban
k (A
beill
é et
al.,
200
3)–
Chi
nese
Sin
ica
Tree
bank
(Che
n et
al.,
200
3)–
Etc.
1. B
ackg
roun
d of
lang
uage
Tre
eban
k
•Pr
oble
ms
–D
iffer
ent t
reeb
anks
use
thei
r ow
n sy
ntac
tic ta
gset
s–
The
num
ber o
f tag
s ran
ging
from
tens
(e.g
. Eng
lish
Penn
Tree
bank
) to
hund
reds
(e.g
. Chi
nese
Sin
ica
Tree
bank
)–
Inco
nven
ient
whe
n un
derta
king
the
mul
tilin
gual
or c
ross
-lin
gual
rese
arch
2. M
otiv
atio
n
•To
brid
ge th
e ga
p be
twee
n th
ese
treeb
anks
and
faci
litat
e fu
ture
rese
arch
–E.
g. th
e un
supe
rvise
d in
duct
ion
of sy
ntac
tic st
ruct
ure
•Pe
trov
et a
l. (2
012)
dev
elop
a u
nive
rsal
PO
S ta
gset
•H
ow a
bout
the
phra
se le
vel t
ags?
•Th
e di
sacc
ord
prob
lem
in th
e ph
rase
leve
l tag
sre
mai
ns u
nsol
ved
–Le
t’s tr
y to
solv
e it
3. D
esig
ned
phra
se ta
gset
map
ping
•Te
ntat
ive
desig
n of
phr
ase
tags
et m
appi
ng–
On
Engl
ish P
enn
Tree
bank
I, II
& F
renc
h Tr
eeba
nk•
9 un
iver
sal p
hras
al c
ateg
orie
s cov
erin
g–
14 p
hras
e ta
gs in
Eng
lish
Penn
Tre
eban
k I
–26
phr
ase
tags
in E
nglis
h Pe
nn T
reeb
ank
II–
14 p
hras
e ta
gs in
Fre
nch
Tree
bank
3. D
esig
ned
phra
se ta
gset
map
ping
Tabl
e 1:
phr
ase
tags
et m
appi
ng fo
r Fre
nch
and
Eng
lish
treeb
anks
3. D
esig
ned
phra
se ta
gset
map
ping
•U
nive
rsal
phr
asal
cat
egor
ies:
NP
(nou
n ph
rase
), V
P(v
erb
phra
se),
AJP
(adj
ectiv
e ph
rase
), AV
P (a
dver
bial
phra
se),
PP (p
repo
sitio
nal p
hras
e), S
(sub
/-sen
tenc
e),
CON
JP (c
onju
nctio
n ph
rase
), CO
P (c
oord
inat
edph
rse)
, X (o
ther
phr
ases
or u
nkno
wn)
•N
P co
verin
g–
Fren
ch ta
gs: N
P–
Engl
ish ta
gs: N
P, N
AC
(the
scop
e of
cer
tain
pre
nom
inal
mod
ifier
s with
in a
n N
P), N
X (w
ithin
cer
tain
com
plex
NPs
to m
ark
the
head
of N
P), W
HN
P (w
h-no
un p
hras
e), Q
P(q
uant
ifier
phr
ase)
3. D
esig
ned
phra
se ta
gset
map
ping
•V
P co
verin
g–
Fren
ch ta
gs: V
N (v
erba
l nuc
leus
), V
P (in
finiti
ves a
ndno
nfini
te c
laus
es)
–En
glis
h ta
gs: V
P (v
erb
phra
se)
•A
JP c
over
ing
–Fr
ench
tags
: AP
(adj
ectiv
al p
hras
e)–
Engl
ish
tags
: AD
JP (a
djec
tive
phra
se),
WH
AD
JP (w
h-ad
ject
ive
phra
se)
3. D
esig
ned
phra
se ta
gset
map
ping
•A
VP
cove
ring
–Fr
ench
tags
: AdP
(adv
erbi
al p
hras
es)
–En
glis
h ta
gs: A
DV
P (a
dver
b ph
rase
), W
HAV
P (w
h-ad
verb
phra
se),
PRT
(par
ticle
)•
PP c
over
ing
–Fr
ench
tags
: PP
–En
glis
h ta
gs: P
P, W
HPP
(wh-
prop
ositi
onal
phr
ase
phra
se)
3. D
esig
ned
phra
se ta
gset
map
ping
•S
cove
ring
–Fr
ench
tags
: SEN
T (s
ente
nce)
, S (fi
nite
cla
use)
–En
glish
tags
: S (s
impl
e de
clar
ativ
e cl
ause
), SB
AR
(cla
use
intro
duce
d by
a su
bord
inat
ing
conj
unct
ion)
, SBA
RQ (d
irect
ques
tion
intro
duce
d by
a w
h-ph
rase
), SI
NV
(dec
lara
tive
sent
ence
with
subj
ect-a
ux in
vers
ion)
, SQ
(sub
-con
stitu
ent
of S
BARQ
), PR
N (p
aren
thet
ical
), FR
AG
(fra
gmen
t), R
RC(re
duce
d re
lativ
e cl
ause
).•
CON
JP c
over
ing
–Fr
ench
tags
: N/A
–En
glish
tags
: CO
NJP
3. D
esig
ned
phra
se ta
gset
map
ping
•CO
P co
verin
g–
Fren
ch ta
gs: C
OO
RD (c
oord
inat
ed p
hras
e)–
Engl
ish ta
gs: U
CP (c
oord
inat
ed p
hras
es b
elon
ging
todi
ffere
nt c
ateg
orie
s)•
X c
over
ing
–Fr
ench
tags
: unk
now
n–
Engl
ish ta
gs: X
(unk
now
n or
unc
erta
in),
INTJ
(inte
rject
ion)
, LST
(list
mar
ker)
4. A
pplic
atio
n in
4.
App
licat
ion
in M
achi
ne T
rans
latio
n M
achi
ne T
rans
latio
n ev
alua
tion
eval
uatio
n
4.1
Man
ual e
valu
atio
ns
•R
apid
dev
elop
men
t of M
achi
ne T
rans
latio
ns–
MT
bega
n as
ear
ly a
s in
the
1950
s (W
eave
r, 19
55)
–B
ig p
rogr
ess s
cien
ce th
e 19
90s d
ue to
the
deve
lopm
ent o
fco
mpu
ters
(sto
rage
cap
acity
and
com
puta
tiona
l pow
er) a
ndth
e en
larg
ed b
iling
ual c
orpo
ra (M
arin
o et
al.
2006
)•
Diffi
culti
es o
f MT
eval
uatio
n–
lang
uage
var
iabi
lity
resu
lts in
no
sing
le c
orre
ct tr
ansl
atio
n–
the
natu
ral l
angu
ages
are
hig
hly
ambi
guou
s and
diff
eren
tla
ngua
ges d
o no
t alw
ays e
xpre
ss th
e sa
me
cont
ent i
n th
esa
me
way
(Arn
old,
200
3)
4.1
Man
ual e
valu
atio
ns
•Tr
aditi
onal
man
ual e
valu
atio
n cr
iteria
:–
inte
lligi
bilit
y (m
easu
ring
how
und
erst
anda
ble
the
sent
ence
is)
–fid
elity
(mea
surin
g ho
w m
uch
info
rmat
ion
the
trans
late
dse
nten
ce re
tain
s as c
ompa
red
to th
e or
igin
al) b
y th
eA
utom
atic
Lan
guag
e Pr
oces
sing
Adv
isor
y C
omm
ittee
(ALP
AC
) aro
und
1966
(Car
roll,
196
6)–
adeq
uacy
(sim
ilar a
s fide
lity)
, flue
ncy
(whe
ther
the
sent
ence
is w
ell-f
orm
ed a
nd fl
uent
) and
com
preh
ensio
n(im
prov
ed in
telli
gibi
lity)
by
Def
ense
Adv
ance
d R
esea
rch
Proj
ects
Age
ncy
(DA
RPA
) of U
S (W
hite
et a
l., 1
994)
4.1
Man
ual e
valu
atio
ns
•Pr
oble
ms o
f m
anu
al e
valu
atio
ns :
–Ti
me-
cons
umin
g–
Expe
nsiv
e–
Unr
epea
tabl
e–
Low
agr
eem
ent (
Cal
lison
-Bur
ch, e
t al.,
201
1)
4.2
Trad
ition
al a
utom
atic
MT
eval
uatio
ns
•M
easu
ring
the
simila
rity
of a
utom
atic
tran
slatio
n an
dre
fere
nce
trans
latio
n–
Aut
omat
ic tr
ansla
tion
(or h
ypot
hesis
tran
slatio
n, ta
rget
trans
latio
n): b
y au
tom
atic
MT
syste
m–
Refe
renc
e tra
nsla
tion:
by
prof
essio
nal t
rans
lato
rs–
Sour
ce la
ngua
ge a
nd so
urce
doc
umen
t: no
t use
d•
Trad
ition
al a
utom
atic
eva
luat
ion:
–BL
EU: n
-gra
m p
reci
sions
(Pap
inen
i et a
l., 2
002)
–TE
R: e
dit d
istan
ces (
Snov
er e
t al.,
200
6)–
MET
EOR:
pre
cisio
n an
d re
call
(Ban
erje
e an
d La
vie,
200
5)
4.3
Des
igne
d un
supe
rvise
d M
T ev
alua
tion
•Pr
oble
ms i
n su
perv
ised
MT
eval
uatio
n–
Ref
eren
ce tr
ansl
atio
ns a
re e
xpen
sive
–R
efer
ence
tran
slat
ions
are
not
ava
ilabl
e is
som
e ca
ses
•C
ould
we
get r
id o
f the
refe
renc
e tra
nsla
tion?
–U
nsup
ervi
sed
MT
eval
uatio
n m
etho
d–
Extra
ct in
form
atio
n fr
om so
urce
and
targ
et la
ngua
ge–
How
to u
se th
e de
sign
ed u
nive
rsal
phr
ase
tags
et?
4.3
Des
igne
d un
supe
rvise
d M
T ev
alua
tion
•A
ssum
e th
at th
e tra
nsla
ted
sent
ence
shou
ld h
ave
asi
mila
r se
t of p
hras
e ca
tego
ries
with
the
sour
cese
nten
ce.
–Th
is d
esig
n is
insp
ired
by th
e sy
nony
mou
s rel
atio
n be
twee
nso
urce
and
targ
et se
nten
ce.
•Tw
o se
nten
ces t
hat h
ave
sim
ilar s
et o
f phr
ases
may
talk
abo
ut d
iffer
ent t
hing
s.–
How
ever
, thi
s eva
luat
ion
appr
oach
is n
ot d
esig
ned
for
gene
ral c
ircum
stan
ce–
Ass
ume
that
the
targ
eted
sent
ence
s are
inde
ed th
etra
nsla
ted
sent
ence
s fro
m th
e so
urce
doc
umen
t
4.3
Des
igne
d un
supe
rvise
d M
T ev
alua
tion
•Fi
rst,
we
pa
rse
the
sour
ce a
nd ta
rget
lang
uage
sre
spec
tivel
y•
Then
we
extr
act
the
phr
ase
set f
rom
the
sour
ce a
ndta
rget
sent
ence
s•
Third
, we
conv
ert t
he p
hras
es in
to th
e de
velo
ped
univ
ersa
l phr
ase
cate
gorie
s•
Last
, we
mea
sure
the
sim
ilarit
y of
sour
ce a
nd ta
rget
lang
uage
on
the
univ
ersa
l phr
ase
sequ
ence
s
4.3
Des
igne
d un
supe
rvise
d M
T ev
alua
tion
Figu
re 1
: the
par
sed
Fren
ch a
nd E
nglis
h se
nten
ce
4.3
Des
igne
d un
supe
rvise
d M
T ev
alua
tion
Figu
re 2
: con
vert
the
extra
cted
phr
ase
into
uni
vers
al p
hras
e ta
gs
The
leve
l of e
xtra
cted
phr
ase
tags
: jus
t the
upp
er le
vel o
f PO
S ta
gs, b
otto
m-u
p
4.3
Des
igne
d un
supe
rvise
d M
T ev
alua
tion
•W
hat i
s the
sim
ilarit
y m
etric
we
empl
oyed
?•
Des
igne
d si
mila
rity
met
ric: H
PPR
–N
1 g
ram
pos
ition
ord
er d
iffer
ence
pen
alty
–W
eigh
ted
N2
gram
pre
cisi
on–
Wei
ghte
d N
3 gr
am re
call
–W
eigh
ted
geom
etric
mea
n in
n-g
ram
pre
cisi
on &
reca
ll–
Wei
ghte
d ha
rmon
ic m
ean
to c
ombi
ne su
b-fa
ctor
s–
The
para
met
ers a
re tu
nabl
e ac
cord
ing
to d
iffer
ent l
angu
age
pairs
4.3
Des
igne
d un
supe
rvise
d M
T ev
alua
tion
•
4.3
Des
igne
d un
supe
rvise
d M
T ev
alua
tion
•
4.3
Des
igne
d un
supe
rvise
d M
T ev
alua
tion
Figu
re 3
: N1
gram
tag
alig
nmen
t alg
orith
m
4.3
Des
igne
d un
supe
rvise
d M
T ev
alua
tion
4.3
Des
igne
d un
supe
rvise
d M
T ev
alua
tion
•
4.3
Des
igne
d un
supe
rvise
d M
T ev
alua
tion
Figu
re 5
: big
ram
chu
nk m
atch
ing
exam
ple
4.4
Eval
uatin
g th
e ev
alua
tion
met
hod
•
4.5
Expe
rim
ents
•C
orp
us fr
om W
MT
–W
orks
hop
of st
atis
tical
mac
hine
tran
slat
ion
–SI
GM
T, A
CL’
S sp
ecia
l int
eres
t gro
up o
f mac
hine
trans
latio
n•
Trai
ning
dat
a (W
MT1
1), t
une
the
para
met
ers
–3,
003
sent
ence
s for
eac
h do
cum
ent
–18
aut
omat
ic F
renc
h-to
-Eng
lish
MT
syst
ems
•Te
stin
g da
ta (W
MT1
2)–
3, 0
03 se
nten
ces f
or e
ach
docu
men
t–
15 a
utom
atic
Fre
nch-
to-E
nglis
h M
T sy
stem
s
4.5
Expe
rim
ents
•Tr
aini
ng, t
une
the
pa
ram
eter
s–
N1,
N2
and
N3
are
tune
d as
2, 3
and
3 d
ue to
the
fact
that
the
4-gr
am c
hunk
mat
ch u
sual
ly re
sults
in 0
scor
e.–
Tune
d va
lues
of f
acto
r wei
ghts
are
show
n in
tabl
e
Tabl
e 2:
tune
d pa
ram
eter
val
ues
4.5
Expe
rim
ents
•C
ompa
rison
s with
:–
BLE
U, m
easu
re th
e cl
osen
ess o
f the
hyp
othe
sis a
ndre
fere
nce
trans
latio
ns, n
-gra
m p
reci
sion
–TE
R, m
easu
re th
e ed
iting
dis
tanc
e of
hyp
othe
sis t
ore
fere
nce
trans
latio
ns
4.5
Expe
rim
ents
Tabl
e 3:
trai
ning
(dev
elop
men
t) sc
ores
on
WM
T11
corp
us
Tabl
e 4:
test
ing
scor
es o
n W
MT1
2 co
rpus
4.5
Expe
rim
ents
Tabl
e 5:
cor
rela
tion
scor
e in
tro (C
ohen
, 198
8)
●Th
e ex
perim
ent r
esul
ts o
n th
e de
velo
pmen
t and
test
ing
corp
ora
show
that
HP
PR
with
out u
sing
ref
eren
ce tr
ansl
atio
ns h
as y
ield
ed p
rom
isin
g co
rrel
atio
nsc
ores
(0.6
3 an
d 0.
59 re
spec
tivel
y).
●Th
ere
is s
till p
oten
tial t
o im
prov
e th
e pe
rform
ance
s of
all
the
thre
e m
etric
s,ev
en t
houg
h th
at t
he c
orre
latio
n sc
ores
whi
ch a
re h
ighe
r th
an 0
.5 a
real
read
y co
nsid
ered
as
stro
ng c
orre
latio
n as
sho
wn
in T
able
5.
4.6
Ope
n so
urce
cod
e
•Ph
rase
Tag
set M
appi
ng fo
r Fre
nch
and
Engl
ish
Tree
bank
san
d Its
App
licat
ion
in M
achi
ne T
rans
latio
n Ev
alua
tion
–A
aron
L.-F
. Han
, Der
ek F
. Won
g, L
idia
S. C
hao,
Lia
ngye
He,
Shu
o Li
, and
Lin
g Zh
u. G
SCL
2013
, Dar
mst
adt,
Ger
man
y. L
NC
S Vo
l. 81
05, p
p. 1
19-1
31, V
olum
e Ed
itors
:Ir
yna
Gur
evyc
h, C
hris
Bie
man
n an
d To
rste
n Ze
sch.
•O
pen
sour
ce to
ol fo
r phr
ase
tags
et m
appi
ng a
nd H
PPR
simila
rity
mea
surin
g al
gorit
hms:
http
s://g
ithub
.com
/aar
onlif
engh
an/a
aron
-pro
ject
-hpp
r
5. D
iscus
sion
•Fa
cilit
ate
futu
re re
sear
ch in
mul
tilin
gual
or c
ross
-lin
gual
lite
ratu
re, t
his p
aper
des
igns
a p
hras
e ta
gsm
appi
ng b
etw
een
the
Fren
ch T
reeb
ank
and
the
Engl
ish P
enn
Tree
bank
usin
g 9
phra
se c
ateg
orie
s.•
One
of t
he p
oten
tial a
pplic
atio
ns o
f the
des
igne
dun
iver
sal p
hras
e ta
gset
is sh
own
in th
e un
supe
rvise
dM
T ev
alua
tion
task
in th
e ex
perim
ent s
ectio
n.
5. D
iscus
sion
•Th
ere
are
still
som
e lim
itatio
ns in
this
wor
k to
be
addr
esse
d in
the
futu
re.
–Th
e de
sign
ed u
nive
rsal
phr
ase
cate
gorie
s may
not
be
able
to co
ver a
ll th
e ph
rase
tags
of o
ther
lang
uage
tree
bank
s,so
this
tags
et c
ould
be
expa
nded
whe
n ne
cess
ary.
–Th
e de
sign
ed H
PPR
form
ula c
onta
ins t
he n
-gra
m fa
ctor
sof
pos
ition
diff
eren
ce, p
reci
sion
and
reca
ll, w
hich
may
not
be su
ffici
ent o
r sui
tabl
e fo
r som
e of
the
othe
r lan
guag
epa
irs, s
o di
ffere
nt m
easu
ring
fact
ors s
houl
d be
adde
d or
switc
hed
whe
n fa
cing
new
task
s.
5. D
iscus
sion
•A
ctua
lly sp
eaki
ng, t
he d
esig
ned
mod
els a
re v
ery
rela
ted
to th
e si
mila
rity
mea
surin
g. W
here
we
have
empl
oyed
them
is in
the
MT
eval
uatio
n. T
hese
wor
ksm
ay b
e fu
rther
dev
elop
ed in
to o
ther
lite
ratu
re:
–in
form
atio
n re
triev
al–
ques
tion
and
answ
erin
g–
Sear
chin
g–
text
ana
lysi
s–
etc.
6. F
urth
er in
form
atio
n
•O
ngoi
ng a
nd fu
rther
wor
ks:
–Th
e co
mbi
natio
n of
tran
slatio
n an
d ev
alua
tion,
tuni
ng th
etra
nsla
tion
mod
el u
sing
eval
uatio
n m
etric
s–
Eval
uatio
n m
odel
s fro
m th
e pe
rspe
ctiv
e of
sem
antic
s–
The
furth
er e
xplo
ratio
ns o
f uns
uper
vise
d ev
alua
tion
mod
els,
extra
ctin
g ot
her f
eatu
res f
rom
sour
ce a
nd ta
rget
lang
uage
s•
Aar
on o
pen
sour
ce to
ols:
http
s://g
ithub
.com
/aar
onlif
engh
an•
Aar
on n
etw
ork
Hom
e: h
ttp://
ww
w.lin
kedi
n.co
m/in
/aar
onha
n
Phra
se T
agse
t Map
ping
for
Fren
ch a
nd E
nglis
hTr
eeba
nks a
nd It
s App
licat
ion
in M
achi
neTr
ansla
tion
Eval
uatio
nG
SCL
2013
, Dar
mst
adt,
Ger
man
y
Aar
on L
.-F. H
anem
ail:
hanl
ifeng
aaro
n AT
gm
ail D
OT
com
Natu
ral L
angu
age
Proc
essin
g &
Por
tugu
ese-
Chin
ese
Mach
ine
Tran
slatio
n La
bora
tory
Dep
artm
ent o
f Com
pute
r and
Info
rmat
ion
Sci
ence
Uni
vers
ity o
f Mac
au
Q a
nd A