Transcript
Page 1: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

1

Cross‐LanguageIR

CISC489/689‐010,Lecture#23Monday,May11th

BenCartereCe

Cross‐LanguageIR

•  Usersubmitsaqueryinonelanguage,getsresultsinadifferentlanguage

•  Documentsaresemi‐structuredandheterogeneous(asalmostalldatainIR),andalsoinmulNplelanguages

•  InformaNonmayonlybeavailableindocumentswriCeninoneofthelanguages

•  Highlyusefultointelligencecommunity

Page 2: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

2

ApproachestoCLIR

•  Translatethedocumentsintotheusers’language,andlettheuserssubmitqueriesintheirownlanguage

•  Translatetheusers’queriesintotargetlanguage(s)andusethetranslatedqueryforretrieval

•  Translatebothqueriesanddocumentstoan“intermediate”language

AutomaNcTranslaNon

•  WhataresomeapproachestoautomaNctranslaNon?–  Language‐to‐languagedicNonaries

•  Languagesdonottranslateprecisely– Onewordwithseveralmeaningsinonelanguagemighttranslatetoseveraldifferentwordsintheother

– Manywordswiththesamemeaningmightalltranslatetoasingleword

– Awordinonelanguagemightonlybeexpressibleasaphraseinanother(orvice‐versa)

–  etc…

Page 3: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

3

Example

•  TranslaNonsof“bank”:–  Orilla(riverbank)–  Terraplen (bankofearth)–  Banco (bankofclouds)–  Bateria (bankoflights)–  Banco (financialinsNtuNon)–  Banca(casinobank) 

•  TranslaNonsof“fraud”:–  Impostor (fraudulentperson)–  Fraude(decepNon)

•  HowwouldadicNonary‐basedsystemknowwhichpairoftranslaNonstouse?

•  EnglishqueriestoretrieveSpanishdocuments

•  SystemworksbytranslaNngquerytoSpanish•  Query:“bankfraud”

•  PossiblycorrecttranslaNon: •  Fraude bancario

StaNsNcalApproach

•  Insteadoftryingtotranslatedirectly,applystaNsNcalmethods

•  Learn“translaNonprobabiliNes”P(f|e)–probabilityoftranslaNngstringeinlanguageEtostringfinlanguageF

•  E.g.:– P(orillafraude|bankfraud),P(orillaimpostor|bankfraud),P(bancofraude|bankfraud),…

Page 4: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

4

Cross‐LanguageLanguageModel

•  Recallquery‐likelihoodlanguagemodel:

•  Let’sadaptthistocross‐languageretrievalusingstaNsNcaltranslaNon

P (Q|D) =!

q!Q

P (q|D) =!

q!Q

(1! !D)tfqD

|D| + !Dctfq

|C|

P (Qf |De) =!

qf!Qf

P (qf |De)

=!

qf!Qf

"

te!E

P (qf |te)P (te|De)

=!

qf!Qf

"

te!E

P (qf |te)#

(1! !De)tfteDe

|De| + !De

ctfte

|Ce|

$

TranslaNonModel

•  WhatisP(qf|te)?•  Thetransla6on model:probabilityoftranslaNngwordteinlanguageEtowordqfinlanguageF

•  Wheredoesitcomefrom?– MaybeadicNonaryapproach:everypossibletranslaNonoftehasequalprobability

– e.g.P(orilla|bank)=P(banco|bank)=P(banca|bank)=…

Page 5: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

5

StaNsNcalTranslaNonModel

•  AnalternaNveapproach:parallel corpora

StaNsNcalTranslaNonwithParallelCorpora

•  ParallelcorporaconsistofdocumentsintwoormorelanguagesthatareknowntobetranslaNonsofoneanother

•  Theparallelcoporaarealigned:stringeandstringfaremarkedastranslaNonsofeachother

•  WecanusethesealignmentstoesNmateatranslaNonmodel

Page 6: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

6

TranslaNonModel

•  ToesNmateP(qf|te),countthenumberofalignedstringpairs(e,f)suchthatteisawordineandqfisawordinf

•  Dividebythetotalnumberofstringsinlanguageethatcontainte

P (qf |te) =|{(e, f)|te ! e and qf ! f}|

|{e|te ! e}|

SimpleAlignmentExample

•  Englishsentence:“TheobjecNvewasclear:arrestandextraditetoMexicothewomanagainstwhomtheyhadchargedforfraudtoarecognizedbankinginsNtuNon.”

•  Spanishsentence:“ElobjeNvoeraclaro:deteneralamujeryenviarladeregresoaMéxicopueshabíancargosensucontraporfraudeaunareconocidainsNtuciónbancaria.”

•  EverypairofwordsinthesetwosentenceswillhavesometranslaNonprobability

•  Overmanysentences,thehighestprobabiliNeswillbethepairsofwordsthataremostcloselyrelated

Page 7: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

7

Alignments

•  Alignmentscanbemuchmoredetailed

ImagesfromBrownetal.,“TheMathemaNcsofStaNsNcalMachineTranslaNon”

ParallelCorpora

•  Wheredowegetparallelcorpora?– FinddocumentsthatweknowtobetranslaNons

– CanadianHansard:transcriptsofCanadianparliamentarydebatesinbothEnglishandFrench

– EuropeanUnionlawin22languages•  Anythingthat’snotlaw‐related?– WikipediaarNclesindifferentlanguages..NotnecessarilytranslaNonsthough

Page 8: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

8

CLIRExperiments

•  CLIRtrackranatTRECfrom1998through2002

•  LanguagesusedincludeEnglish,German,French,Italian,Chinese,andArabic

•  OtherissuesinCLIR:– SegmentaNon,stemming,stopping,phrasesrequiredifferentapproachesindifferentlanguages

–  Iamgoingtofocusonhigh‐levelproblem

CLIRExperiments

•  In2001and2002,themainCLIRtaskwasEnglishqueriestoretrieveArabicdocuments

•  Documents:383,872newsarNclesfromAgenceFrancePressfrom1994‐2000

•  InformaNonneeds:25queries,descripNons,andnarraNvesinEnglishbynaNveArabicspeakers–  TranslatedintoArabicandFrenchaswell

•  ParNcipaNngsitescoulddoCLIR(EnglishtoArabicorFrenchtoArabic)ornormalIR(ArabictoArabic)

Page 9: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

9

ExampleTopic<num>Number:AR26<Ntle> الكردستانيالوطنياملقاومةمجلس

<desc>DescripNon:اإلستقاللالىالوطنيةاملقاومةمجلسينظركيف

؟لالكراداحملتمل

<narr>NarraNve:مجلسبتحركاتمتعلقةنصوصيتضمناملوضوعالوطنيةاملقاومة،قيادةعنتتحدثمقاالت

لالستقاللاالكرادجهودضمناوجالن.

<num>Number:AR26<Ntle>KurdistanIndependence

<desc>DescripNon:HowdoestheNaNonalCouncilof

ResistancerelatetothepotenNalindependenceofKurdistan?

<narr>NarraNve:ArNclesreporNngacNviNesofthe

NaNonalCouncilofResistanceareconsideredontopic.ArNclesdiscussingOcalan'sleadershipwithinthecontextoftheKurdisheffortstowardindependencearealsoconsideredontopic.

ExampleDocument

Page 10: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

10

Results

•  BBN,Umass,IBMusedstaNsNcalmodels•  Umassperformanceoncross‐languageisroughlyequaltoperformanceonmonolingual!

Monolingual(ArabictoArabic)Cross‐lingual(English/FrenchtoArabic)

PlotsfromOard&Gey,“TheTREC‐2002Arabic/EnglishCLIRTrack”

Analysis

•  ThetranslaNonmodelisimperfect–  ItassignsprobabiliNestoalmosteverypairofwords

– TherearemanyerrorsintranslaNon

•  Sohowcouldcross‐lingualbealmostasgoodasmonolingual?

•  Hypotheses:– TranslaNonprocessdisambiguatessometerms– TranslaNonprocesssmoothsquerymodels

Page 11: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

11

IRasStaNsNcalTranslaNon

•  WhatifweviewIRasatranslaNonprocess?– UserinputsqueryinEnglish,systemdoes“cross‐language”retrievalfromuser‐Englishtosystem‐English

– Thismayaccountforusersnotusingtherightkeywordsintheirqueries

•  ThereisnonaturaltranslaNonmodel,soonemustbesimulated

•  Berger&Lafferty,SIGIR1999

IRTranslaNonModel

•  GenerateatranslaNonmodelbyaligningsimulatedqueriestorelevantdocuments

Page 12: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

12

Results

TranslaNonmodelscomparedtow‐idf

LMcoincideswithModel0

Conclusion:staNsNcaltranslaNonworksatleastaswellasw‐idforLM

TranslaNonforMulNmediaRetrieval

•  English‐ArabicCLIRworks•  English‐EnglishCLIRworks•  WhataboutEnglish‐mulNmediaCLIR?

•  “Translate”animageintowordstoenableretrievalofimagesbytextqueries

•  TranslaNonmodel:P(w|I)isprobabilityof“translaNng”imageItowordw

Page 13: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

13

ImageTranslaNonModel

•  EsNmateP(w|I)requirestwothings:– Afeature‐basedrepresentaNonoftheimage

– Asetofwordsthat“align”withtheimage

•  UseimagesegmentaNonandclusteringtoformarepresentaNonofimages

•  UseimagecapNonstoalignwordstoimage

ImageRepresentaNon:“Blobs”

FromJeonetal.,“AutomaNcImageAnnotaNonandRetrievalUsingCross‐MediaRelevanceModels”

Page 14: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

14

Cross‐MediaRelevanceModel

•  Retrievalisbyquery‐likelihoodP(Q|I)P (Q|I) =

!

q!Q

P (q|I)

!!

q!Q

P (q|b1, ..., bm)

"!

q!Q

"

J!C

P (q|J)P (J)m!

i=1

P (bi|J)

CisthecollecNonofimages,JisanimageinC,andb1…bmare“blobs”

ExampleResults

FromJeonetal.,“AutomaNcImageAnnotaNonandRetrievalUsingCross‐MediaRelevanceModels”

Page 15: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

15

MachineTranslaNon

•  MachinetranslaNon(MT)isaprobleminNLP/computaNonallinguisNcs

•  ThegoalistoautomaNcallytranslatetextinonelanguagetoanother

•  DifferentfromCLIRwithquerytranslaNonmodelinthattheCLIRmodeldoesnotrequirea“coherent”translaNonofthequery–  CLIRessenNallyuseseverypossibletranslaNon

•  MachinetranslaNonshouldprovideasingle“good”translaNonthatishuman‐readable

StaNsNcalMT

•  ThoughMTandCLIRaredifferentproblems,thestaNsNcalapproachesareverysimilar

•  IBMdevelopedseveralstaNsNcalmodelsforMT– “AstaNsNcalapproachtomachinetranslaNon”,Brownetal.1990

– CLIRmodelsbasedonIBM’smodels

Page 16: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

16

IBMModels

•  Basicidea:totranslateasentencefinlanguageFtoasentenceeinlanguageE,esNmateP(e|f)usingBayesRule

•  The“right”translaNonistheonewithhighestprobability

P (e|f) =P (f |e)P (e)

P (f)

!e = argmaxe

P (f |e)P (e)

IBMModels

•  ThekeyisesNmaNngP(f|e)•  Brownetal.presentedfivedifferentmodels–  Increasinglycomplicated,requirealotoftrainingdataintheformofparallelalignedcorpora

•  GooglemachinetranslaNonisbasedonalignmentandIBMmodels,butalsobasedonverylargeamountsofunaligneddata

Page 17: Cross‐Language IRir.cis.udel.edu/~carteret/CISC689/slides/lecture23.pdf · • Spanish sentence: “El objevo era claro: detener a la mujer y enviarla de regreso a México pues

5/24/09

17

GoogleMachineTranslaNon

Google’stranslaNonoftheSpanishWikipediapageforSpain(hCp://es.wikipedia.org/wiki/Espana)


Top Related