polyglot: multilingual semantic role labeling with unified labels

Post on 08-Apr-2017

114 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

English Proposition Bank Frames

Generating Training Data for Multilingual SRL

Semantic Parsing of 9 Languages

English Chinese Spanish

buy.01Roles:

A0: buyer (agent) A1: thing bought (theme) A2: seller (source) A3: price paid (asset) A4: benefactive (beneficiary)

German Japanese Russian

Training data generation pipeline• Optional: Manual aliasing of TL verbs to English frames• Filtered annotation projection (Akbik et al., 2015)

like.01Roles:

A0: liker (experiencer) A1: object of affection (theme)

give.01Roles:

A0: giver (agent) A1: thing given(theme) A2: entity given to(recipient)

sell.01Roles:

A0: Seller (agent) A1: Thing Sold (theme) A2: Buyer (recipient) A3: Price PaidA4: Benefactive

Challenges and open questions• Source-language SRL errors• Coverage: Do appropriate English

frames exist for all TL verbs?• pouvoir (to be able to), sollen (to be

supposed to)

• Crowdsourced data curation(Akbik et al., 2016)• Design of crowdsourcing task

Alan Akbik and Yunyao Li

IBM Research - Almaden

POLYGLOT

Multilingual Semantic Role Labeling with Unified Labels

Idea: Use English Proposition Bank Frames and Roles as universal semantic labels

annehmen.01(accept)

Roles:A0: acceptor (agent) A1: thing accepted (theme) A2: accepted-from (source) A3: attribute (attribute)

annehmen.02(assume)

Roles:A0: thinker (agent) A1: thought(theme) A2: attributive (source)

Predicate: annehmen

Example Target Language FrameAnnotation Projection

English, German, Chinese, French, Japanese, Spanish, Russian, Hindi and Arabic

semantic labels(predicates + roles)

EN

unlabeled corpus

TL

Parallel corpus

semantic labels(projected)

Annotation projection

TL

Annotation projection Future work: Crowdsourced and expert data curation

Crowd agrees?

Input

Crowdsourced data curation

semantic labels(crowd cannot curate)

semantic labels(curated, final)

TL

TL

Expert data curation

yes

no

Multilingual aliases

Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling. Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan and Huaiyu Zhu. ACL 2015.

Towards Semi-Automatic Generation of Proposition Banks for Low-Resource Languages. Alan Akbik and Yunyao Li. EMNLP 2016.

Evaluation

top related