arif bramantoro and toru ishida department of social informatics kyoto university japan

25
LREC 2010, 17-23 May, Malt Arif Bramantoro and Toru Ishida Department of Social Informatics Kyoto University Japan Towards an Integrated Architecture for Composite Language Services and Multiple Linguistic Processing Components Ulrich Schäfer Language Technology Lab DFKI Germany

Upload: fauve

Post on 25-Feb-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Towards an Integrated Architecture for Composite Language Services and Multiple Linguistic Processing Components. Ulrich Schäfer Language Technology Lab DFKI Germany. Arif Bramantoro and Toru Ishida Department of Social Informatics Kyoto University Japan. Presentation Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

LREC 2010, 17-23 May, Malta

Arif Bramantoro and Toru Ishida Department of Social Informatics Kyoto UniversityJapan

Towards an Integrated Architecture for Composite Language Services and Multiple Linguistic Processing Components

Ulrich Schäfer Language Technology Lab DFKIGermany

Page 2: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

2LREC 2010, 17-23 May, Malta

Presentation Outline

• Introduction• Language Grid

– Workflow inLanguage Grid

• Heart of Gold– Processing Flow in

Heart of Gold• Combination• Pipelining Support Service• Conclusion

Outline

Introduction Language Grid

WorkflowHeart of Gold

Processing Flow

Combination

Pipelining Support Service

Conclusion

Page 3: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

3LREC 2010, 17-23 May, Malta

Introduction

Outline

Introduction Language Grid

WorkflowHeart of Gold

Processing Flow

Combination

Pipelining Support Service

Conclusion

Page 4: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

4LREC 2010, 17-23 May, Malta

Introduction

• Lots of natural language processing (NLP) architectures

• Each NLP architecture has its own characteristic– Language Grid (NICT-Japan): Service-oriented architecture– Heart of Gold (DFKI-Germany): Functional-oriented architecture

• To increase the number of language services in Language Grid– Why not integrating NLP architectures instead of integrating NLP

tools?

Page 5: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

5LREC 2010, 17-23 May, Malta

Introduction (2) – A Motivation for Combination

• A challenging issue: both have specific way for multi processing– Language Grid: Workflow for composite services– Heart of Gold: Processing flow for multiple linguistic processing

components• Functionalities for access management is only

available in Language Grid

Page 6: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

6LREC 2010, 17-23 May, Malta

Language Grid

Outline

Introduction Language Grid

WorkflowHeart of Gold

Processing Flow

Combination

Pipelining Support Service

Conclusion

Page 7: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

7LREC 2010, 17-23 May, Malta

Language Grid (2)

• A new service oriented multilingual infrastructure on Internet to support intercultural activities

• Language resources with complicated intellectual property can be wrapped and shared

• Linked Service Grids:– Language Grid in Japan– Language Grid in Thailand– Agricultural Service Grid– Education Service Grid– etc

Page 8: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

8LREC 2010, 17-23 May, Malta

Language Grid

Service Grid Server Software

Application System

Service Invoker

Service Manager

Grid Composer

Other Service Grid

BPEL Composite Service Engine

NativeProgram

NetworkProgram

ResourceDatabase

Service Resource

Service Resource

Service Resource

Script Composite Service Engine

Java Composite Service Engine

Java AtomicService Engine

Page 9: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

9LREC 2010, 17-23 May, Malta

Workflow in Language Grid

• Sample methods of workflow for composite services– Business Process Execution Language (BPEL)– Script– Java, etc

• Additional technique for composite service: Constraint Satisfaction

• X = {X1,…,Xn} is a set of abstract web services• D = {D1,…,Dn}

– Di = {si1,...,sik} where sij is a concrete web service of the corresponding Xi

• C = {C1,…,Cp} is a set of constraints

Page 10: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

Workflow in Language Grid (2)

• X = {X1, X2, X3, X4, X5}– X1 : Morphological analyzer service– X2 : ja-en translation service – X3 : en-id translation service– X4 : Community dictionary service;– X5 : Term replacement service

• D = {D1, D2, D3, D4, D5} – D1 : {mecab at NTT, ICTCLAS, KLT at Kookmin

University, treetTagger at IMS Stuttgart};– D2 : {JServer at NICT, WEB-Transer at Kyoto-U,

Google Translation, Translution}– D3 : {ToggleText}– D4 : {Life Science Dictionary, Natural Disasters

Dictionary, Kyoto Tourism Dictionary}– D5 : {TermRepl service}• C = {C1, C2, C3}

– C1 : For multi hop translation, X2.OUT = X3.IN– C2 : For specialized translation service with dictionary, serverLocation(X2) = serverLocation(X4) – C3 : For morphological analysis, partialAnalyzedResult(X1.OUT) X∈ 2.IN

Japanese Morphological Analysis Service

Community Dictionary Service

ja->en Translation Service

Term Replacement Service

en->idTranslation Service

Page 11: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

11LREC 2010, 17-23 May, Malta

Heart of Gold

Outline

Introduction Language Grid

WorkflowHeart of Gold

Processing Flow

Combination

Pipelining Support Service

Conclusion

Page 12: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

12LREC 2010, 17-23 May, Malta

Heart of Gold

• Functional oriented middleware architecture for integrating deep and shallow Natural Language Processing (NLP) components

Application

Heart of Gold Middleware

queries results

External NLP Components

XML-RPC /Java API

MoCoMan

Computed annotation

ModulesExternal

persistent annotation database

Page 13: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

13LREC 2010, 17-23 May, Malta

Heart of Gold – Deep NLP

• Key feature of Heart of Gold – unavailable in Language Grid

• Try to apply as much linguistic knowledge as possible

• Linguistic knowledge is declaratively encoded– Tom gave his son a toy

past(give(Tom, his son, toy))• Syntactic variants: ‘A toy was given by Tom to his

son’ or ‘Tom gave his son the toy’

Page 14: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

Processing Flow in Heart of Gold

• 3 methods of processing flow for multiple NLP components– Varying depth of modules – Varying additional input & output annotation– Using SDL (System Description Language; Krieger, 2003)

• + (sequence)– one component starts after the previous component has

finished, taking its output as own input• | (parallelism)

– multiple components are executed in parallel in separate threads in Java

• ∗ (unrestricted iteration)– a component is executed in a loop until its output remains

unchanged

Page 15: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

Processing Flow in Heart of Gold (2)

chunkiermrs = ( sprout_rmrs_morph + xslt_pos_filter +   sprout_rmrs_lex + (* xslt_nodeid_cat + sprout_rmrs_phrase ) + slt_fs2rmrsxml)

 sprout_rmrs_morph = SproutModulesTextDom("rmrs-morph.cfg")xslt_pos_filter = XsltModulesDomDom("posfilter.xsl", "aid", "Chunkie")sprout_rmrs_lex = SproutModulesDomDom("rmrs-lex.cfg")xslt_nodeid_cat = XsltModulesDomDom("nodeinfo.xsl", "aid", "Chunkie")sprout_rmrs_phrase = SproutModulesDomDom("rmrs-phrase.cfg")xslt_fs2rmrsxml = XsltModulesDomDom("fs2rmrsxml.xsl")

RMRS

result

SProUTrmrs_lex

SProUTrmrs_phras

e

SProUTrmrs_morp

hXSLT

pos_filterXSLT

nodeid_catXSLT

fs2rmrsxml

SProUT-XSLT cascaded language components

input sentence

Page 16: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

16LREC 2010, 17-23 May, Malta

Combination

Outline

Introduction Language Grid

WorkflowHeart of Gold

Processing Flow

Combination

Pipelining Support Service

Conclusion

Page 17: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

Combining Two Architectures

• Wrapping Heart of Gold as atomic service in language resource layer of Language Grid– Service Input: language identifier, text to be analyzed, depth of

analysis– Service Output: XML string

Kyoto

University

D

F

K

I

G

e

r

m

a

n

y

Intercultural Collaboration Tools

P2P Grid Infrastructure

Language Services(specialized translation, multi-hop

translation, …)

Language Resources (machine translations,

morphological analyzers, dictionaries, …)

Heart of Gold

Wrapped Web Service

Heart of GoldMiddleware

queriesresults

External NLP Component 1

..

.

XML-RPC

External NLP Component n

N

I

C

T

J

a

p

a

n

Page 18: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

18LREC 2010, 17-23 May, Malta

Combining Two Architectures (2)

• What about composite service?– Unable to run the composite service from language

resource layer– Workflow & processing flow are different– Should move to upper layer: language service layer

• Solution– Use processing flow in Language Grid– Use workflow in Heart of Gold– Create pipelining service

Page 19: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

Combination of Two Flows (1)

ChaSen

Science Dictionary Service

J-Serveren -> jaTranslation Service

Term Replacement

TreeTagger Hart of Gold (SProUT)

ChaSen

Science Dictionary Service

J-Serveren -> jaTranslation Service

Term Replacement

b) After Combination(Language Grid + Heart of Gold)

a) Before Combination (Language Grid)

The Temple of the Golden Pavilion = Kinkakuji

Tourism Dictionary Service

I visited the temple of the golden pavilion at Kyoto

Watashi ha kyoto de gooruden tenjikan no jiin wo

houmonshitaWatashi ha kyoto de

Kinkakuji wo houmonshita

The Temple of the Golden

Pavilion = −

I visited The Temple of the Golden Pavilion at Kyoto

<FS type="ne-location"> the temple of the golden pavilion

at Kyoto </FS>Processing flow

Page 20: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

Combination of Two Flows (2)

• Utilizing Service as a Software – Wrap language service containing workflow as Heart of

Gold component– Useful for NLPs with limited supported language

(ex: ChunkieRMRS is only available for German & English)

Specialized ja-en

translation service

ChunkieRMRSSpecialized

en-jatranslation

service

XML Converter XML Converterinput sentence

in Japanese

output RMRSmerge

in Japanese

Heart of Gold componentsworkflowworkflow

Page 21: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

21LREC 2010, 17-23 May, Malta

Pipelining Support Service

Outline

Introduction Language Grid

WorkflowHeart of Gold

Processing Flow

Combination

Pipelining Support Service

Conclusion

Page 22: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

22LREC 2010, 17-23 May, Malta

Supporting Service for Pipelining NLP

Extended Workflow Repository in

Constraint Optimization

Language Service Information

Repository (WSDL, QoS Profile)

Language Component Information Repository

(Class, Depth, Input-Output)

Processing Flow & Workflow Integrator Service

SDL WriterWorkflow Analyzer

Processing Flow Analyzer

Component Information

Service Profile

Set of Workflows

New Workflows +SDL

• A service to orchestrate a new workflow containing processing flow (SDL)– by analyzing current workflow and processing flow– useful for pipelining NLP

• Can be offline or online with user request

Page 23: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

23LREC 2010, 17-23 May, Malta

Conclusion

Outline

Introduction Language Grid

WorkflowHeart of Gold

Processing Flow

Combination

Pipelining Service Conclusion

Page 24: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

24LREC 2010, 17-23 May, Malta

Conclusion

• Composite language services & language components can be integrated– by utilizing their processing flow & workflow

• Additional pipelining support service to modify the existing workflow

• Language service is a good way to combine human and machine language processing

• Flexibility for high speed pipeline: BPEL, Script, etc• Possible intra-server workflow from the integration

Contribution

Lesson Learned

Page 25: Arif  Bramantoro and Toru Ishida  Department  of Social  Informatics  Kyoto University Japan

25LREC 2010, 17-23 May, Malta

Q & A

Thank you for listening