arif bramantoro and toru ishida department of social informatics kyoto university japan
DESCRIPTION
Towards an Integrated Architecture for Composite Language Services and Multiple Linguistic Processing Components. Ulrich Schäfer Language Technology Lab DFKI Germany. Arif Bramantoro and Toru Ishida Department of Social Informatics Kyoto University Japan. Presentation Outline. - PowerPoint PPT PresentationTRANSCRIPT
LREC 2010, 17-23 May, Malta
Arif Bramantoro and Toru Ishida Department of Social Informatics Kyoto UniversityJapan
Towards an Integrated Architecture for Composite Language Services and Multiple Linguistic Processing Components
Ulrich Schäfer Language Technology Lab DFKIGermany
2LREC 2010, 17-23 May, Malta
Presentation Outline
• Introduction• Language Grid
– Workflow inLanguage Grid
• Heart of Gold– Processing Flow in
Heart of Gold• Combination• Pipelining Support Service• Conclusion
Outline
Introduction Language Grid
WorkflowHeart of Gold
Processing Flow
Combination
Pipelining Support Service
Conclusion
3LREC 2010, 17-23 May, Malta
Introduction
Outline
Introduction Language Grid
WorkflowHeart of Gold
Processing Flow
Combination
Pipelining Support Service
Conclusion
4LREC 2010, 17-23 May, Malta
Introduction
• Lots of natural language processing (NLP) architectures
• Each NLP architecture has its own characteristic– Language Grid (NICT-Japan): Service-oriented architecture– Heart of Gold (DFKI-Germany): Functional-oriented architecture
• To increase the number of language services in Language Grid– Why not integrating NLP architectures instead of integrating NLP
tools?
5LREC 2010, 17-23 May, Malta
Introduction (2) – A Motivation for Combination
• A challenging issue: both have specific way for multi processing– Language Grid: Workflow for composite services– Heart of Gold: Processing flow for multiple linguistic processing
components• Functionalities for access management is only
available in Language Grid
6LREC 2010, 17-23 May, Malta
Language Grid
Outline
Introduction Language Grid
WorkflowHeart of Gold
Processing Flow
Combination
Pipelining Support Service
Conclusion
7LREC 2010, 17-23 May, Malta
Language Grid (2)
• A new service oriented multilingual infrastructure on Internet to support intercultural activities
• Language resources with complicated intellectual property can be wrapped and shared
• Linked Service Grids:– Language Grid in Japan– Language Grid in Thailand– Agricultural Service Grid– Education Service Grid– etc
8LREC 2010, 17-23 May, Malta
Language Grid
Service Grid Server Software
Application System
Service Invoker
Service Manager
Grid Composer
Other Service Grid
BPEL Composite Service Engine
NativeProgram
NetworkProgram
ResourceDatabase
Service Resource
Service Resource
Service Resource
Script Composite Service Engine
Java Composite Service Engine
Java AtomicService Engine
9LREC 2010, 17-23 May, Malta
Workflow in Language Grid
• Sample methods of workflow for composite services– Business Process Execution Language (BPEL)– Script– Java, etc
• Additional technique for composite service: Constraint Satisfaction
• X = {X1,…,Xn} is a set of abstract web services• D = {D1,…,Dn}
– Di = {si1,...,sik} where sij is a concrete web service of the corresponding Xi
• C = {C1,…,Cp} is a set of constraints
Workflow in Language Grid (2)
• X = {X1, X2, X3, X4, X5}– X1 : Morphological analyzer service– X2 : ja-en translation service – X3 : en-id translation service– X4 : Community dictionary service;– X5 : Term replacement service
• D = {D1, D2, D3, D4, D5} – D1 : {mecab at NTT, ICTCLAS, KLT at Kookmin
University, treetTagger at IMS Stuttgart};– D2 : {JServer at NICT, WEB-Transer at Kyoto-U,
Google Translation, Translution}– D3 : {ToggleText}– D4 : {Life Science Dictionary, Natural Disasters
Dictionary, Kyoto Tourism Dictionary}– D5 : {TermRepl service}• C = {C1, C2, C3}
– C1 : For multi hop translation, X2.OUT = X3.IN– C2 : For specialized translation service with dictionary, serverLocation(X2) = serverLocation(X4) – C3 : For morphological analysis, partialAnalyzedResult(X1.OUT) X∈ 2.IN
Japanese Morphological Analysis Service
Community Dictionary Service
ja->en Translation Service
Term Replacement Service
en->idTranslation Service
11LREC 2010, 17-23 May, Malta
Heart of Gold
Outline
Introduction Language Grid
WorkflowHeart of Gold
Processing Flow
Combination
Pipelining Support Service
Conclusion
12LREC 2010, 17-23 May, Malta
Heart of Gold
• Functional oriented middleware architecture for integrating deep and shallow Natural Language Processing (NLP) components
Application
Heart of Gold Middleware
queries results
External NLP Components
XML-RPC /Java API
MoCoMan
Computed annotation
ModulesExternal
persistent annotation database
13LREC 2010, 17-23 May, Malta
Heart of Gold – Deep NLP
• Key feature of Heart of Gold – unavailable in Language Grid
• Try to apply as much linguistic knowledge as possible
• Linguistic knowledge is declaratively encoded– Tom gave his son a toy
past(give(Tom, his son, toy))• Syntactic variants: ‘A toy was given by Tom to his
son’ or ‘Tom gave his son the toy’
Processing Flow in Heart of Gold
• 3 methods of processing flow for multiple NLP components– Varying depth of modules – Varying additional input & output annotation– Using SDL (System Description Language; Krieger, 2003)
• + (sequence)– one component starts after the previous component has
finished, taking its output as own input• | (parallelism)
– multiple components are executed in parallel in separate threads in Java
• ∗ (unrestricted iteration)– a component is executed in a loop until its output remains
unchanged
Processing Flow in Heart of Gold (2)
chunkiermrs = ( sprout_rmrs_morph + xslt_pos_filter + sprout_rmrs_lex + (* xslt_nodeid_cat + sprout_rmrs_phrase ) + slt_fs2rmrsxml)
sprout_rmrs_morph = SproutModulesTextDom("rmrs-morph.cfg")xslt_pos_filter = XsltModulesDomDom("posfilter.xsl", "aid", "Chunkie")sprout_rmrs_lex = SproutModulesDomDom("rmrs-lex.cfg")xslt_nodeid_cat = XsltModulesDomDom("nodeinfo.xsl", "aid", "Chunkie")sprout_rmrs_phrase = SproutModulesDomDom("rmrs-phrase.cfg")xslt_fs2rmrsxml = XsltModulesDomDom("fs2rmrsxml.xsl")
RMRS
result
SProUTrmrs_lex
SProUTrmrs_phras
e
SProUTrmrs_morp
hXSLT
pos_filterXSLT
nodeid_catXSLT
fs2rmrsxml
SProUT-XSLT cascaded language components
input sentence
16LREC 2010, 17-23 May, Malta
Combination
Outline
Introduction Language Grid
WorkflowHeart of Gold
Processing Flow
Combination
Pipelining Support Service
Conclusion
Combining Two Architectures
• Wrapping Heart of Gold as atomic service in language resource layer of Language Grid– Service Input: language identifier, text to be analyzed, depth of
analysis– Service Output: XML string
Kyoto
University
D
F
K
I
G
e
r
m
a
n
y
Intercultural Collaboration Tools
P2P Grid Infrastructure
Language Services(specialized translation, multi-hop
translation, …)
Language Resources (machine translations,
morphological analyzers, dictionaries, …)
Heart of Gold
Wrapped Web Service
Heart of GoldMiddleware
queriesresults
External NLP Component 1
..
.
XML-RPC
External NLP Component n
N
I
C
T
J
a
p
a
n
18LREC 2010, 17-23 May, Malta
Combining Two Architectures (2)
• What about composite service?– Unable to run the composite service from language
resource layer– Workflow & processing flow are different– Should move to upper layer: language service layer
• Solution– Use processing flow in Language Grid– Use workflow in Heart of Gold– Create pipelining service
Combination of Two Flows (1)
ChaSen
Science Dictionary Service
J-Serveren -> jaTranslation Service
Term Replacement
TreeTagger Hart of Gold (SProUT)
ChaSen
Science Dictionary Service
J-Serveren -> jaTranslation Service
Term Replacement
b) After Combination(Language Grid + Heart of Gold)
a) Before Combination (Language Grid)
The Temple of the Golden Pavilion = Kinkakuji
Tourism Dictionary Service
I visited the temple of the golden pavilion at Kyoto
Watashi ha kyoto de gooruden tenjikan no jiin wo
houmonshitaWatashi ha kyoto de
Kinkakuji wo houmonshita
The Temple of the Golden
Pavilion = −
I visited The Temple of the Golden Pavilion at Kyoto
<FS type="ne-location"> the temple of the golden pavilion
at Kyoto </FS>Processing flow
Combination of Two Flows (2)
• Utilizing Service as a Software – Wrap language service containing workflow as Heart of
Gold component– Useful for NLPs with limited supported language
(ex: ChunkieRMRS is only available for German & English)
Specialized ja-en
translation service
ChunkieRMRSSpecialized
en-jatranslation
service
XML Converter XML Converterinput sentence
in Japanese
output RMRSmerge
in Japanese
Heart of Gold componentsworkflowworkflow
21LREC 2010, 17-23 May, Malta
Pipelining Support Service
Outline
Introduction Language Grid
WorkflowHeart of Gold
Processing Flow
Combination
Pipelining Support Service
Conclusion
22LREC 2010, 17-23 May, Malta
Supporting Service for Pipelining NLP
Extended Workflow Repository in
Constraint Optimization
Language Service Information
Repository (WSDL, QoS Profile)
Language Component Information Repository
(Class, Depth, Input-Output)
Processing Flow & Workflow Integrator Service
SDL WriterWorkflow Analyzer
Processing Flow Analyzer
Component Information
Service Profile
Set of Workflows
New Workflows +SDL
• A service to orchestrate a new workflow containing processing flow (SDL)– by analyzing current workflow and processing flow– useful for pipelining NLP
• Can be offline or online with user request
23LREC 2010, 17-23 May, Malta
Conclusion
Outline
Introduction Language Grid
WorkflowHeart of Gold
Processing Flow
Combination
Pipelining Service Conclusion
24LREC 2010, 17-23 May, Malta
Conclusion
• Composite language services & language components can be integrated– by utilizing their processing flow & workflow
• Additional pipelining support service to modify the existing workflow
• Language service is a good way to combine human and machine language processing
• Flexibility for high speed pipeline: BPEL, Script, etc• Possible intra-server workflow from the integration
Contribution
Lesson Learned
25LREC 2010, 17-23 May, Malta
Q & A
Thank you for listening