multilingual cataloguing of product information of specific domains: case mkbeem system aarno...
Post on 30-Dec-2015
216 Views
Preview:
TRANSCRIPT
Multilingual Cataloguing of Product Information of Specific Domains: Case Mkbeem System
Aarno Lehtola, Jarno Tenni and Tuula Käpylä
VTT Information Technology
Contents: MotivationMkbeem in a nutshellMultilingual Cataloguing ToolMeaning extractionExperiences of test usersFutureDEMO
VTT TIETOTEKNIIKKA
Online Language Challenges for eCommerce
Ref: Global Reachhttp://www.glreach.com/
(end of year) 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005upper
limit (M)
English 40.0 45.0 72.0 148 192 231 245 270 295 320 550Total Non-English:
10.0 16.0 45.3 109 211 307 663 540 680 820 5850
TOTAL: 50.0 70.0 117.0 245 391 522 435 792 956 1120 6400
Native English speakers comprise lessthan 9 % of the world population.
"If I'm selling to you, I speak your language. If I'm buying, dann müssen Sie Deutsch sprechen". (Willy Brandt)
VTT TIETOTEKNIIKKA
An Answer: MKBEEM and Multilingual eCommerce Mediation
CustomerCP/SP User
MKBEEMMediation System
CP/SP eCom
Service
MonolingualCP/SP
• EC FP 5 IST/HLT project in 2000-2002, budget 4,9 M€
• Goal: Develop intelligent knowledge-based key components (HLP & KRR) for applications in multilingual eCommerce
• Language adaptation via automatic HL translation and interpretation
• Natural dialogues combining HL & navigation
• Harmonised ontologies enabling localised views to products and trading contracts
Customerlanguage information retrieval &trading
Multilingualcataloguing:write once, publish many
Transactions with contract adaptation
• Generic solutions proved by trials in Finnish, French and English in the domains of travel and mail-order sales
• More information: www.mkbeem.com
VTT TIETOTEKNIIKKA
Generic Architecture of Mkbeem
Content/Ser vice Provider
CP Agent
Rational Agent
DomainOntology
Server
User Interface
Customer
CP E-Commerce platform
CP Interface
Content/Ser vice Provider
CP Agent
CP Agent
UserAgent
CP Information
System
Manager Interface
MKBEEM System
Manager
TradingOntology
Server
Manager Agent
HumanLanguage Processing
Server
VTT TIETOTEKNIIKKA
Colour OntologyColour Ontology
14 products found:1. Beige winterjacket of
wool2. Ochre quilted jacket of
cotton...
Any further requirements?
14 products found:1. Beige winterjacket of
wool2. Ochre quilted jacket of
cotton...
Any further requirements?
Extracting Product PropertiesExtracting Product Properties
"Toppatakki. Muhkea malli, olkapäissä vahvikkeet.
Painonapeilla kiinnitetty huppu, jossa joustava nyöri. Vetoketjun alla suojalista. Kaksi kannellista taskua...
"Toppatakki. Muhkea malli, olkapäissä vahvikkeet.
Painonapeilla kiinnitetty huppu, jossa joustava nyöri. Vetoketjun alla suojalista. Kaksi kannellista taskua...
Meaning extraction
Machine translation
Dialogue processing
...
Meaning extraction
Machine translation
Dialogue processing
...
User Information Request Proc.
User Information Request Proc.
A brown jacket made of natural material
A brown jacket made of natural material
One with a hoodOne with a hood
"Toppatakki. Muhkea malli..." "Quilted jacket. Puffy model with reinforcements on the shoulder..."
jacket(X,quilted_jacket), model(X,puffy), part(X,Y,sleeves), property(Y,Z,reinforcement)...
"Toppatakki. Muhkea malli..." "Quilted jacket. Puffy model with reinforcements on the shoulder..."
jacket(X,quilted_jacket), model(X,puffy), part(X,Y,sleeves), property(Y,Z,reinforcement)...
Product ModelProduct Model
Multilingual Product Data
Material OntologyMaterial Ontology
Mkbeem: Bridging Languages via Language Neutral Ontologies
Ontological Formula in CARIN:(c_colour)(X),
(r_name)(X,brown),(c_product)(Y),
(r_name)(Y,jacket),(c_material)(Z),
(r_name)(Z,nat_mat).
Ontological Formula in CARIN:(c_colour)(X),
(r_name)(X,brown),(c_product)(Y),
(r_name)(Y,jacket),(c_material)(Z),
(r_name)(Z,nat_mat).
VTT TIETOTEKNIIKKA
Mkbeem: Multilingual Cataloguing Tool
• Starting point:
• The new product belongs to the supported product domains
• Available a textual product description in one of the supported languages and a photograph
• Basic functionalities:
• Text checking
• Property extraction
• Product Categorisation
• Machine Translation
• NL Query Processing
• Technical key challenge:
• Formalising relationship of ontologies and HL and
• Extracting meaning of input HL texts with respect to provided ontologies into the form of Ontological Formulas
VTT TIETOTEKNIIKKA
Meaning Extraction: Example in Clothing Domain
Long skirt with cargo pockets
Jupe longue avec des poches battle-dress
Pitkä hame, jossa reisitaskut
(c_MKBEEM:81007:clothingProduct)(H6641),
(r_name)(H6641,H6989),
(c_MKBEEM:83383:property)(H6552),
(r_name)(H6552,H6889),
(c_MKBEEM:81011:part)(H6730),
(r_name)(H6730,H7295),
(l_dependency)(H6989,adjAttr,H6889),
(l_dependency)(H6989,prepAttr,H7295),
(l_constituent)(H6889,0,long,[en,long,adj,nom,sg,property]),
(l_constituent)(H6989,1,skirt,[en,skirt,noun,nom,sg,product]),
(l_constituent)(H7295,4,cargo#pockets,
[en,cargo#pocket,noun,nom,pl,prodpart])
Concept Bindings
LinguisticDependencies &Lexical info
VTT TIETOTEKNIIKKA
Linguistic
Services
Linguistic Ontology
Webtran MT System
VTT’s implementation of HLP Services in Mkbeem
extractMeaning
HL
string
On
tolo
gic
Fo
rmu
laALEs for MT
(965 btwFinnish, French
English)
Cone: Onto S/W Inference
DomainLexica
(Finnish/~4500French/~1700English/~1500Fi->Fr/~1300Fi->En/~1500)
Meaning Extractor
Verification
Unifier: Text Correction S/W
Webtran Dependency Parser
checkText
HL
str ing
OK
or
corr
ecti
onH
L str in
g
Tra
nsl
ated
str
ing
translateText
Product ModelColour Ontology
Material OntologyAltogether:
307 concepts1050 attributes
150 ALEs embedded
Concept Bindings
Functions:
KBs:
VTT TIETOTEKNIIKKA
Augmented Lexical Entries
• Augmented Lexical Entries (ALE) rules (see MT Summit 99): • Bilingual or multilingual non-directed entries representing phrase and
sentence structures and possibly their translation relations.• Both surface form entries and generalised rules• Possible to declare multidirectional entries• Declarative and intuitive formalism - to be used by translators• Uniform way of representing phenomena on different levels of language • Designed to be suitable for automated or machine supported language
modelling (see SMC 99 paper on learning translation grammars)• Can be viewed as a forest of partial dependency parse trees• Near relationship obtainable to the corresponding conceptual structures
(concept bindings to ontologies)
• Lexicon • All the allowed words• Monolingual and bilingual entries
VTT TIETOTEKNIIKKA
Meaning Extraction: A Product Ontology with ALEs Embedded
VTT TIETOTEKNIIKKA
Syntax of ALEs
augmented_lexical_entry ::= [ entry_name pattern.. opt_message opt_repair ]entry_name ::= name . number_indexname ::= hierarchical_name_w_dots_betw_partspattern ::= [ opt_language_id constituent_def.. ] opt_message ::= | [ message string_w_opt_binding ]opt_repair ::= | [ repair string_w_opt_binding ]constituent_def ::= constituent_def*constituent_def ::= constituent_def.. constituent_def ::= < constituent_def.. > constituent_def ::= opt_regent_mark opt_lexeme opt_binding opt_feature_constraint opt_language_id ::= | ISO_std_lang_identifier | ~ ISO_std_lang_identifierISO_std_lang_identifier ::= ee | en | fi | fr | se | opt_regent_mark ::= | ^ opt_lexeme ::= | lexeme | tag | nameopt_binding ::= | binding opt_feature_constraint ::= | { feature.. } binding ::= ( variable_name ) | (^) feature ::= feature_value | property_type binding
VTT TIETOTEKNIIKKA
Examples of ALEs - 1/3
• Basic word correspondence definition:
• Specific idiom correspondence:
• Generalised ALE, e.g. "shirt of 100% cotton”
[footwear.word.27[se ^allväderskänga][fi ^jokasäänkenkä]
[en all weather ^shoe]]
[price.tax.4[se inkl. ^moms tag_price(X)][fi sis. ^alv tag_price(X)] [en incl. ^VAT tag_price(X)]]
[cloth.material.composition
[fi ^(A){clothProd} tag_percentage(X)
(B){textileMaterial ptv}]
[fr ^(A){clothProd} en tag_percentage(X)
(B){textileMaterial}]
[en ^(A){clothProd} of tag_percentage(X)
(B){textileMaterial}]
[se ^(A){clothProd} av tag_percentage(X)
(B){textileMaterial}]
VTT TIETOTEKNIIKKA
Examples of ALEs - 2/3
• Semantical and grammatical restrictions,
e.g. agreement in “miellyttävä pusero”
or “miellyttävää puseroa”
(“comfortable blouse”)
• An iterative phrase, obs! tree flattening:
[cloth.property.2
[se property.expr{clothProp}
^(B){cloth}]
[fi property.expr{clothProp}
^(B){cloth}]
[en property.expr{clothProp}
^(B){cloth)] ]
[cloth.property.1
[se (A){adj clothProp gender(B) number(B)}
^(B){noun clothProd}]
[fi (A){adj clothProp case(B) number(B)}
^(B){noun clothProd}]
[en (A){adj clothProp}
^(B){noun clothProd}] ]
[property.expr.1
[se (A){adj prop gender(^) number(^)} ]
[fi (A){adj prop number(^) case(^)} ]
[en (A){adj prop} ] ]
[property.expr.2
[property.expr.2 tag_comma property.expr.3]]
[property.expr.3
[property.expr.1 {conjAND} property.expr.1]]
VTT TIETOTEKNIIKKA
• Negative Instances - Correction ALEs
[correct.ellos.3[~se kardborrstängning(A)][~se kardborreförslutning(A)][~se kardborrknäppning(A)][~se kardborreknäppning(A)][message Use the correct synonym
“kardborrestängning” instead of word(A)]
[repair kardborrestängning(A)]
Examples of ALEs - 3/3
VTT TIETOTEKNIIKKA
Inference of CARIN formulas Syntactico-semantic analysis
LexicalAnalysis
Set of tokens
DependenceAnalysis
Set of syntacticdependence trees
Set of approved lexical-semantic graphs with
concepts identified
Subset of refined semanticgraphs
Refining semanticsin particular themes,e.g. colors, materials,
distances
Set of CARIN formulas
Syntactictranslationinto CARIN
Input phrase
ConceptMatching
& Verification
Meaning Extraction Process
VTT TIETOTEKNIIKKA
Inference of CARIN formulas
Syntactictranslationinto CARIN
Syntactico-semantic analysis
Set of CARIN formulasInput phrase
LexicalAnalysis
Set of tokens
DependenceAnalysis
Set of syntacticdependence trees
Subset of refined semanticgraphs
Refining semanticsin particular themes,
e.g. colours, materials,distances
ConceptMatching
& Verification
musta hame, jossa halkio ja taskutune jupe noire avec fente et pochesa black skirt with split and pockets
(c_MKBEEM:81098:colour)(H1017),(r_name)(H1017,H641),(c_MKBEEM:84731:clothingProduct)(H984),(r_name)(H984,H684),(c_MKBEEM:81011:part)(H951),(r_name)(H951,H813),(c_MKBEEM:81011:part)(H918),(r_name)(H918,H899),(l_dependency)(H684,adjAttr,H641),(l_dependency)(H684,prepAttr,H813),(l_dependency)(H684,prepAttr,H899),(l_constituent)(H641,0,musta, [fi,colour,musta,adj,nom,sg]),(l_constituent)(H684,1,hame, [fi,product,hame,noun,nom,sg]),(l_constituent)(H727,2,tag_comma, [fi]),(l_constituent)(H770,3,jossa, [fi,jossa,pron,ine,sg]),(l_constituent)(H813,4,halkio, [fi,prodpart,halkio,noun,nom,sg]),(l_constituent)(H856,5,ja,[fi,conj,ja,coord_c]),(l_constituent)(H899,6,taskut, [fi,prodpart,taskut,noun,nom,pl])
Set of approved lexical-semantic graphs with
concepts identified
(c_product)(H1606), (r_name)(H1606,skirt), (c_colour)(H1573), (r_name)(H1573,black), (c_part)(H1540), (r_name)(H1540,split), (c_part)(H1506), (r_name)(H1506,pocket)
Meaning Extraction Process Example
VTT TIETOTEKNIIKKA
Cataloguing Tool Testing by End-Users
• Goals:• Proof of concept (“Swiss army knife of a cataloguer”)• Usability in real working environment
• Ellos' test group consisted of 8 persons (translators, cataloguers and call-centre workers):
• familiar with Internet: 5 yes, 1 almost yes, 2 yes at home• languages used: 8 Finnish, 6 English, 4 Swedish, 1 French• familiar w. catalogue maintenance: 6 yes, 2 no
• Schedule:• Short training and preliminary interviews on August 30, 2002• Interviews of experiences and summary of the results ready by
October 14, 2002
VTT TIETOTEKNIIKKA
... Trial experiences of the Ellos test group
• Cataloguing tool considered to be useful:
• cataloguing process as a whole was seen as an easy and efficient way of producing and classifying product information
• each of the main features was considered good
• very important:semi-automatic translation into target languages
• property extraction and inference with colours and materials seen as important in bringing value-adding services to customers
• helps in producing consistent and uniform information
• can make the working process faster and reduce the amount of manual, repeated routine procedures
• KB management tools considered suitable to their task
• Reported difficulty:
• occasionally long response times => boring of the user + e.g. repeating queries
• e.g. "hourglass" or provision of partial results could bring quick help
• will be eventually solved by continued product development
VTT TIETOTEKNIIKKA
MT Part (Webtran) in Production Use at Ellos since 2000
Language technology solutions are necessary to embed into business processes and IT infrastructure
Catalogueauthor
SourceDB
MacQuarkXPress
Cataloguer
MacQuarkXPress
LocalisedDB
WebtranMachine Translation
Software
AutomaticSw -> Fi
Translation
LanguageModeller PC Server
Swedish Finnish
Ellos Sweden Ellos Finland
About 2000 translated catalogue pages and 10000-15000 product descriptions per year
Benchmark by CSC Inc. reports over 30% time savings after one year of use
VTT TIETOTEKNIIKKA
• Marginal cost of adding a new domain or a new language is reasonable with respect to the added-value gained
• Based on experiences from modelling vacation cottage domain to the system (fi,fr,en) we have estimated that introducing a comparable new domain would require:
• semantic-lexicon: 2 man-months
• translation and meaning extraction rules: 1 man-month
• product models: 2-4 man-weeks
• We also estimate that adding a language to a pre-existing domain would need:
• semantic-lexicon: 1-2 man-month
• translation and meaning extraction rules: 2-4 man-week
• product models: 1 man-week
Work Needed for Adding Domain and Languages
VTT TIETOTEKNIIKKA
Future Development Recommendations
• Further development of could focus on the following issues:
• information request processing dialogues:
• question answering capabilities (e.g. qualitative questions about the goods selection)
• proper way of handling null queries (e.g. graceful relaxation of the search constraints based on the ontology models and the actual goods selection)
• new languages to the system: Russian, Norwegian, Estonian, German ...
• user-friendlier ways for the acquisition and maintenance of language models and product models (knowledge acquisition bottleneck): machine learning
• special requirements of mobile terminals (e.g. automatic text abstraction)
VTT TIETOTEKNIIKKA
DEMO
top related