n. calzolari [flarenet]neeri workshop, helsinki, september 20091 e content plus standards: strength...

41
N. Calzolari [FLaReNet] NEERI Workshop, Helsinki, September 2009 1 e Content Content plus plus Standards: strength and limitations Standards: strength and limitations LMF LMF Nicoletta Calzolari Nicoletta Calzolari [email protected] [email protected] Fostering Language Resources Network http:// http:// www.flarenet.eu www.flarenet.eu

Upload: lee-annand

Post on 30-Mar-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 1

ee Content Content plusplus

Standards: strength and limitationsStandards: strength and limitations

… … LMFLMF

Nicoletta CalzolariNicoletta [email protected]@ilc.cnr.it

Fostering Language Resources Network

http://http://www.flarenet.euwww.flarenet.eu

Page 2: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 2

In Europe the so-called X-LEX X-LEX projects:ACQUILEX MULTILEXGENELEX

and other lexical and text annotation/representation projects: NERC ET-7ET-10DELIS

that saw the participation of many EU groups, linked by sharing similar approaches and visions

EAGLESISLE

After the “Grosseto Workshop” (1985): a turning

point

Historical notes

Start:Start: ZampoZampo

lli lli breakfbreakf

ast ast meetinmeetin

gg EAGLES EAGLES acronym

… by Cencioni

Page 3: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 3

ReusabilityReusability as key concept true also todayTo avoid duplication of efforts, costs, etc.To allow synergies, integration, exchange of data, ...To provide a model for new data creation & acquisition

Decide on “feasible”“feasible” areas & state priorities priorities this is changing over time

The feasibility of formulation of consensual standards as a strong sign of maturity strong sign of maturity in the field we can’t propose standards if there are not enough results on which to base them

EAGLES was launchedEAGLES was launched in ‘93 in ‘93

Key issues: Do conditions Key issues: Do conditions exist exist for standardisation effort?for standardisation effort?

Page 4: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 4

Some standard-related projects & initiatives

Defining standards/best practice:TEI: creating standards for text annotation NERC: creating the basis to bottom-up empirical harmonisation, based on extensive best-practice analysisEAGLES: introducing a methodological model for standard workISLE: extending in topics & communitiesLIRICS: preparing for international standardsISO/TC 37/SC 4/WG 4: going to international standards LMF … & many othersNEDO: porting to Asian languages MultilingualWeb: new Thematic Network for relation with W3C

Page 5: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 5

Some standard-related projects & initiatives (cont.)

Using standards/best practice:MULTEXT & MULTEXT-EAST: applying to lexicons & text annotation, with EAGLES compliant specs

PAROLE-SIMPLE lexicons: morphology, syntax & semantics: operational specs & constraints betw. lexical descriptors (12 languages)

EuroWordNets: a de-facto best-practice

BOOTStrep: terminologies in Bio-domain: BioLexicon

KYOTO: in the environment domain

PANACEA: in a platform for LR acquisition

Page 6: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 6

Some standard-related projects & initiatives (cont.)

Promoting standards/best practice:

INTERA: for a EU repository of language data

ENABLER: to link EU & national initiatives

ELRA: the EU LR association

LanguageGrid: Japanese infrastructure for LR services

CLARIN: LR standards for the Humanities & Social Sciences

FLaReNet: LR standards for Human Language Technologies

T4ME NoE: for an Open Resource Infrastructure

Page 7: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 7

Main Results in Lexicon & Corpus Main Results in Lexicon & Corpus WGsWGs

First Phase First Phase (www.ilc.pi.cnr.it/EAGLES96/home.html)(www.ilc.pi.cnr.it/EAGLES96/home.html)Standard for morphosyntactic encodingmorphosyntactic encoding of lexical entriesof lexical entries, in a

multi-layered structure, with applications for all all the EU languages

Standard for subcategorisation in the lexiconsubcategorisation in the lexicon: a set of standardised basic notions using a frame-based structure

Proposal for a basic set of notions in lexical semanticslexical semantics: focus on requirements of Information Systems and MT

Corpus Encoding Standard (CES)Corpus Encoding Standard (CES) from TEI

Standard for morphosyntactic annotationmorphosyntactic annotation of corpora, to ensure compatibility/ interchangeability of concrete annotation schemata 

Preliminary recommendations for syntactic annotationsyntactic annotation of corpora

Dialogue annotationDialogue annotation, for integration of written and spoken annotation

Page 8: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 8

Content vs. Format/RepresentationContent vs. Format/Representation

Work on lexical description deals with two aspectsLinguistic descriptionLinguistic description of lexical items (contentcontent)Formal representationFormal representation of lexical descriptions (formatformat)

EAGLES concentrated on linguistic contentlinguistic content, not disregarding the formal representation of the proposal

TEI more on format/representation issuesIn In

In LMF : LMF : on the abstract meta-model

Page 9: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 9

Flexibility in the RecommendationsFlexibility in the Recommendationse.g. Morphosyntaxe.g. Morphosyntax

Level Information Type Recommendation Recommendation

L-0 Part-of-Speech ObligatoryObligatory

L-1 Morphosyntactic agreement RecommendedRecommended

features L-2 Language-specific (or refined) OptionalOptional features

Page 10: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 10

MERITS MERITS Strengths Strengths (from EAGLES-ISLE)(from EAGLES-ISLE)

Standardisation as a necessary component of any strategic programme to create a coherent marketcoherent marketLeading industrialsindustrials & academics participated (> 150 EU > 150 EU groupsgroups)

Bottom-up community created standards

To avoid wasting timeTo avoid wasting time reinventing basic/consolidated knowledge

May be true also for many “humanities” users, not interested in debates on specific lexical approaches

Work otherwise duplicated among many projects, done just just onceonce in a collaborative manner (overall cost-effectivenessoverall cost-effectiveness)Allows the field to be more competitivemore competitive:

Concentrate efforts on innovative areas Engage in new/advanced technology

Page 11: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 11

Why Standards for Why Standards for Language Resources? Language Resources? (from EAGLES-ISLE)(from EAGLES-ISLE)

To ensure:

interoperability of systems (& data), through compatible interfaces

reusability and integrability of components

training based on consensual technical specifications and models (“gold standards”)

evaluation & validation based on agreed criteria

transition from prototypes to HLT products

important for workflows

essential for a LR Infrastructure

for evaluation campaigns

Page 12: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 12

The applications: requirements The applications: requirements for systems & enabling for systems & enabling technologiestechnologies

Machine TranslationMachine TranslationInformation Extraction Information Extraction Information Retrieval Information Retrieval Summarisation Summarisation Natural Language GenerationNatural Language GenerationWord Clustering Word Clustering Multiword Recognition + Multiword Recognition + Extraction Extraction Word Sense DisambiguationWord Sense DisambiguationProper Noun RecognitionProper Noun RecognitionParsingParsingCoreferenceCoreference……

II For For HLT HLT

knowledge knowledge of of

applicationapplications’ s’

requiremerequirementsnts is is

essentialessential

Page 13: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 13

The Multilingual ISLE Lexical The Multilingual ISLE Lexical Entry (MILE)Entry (MILE)

General methodological principlesmethodological principles (from EAGLES)

Basic requirements for the design of the MILEMILE::

Discover and list the (maximal) set of basic notionsbasic notions needed to describe the MILE (up to which level standardisation is feasible?)

GranularityGranularity

The leading principle: the edited unionedited union of existing lexicons/models (redundancyredundancy is not a problem)

Modular & layeredModular & layered

Allow for underspecification (& hierarchical structure)underspecification (& hierarchical structure)

Page 14: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 14

MILE – Modularity The building-block model

syntacticframe

phrasephraseslot Synfeature

Lexical Objects

Semfeature

Lexical entry 1Lexical entry 1 Lexical entry 2Lexical entry 2 Lexical entry 3Lexical entry 3

Independent, but interlinked, modules allow to Independent, but interlinked, modules allow to express different dimensions of lexical entriesexpress different dimensions of lexical entries

Page 15: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 15

MILE Lexical Classes & Lexical Objects vs ISO LMF

Lexical Classes as the main building blocks of the lexical architecture

Building blocks allow two kinds of reusability: intra-lexicon reusability (within the same lexicon) inter-lexicon reusability (among different lexicons)

Define an ontology of lexical objects represent lexical notions such as semantic unit, syntactic

feature, syntactic frame, semantic predicate, semantic relation, synset, etc.

specify the relevant attributes define the relations with other classes hierarchically structured

Done in LMF

To be done … (in ISOCat?)

Page 16: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 16

The MILE Data Categories User-adaptability and extensibility

HUMANARTIFACTEVENTANIMALGROUP

AGEMAMMAL

instance_of

Core

UserDefined

MLC:SemanticFeature

OK in ISOCat

Page 17: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 17

MILE Lexical Data Category RegistryA library of pre-instantiated objects

Enables modular specification of lexical entities eliminate redundancy identify lexical entries or sub-entries with shared

properties create ready-to-use packages that can be combined

in different ways

Can be used “off the shelf” or as a departure point for the definition of new or modified categories ISOCat

ISO ProfilesISO Profiles

Page 18: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 18

ISO - LMFLexical Markup Framework

Designed to accommodate many models of lexical representation

Its pros: Meta-model: abstract high-level specification ISO24613 Data Category Registry: low-level specifications

ISO12620 Not a monolithic model, rather a modular

framework LMF library provides the hierarchy of lexical objects

(with structural relations among them) Data Category Registry provides a library of descriptors

to encode linguistic information associated to lexical objects (N.B. Data Categories can be also user-defined)

Page 19: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 19

ISO LMF

Morphology

NLP Multilingual notations

NLP MWE pattern

NLP Paradigm class

NLP Semantic

MRD

NLP Syntax

Constraint Expression

Core Package

Structural skeleton, with the basic hierarchy of information in a lexical entry

+ various extensions

Modular framework LMF specs comply with

modelling UML principles an XML DTD allows

implementation

Builds on Builds on EAGLES/EAGLES/ISLEISLE

NEDONEDOAsian Asian Lang.Lang.

The field is The field is

maturemature

NICT Language-

Grid Service Ontology

ICTICT

KYOTOKYOTO

LIRICSLIRICSNew

initiatives…

LexInfo

Page 20: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 20

Mapping experiment

Major best practices:OLIFPAROLE/SIMPLELC-Star (Speech Lexicon)WordNet - EuroWordNetFrameNetBDef formal database of lexicographic definitions derived from Explanatory Dictionary of Contemporary French

Entries from major existing lexicons mapped to LMF Entries from major existing lexicons mapped to LMF To prove that the model is able to represent many model is able to represent many

best practicesbest practices To test the expressive potentialities, the adequacy of

architectural model & linguistic objects

from Monica Monachini

Page 21: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 21

BioLexicon SIMPLE model & ISO-LMF standard

BLBLBLBL

A unique large-scale computational lexicon in the biomedical domain in

terms of coverage & typology of information Populated with info from

available biomedical resources

Semi-automatically populated from corpora:

Population toolkit available

Including both domain-specific & general

language words

Rich linguistic information ranging over different linguistic

descriptions levels

Conformant to international Conformant to international lexical representation lexical representation

standardsstandards

Designed to meet bio-Text Mining requirements

from Monica Monachini

Page 22: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 22

<Sense rdf:ID=“activate_2"> <belongsToSynset rdf:resource="#activate"/> <hasSemanticRelation rdf:resource="#is_a_1"/> <hasSemanticRelation rdf:resource="#has_as_part_1"/> <hasSemanticRelation rdf:resource="#object_of_the_activity_1"/> <hasSemanticFeature rdf:resource="# SF_chemistry"/> <hasSemanticFeature rdf:resource="# SF_process"/> </Sense>

Sense

activate_2

Synset

activate

PredicativeRepresentatio

n

SemanticFeature

SF_chemistry

SF_process

Collocation

SemanticRelation

is_a: [SenseID]

Typical_of: [SenseID] S_protein

Sense Representation

Page 23: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 23

KYOTO SYSTEMLinear

MAF/SYNAF

LinearSEMAF

Term extraction Tybot Generic

TMF

Semantic annotation

LinearGenericFACTAF

Fact extraction Kybot

Domain editing Wikyoto

Wordnet

Domain Wordnet

LMF API

Ontology

Domain ontology

OWL APIConceptUser

FactUser

from Piek Vossen

SourceDocuments

Page 24: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 24

GlobalInformation

Lemma

MonolingualExternalRef

MonolingualExternalRefs

Sense

LexicalEntry

Statement

Definition

SynsetRelation

SynsetRelations

MonolingualExternalRef

MonolingualExternalRefs

Synset

Lexicon

InterlingualExternalRef

InterlingualExternalRefs

SenseAxis

SenseAxes

LexicalResource

1..1 1..* 0..1

1..*1..*

1..1 0..*

0..1

1..*

Meta0..1

0..1

Meta

0..1 0..1

Meta Meta

0..1

Meta

0..*

0..1 0..10..1

1..* 1..*0..*

0..1

1..*

A common representation A common representation format: format: WordNet - LMFWordNet - LMF

Data Categories

from Monica Monachini

Page 25: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 25

Centralized WordNet DC Registry

A list of 85 sem.rels as a result of a mapping of the KYOTO

WordNet grid Inter-WN

Intra-WN

from Monica Monachini

Page 26: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 26

SWN<fuego_3, llama_1>

09686541-n

<!ELEMENT SenseAxes (SenseAxis+)><!ELEMENT SenseAxis (Meta?, Target+, InterlingualExternalRefs?)><!ATTLIST SenseAxisid ID #REQUIREDrelType CDATA #REQUIRED><!ELEMENT Target EMPTY><!ATTLIST TargetID CDATA #REQUIRED><!ELEMENT InterlingualExternalRefs (InterlingualExternalRef+)><!ELEMENT InterlingualExternalRef (Meta?)><!ATTLIST InterlingualExternalRef externalSystem CDATA #REQUIREDexternalReference CDATA #REQUIREDrelType (at|plus|equal) #IMPLIED>

IWN<fuoco_1, fiamma_1>

00001251-n

WordNet-LMF multilingual level - Cross-lingual relations

WN3.0<fire_1 flame_1 flaming_1>

13480848-n

groups monolingual synsets corresponding to each other and sharing the same relations to English

link to ontology/(ies)

specifies the type of correspondence

from Monica Monachini

Page 27: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 27

LexInfo & Previous Models

LingInfo: modeling morphosyntatic decomposition of (complex) terms [Buitelaar et al. 2006]

LexOnto: capturing syntactic behaviour and syntax-semantics links [Cimiano et al. 2007]

Lexical Markup Framework (LMF): ISO standardised model for representing machine readable lexica (agnostic about connection with ontology) [Francopoulo et al. 2007]

LexInfo: building on LMF as a core, develop a model which “subsumes” LingInfo and LexOnto for flexibly associating linguistic information to ontologies [Buitelaar, Cimiano, Haase, Sintek 2009]

From Paul Buitelaar

Page 28: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 28

LexInfo: Lexical Entry Sub-Categorization Frames

From Paul Buitelaar

Page 29: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 29

MILE Lexical Model oriented towards an Open Distributed Lexical

Infrastructure

Lexical Information Servers for multiple access to lexical information repositories

Enhance user-adaptivity resource sharing cooperative creation of LR & LT

Develop integration and interchange tools

Beyond MILE: future work

Page 30: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 30

Some steps for a “new generation” of LRs

From huge efforts in building static, large-scale, general-purpose LRs

To dynamic LRs rapidly built on-demand, tailored to specific user needs

From closed, locally developed and centralized resources

To LRs residing over distributed places, accessible on the web, choreographed by agents acting over them

From Language Resources

To Language Services BUT

• Need of tools to make this vision operational & concrete

Interoperabili

Interoperabili

tyty

Page 31: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 31

Lexical WEB & Content Interoperability

As a critical step for semantic mark-up in the SemWeb

ComLex

SIMPLE

WordNetsWordNets

WordNets

FrameNet

Lex_x

Lex_y

LMFLMF

with intelligent

agents

NomLex

Standards Standards for for

InteroperaInteroperabilitybility

EnougEnough??h??

Global WordNet GRIDGlobal WordNet GRID

BioLexicon

SIMPLE-WEBSIMPLE-WEB

Page 32: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 32

A new paradigm of R&D in LRs & LTA new paradigm of R&D in LRs & LTDistributed Language Services

Open & distributed infrastructures for LRs & LTOpen & distributed infrastructures for LRs & LTAdopting the paradigm of accumulation of knowledgeaccumulation of knowledge so successful in more mature disciplines, based on sharing LRs & toolsAbility to build on previous achievements, results accessible to various systems, allowing effective effective cooperation of many groups on common taskscooperation of many groups on common tasksExchange and integrate information across repositoriesCreate new resources on the basis of existing Compose new services on demand…

A new scenario implying content interoperability standards development of architectures enabling accessibility supra-national cooperation

Page 33: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 33

A few Issues for discussion:A few Issues for discussion:“content”, guidelines, tools, “content”, guidelines, tools,

priorities, ...priorities, ... For Semantic Web Semantic Web and “content” interoperability:“content” interoperability: is the field

‘mature’ enough to converge‘mature’ enough to converge also for the semantic/conceptual level (e.g. to automatically establish links among different languages)?

For the standards to have impact, ensure their usabilityusability & gain industry support focusing on requirements of industrial requirements of industrial applicationsapplications

To have Guidelines Guidelines which are a “usable product” “usable product” (to assist in creation or adaptation of lexicons, to share resources, …)

Facilitate acceptance of the standards providing an open-source open-source reference implementation platform & toolsreference implementation platform & tools, related web servicesweb services and test suites

Relation with Spoken language Spoken language community Define further stepsfurther steps necessary to converge on common prioritiespriorities

Page 34: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 34

Limits observed& needs of further work

For usability & operability of LMF: Data Categories (DC) & others: From Japanese NEDO: DC not defined in LMF & LMF non operational

Asian, African DCs Need of DC organised in profiles (easy to use) IsoCat & Profiles Need of an ontology of DCs with structure/dependencies, and

constraints Otherwise the model remains too abstract, and doesn’t say anything on how

to implement concretely the different layers Link with Ontologies: relations Lexicons-Ontologies Need of easy, user-friendly guidelines Need of tools to make it operational, also for creating standard

compliant resources: more important than the model! More dissemination, also with industry

Linguists may be (rightly for certain purposes) not interested Younger colleagues not aware of the past work on standards

Need of operational definitions of interoperability Need of stimuli also from EC to produce standard-compliant resources

(unless differently motivated)

Page 35: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 35

Strengths

Good set of methodological principles: Granularity of basic notions, …

Many languages already compliant with EAGLES morpho-syntax, etc.

Many projects today using LMF Unified Lexicon experiment between Speechdat & Parole, at

ELRA (possible because EAGLES compliant) Web-services to access LRs based on standards Web-based platforms for LR integration An open infrastructure of LRT need standards New topics being constantly added: Time, Space, …

Page 36: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 36

Future requirements & planning

To make LMF usable and operationalLMF User Guidelines with examples Mapping of commonly used lexicons into LMF Data categories for LMF lexiconsTool related to LMF, with particular reference to the Lexus tool

Need to address another layerThe ontological layer in a lexiconHow lexicons and ontologies are linked and information mapped from each other An open space in a wiki encironment to store guidelines, examplesto allow broad discussion on these topics to ease dissemination of LMF

Page 37: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 37

FLaReNet Mission: structure the area of LR & LT of the

future Worldwide Forum for LRs & LTs

Consolidate methods, approaches, common practices, architectures Integrate so far partial solutions into broader infrastructures

A “roadmap”“roadmap”: a plan of coherent actions as input to policy development

For the EU, national organisations & industryAs a model for the LRs/LTs of the next yearsStrengthening the language product market, e.g. for new products & innovative services

Identifying areas where consensus is achieved/emerging vs. areas where more discussion & testing is requiredIndicating priorities

221221 Individual Subscribers 8181 Institutional Members from

31 countries

Page 38: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 38

Promote knowledge of standards in the community Define specifications for tools supporting standards Support workshops/tutorials on how to use standards Start focusing on standards for more consensual areas &

develop for these a toolkit that can be used off-the-shelf, so that we can move on to tackling the larger problems

Identify “best practices” in standards wrt usability, usefulness, viability, outreach etc.

Adopt a model for tool & resource development based on open & collaborative development, where the community as a whole contributes components, modules, etc. to a common framework

Some results from FLaReNet Vienna Forum:

Interoperability Session Interoperability Session

Page 39: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 39

Standards & Interoperability: topics for cooperation A metadata catalogue should involve every party Common repositories for LRT universally & easily accessible

Try to connect ongoing work done by many groups A shared repository of data formats, annotationsshared repository of data formats, annotations – where to find

the most frequently used and preferred schemes –major help to achieve standardisation

For a new world-wide language infrastructure Create the means to plug together different LR & LT, in a web-

based resource and technology grid Access to LRT is critical: involves – and has impact on – all the

community With the possibility to easily create new workflows Create conditions to easily share and re-use technologies, to have

more open (source) tools available for use also to under-funded groups

Some results from FLaReNet Vienna Forum:

International CooperationInternational Cooperation

Page 40: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 40

Special Highlight: Contribute to building the LREC2010 Map!

Time is ripe to launch an important initiative, the LREC2010 Map of Language Resources, Technologies and Evaluation.

The Map will be a collective enterprise of the LREC community, as a first step towards the creation of a very broad, community-built, Open Resource Infrastructure.

First in a series, it will become an essential instrument to monitor the field and to identify shifts in the production, use and evaluation of LRs and LTs over the years.

When submitting a paper (< 900!), from the START page fill in a very simple template to provide essential information about resources (in a broad sense, also technologies, standards, evaluation kits.) either used for the work described or a new result of your research

The Map will be disclosed at LREC, where some event(s) will be organised around this initiative

FLaReNet & the ORI (Open Resource Infrastructure) … at LREC

Page 41: N. Calzolari [FLaReNet]NEERI Workshop, Helsinki, September 20091 e Content plus Standards: strength and limitations … LMF Nicoletta Calzolari glottolo@ilc.cnr.it

N. Calzolari [FLaReNet]NEERI Workshop, Helsinki,

September 2009 41

Join FLaReNet!

We invite all interested players in the field to express their interest in becoming part of the Network

How to join? To be part of the FLaReNet Network fill the

form available on the project website (http://www.flarenet.eu)