the current state of metadata - as far as we understand it -

14
The current state of Metada - as far as we understand it Peter Wittenb The Language Archive - Max Planck Instit CLARIN Research Infrastruct Nijmegen, The Netherlan

Upload: aqua

Post on 24-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

The current state of Metadata - as far as we understand it -. Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure Nijmegen, The Netherlands . Old Concept. of course "metadata" is an old concept library cards were introduced to cope with - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The current state of Metadata - as far as we understand it -

The current state of Metadata- as far as we understand it -

Peter WittenburgThe Language Archive - Max Planck Institute

CLARIN Research InfrastructureNijmegen, The Netherlands

Page 2: The current state of Metadata - as far as we understand it -

Old Concept

• of course "metadata" is an old concept • library cards were introduced to cope with

mass and anonymity

• not surprising that library people started thinking about this to describe all kind web-accessible resources

• DC and qualified DC wee the results

• however, research world is different - not just search

• therefore in many domains solutions were developed • 2 years ago CLARIN revised its 15 year old set&framework

Page 3: The current state of Metadata - as far as we understand it -

Big Ideas

• of course managing increasing amounts of data • of course finding valuable data in the growing haystacks

• but also• machine usage of metadata

• automatic profile matching• research statistics - virtual sub-collection building• etc.

• multilinguality in a multilingual European society• interdisciplinary research

biodiversity people should find information in linguistic archivesetc.

• linking with contextual information • document lifecycle management (provenance)

Page 4: The current state of Metadata - as far as we understand it -

Big Change

• until now researchers informed each other • culture of personal exchange

• claim: this will only work partially in the future• have distributed centers storing lots of data

national and discipline dimensions • depositors upload their data into these centers• will have an anonymous landscape of data & tools

all offered as services • what do we have to find things:

• proper metadata descriptions • social tagging by virtual organizations • content to operate on by "smart" data mining

Page 5: The current state of Metadata - as far as we understand it -

Big Question

• are we ready to meet these wishes and changes?• probably not

• some major issues • quality • interoperability • registry and reference stability • functional• multilingual • scalability • IT principles

Page 6: The current state of Metadata - as far as we understand it -

Quality Issue

• lack quality in descriptions • not all elements filled in

(researchers are lazy, lack of tool support)• often not schema based (XLS) thus inconsistent • lack agreed and standardized vocabularies

• ISO 639-3 - about 6000 language codes • what about subject classification schemes • what about institution names• thus many errors and inconsistencies• ontologies are expensive to maintain

• misinterpretations/misuse of element semantics • etc

Page 7: The current state of Metadata - as far as we understand it -

Interoperability Issue

• hampered by different approaches (closed DB, no modularity, embedded ontologies)

• structural difficulties up to context dependency• difficult semantic mapping

• different description dimensions • bad element definitions • bad vocabulary definitions

• only little support of OAI-PMH• reliance on DC semantics - but useless for research etc• often "hardwired" mappings • lack of a flexible framework to create/share/use relations • little is standardized - what about lifetime then

Page 8: The current state of Metadata - as far as we understand it -

Registry and Reference Stability Issue

• flexibility only when we separate things • define & register all concepts in open registries

(we are using ISO 12620 - ISOcat) • define & register all components/profiles

(we are using CLARIN registry)• register all mappings (nothing yet)

• but if we do this we need to refer • are our references stable??

• some are using Cool URIs - are they just URLs?• some using explicit Handles - are they maintained?• who takes care?

(we are using EPIC - European PID Consortium)

Page 9: The current state of Metadata - as far as we understand it -

Functional Issue

• do we address new functional requirements

• what about provenance information is it automatically generated

• what about versions - are they visible • what about ltp information • what about formal access information• do we know what is needed for the web services scenario

(profile matching, deployment information, etc)

Page 10: The current state of Metadata - as far as we understand it -

Multilingual Issue

• what does it really include?• localizing all software • multilingual definitions of all concepts

elements and vocabulary terms(no translations of proper names of course or?)

• or do we simply rely on some lingua franca • answer probably discipline dependent • how much is (should be) public involved

• whatever we do it is a lot of work• CLARIN: ISOcat covers almost all major EU languages

Page 11: The current state of Metadata - as far as we understand it -

Scalability Issue

• are our solutions scalable?• in EUROPEANA millions of metadata records• in CLARIN about 270.000

• how to structure the offer • how to present this to naive users

• do we share same granularity (md at collection and/or resource level)• can we deal with aggregations in same way

• can we apply semantic web technology • automatic mapping• automatic quality improvement

Page 12: The current state of Metadata - as far as we understand it -

IT Principles

• we need to disseminate the message of some basic IT principles

• define and register your semantics• specify and register your syntax • use a stable reference scheme• in some areas separate definitions and relations

• get things standardized or use standards such as • XML, some schema language• ISO 12620, etc• URI, Handles

Page 13: The current state of Metadata - as far as we understand it -

What can we do?

• listen to each other first

• increase awareness about metadata and basic principles

• see how we can create an interoperable landscape • harmonizing approaches• harmonizing along major issues• making things explicit and scalable • look for proper interdisciplinary solutions

Page 14: The current state of Metadata - as far as we understand it -

Üm nicht to end in Babylonish scenario nous avons still algo time om sistemas te improve.

Thanks for your attention.

moving towards an ideal e-Science

domain