metadata: why and how for social science louise corti online resources day 15 november 2005, london

29
Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

Upload: crystal-rich

Post on 05-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

Metadata: why and how for social science

Louise Corti

Online Resources Day15 November 2005, London

Page 2: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

What Do Social Researchers Want?

• Discover available datasets (globally, not just in their own country) and related research literature

• Understand in detail the origin, methodology and structure of datasets (social sciences datasets are modest in size but big in complexity)

• Compare and Link data from different sources• Model the social phenomena underlying the data• Publish their findings with all the supporting

evidence (no ‘iceberg’ publishing) and Reproduce published results

• Connect to other experts and Share informal comments and advice

• Enforce confidentiality and intellectual property rights while mantaining accuracy and access to data sources.

• … and more

Page 3: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

How?

• through rich and systematic description – though a language that humans and computers can both understand

• using commonly agreed or mappable vocabularies and standards

• which must be flexible and adaptable

• metadata

Page 4: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

What are metadata?

Metadata are structured data which describe the characteristics of an object or resource. They share many similar characteristics to the cataloguing that takes place in libraries, museums and archives. The term "meta" derives from the Greek word denoting a nature of a higher order or more fundamental kind.

A metadata record typically consists of a number of pre-defined elements representing specific attributes of a resource, and each element can have one or more values.

Page 5: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

Grasshopper

Page 6: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

Metadata schema

Element name Value• Title Web UKDA Catalogue• Creator Louise Corti• Publisher UK Data Archive • Identifier http://www.data-archive.ac.uk/

• Format Text/html• Relation Data Archive Web site

Each metadata schema will usually have the following characteristics:

a limited number of elements the name of each element the meaning of each element

Page 7: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

International standards for metadata schema

• to ensure that every element of information pertaining to the lifecycle of an object ( collection) can be captured:

– creation, appraisal, accessioning, conservation, preservation, availability and access

must be dynamic and must be open to amendment

aim to be consistent, appropriate and self-explanatory description

• facilitate the retrieval and exchange of information

• enable the sharing of authority data

• enable the integration of descriptions from different locations into a unified information system

Page 8: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

Common metadata schemasDublin Core minimum number of elements required to facilitate the discovery of document-like objects in a networked environment (eg Internet). Currently 15:

Content: Title, Subject, Description, Source, Language, Relation, CoverageIntellectual Property: Author/Creator, Publisher, Contributor, RightsElectronic/Physical Manifestation: Date,Type, Format, Identifier

ISAD(G) General International Standard of Archival Description

E-GIF E-Government Interoperability Framework

OAIS Open Archival Information Systems Reference Model OAI Open Archives Initiative Protocol for Metadata Harvesting

Page 9: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

No shortage of statistical metadata standards

• The Common Warehouse Metamodel (CWM) from OMG – data warehousing and business intelligence

• ISO 11179 – data elements in a metadata repository• SDMX – multidimensional data and time-series• IQML, AskXML and Triple-S - questionnaire data• The Data Documentation Initiative (DDI) – a general

metadata standard for statistical data (micro as well as aggregated)

• And many other related standards. e-Social Science requires more than simple ”data” metadata: – Thesauri, Classifications

Page 10: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

Encoding schemes

HTML (Hyper-Text Markup Language in Web pages, version 3.2 or 4.0)

SGML (Standard Generalised Markup Language) XML (eXtensible Markup Language) RDF (Resource Description Framework) MARC (MAchine Readable Cataloging) MIME (Multipurpose Internet Mail Extensions) Z39.50 (protocol for distributed information

retrieval) LDAP (Lightweight Directory Application

Protocol)

Page 11: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

Example of deploying metadata for a simple web resource

• embedding the metadata in a Web page by the creator using META tags in the HTML coding of the page

• as a separate document (eg XML) linked to a web resource it describes

• in a database linked to the web resource. The records may either have been directly created within the database or extracted from another source, such as Web pages

• but what about complex social science data?

Page 12: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

Stepping back:The Standard Study Description

• devised in 1970s to describe academically created sociological/political science datasets

• recommended key bibliographic elements

• informally ‘adopted’ by CESSDA in 1980s

• often adapted to suit local needs

Page 13: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

The Standard Study Descriptionrecommended elements:

• subject category• title• depositor• principal investigator• abstract and main topics• kind of data• dimensions of dataset• universe sampled

• sampling procedures• method of data collection• dates of coverage,

fieldwork and deposit• availability and access

conditions• references to reports and

related datasets

Controlled vocabulary

• adopted for some elements– e.g sampling, kind of data

• subject and geographical key words from broad social science Thesaurus (HASSET)

Page 14: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

The first step towards interoperability

• driven by the need to search across European Data Archive holdings

• development of a core element set for the Integrated Data Catalogue (IDC)

• catalogue records marked with standard tags for inclusion into WAIS indexes (Wide Area Information Servers)

• enabled multi-site searching via WAIS protocol

• simplistic and excluded - links to additional metadata, documentation, thesaurus help, and browsing

Page 15: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

• the DDI is widely adopted by social sciences data archives all over the world that provide many of the datasets used by social scientists for secondary analysis

• initiated and organised by the the Inter-University Consortium for Political and Social Research (USA) in 1995 to create a metadata standard for the social science community

• members coming from social science data archives and libraries in USA, Canada and Europe and from major producers of statistical data

• first in SGML then in XML

• DDI 1.0 published in 2000. Currently at version 2. Version 3 is being designed and it is scheduled for 2006

Page 16: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

The Structure of a DDI Codebook

• Document Description– Description of the codebook document itself

(author, sources, etc)• Study Description

– Information about the entire study or data collection (content, collection methods, processing, sources, access conditions etc)

• File Description– Description of each single file of the data collection

(formats, dimensions, processing information, etc.).• Data Description

– Description of each single variable in a datafile (format, variable and value labels, definitions, question texts, imputations etc.)

• Other Study-related Materials– References to reports and publications and other

machine readable documentation

Page 17: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

Data description - variables

000001 1 1 44 123 9 5 4 5000002 1 3 47 003 1 3 3 3000003 2 5 43 155 1 1 2 3000004 1 3 36 012 2 5 5 5000005 9 4 24 207 9 1 4 5

CaseNumber Sex Age

Country Ocuupation

QuestionResponses

Page 18: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

DDI in XML

Page 19: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

Understanding Statistical Metadata

Different approaches to understanding:

• what is it for?– statistical metadata has no value in

itself, it is just a means to an end. Its progress should be measured by the extent that it facilitates social research

• what is it like?– Anything familiar we can relate it to?

Form of communication might be a good choice

Page 20: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

Benefits• interoperability

– homogeneous exchangeable documents• richer content

– comprehensive set of elements providing the potential data analyst with broader knowledge

• single document - multiple purposes– repurposed for different needs and applications –

preservation, discovery, and dissemination• on-line subsetting and analysis

– standard uniform structure and content for variables, ensures easy import into on-line analysis systemsp

• precision in searching– field-specific searches across documents are

enabled• and more …

– human-readable and computer actionable– essential foundation for E-science and the Grid

Page 21: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London
Page 22: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London
Page 23: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London
Page 24: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London
Page 25: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

EU Madiera Portal

Meta(data) Browsing

Search

MultilingualBrowsing

Page 26: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

Summary - the DDI

• The DDI can serve as the foundation for content, distribution, use and preservation of data collections in the social and behavioural sciences, across institutions, countries, and disciplines

• cooperation from both data producers and statistical software manufacturers, so that the DDI specification can readily become the basis for the entire research process, from generation of a data collection instrument to production of research articles

• serves the social science community well with a specification that produces quality metadata with multiple purposes. It fully documents the details of datasets, it is user friendly and accessible, it integrates into the infrastructure of the Web and it supports automatic generation of statistical software system files.

• the widespread adoption of the DDI will vastly improve access to a range of varied datasets. Expanded use will greatly enhance comparative research; the ability to harmonize datasets over time and geography will lead to significant improvement in our understanding of societies

Page 27: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

The futureStatistical metadata is here and it is already changing the way people locate and make sense of data but it does not yet support most use cases of interest to social scientist. What we will need to move forward is:

• Grammar, a standard Semantic infrastructure (e.g. as provided by the Semantic Web):– semantic extendibility– ability of integrating (merging and overriding) descriptions

from different sources

• large Vocabulary, by integrating different flavours of metadata:– unique identifiers for data and research literature– statistical data metadata (full life cycle)– Ontologies, Thesauri and Classifications (and mappings

among them)– statistical processing metadata– “Secondary metadata”: annotations, quality assessment,

links to research literature– experts metadata (FOAF)

Page 28: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

Not Even Half Way There ..

DDI Standard TEI for QDRDF

Semantic Web

Nesstar – Data WebELSST

Integrated Data Catalogue

USI

Cooperative Markup

AnnotationsComparable

variablesUnified

Authentication

Mappings ReferencesExtraction

Future developments:

•Progress in metadata and technical standardisation

•Latent knowledge capture and extraction

Grid

Page 29: Metadata: why and how for social science Louise Corti Online Resources Day 15 November 2005, London

Qualitative data and the DDI

• in October 2001 ESDS Qualidata formally adopted the DDI to describe data

• in 2000, began to explore standards for archiving, and web representation of qualitative data

• expertise from the text processing/arts and humanities communities - TEI

• ESDS Qualidata Online show basic potential of what can be achieved by a common standard

• need to catch up with the statistical community!

• working model that will presented today