2014.12 - let's disco - 2 (eddi 2014)

29

Upload: thomas-bosch

Post on 09-Jul-2015

102 views

Category:

Technology


3 download

DESCRIPTION

Let's Disco

TRANSCRIPT

Page 1: 2014.12 - Let's Disco - 2 (EDDI 2014)
Page 2: 2014.12 - Let's Disco - 2 (EDDI 2014)

Controlled Vocabularies

Page 3: 2014.12 - Let's Disco - 2 (EDDI 2014)

Controlled Vocabularies • Existing DDI-CVs are available in RDF

– Represented in SKOS format

– Each CV is a skos:ConceptScheme

– Each CV entry is a skos:Concept

– Versioning is considered

• Available at https://github.com/linked-statistics/DDI-controlled-vocabularies

• Next step: Review by DDI-CV Working Group

Page 4: 2014.12 - Let's Disco - 2 (EDDI 2014)

skos:Concept

skos:Concept Scheme

SummaryStatisticsType_1.0#

ArithmeticMean

Variance

StandardDeviation

a

a

a

a

skos:hasTopConcept

skos:hasTopConcept

skos:hasTopConcept

Page 5: 2014.12 - Let's Disco - 2 (EDDI 2014)

<http://rdf-

vocabulary.ddialliance.org/DDICV/SummaryStatisticType_1.0#Arithm

eticMean>

a skos:Concept ;

skos:definition "Mathematical average of a set of values. The

mean is calculated by adding up two or more values and

dividing the total by their number. In social/political

science, it is usually the sum of the measurements divided

by the number of subjects, or cases."@en ;

skos:inScheme

<http://rdf-

vocabulary.ddialliance.org/DDICV/SummaryStatisticType_1.0#CodeLi

st> ;

skos:notation "ArithmeticMean" ;

skos:prefLabel "Arithmetic mean (X)"@en .

Page 6: 2014.12 - Let's Disco - 2 (EDDI 2014)

SummaryStatisticsType_2.0#

skos:Concept Scheme

SummaryStatisticsType_1.0#

SummaryStatisticsType#

a

a

a

dcterms:hasVersion

dcterms:hasVersion

Page 7: 2014.12 - Let's Disco - 2 (EDDI 2014)

Versioning

<http://rdf-vocabulary.ddialliance.org/DDICV/SummaryStatisticType#> a skos:ConceptScheme ;

dcterms:title "Base Scheme of Summary Statistic Type"@en ; dcterms:description "Specifies the type of summary statistic. Summary statistics are a single number representation of the characteristics of a set of values."@en ; owl:versionInfo "1.0" ; dcterms:hasVersion <http://rdf- vocabulary.ddialliance.org/DDICV/SummaryStatisticType_1.0#>, <http://rdf-vocabulary.ddialliance.org/DDICV/SummaryStatisticType_2.0#> .

Page 8: 2014.12 - Let's Disco - 2 (EDDI 2014)

Variables

Page 9: 2014.12 - Let's Disco - 2 (EDDI 2014)

Relationships to other Vocabularies

Page 10: 2014.12 - Let's Disco - 2 (EDDI 2014)

Relationships to other vocabularies

• Data Cube – For representing multidimensional aggregate data

• DCAT – For representing collections (catalogs) of research

datasets – For providing additional information about physical

aspects (file size, file formats) of research data files

• PROV-O – For representing detailed provenance information,

e.g. generation and aggregation of data, versioning information, etc.

Page 11: 2014.12 - Let's Disco - 2 (EDDI 2014)

MicrodataData Set_1

AggregatedData Set_1

prov:Entity

disco:LogicalData Set

qb:DataSet

a

a

a

a

prov:wasDerivedFrom

Page 12: 2014.12 - Let's Disco - 2 (EDDI 2014)

Simple Case

ddi:AggregatedDataSet_1

a prov:Entity ;

prov:wasDerivedFrom

ddi:MicrodataDataSet_1 .

ddi:MicrodataDataSet_1

a prov:Entity .

Page 13: 2014.12 - Let's Disco - 2 (EDDI 2014)

Complex Case

ddi:AggregatedDataSet_2 a prov:Entity ; prov:wasDerivedFrom ddi:MicrodataDataSet_2 ; prov:wasGeneratedBy ddi:AggregationActivity ; prov:qualifiedDerivation [ a prov:Derivation ; prov:entity ddi:MicrodataDataSet_2 ; prov:hadActivity ddi:AggregationActivity ] .

ddi:AggregationActivity a prov:Activity .

ddi:MicrodataDataSet_2 a prov:Entity;

Page 14: 2014.12 - Let's Disco - 2 (EDDI 2014)

European Study_1

EuropeanData Set_1

DataCatalog_1

disco:Logical DataSet

disco:Study

dcat:Catalog

dcat:Catalog Record

dcat:Dataset

a

a

a

a

a

dcat:record

dcat:dataset

Page 15: 2014.12 - Let's Disco - 2 (EDDI 2014)

ddi:DataCatalog_1 a dcat:Catalog ; dcat:record ddi:EuropeanStudy_1 ; dcat:dataset ddi:EuropeanDataSet_1 .

ddi:EuropeanStudy_1 a dcat:CatalogRecord, disco:Study ; disco:product ddi:EuropeanDataSet_1 .

ddi:EuropeanDataSet_1 a dcat:Dataset, disco:LogicalDataSet ; dcat:theme ddi:topics/WellBeing ; dcat:theme ddi:topics/PoliticalAttitudes ; dcat:keyword "Europe"@en ; dcat:keyword "Politics"@en .

Page 16: 2014.12 - Let's Disco - 2 (EDDI 2014)
Page 17: 2014.12 - Let's Disco - 2 (EDDI 2014)

ddi:DataCatalog_2 a dcat:Catalog; dcat:record ddi:EuropeanStudy_2 ; dcat:record ddi:AggregatedEuropeanData_2 ; dcat:dataset ddi:EuropeanDataSet_2 ; dcat:dataset ddi:AggregatedEuropeanDataSet_2 . ddi:EuropeanStudy_2 a dcat:CatalogRecord, disco:Study ; disco:product ddi:EuropeanDataSet_2 . ddi:AggregatedEuropeanData_2 a dcat:CatalogRecord ; foaf:primaryTopic ddi:AggregatedEuropeanDataSet_2. ddi:EuropeanDataSet_2 a dcat:Dataset, disco:LogicalDataSet . ddi:AggregatedEuropeanDataSet_2 a dcat:Dataset, qb:DataSet ; prov:wasDerivedFrom ddi:EuropeanStudy_2 .

Page 18: 2014.12 - Let's Disco - 2 (EDDI 2014)

PHDD

Page 19: 2014.12 - Let's Disco - 2 (EDDI 2014)
Page 20: 2014.12 - Let's Disco - 2 (EDDI 2014)

Mapping DDI-XML to Disco

Page 21: 2014.12 - Let's Disco - 2 (EDDI 2014)

Mapping DDI-XML to Disco

• Mappings only between Disco and DDI 3.1 of DDI-L in order to avoid inconsistencies – existing mapping documents between DDI 3.1 and

other DDI versions (like DDI 3.2 and DDI 2.1) can be reused

• Availability – Google Doc with mapping tables as basis for

automatic generation – Turtle file containing all mappings – Mapping tables in HTML specification of Disco

• Mapping is still ongoing work

Page 23: 2014.12 - Let's Disco - 2 (EDDI 2014)

Bidirectional Mappings

• Only between Disco and DDI-L – DDI-L ⤑ Disco: straight-forward mapping for all items used in

Disco – Disco ⤑ DDI-L: straight-forward mapping for all items in the

disco namespace.

• Only standard XPath expression is defined as mapping • Context:

– Items from other vocabularies - used in Disco - need a context; then there could be a clear mapping path.

– Context information necessary for mappings, e.g., skos:notation can be mapped to variable labels and to codes.

– Context information is either a SPARQL query or an informal description as plain literal.

Page 24: 2014.12 - Let's Disco - 2 (EDDI 2014)

Mapping Representation

• Mapping ontology available containing all mapping triples

• generated automatically out of the official mapping document

Page 25: 2014.12 - Let's Disco - 2 (EDDI 2014)

Mapping Representation

skos:notation a rdfs:Class, owl:Class ; disco:mapping [ a disco:Mapping ; disco:ddi-L-Xpath "//l:Variable/l:VariableName" ; disco:ddi-L-Documentation "http://www.ddialliance.org/Specification/DDI- Lifecycle/3.1/XMLSchema/FieldLevelDocumentatio n/logicalproduct_xsd/elements/V ariable.html" disco:context "skos:notation represents variable label" ; disco:context "SELECT ?notation WHERE { ?notation rdfs:domain ?variable. ?variable a disco:Variable. }" ]

Page 26: 2014.12 - Let's Disco - 2 (EDDI 2014)

DDI 4

Page 27: 2014.12 - Let's Disco - 2 (EDDI 2014)

Let‘s Disco Now!

Page 28: 2014.12 - Let's Disco - 2 (EDDI 2014)
Page 29: 2014.12 - Let's Disco - 2 (EDDI 2014)

Acknowledgements

26 experts from the statistical community and the Linked Data community coming from 12 different countries contributed to this work. They were participating in the events mentioned below.

• 1st workshop on 'Semantic Statistics for Social, Behavioural, and Economic Sciences: Leveraging the DDI Model for the Linked Data Web' at Schloss Dagstuhl - Leibniz Center for Informatics, Germany in September 2011

• Working meeting in the course of the 3rd Annual European DDI Users Group Meeting (EDDI11) in Gothenburg, Sweden in December 2011

• 2nd workshop on 'Semantic Statistics for Social, Behavioural, and Economic Sciences: Leveraging the DDI Model for the Linked Data Web' at Schloss Dagstuhl - Leibniz Center for Informatics, Germany in October 2012

• Working meeting at GESIS - Leibniz Institute for the Social Sciences in Mannheim, Germany in February 2013