towards easy matching between statistical linked data ... · introduction • for matching stas0cal...
TRANSCRIPT
![Page 1: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/1.jpg)
Towards Easy Matching Between Statistical Linked Data:
Dimension Patterns
Hideto Sato and Wen Wen
2013/10/22 1
FirstInterna0onalWorkshoponSeman0cSta0s0cs(SemStats2013)
22October2013,Sydney
![Page 2: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/2.jpg)
Introduction
• Formatchingsta0s0caldatafromdifferentsources,upperconceptsandschema-levellinksareimportant.
• ThreeProblems(1)Asmallnumberofupperconceptsareavailable.(2)CertainpaHernsofdimensiondescrip0onprevent
someschema-levellinks.(3)Usageofexternalcodesishardtofindinaschema-
level.• Thispaperfocuseson(2)and(3),andproposepa9ernsofdimensiondescrip:ontoimprovethem.
2013/10/22 2
![Page 3: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/3.jpg)
Trial Matching
• ItalianImmigra0onSta0s0cs⇒ thenumbersofimmigrantstoItaly
bybirthcountrybyyear• WorldBankSta0s0cs
⇒ thetotalpopula0on bycountrybyyear
• IntegratedSta0s0csPercentageofImmigrantstoItalybycountrybyyear
2013/10/22 3
![Page 4: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/4.jpg)
qb:component
qb:dimension
2013/10/22 4
rdf:type
istat:dataset-DCIS_POPSTRCIT
qb:DataSet istat:dsd-DCIS_POPSTRCIT
qb:structure
qb:component
istat:dimension-paesi
istat:code-paesi
qb:codeList
istat:code-paesi-al
skos:hasTopConcept
hHp://sws.geonames.org/783754/
skos:exactMatch
dataset:world-development-indicators
qb:DataSetd-indicators:structure
qb:structure
sdmx-dimension:refAreaor
sdmx-dimension:visArea
classifica0on:country
classifica0on:country/AL
skos:hasTopConcept
owl:sameAs
qb:dimension
rdf:type rdf:type
qb:codeList
rdfs:subPropertyOf
country-dimension
qb:dimension
istat:code-range-paesi
rdf:type
rdfs:range
sdmx-code:Area
rdfs:subClassOf
country-code-class
rdfs:range
rdf:type
rdfs:subClassOf
ItalianImmigra:onSta:s:cs WorldBankSta:s:cs
rdfs:subPropertyOf
DimensionProperty
Code Class
Code
DataSet
DataStructureDefinition
hHp://sws.geonames.org/ontology#Feature/
![Page 5: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/5.jpg)
qb:component
qb:dimension
2013/10/22 5
rdf:type
istat:dataset-DCIS_POPSTRCIT
qb:DataSet istat:dsd-DCIS_POPSTRCIT
qb:structure
qb:component
istat:dimension-paesi
istat:code-paesi
qb:codeList
istat:code-paesi-al
skos:hasTopConcept
hHp://sws.geonames.org/783754/
skos:exactMatch
dataset:world-development-indicators
qb:DataSetd-indicators:structure
qb:structure
sdmx-dimension:refAreaor
sdmx-dimension:visArea
classifica0on:country
classifica0on:country/AL
skos:hasTopConcept
owl:sameAs
qb:dimension
rdf:type rdf:type
qb:codeList
rdfs:subPropertyOf
country-dimension
qb:dimension
istat:code-range-paesi
rdf:type
rdfs:range
sdmx-code:Area
rdfs:subClassOf
country-code-class
rdfs:range
rdf:type
rdfs:subClassOf
ItalianImmigra:onSta:s:cs WorldBankSta:s:cs
rdfs:subPropertyOf
DimensionProperty
Code Class
Code
DataSet
DataStructureDefinition
hHp://sws.geonames.org/ontology#Feature/
(1) What role does the dimension play?
• place of residence • place of birth
![Page 6: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/6.jpg)
qb:component
qb:dimension
2013/10/22 6
rdf:type
istat:dataset-DCIS_POPSTRCIT
qb:DataSet istat:dsd-DCIS_POPSTRCIT
qb:structure
qb:component
istat:dimension-paesi
istat:code-paesi
qb:codeList
istat:code-paesi-al
skos:hasTopConcept
hHp://sws.geonames.org/783754/
skos:exactMatch
dataset:world-development-indicators
qb:DataSetd-indicators:structure
qb:structure
sdmx-dimension:refAreaor
sdmx-dimension:visArea
classifica0on:country
classifica0on:country/AL
skos:hasTopConcept
owl:sameAs
qb:dimension
rdf:type rdf:type
qb:codeList
rdfs:subPropertyOf
country-dimension
qb:dimension
istat:code-range-paesi
rdf:type
rdfs:range
sdmx-code:Area
rdfs:subClassOf
country-code-class
rdfs:range
rdf:type
rdfs:subClassOf
ItalianImmigra:onSta:s:cs WorldBankSta:s:cs
rdfs:subPropertyOf
DimensionProperty
Code Class
Code
DataSet
DataStructureDefinition
hHp://sws.geonames.org/ontology#Feature/
(2) What type of code does the dimension use ?
• Countries • Domestic Administrative Areas • River Basins, and so on.
![Page 7: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/7.jpg)
qb:component
qb:dimension
2013/10/22 7
rdf:type
istat:dataset-DCIS_POPSTRCIT
qb:DataSet istat:dsd-DCIS_POPSTRCIT
qb:structure
qb:component
istat:dimension-paesi
istat:code-paesi
qb:codeList
istat:code-paesi-al
skos:hasTopConcept
hHp://sws.geonames.org/783754/
skos:exactMatch
dataset:world-development-indicators
qb:DataSetd-indicators:structure
qb:structure
sdmx-dimension:refAreaor
sdmx-dimension:visArea
classifica0on:country
classifica0on:country/AL
skos:hasTopConcept
owl:sameAs
qb:dimension
rdf:type rdf:type
qb:codeList
rdfs:subPropertyOf
country-dimension
qb:dimension
istat:code-range-paesi
rdf:type
rdfs:range
sdmx-code:Area
rdfs:subClassOf
country-code-class
rdfs:range
rdf:type
rdfs:subClassOf
ItalianImmigra:onSta:s:cs WorldBankSta:s:cs
rdfs:subPropertyOf
DimensionProperty
Code Class
Code
DataSet
DataStructureDefinition
hHp://sws.geonames.org/ontology#Feature/
(3) What common codes are available?
• Geonames • DBPedia
preferably in the schema-level
![Page 8: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/8.jpg)
Matching Data from Different Sources
For Dimension Properties What role does the dimension play? • Place of Birth• Place of Residence
For Code Class (Range of Dimension) What type of code does the dimension use? • Countries• Domestic Administrative Areas • River Basins
For Code Values What common codes are available? • Geonames• DBPedia
2013/10/22 8
Thefollowingques0onsareimportantforeachdimension.Asforanareadimension,
![Page 9: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/9.jpg)
Matching Data from Different Sources
For Dimension Properties What role does the dimension play? • Place of Birth• Place of Residence
For Code Class (Range of Dimension) What type of code does the dimension use? • Countries• Domestic Administrative Areas
For Code Values What common codes are available? • Geonames• DBPedia
2013/10/22 9
Thefollowingques0onsareimportantforeachdimension.Asforanareadimension,UpperConcepts
Schema-LevelDescrip:on
![Page 10: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/10.jpg)
QB and Upper Concepts
QB:TheRDFDataCubeVocabularyQBprovidesabridgetoupperconceptsbyreferringtotheSDMX-RDFvocabulary.
2013/10/22 10
![Page 11: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/11.jpg)
Upper Concepts and SDMX-RDFUpperconcept UpperresourceinSDMX-RDF
Dimension Property PlaceofBirth sdmx-dimension:visAreaPlaceofResidence sdmx-dimension:refArea
Code Class (Range of Dimension) Area sdmx-code:AreaCountry (notdefined)Domes0cArea (notdefined)RiverBasin (notdefined)
2013/10/22 11
(sdmx-dimension:visArea has been removed in the current version of SDMX-RDF.)
![Page 12: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/12.jpg)
eg:cardiff_00pt(local:code)
DimensionDescrip:oninQB
Code
Dimension Property
rdfs:range
rdf:type
sdmx-dimension:refArea(upper:abstract
DimensionProperty)
rdfs:subPropertyOf
LocalUpper
eg:refArea(local:
dimensionProperty)
eg:areaCodeList(local:codeList)
skos:hasTopConcept|qb:hierarchyRoot
rdfs:subClassOf
Code Class
eg:UnitaryAuthority(local:CodeClass)
qb:codeList
sdmx-code:Area(upper:
AbstractCodeClass)
Code List
Data Structure Definition
qb:dimension
2013/10/22 12
![Page 13: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/13.jpg)
Anti-Patterns
• TwoAn:-Pa9ernspreventdescribingschema-levellinksproperly.– Directuseofanabstractupperresource
– Directuseofanexternalcodeclass
2013/10/22 13
![Page 14: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/14.jpg)
eg:cardiff_00pt(local:code)
An:-Pa9ern:DirectUseofanUpperResource
Code
Dimension Property
rdf:type
LocalUpper
eg:areaCodeList(local:codeList)
Code Class
eg:UnitaryAuthority(local:CodeClass)
sdmx-code:Area(upper:
AbstractCodeClass)
Code List
?
skos:hasTopConcept|qb:hierarchyRoot
qb:codeList
rdfs:range
Data Structure Definitionqb:dimension
rdfs:subClassOf
2013/10/22 14
sdmx-dimension:refArea(upper:abstract
DimensionProperty)
![Page 15: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/15.jpg)
eg:cardiff_00pt(local:code)
ThePa9ernforUsingaLocalCodeClass
Code
Dimension Property
rdfs:range
rdf:type
sdmx-dimension:refArea(upper:abstract
DimensionProperty)
rdfs:subPropertyOf
LocalUpper
eg:refArea(local:
dimensionProperty)
eg:areaCodeList(local:codeList)
skos:hasTopConcept|qb:hierarchyRoot
rdfs:subClassOf
Code Class
eg:UnitaryAuthority(local:CodeClass)
qb:codeList
sdmx-code:Area(upper:
AbstractCodeClass)
Code List
Data Structure Definition
qb:dimension
2013/10/22 15
![Page 16: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/16.jpg)
An:-Pa9ern:DirectUseofanExternalCodeClass
Dimension Property
<hHp://sws.geonames.org/
2653822/>(external:code)
rdf:type
sdmx-dimension:refArea(upper:abstract
DimensionProperty)
rdfs:subPropertyOf
LocalUpper External
Code Classrdfs:range
sdmx-code:Area(upper:
AbstractCodeClass)
Code
?eg:areaCodeList(local:codeList)
qb:hierarchyRoot
Code List
qb:codeList
Data Structure Definition
qb:dimension
2013/10/22 16
rdfs:subClassOf
<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)
eg:refArea(local:
dimensionProperty)
![Page 17: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/17.jpg)
ThePa9ernforUsinganExternalCodeClass
Dimension Property
<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)
<hHp://sws.geonames.org/
2653822/>(external:code)
rdf:type
sdmx-dimension:refArea(upper:abstract
DimensionProperty)
rdfs:subPropertyOf
LocalUpper External
Code Classrdfs:range
sdmx-code:Area(upper:
AbstractCodeClass)
Codeeg:areaCodeList(local:codeList)
qb:hierarchyRoot
Code List
qb:codeList
eg:UnitaryAuthority(local:
CodeClassAdapter)
rdfs:subClassOf owl:equivalentClass
Data Structure Definitionqb:dimension
2013/10/22 17
eg:refArea(local:
dimensionProperty)
![Page 18: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/18.jpg)
Alternate Code Class
Whenusingbothlocalandexternalcodeclasses,itisdifficulttofindwhetheranexternalcodeclassisemployedornot.
Weneedaschema-leveldescrip:onforanalternatecodeclass.
2013/10/22 18
![Page 19: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/19.jpg)
eg:cardiff_00pt(local:code)
UsingLocalandExternalCodeClasses
Code
Dimension Property
rdfs:range
rdf:type
Local
eg:refArea(local:
dimensionProperty)
eg:areaCodeList(local:codeList)
skos:hasTopConcept|qb:hierarchyRoot
Code Class
eg:UnitaryAuthority(local:CodeClass)
qb:codeList
Code List
Data Structure Definition
qb:dimension
2013/10/22 19
<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)
<hHp://sws.geonames.org/
2653822/>(external:code)
rdf:type
?
External
skos:exactMatch|owl:sameAs
![Page 20: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/20.jpg)
eg:cardiff_00pt(local:code)
Proposalofanaddi:onallink(ext:altClass)
Code
Dimension Property
rdfs:range
rdf:type
Local
eg:refArea(local:
dimensionProperty)
eg:areaCodeList(local:codeList)
skos:hasTopConcept|qb:hierarchyRoot
Code Class
eg:UnitaryAuthority(local:CodeClass)
qb:codeList
Code List
Data Structure Definition
qb:dimension
2013/10/22 20
<hHp://www.geonames.org/ontology#Feature>(external:CodeClass)
<hHp://sws.geonames.org/
2653822/>(external:code)
rdf:type
External
ext:altClass
skos:exactMatch|owl:sameAs
![Page 21: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/21.jpg)
From Our Survey
2013/10/22 21
AreaDimension
TimeDimension
DirectUseofanUpperResource
3/12 3/12
DirectUseofanExternalCodeClass
2/12 8/12
UseofAlternateCodeClasses
10/12 1/12
ThecountsareDSDs(DataStructureDefini7ons)foundintheendpointslistedathHp://www.w3.org/2011/gld/wiki/Data_Cube_Implementa0ons.
![Page 22: Towards Easy Matching Between Statistical Linked Data ... · Introduction • For matching stas0cal data from different sources, upper concepts and schema-level links are important](https://reader034.vdocuments.us/reader034/viewer/2022050203/5f565e2754a999022a5d1db5/html5/thumbnails/22.jpg)
Conclusion• Weintroduceddimensionpa9ernsfordescribingschema-levellinksincludingreferencestoupperresourcesandalternateclasslinks.
• ThesewillextracttheQB'spowerofdescrip0ontoitsfullextent.
• However,onlyafewupperresourcesareavailablenow.Therefore,thepartofthepaHernsconcerningtoupperconceptsarepreparatoryforthefuture.
• Wethinkthatitisanurgenttasktoenrichupperresourcessuitableforsta0s0caldata.
2013/10/22 22