reducing metadata objects dan gillman november 14, 2014
TRANSCRIPT
Focus
Metadata describing data Conforming to a standard may
imply Creating too many objects Lack of meaningful roles Generating nightmares for
– Discovery– Efficiency– Management– Semantic interoperability
2
Focus
Can this be helped? Problems
Illustrated by ISO/IEC 11179 Potential solution
Incorporated into DDI-4
3
Preliminaries
Metadata Definition:
– Data used to describe some objects Metadata are data first
– No data always metadata “relative” concept
– Descriptive relationship is key
4
Preliminaries
Re-use Power of metadata management Write once – Link many Similar to normalizing database
schemas Allows for
– Sharing meanings– Comparison– Targeted search– Efficient storage / retrieval
5
Preliminaries
Problem Dependencies Many-to-One relationships
Let B’ be new version of B
But A can’t be related to both6
A 1
0..* B’
A 1
0..* B
ISO/IEC 11179
About – description of data Title – Metadata registries Mechanism – organize semantics
6 part standard Framework (1) Definitions (4) Classification (2) Naming (5) Metamodel (3) Registration (6)
7
ISO/IEC 11179
Basic model –
8
DATA ELEMENT CONCEPT
DATA ELEMENT
CONCEPTUAL DOMAIN
0..*
VALUE DOMAIN
1
CONCEPTUAL LEVEL
REPRESENTATIONAL LEVEL
0..*
0..*
1
0..*
0..*
1
ISO/IEC 11179
New Object Class or Property Implies new Data Element Concept
– Implies new Data Element
Change in Permissible Values Implies new Value Domain
– Implies new Data Element
Similarly for change in Value Meanings Implies new Permissible Values
10
Problems
11179 One kind of data element
– No abstract vs application One kind of value domain
– Processing codes not separated
Processing steps Sentinel values
– Missing, Etc. Software and application dependent
11
Problems
Dimensional data Tables
– Many cells– Each cell its own data element?
• No means to differentiate cells
Time series– Similar problem
12
Data Documentation Initiative (DDI)
Social Science data libraries and archives
Since 1995 Consortium based since 2005
DDI Alliance University of Michigan
13
DDI
2 development threads Codebook
– From earlier work– Latest version 2.5
Lifecycle– Includes processing– Latest version 3.2
Both rendered in XML-Schema Complex to read and use
14
DDI
Modernization (DDI-4) Upgrade for Lifecycle Rendered in UML Built in sections Following Generic Statistical
Information Model– Built under UNECE Statistical Division– DDI is Profile (ISO/IEC TR 10000-1)
15
DDI Variables
Differs from 11179 Data Element Types
– Conceptual• No object class• Only has Conceptual Domain
– Represented• Inherits from Conceptual Variable• Has object class (called Unit Type)
E.g., People, Establishment• Has Value Domain
Substantive – subject matter related
16
DDI Variables
– Instance• Inherits from Represented Variable• Has Universe – specialized Object Class
E.g., Patients, Hospitals• Has second Value Domain
Sentinel – processing related
No DEC – implied Specificity cascade
– For 11179 Property (DDI Variable)– For 11179 Object Class (DDI Unit Type)
17
DDI Variables
Value Domain growth– Due to changing codes– 11179
• Substantive * Sentinel
– DDI• Substantive + Sentinel
Data Element growth– About the same– DDI is much more specific
18
DDI Variables
19
Represented
Instance Sentinel Value Domain
`
`
Conceptual Domain
Substantive Value Domain
Conceptual
Example
DDI Sex of a patient Conceptual variable (CV) = sex
– CD = {male, female} Represented variables (RV1 and RV2)
– Inherit from CV– Unit type = Person– VD1 = {<m, male>, <f, female>} for
RV1– VD2 = {<0, male>, <1, female>} For
RV2 21
Example
DDI For 3 applications: SAS, SPSS, Excel
– Sentinel CD = {Don’t know, Refused}– Universe = Patient (specialization of
Person)
Instance variable (IV) – for SAS– Two – inherit from RV1 or RV2– SenVD = {<.D, Don’t Know>, <.R,
Refused>}22
Example
DDI Instance variable (IV) – for SPSS
– Two - inherit from RV1 or RV2– SenVD = {<-998, Don’t Know>,– <-999, Refused>}
Instance variable (IV) – for Excel– Two - inherit from RV1 or RV2– user defined sentinel codes– SenVD = {<_d, Don’t Know>, <_r,
Refused>}
23
Example
DDI Total objects (18)
– 1 Unit Type– 1 Universe– 1 CV– 2 RV– 6 IV– 2 CD (sub & sen)– 5 VD– Including much inheritance
24
Example
11179 Sex of patient Object class = patient Property = sex DEC = sex of patient CD = {male, female} VD1 = {<m, male>, <f, female>} VD2 = {<0, male>, <1, female>} Two DE’s, one for each VD
25
Example
11179 2 more abstract DE’s Correspond to CV in DDI Sex of patient Object class = person Property = sex DEC = sex of person CD = {male, female} Need VD1 and VD2, too
26
Example
11179 DE’s for processing?
– Missing sentinels for each application– Need 6 VD’s, one CD, 6 DE’s
CD = {male, female, don’t know, refused}
VD3 = {m, f, .d, .r} (SAS) VD4 = {0, 1, .d, .r} (SAS) VD5 = {m, f, -998, -999} Etc.
27
Example
11179 Total objects (25)
– 2 Object Class– 1 Property– 2 DEC– 2 CD– 8 VD– 10 DE– Little inheritance– Each new application -> twice the VD’s
28
Contact Information
Dan GillmanInformation Scientist
Office of Survey Methods Researchwww.bls.gov/osmr