reducing metadata objects dan gillman november 14, 2014

30
Reducing Metadata Objects Dan Gillman November 14, 2014

Upload: hester-shelton

Post on 21-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Reducing Metadata Objects

Dan Gillman

November 14, 2014

Focus

Metadata describing data Conforming to a standard may

imply Creating too many objects Lack of meaningful roles Generating nightmares for

– Discovery– Efficiency– Management– Semantic interoperability

2

Focus

Can this be helped? Problems

Illustrated by ISO/IEC 11179 Potential solution

Incorporated into DDI-4

3

Preliminaries

Metadata Definition:

– Data used to describe some objects Metadata are data first

– No data always metadata “relative” concept

– Descriptive relationship is key

4

Preliminaries

Re-use Power of metadata management Write once – Link many Similar to normalizing database

schemas Allows for

– Sharing meanings– Comparison– Targeted search– Efficient storage / retrieval

5

Preliminaries

Problem Dependencies Many-to-One relationships

Let B’ be new version of B

But A can’t be related to both6

A 1

0..* B’

A 1

0..* B

ISO/IEC 11179

About – description of data Title – Metadata registries Mechanism – organize semantics

6 part standard Framework (1) Definitions (4) Classification (2) Naming (5) Metamodel (3) Registration (6)

7

ISO/IEC 11179

Basic model –

8

DATA ELEMENT CONCEPT

DATA ELEMENT

CONCEPTUAL DOMAIN

0..*

VALUE DOMAIN

1

CONCEPTUAL LEVEL

REPRESENTATIONAL LEVEL

0..*

0..*

1

0..*

0..*

1

ISO/IEC 11179

Plus –

9

DATA ELEMENT CONCEPT

0..*

PROPERTY

0..1

0..1

0..*

OBJECT CLASS

ISO/IEC 11179

New Object Class or Property Implies new Data Element Concept

– Implies new Data Element

Change in Permissible Values Implies new Value Domain

– Implies new Data Element

Similarly for change in Value Meanings Implies new Permissible Values

10

Problems

11179 One kind of data element

– No abstract vs application One kind of value domain

– Processing codes not separated

Processing steps Sentinel values

– Missing, Etc. Software and application dependent

11

Problems

Dimensional data Tables

– Many cells– Each cell its own data element?

• No means to differentiate cells

Time series– Similar problem

12

Data Documentation Initiative (DDI)

Social Science data libraries and archives

Since 1995 Consortium based since 2005

DDI Alliance University of Michigan

13

DDI

2 development threads Codebook

– From earlier work– Latest version 2.5

Lifecycle– Includes processing– Latest version 3.2

Both rendered in XML-Schema Complex to read and use

14

DDI

Modernization (DDI-4) Upgrade for Lifecycle Rendered in UML Built in sections Following Generic Statistical

Information Model– Built under UNECE Statistical Division– DDI is Profile (ISO/IEC TR 10000-1)

15

DDI Variables

Differs from 11179 Data Element Types

– Conceptual• No object class• Only has Conceptual Domain

– Represented• Inherits from Conceptual Variable• Has object class (called Unit Type)

E.g., People, Establishment• Has Value Domain

Substantive – subject matter related

16

DDI Variables

– Instance• Inherits from Represented Variable• Has Universe – specialized Object Class

E.g., Patients, Hospitals• Has second Value Domain

Sentinel – processing related

No DEC – implied Specificity cascade

– For 11179 Property (DDI Variable)– For 11179 Object Class (DDI Unit Type)

17

DDI Variables

Value Domain growth– Due to changing codes– 11179

• Substantive * Sentinel

– DDI• Substantive + Sentinel

Data Element growth– About the same– DDI is much more specific

18

DDI Variables

19

Represented

Instance Sentinel Value Domain

`

`

Conceptual Domain

Substantive Value Domain

Conceptual

DDI Variables

20

Represented

Instance Universe

`

`

Unit Type

Conceptual

`

Example

DDI Sex of a patient Conceptual variable (CV) = sex

– CD = {male, female} Represented variables (RV1 and RV2)

– Inherit from CV– Unit type = Person– VD1 = {<m, male>, <f, female>} for

RV1– VD2 = {<0, male>, <1, female>} For

RV2 21

Example

DDI For 3 applications: SAS, SPSS, Excel

– Sentinel CD = {Don’t know, Refused}– Universe = Patient (specialization of

Person)

Instance variable (IV) – for SAS– Two – inherit from RV1 or RV2– SenVD = {<.D, Don’t Know>, <.R,

Refused>}22

Example

DDI Instance variable (IV) – for SPSS

– Two - inherit from RV1 or RV2– SenVD = {<-998, Don’t Know>,– <-999, Refused>}

Instance variable (IV) – for Excel– Two - inherit from RV1 or RV2– user defined sentinel codes– SenVD = {<_d, Don’t Know>, <_r,

Refused>}

23

Example

DDI Total objects (18)

– 1 Unit Type– 1 Universe– 1 CV– 2 RV– 6 IV– 2 CD (sub & sen)– 5 VD– Including much inheritance

24

Example

11179 Sex of patient Object class = patient Property = sex DEC = sex of patient CD = {male, female} VD1 = {<m, male>, <f, female>} VD2 = {<0, male>, <1, female>} Two DE’s, one for each VD

25

Example

11179 2 more abstract DE’s Correspond to CV in DDI Sex of patient Object class = person Property = sex DEC = sex of person CD = {male, female} Need VD1 and VD2, too

26

Example

11179 DE’s for processing?

– Missing sentinels for each application– Need 6 VD’s, one CD, 6 DE’s

CD = {male, female, don’t know, refused}

VD3 = {m, f, .d, .r} (SAS) VD4 = {0, 1, .d, .r} (SAS) VD5 = {m, f, -998, -999} Etc.

27

Example

11179 Total objects (25)

– 2 Object Class– 1 Property– 2 DEC– 2 CD– 8 VD– 10 DE– Little inheritance– Each new application -> twice the VD’s

28

Example

11179 Less specificity More objects Lack of constructs

29

Contact Information

Dan GillmanInformation Scientist

Office of Survey Methods Researchwww.bls.gov/osmr

[email protected]