sdmx advanced topics on technical standards arofan gregory and chris nelson sdmx capacity building...
TRANSCRIPT
SDMX Advanced Topics on Technical Standards
Arofan Gregory and Chris Nelson
SDMX Capacity Building Workshop Washington January 11 2007
Advanced Topics (1)
• Many of these will be presented in the context of a live prototype system– Data structures– Provisioning metadata– Registry interfaces
• Submit structure and provisioning metadata• Query for structure• Register data and metadata set• Query for registered data and metadata sets
– Alignment with other standards
Advanced Topics (2)
• Others will presented by explanation and example– Hierarchical Code Set– Structure Set– Reporting Taxonomy– Services based architecture
• Notification• RSS feed
SDMX Technical Standards
Information Model and Technical Specifications: High Level
Overview (Reminder)
CategoryScheme
Category
can have child categories
comprises subject or reporting categories
Data or Metadata
Flow
Data Provider
Provision Agreement
can get data/metadata from multiple data/metadata providerscan provide
data/metadata for many data/metadata flows using agreed data/metadata structure
Structure Definition
uses specific data/metadata structure
can be linked to categories in multiple category schemes
Data Set or
Metadata Set
publishes/reports data sets or metadata sets
conforms to business rules of the data/metadata flow
Information Model: High Level Schematic
Data or Metadata
Set
URL, registration date etc.
Registers existence of data and metadata sets
Structure MapsStructure and
Code List maps
SDMX Technical Standards
SDMX Registry
REPOSITORY Provisioning
Metadata
REGISTRY Data Set/
Metadata Set
REPOSITORY Structural Metadata
Subscription/Notification
Applications can subscribe to notification of new or changed objects
Register
Query
Submit
Query
Submit
Query
SDMX Registry/Repository
Describes data and metadata structures
Describes data and metadata sources and reporting processes
Indexes data and metadata
SDMX Registry Interfaces
CategoryScheme
Category
can have child categories
comprises subject or reporting categories
Data or Metadata
Flow
Data Provider
Provision Agreement
can get data/metadata from multiple data/metadata providerscan provide
data/metadata for many data/metadata flows using agreed data/metadata structure
Structure Definition
uses specific data/metadata structure
can be linked to categories in multiple category schemes
SDMX Artefacts: Registry Contents
Data
Set
URL, registration date etc.
registers existence of data and metadata sets
Structure MapsStructure and
Code List maps
CategoryScheme
CategoryData or Metadata
Flow
Structure Definition
Structure Maps
Structural Metadata
Provisioning Metadata
Registered Data and Metadata
Registry Interfaces
SDMX Technical Standards
Practical Examples
CountrySTATRegionSTAT
National Publication Server(s)
Regional Publication Server
FAO SDMX Registry
Flow of FAO CountrySTAT-RegionSTAT Implementation
1
23a
4
3b
SDMX in Action: Prototype System
FOOD AND AGRICULTURE ORGANIZATIONOF THE UNITED NATIONS
Slide courtesy of the FAO
FOOD AND AGRICULTURE ORGANIZATIONOF THE UNITED NATIONS
1 CountryStat National Publication Server
•The web site is published from the files in CountryStat
SDMX Publication
•The new CountryStat files are converted to SDMX-ML data sets and made web accessible on the CountryStat web site
•These files are registered in the FAO SDMX Registry
RegionStat Regional Publication Server
•Queries the registry for new registrations which responds with registration details including the URL of the new data sets
•Retrieves the new data sets from the CountryStat web site
•Converts the SDMX-ML files to an internal format and integrates the new data sets with existing RegionStat data sets
•Re-publishes the RegionStat web site
2
3a
4
Prototype System: Explanation
Slide courtesy of the FAO
3b
SDMX Technical Standards
Data Structure Definitions: Registration and Query
Data Set and Structure
Reference Region
Commodity
Frequency and Time
Observation Value
Measure Type
Unit and Unit Multiplier
Measurement = 1,000 Kg
Data Set: Structure
• Comprises– Concepts that identify the observation value– Concepts that add additional metadata about the
observation value– Concept that is the observation value– Any of these may be
• coded• text• date/time• number• etc.
Dimensions
Attributes
Measure
Representation
Data Set and Structure
Reference Region
Commodity
Frequency and Time
Observation Value
Measure Type
Unit and Unit Multiplier
Measurement = 1,000 Kg
(Dimensions)
(Dimension)
(Dimension)
(Attributes)
(Dimension)(Measure)
Dimensions Frequency
Reference Region Commodity
Time Measure Observation Value
Attributes Unit
Unit Multiplier 1
Data Structure Definition
Key Group Key
Dimensions
RepresentationConcept
Attributes Measures
takes semantic
from
has format
takes semantic
from
takes semantic
from
has format
has format
concepts that add metadata
concepts that Identify the observation
concepts that are observed phenomenon
concepts that Identify groups of keys
Data Structure Definition
Data Structure Definition
Key Group Key
Dimensions
RepresentationConcept
Attributes Measures
FREQREF_AREA_REGCOMMODITYTIME
AGRICULTURE_COMMODITY
CL_FREQCL_AREA_CTYCL_COMMODITY
UNITUNIT_MULT
OBS_VALUE
Registry Contents - DSD
CL_MEASURE_UNITCL_UNIT_MULT
Registry Interfaces: Submit Structure
Data Structure Definition Artefacts
Registry Interfaces: Submit Structure
Registry Interfaces: Query Structure
Query for KeyFamily with resolveReferences set to “true” will return all related Concepts and Code Lists
Registry Interfaces: Query Structure
The registry will respond with all DSDs maintained by the FAOSTAT agency
SDMX Technical Standards
Dataflows, Data Providers, Category Scheme
Data
Flow
Data Provider
Structure Definition
FAOSTAT:AGRICULTURE_COMMODITY
FAOSTAT:AGRICULTURE_AREA
FAOSTAT:AGRICULTURE_PRODUCTION
FAOSTAT:OS_FAO_DATA_PROVIDER.29 (Bénin)
FAOSTAT:OS_FAO_DATA_PROVIDER.42 (Burkina Faso)
FAOSTAT:OS_FAO_DATA_PROVIDER.66 (Côte d’Ivoire)
FAOSTAT:OS_FAO_DATA_PROVIDER.217 (Sénegal)
CategoryScheme
Category
SDMX:SDMXStatSubMatDomainsWD1
(adoption of UNECE Classification of
International Statistical Activities)
(Economic Statistics.Sectoral Statistics.Agriculture, forestry, fisheries)
SDMX:SDMXStatSubMatDomainsWD1.
Domain_2.C4.C1
Registry Contents – Other Structures
The data flows are connected to the relevant Category in the Category Scheme
Registry Interface: Submit Structure
Artefacts
Registry Interface: Submit Structure
Category Scheme
Registry Interface: Submit Structure
Links the Dataflow to the (Subject Matter Domain) Category
Data Providers
Dataflow
SDMX Technical Standards
Submit Provision Agreements
Data
Flow
Data Provider
Provision Agreement
Structure Definition
FAOSTAT:AGRICULTURE_COMMODITY
FAOSTAT:AGRICULTURE_AREA
FAOSTAT:AGRICULTURE_PRODUCTION
FAOSTAT:OS_FAO_DATA_PROVIDER.29 (Bénin)
FAOSTAT:OS_FAO_DATA_PROVIDER.42 (Burkina Faso)
FAOSTAT:OS_FAO_DATA_PROVIDER.66 (Côte d’Ivoire)
FAOSTAT:OS_FAO_DATA_PROVIDER.217 (Sénegal)
CategoryScheme
Category
SDMX:SDMXStatSubMatDomainsWD1
(adoption of UNECE Classification of
International Statistical Activities)
(Economic Statistics.Sectoral Statistics.Agriculture, forestry, fisheries)
There are eight provision agreements, one for each combination of Data Provider and Data Flow
SDMX:SDMXStatSubMatDomainsWD1.
Domain_2.C4.C1
Registry Contents – Structure and Provisioning
Registry Interface: Submit Provision Agreement
Unique Id. of the Dataflow
Unique Id. of the Data Provider
Registry Interface: Submit Provision Agreement
Unique Id. of the Dataflow
Unique Id. of the Data Provider
Registry Interface: Submit Provision Agreement Response
The status indicates success or failure
Registry Interface: Submit Provision Agreement Response
The response returns the URN as well as confirmation of the provisioning details submitted
SDMX Structured URNs• The URNs in SDMX are compound identifiers
which reflect the relationships described in the information model– They are unique and predictable– They can be easily validated– They function exactly like URLs for the registry
• Each identifier tells you which organization maintains the identified object
• Each identifier tells you which agency maintains the scheme from which the identifier comes
URN Structure
urn:sdmx:org.sdmx.infomodel.registry.Provision
Agreement=FAOSTAT:OS_FAO_DATA_PROVIDER.29.FAOSTAT:AGRICULTURE_PRODUCTION
Data Provider Scheme
Maintenance Agency
Maintenance Agency
Data Provider
Dataflow
Data Provider
Provision Agreement
Data Flow
SDMX Technical Standards
Register a Data Set
Data
Flow
Data Provider
Provision Agreement
Structure Definition
Data
Set
Data Set Registration
•The data set is “registered” against the provision agreement
•The registry stores metadata (e.g. URL) about the data set: it does not store the data set
URL, registration date etc.
registers existence of data set
Data
Flow
Data Provider
Provision Agreement
Structure Definition
FAOSTAT:AGRICULTURE_COMMODITY
FAOSTAT:AGRICULTURE_AREA
FAOSTAT:AGRICULTURE_PRODUCTION
FAOSTAT:OS_FAO_DATA_PROVIDER.29 (Bénin)
FAOSTAT:OS_FAO_DATA_PROVIDER.42 (Burkina Faso)
FAOSTAT:OS_FAO_DATA_PROVIDER.66 (Côte d’Ivoire)
FAOSTAT:OS_FAO_DATA_PROVIDER.217 (Sénegal)
CategoryScheme
Category
Data Set
Metadata
URL, registration date etc.
There can be eight data sets registered, one for each Provision Agreement
Registry Contents – Data Set Registrations
Registry Interface: Data Set Registration
Action is “replace”, “append” etc.
An SDMX-ML file is a simple datasource
Identifies the Provision Agreement either by URN or by Dataflow and Data Provider
Registry Interface: Data Set Registration
URL of the SDMX-ML file
URN of the Provision Agreement
SDMX Technical Standards
Query for a Data Set
Data
Set
Data
Flow
Data Provider
Provision Agreement
Structure Definition
Data
Set
Query for Data Sets
AGRICULTURE_AREA
AGRICULTURE_PRODUCTION
29 - Bénin
42 - Burkina Faso
66 - Côte d’Ivoire
217 - Sénegal
Provision AgreementProvision
Agreement
Data Set
Metadata
Query for Data Sets
•for all Provision Agreements linked to Data Flow or
•linked to a specific Provision Agreement
Registry Interface: Query for Data SetsQueryType is “DataSets” “MetadataSets” etc.
Registry Interface: Query for Data Sets
Could be done with URN or as shown here with
explicit fields
Registry Interface: Data Set Query Response
URL of the SDMX-ML file
Identification of the Provision Agreement
Registry Interface: Data Set Query Response
Note that the URN of the registered data set included the date and time of registration
SDMX Technical Standards
Metadata Structure Definition
Metadata – Reported according to a Quality Framework
Metadata pertaining to a Quality Framework are reported in a Metadata Set, whose structure is defined by a Metadata Structure Definition
Metadata Attribute Metadata Attribute: Metadata Content
Data Flow
Data Provider
Provision Agreement
MetadataReport
Metadata Reporting“Quality” metadata about published or reported data sets are linked to the
Provision Agreement, or the Data Flow, or the Data Provider
AGRICULTURE_AREA
AGRICULTURE_PRODUCTION
29 - Bénin
42 - Burkina Faso
66 - Côte d’Ivoire
217 - Sénegal
Provision Agreement
Identify Structure
•Concepts
•Hierarchies
•Representation (e.g. code list)
Metadata Structure Definition (MSD)
Full Target Identifier
Partial Target Identifier
Metadata Structure Definition
Identifier Components Item Scheme
uses defined concepts
defines “keys” of object types to which metadata can be “attached”
specifies the identifier components (“key”) of the target object
identifies the code list from which the value
of the (key) component must be
taken when metadata is reported
Report Structure
Target Object Type
identifies target object
type of the component
Metadata Structure Definition
Metadata AttributesFormat and Permitted Value List
Report Structure
Concept Scheme
concept defined inConcept
takes semantic and context
from definition of format and permitted values
Metadata Structure Definition
can comprise the specification of
one or more report
can have hierarchy
MSD – Defining the Metadata Report
Metadata Attributes
Full Target Identifier
Partial Target Identifier
Metadata Structure Definition
Identifier Components
Format and Permitted Value List
Item Scheme
uses defined concepts
defines “keys” of object types to which metadata
can be “attached”specifies to which
object types the Report can be “attached”
specifies the identifier components (“key”) of the target object
identifies the code list from which the value
of the (key) component must be
taken when metadata is reported
Report Structure
Concept Scheme
concept defined inConcept
takes semantic and context from
Target Object Type
identifies target object
type of the component
can have hierarchy
definition of format and permitted values
MSD – Complete Picture
Full Target Identifier
Partial Target Identifier
Metadata Structure Definition
Identifier Components Item Scheme
Target Object Type
QUALITY_METADATA
P_AGREEMENT
AGENCY
DATAFLOW
Dataflow
DataProvider
FAOSTAT:OS_FAO_DATA_PROVIDER
FAOSTAT:DATAFLOWS
MSD – Identification of the Target
MSD Metadata Concepts: Data Quality
Concepts
Concept Id Description
DISSEMINATION_FORMATS* Refers to the various means of dissemination used for making the data available to the public. It would include a description of the various formats available, including where and how to get the information (paper, electronic formats, longer time series)
FREQUENCY_PERIODICITY* Frequency refers to the time interval between the observation of a time series. Periodicity refers to the frequency of compilation of the data (e.g., a time series could be available at annual frequency but the underlying data are compiled monthly, thus have a monthly periodicity).
PERIODICITY The frequency of compilation of the data
FREQUENCY the time interval between the observation of a time series
RELEASE_CALENDAR* Describes the policy regarding the release of statistics according to a preannounced schedule and its availability. It also contains the release calendar information
1
Metadata Attributes
Format and Permitted Value List
Report Structure
Concept Scheme
Concept
SCOPE_COVERAGE
DATA_QUALITY_REPORT
FREQUENCY_PERIODICITY
PERIODICITY
DISSEMINATION_FORMATS
SOURCE_DATA
REFERENCE_PERIOD
ADVANCE_RELEASE_CALENDAR
FAOSTAT:METADATA_CONCEPTS
This varies depending on the Metadata Attribute: Scope_Coverage, Source_Data are text, Reference_Period is Date/Time, and the remainder are linked to a Code List
The reporting hierarchy must respect the concept hierarchy. No additional reporting hierarchy is specified
TIMELINESS
MSD – Data Quality Report
MSD Metadata Concepts: Contact
Concepts
Concept Id Description
CONTACT* An instance of a role of an individual or an organization (or organization part or organization person) to whom an information item(s), a material object(s) and/or person(s) can be sent to or from in a specified context.
NAME The identity, expressed in natural language, of a person or organisation
PERSON_NAME The identity, expressed in natural language, of a person
ORGANISATION_NAME The identity, expressed in natural language, of an organisation
ADDRESS The identity of a building, a house or other structure.
BUSINESS_ADDRESS The address at which a business is located.
E-MAIL_ADDRESS The address of an electronic mailbox.
TELEPHONE_NUMBER The number by which a natural person or organisation can be contacted by telephone
1
Metadata Attributes
Format and Permitted Value List
Concept Scheme
Concept
E-MAIL ADDRESS
CONTACT_REPORT
CONTACT
NAME
ADDRESS
TELEPHONE_NUMBER
BUSINESS_ADDRESS
PERSON_NAME
FAOSTAT:METADATA_CONCEPTS
All Metadata Attributes are text
The reporting hierarchy must respect the concept hierarchy but may also introduce an additional hierarchy. In this respect the Contact Metadata Attribute is the parent of all other Metadata Attributes
ORGANISATION_NAME
Report Structure
MSD – Contact Report
MSD Metadata Concepts: Advance Release Calendar
Concepts
Concept Id Description
REFERENCE_PERIOD The time period to which a variable refers
RELEASE_DATE_TIME The specific point in time that data or metadata are made available
DATE_TOLERANCE The possible or permissible variance of a time period relative to a known point in time.
RELEASE_STATUS The state of preparedness of a statement on the availability of data or metadata
ANNOTATION Additional metadata
1
Metadata AttributesFormat and Permitted Value List
Concept Scheme
Concept
ARC_REPORT
REFERENCE_PERIOD
RELEASE_DATE_TIME
RELEASE_STATUS
ANNOTATION
DATE_TOLERANCE
FAOSTAT:METADATA_CONCEPTS
This varies depending on the Metadata Attribute: Reference_Period and Release_Date_Time are Date/Time, Release_Status is linked to a Code List, Date_Tolerance and Annotation are text
The reporting hierarchy must respect the concept hierarchy but may also introduce an additional hierarchy.
Report Structure
MSD – Advance Release Calendar
MSD - Identifiers
MSD – Report Structure
Metadata Set
Metadata Set: Quality Report
Metadata Set: Contact Report
SDMX Technical Standards
Metadata Provisioning
Metadata
Flow
Data Provider
Provision Agreement
MetadataStructure Definition
FAOSTAT:QUALITY_METADATA
FAOSTAT:QUALITY_REPORT
FAOSTAT:ARC_REPORT
FAOSTAT:CONTACT_REPORT
FAOSTAT:OS_FAO_DATA_PROVIDER.29 (Bénin)
FAOSTAT:OS_FAO_DATA_PROVIDER.42 (Burkina Faso)
FAOSTAT:OS_FAO_DATA_PROVIDER.66 (Côte d’Ivoire)
FAOSTAT:OS_FAO_DATA_PROVIDER.217 (Sénegal)
CategoryScheme
Category
SDMX:SDMXStatSubMatDomainsWD1
(adoption of UNECE Classification of
International Statistical Activities)
(Economic Statistics.Sectoral Statistics.Agriculture, forestry, fisheries)
There are 12 provision agreements, one for each combination of Data Provider and Metadata Flow
SDMX:SDMXStatSubMatDomainsWD1.
Domain_2.C4.C1
Registry Contents - Metadata Provisioning
Submit Provision Agreement to the Registry
• This is identical in form to that submitted for Data except the Data Provider is paired with a Metadataflow instead of a Dataflow
SDMX Technical Standards
Register and Query for Metadata
Data
Set
Metadata
Flow
Data Provider
Provision Agreement
Structure Definition
Data
Set
Metadata Registration and Query
QUALITY_REPORT ARC_REPORTCONTACT_REPORT
29 - Bénin
42 - Burkina Faso
66 - Côte d’Ivoire
217 - Sénegal
Provision AgreementProvision
Agreement
Metadata Set
Metadata
Register and Query for Metadata
• This is identical in form to the query and response for data except the artefact is a metadata set conforming to the business rules of a metadata flow instead of a data set conforming to the business rules of a data flow
SDMX Technical Standards
Hierarchical Code Lists
Hierarchical Code Lists – Example Scenario
• France is a country• France is part of the continent of Europe• France is a member of NATO• France is a member of the EU• France is a member of the G10• When I analyse statistics I might want to see totals by
– continent– trading block– military alliance– financial grouping
• France will be grouped with different sets of countries depending on the “view” required
• How do we express these groupings?
Reference Area
6B NATO
B0 EU
B1 NAFTA
BE Belgium
BG Bulgaria
CA Canada
CH Switzerland
CZ Czech Republic
DE Germany
DK Denmark
E1 Europe
E8 North America
EE Estonia
ES Spain
FI Finland
FR France
GB United Kingdom
GR Greece
HU Hungary
JP Japan
I2 Euro 12
IT Italy
NE Netherlands
US United States
Code Parent
BE E1
BG E1
CH E1
CZ E1
DE E1
DK E1
EE E1
ES E1
FI E1
FR E1
GB E1
etc
Code Parent
BE E0
CZ E0
DE E0
DK E0
EE E0
ES E0
FI E0
FR E0
GB E0
etc
Europe EU countries
Code Parent
BE 6B
BG 6B
CA 6B
CZ 6B
DE 6B
DK 6B
EE 6B
ES 6B
FR 6B
GB 6B
etc
NATO countries
Code Parent
CA B1
US B1
MX B1
NAFTA countries
Code Parent
CA B1
US B1
North America
Code Composition
Code Parent
BE G0
CA G0
CH G0
DE G0
FR G0
GB G0
JP G0
IT G0
NL G0
SE G0
US G0
G10 countries
Code Association
Code List
Code
Hierarchy-1
Code Composition
Hierarchy-2 Hierarchy-3
Code Composition
Hierarchy-4
Code Composition
Hierarchical Code Scheme
Code Code Association
Code Composition
Level
Hierarchy
parent code
code
relates a code to a parent code
groups codes with the same parent
comprises code groupscomprises hierarchies
comprises code groups
level based hierarchy has formal levels
value based hierarchy has code groups
Property
Code List
belongs to
Properties of the association
The codes may be in variety of code lists.
Schematic of the Hierarchical Code Scheme
Item Scheme Maps
• Many types of “item scheme” use the same fundamental structure– Code list– Category scheme– Concept scheme
• Two Item Schemes can be mapped
Item Scheme
Item Item Association
has item associations
source item
Item Scheme
Itemtarget item
Item Scheme Association
source item schemetarget item scheme
Code List Category Scheme
Concept Scheme
Code Category Concept
Code List Map
Category Scheme
Map
Concept Scheme
Map
Association Role
Code List Category Scheme
Concept Scheme
Code Category Concept
Schematic of the “Code” Mapping
Structure Maps
• Structures can also be mapped– Data structures– Metadata structures
Structure Sets
Structure Map
Code List Map
Information Model: Summary• Supports data and metadata reporting and exchange
– Data and metadata structure definitions– Data and metadata sets
• Supports the process of reporting and exchange– Data/metadata providers– Data/metadata flows– Provision agreements
• Supports registration– Data and metadata sets– Data and metadata can be linked
• Supports query– Categories linked to data and metadata– Constraints for finer grained queries– Retrieval of metadata linked to data
• Supports data analysis, comparison and conversion– Hierarchical code schemes– Structure, Concept, Code, Category maps
CategoryScheme
CategoryData or Metadata
Flow
Data Provider
Provision Agreement
Structure Definition
Data Set or
Metadata Set
Content
Constraint
Data/Metadata Reporting, Query, Analysis, Mapping
Structure and Item Scheme
Maps
Registered Data Set or Metadata
Set
Attachment
Constraint
SDMX Technical Standards
Reporting Taxonomy
Reporting Taxonomy
• An SDMX Reporting Taxonomy is a group of data flows and/or metadata flows which form the basis of a single real-world document or report
• They can be organized into groups and sub-groups as needed
• They can be named and identified• Useful for managing various types of
reports over time
SDMX Technical Standards
Processes
Processes
• SDMX 2.0 provides the ability to document the steps and logic of a process flow
• This is not executable, but serves as documentation to describe the processes which produce data and metadata
• It is useful as a target for the attachment of reference metadata describing processing
SDMX Technical Standards
Services Based Architecture
What is Services-Based Architecture?
• A “services-based architecture” (or services-oriented architecture, SOA) is an architecture that supports distributed applications – Each service, or component, can exist elsewhere on the
network – typically the Internet– The services are coordinated by the use of registries and
event notifications– They communicate using XML messages (like SDMX-ML)
• This type of architecture can be very powerful when data sources and metadata sources are available in standard formats, using standard protocols
Registry: Subscription Service
Registry: Notification Service
RSS Feed
SDMX Technical Standards
Alignment with Other Standards
Other Statistical Standards
• There are many statistical standards which are potentially used by SDMX systems:– Data Documentation Initiative (metadata for microdata)– ISO/IEC 11179 (for semantic models and definitions)– eXtensible Business Reporting Language (for business
reporting)– ISO 19115 (for geophysical metadata and maps)
• Typically, these standards represent the source information of aggregate SDMX data, or represent additional metadata
• SDMX has been aligned with these standards to support such systems
SDMX Standards Alignment Example: The Data
Documentation Initiative(DDI)
XML specification for microdatahttp://www.ddialliance.org
What is the DDI?• Purpose
– Capture extensive metadata for archiving, dissemination and use of microdata
– 5 sections (document, survey, files, variables, documentation), hundreds of elements
• DDI Alliance– Expert Committee, Steering Committee and Working
Groups– http://www.ddialliance.org
• DDI Users– US & European academics and statistical agencies– International Household Survey Network (IHSN) &
developing countries
What is the DDI?• DDI is a mature product with a long history
– ISR OSIRIS (1970), The IASSIST Codebook Action Group (SGML, DTD) (1993), Draft DDI (1997), Beta-testing (1999), DDI 1.0 (2000), DDI 2.0 (2003), DDI 3.0 (2007)
• DDI 1/2.x model: single survey
• DDI 3.0 model: the survey life cycle
Tools for DDI• International Household Survey Network
– Objectives: Improve the availability, quality and use of survey data in developing countries
– Members: International organizations & national agencies supporting survey programs in developing countries
– Management: DFID, ILO, PARIS21, UNICEF, UNSD, WHO, World Bank
– Activities: Coordinating survey programs, Harmonizing concepts & methods, Maintaining a survey catalog, Developing data dissemination tools
– http://www.surveynetwork.org
• Microdata Management Toolkit– DDI based user friendly package for archiving and preservation
of surveys
Microdata Management Toolkit
Generate HTML based CD-ROM
Import metadata and prepare CD-ROM
Import data and compile metadata 1
2
3
Microdata Management Toolkit• Status
– Available in English, French, Spanish, Russian– http://www.surveynetwork.org/toolkit
• Roll-out program– Completed training / pilot in about 20 countries,
mainly in the Africa region– Expected use by UNICEF for next round of Multiple
Indicators Cluster Surveys (MICS, 55 countries)– Asia: Partnership with United Nations Economic and
Social Commission for Asia and the Pacific (ESCAP)– Latin America: partnership with Inter-American
Development Bank (IADB)– Used by IHSN member agencies (WHO, ILO, etc.)– Component of World Bank Accelerated Data Program
(ADP)
DDI and SDMX
SDMXAggregated data
Indicators, Time SeriesAcross time
Across geographyOpen AccessEasy to use
DDIMicrodata
Low level observationsSingle time period Single geographyControlled accessExpert Audience
• Microdata data is a important source of aggregated data• Crucial overlap and mappings exists between both
worlds (but commonly undocumented)• Interoperability provides users with a full picture of the
production process
Demo: SDMX – DDI Integration
• Aggregates and microdata on the website of the Nigerian statistical office
Questions?