complex sdmx mapping issues for development indicators unsd – dfid project on improving the...
TRANSCRIPT
Complex SDMX mapping issues for development
indicators
UNSD – DFID Project on Improving the collation, availability and dissemination of development
indicators (including the MDGs)
22 April 2014, Phnom Penh, Cambodia
Agenda for Today
CountryData Data Structure Definition (DSD) Mapping best practices and guidelines Review of mapping process Complex SDMX cases Overview of new features in DevInfo 7
Mapping Tool
Mapping best practices and guidelines
MDG / CountryData DSD Concepts
Type Name Type of code usedDimension Frequency i.e. Annual, Quarterly, etc.Dimension Series Indicator titleDimension Units of measurement i.e. Percent, numberDimension Location i.e. Total, Urban, RuralDimension Age group i.e. 15–49 yr olds, under 5 years oldDimension Sex i.e. Total, male, femaleDimension Reference Area Country nameDimension Source Type i.e. Survey, census, admin.Dimension Time Period i.e. 1990, 1991, etc.Measure Observation Value -Attribute Unit multiplier i.e. per 10,000, per 1,000 etc.Attribute Time period details i.e. 2001 – 2003, Q1 2010 – Q3 2011Attribute Nature of data points i.e. Estimated, Modelled, Adjusted etc.Attribute Source details Source name & dateAttribute Footnotes Details of methodology & other notes etc.
CountryData DSD
Single DSD used for all MDG (and other development indicators) - superset of MDG DSD
Using cross-domain concepts and codelists: Frequency (CL_FREQ) Sex (CL_SEX) Age (CL_AGE) (modified) Unit multiplier (CL_UNIT_MULT)
CountryData DSD
Support for diverse indicators means not all dimensions are applicable in all cases
e.g. Age Group is not applicable to indicator “Telephone lines”
Value NA is used when a dimension or attribute is not applicable.
CountryData DSD: Mappings
Not always obvious which values should be used in some dimensions
What should be SEX in indicator “Births attended by skilled health personnel”: Female? Total? Not Applicable?
What about the AGE dimension?
INDICATOR SERIES UNIT SEX AGE LOCATIONBirths attended by skilled health personnel SH_STA_BRTC Percent ?? ?? Total NA NA
CountryData DSD: Mappings (2)
Inconsistent mappings lead to duplications and other anomalies
In CountryData, mappings for time series are agreed before data exchange
For the MDG dataset, there has been a spreadsheet developed with the recommended mappings for each time series
CountryData DSD: Maintenance
CountryData codelists are maintained by UNSD
Periodically, the DSD needs to be updated A new version of the CountryData DSD (1.4),
with modified codelists, was released in January 2014
DevInfo 7
DevInfo 7 (Di7) launched in Nov 2012 SDMX 2.1 & 2.0 compliant Web-based software Most countries have migrated to version 7 from
version 6 DevInfo Support Group has been adding
enhancements and bugfixes; newest version is 7.1.0.1
Mapping to the DSD (in DevInfo)
The DevInfo mapping tool is designed to facilitate the mapping of the database to the DSD based on mapping between the codelists of the
DSD and origin database certain situations require further manual effort to
map a time series sometimes a “fix” is required to the database where
the data simply isn’t valid or where there are duplicates
Area, hierarchical dimension IUS = Indicator, Unit and Subgroup
Time series data are stored with the combination of the 3 dimensions Indicator Unit Subgroup: Combination of one or more sub-dimensions
Source & Time Period Together with IUS “uniquely” defines each data value
Footnote “Free text” field stored with data value
DevInfo Data Architecture
DevInfo Database CountryData DSD
• Area • Indicator• Unit
• Subgroup (i.e. Sex, Age, Location etc.)
• Source
• Time Period
• Footnotes
• Frequency (Default = “Annual”)• Reference Area• Series• Units of measurement• Unit multiplier (Default = 0)• Location (Default = “Total”)• Age group (Default = “All Ages”)• Sex (Default = “Both Sexes”)• Source Type (Default = “NA”)• Source details• Time Period• Time period details• Nature of data points (Default =
“C”)• Footnotes
DevInfo Mapping Process Review
DevInfo Mapping Process Review
Go to: unstats.un.org/demoginfo1
( /demoginfo2 /demoginfo3 /demoginfo4 /demoginfo5 /demoginfo6 )
Log onto administrative profile
Log onto administrative profile
Username: [email protected]: 1234
Scroll down to ‘Registry’ menu
Step 1: Codelist mapping
Step 1: (A) Map Indicator codes
Step 1: (B) Map Unit codes
Step 1: (C) Map Subgroup codes
Step 1: (C) Choose Subgroup list
Step 1: (C) Map Age subgroup
Step 1: (C) Map Sex & Location
Step 1: (D) Map Area
Step 1: Save codelist mappings
Step 1: Ignore warning
Step 1: Confirm mapping saved
Step 1: Complete
Exercise 1: Codelist mapping
Use unstats.un.org/demoginfo[1-6]
Map the codelists (where possible) for: Just one indicator, “Antenatal care coverage for at least
one visit” Unit Age Sex Location Area
Step 2: Confirm IUS mapping
Step 2: Save IUS Mappings
Why is Step 2 necessary?
The default values for SEX, LOCATION or AGE GROUP mapping may not be applicable to all mappings
The codelist mapping may only provide a partial mapping of the time series (i.e. more information is required)
Any necessary mapping changes are made in Step 2
Where are the default values?
Admin panel: Application settings
Insert screens shot/details of admin panel and default value storage…
Scroll down
Application settings: mapping default values
Antenatal care coverage for at least one visit - PercentSex Female Female
Location RuralAge 15-49 yr 15-49 yr
Time Period2000 95.6 92.22004 96.5 95.92006 98.8 98.72010 98 97.4
SERIES LABEL UNITS LOCATION AGE_GROUP SEX
Antenatal care coverage, at least one visit PERCENT T 015_035_Y F
Antenatal care coverage, at least one visit PERCENT R 015_035_Y F
Mapping of SUBGROUP (Defaults)
Indicator Unit
• Where a subgroup value is missing the default values will apply:
Default Values
• Location = T
Land under forest cover - PercentTime Period Data Value
1993 59.821997 58.62002 61.152005 602006 59.092008 57.992009 57.62010 57.56
SERIES LABEL UNITS LOCATION AGE_GROUP SEX
Land area covered by forest PERCENT T NA NA
Overriding defaults in mapping of SUBGROUP
Indicator Unit Subgroups?
Default Values• Location = T• Age Group = 000_099_Y• Sex = Both sexes
• Common example of where default subgroup mappings do not apply
SERIES LABEL UNITS LOCATION AGE_GROUP SEX
Antenatal care coverage, at least one visit PERCENT T 015_035_Y F
Antenatal care coverage, at least one visit PERCENT R 015_035_Y F
Antenatal care coverage, at least one visit PERCENT U 015_035_Y F
Antenatal care coverage, at least one visit - PercentLocation Total Rural Urban
Time Period2000 44.8 41.2 67.32005 71.8 70.4 80.72010 89.1 87.6 97
Overriding defaults in mapping of SUBGROUP
Indicator Unit Subgroup for Age and Sex?
Default Values• Age Group = 000_099_Y• Sex = T
• So subgroups coverage affects the number of manual changes which have to be made…
SERIES LABEL UNITS LOCATION AGE_GROUP SEX
Condom use at last high-risk sex PERCENT T 015_010_Y M
Condom use at last high-risk sex PERCENT T 015_010_Y F
Condom use at last high-risk sex - Percent
OtherFemale 15-24 yr
Male 15-24 yr
Rural female 15-24 yr
Rural male 15-
24 yr
Urban female 15-24 yr
Urban male 15-
24 yrTime Period
2003 32.7 51.7 21.2 38.9 32.5 50.32008 29.2 46.3 21.6 40.6 33.5 52
• Using the Other subgroup to clump age, sex and location information results in more manual mapping
Overriding defaults in mapping of SUBGROUP
Indicator Unit
Subgroup for Location, Age and Sex?
Default Values• Location = T• Age Group =
000_099_Y• Sex = T
?
Step 2: Amend IndicatorWhen using the check box to tick the mapping, you are “fixing” the mapped DSD values. If the box is unchecked again and the mappings saved, then DSD values revert to those mapped at codelist / default values (i.e. any manual changes are undone.)
Exercise 2: Mapping time series
Use unstats.un.org/demoginfo[1-6] Username = [email protected] Password = 1234
Map the time series for1. “Literacy rate of 15-24 year-olds”
2. “Condom use at last high risk sex (Version 2)”
3. “Land under forest cover”
Step 2: Complete
Final Step: Register mappings
Final Step: Generate SDMX-ML
Final Step: Complete
Complex SDMX Cases
Complex mappings under the 1st and 2nd mapping steps
Most commonly mappings need to be overridden for dimensions Sex, Age Group and Location
But sometimes manual changes are required between DevInfo and DSD indicator and unit, such as when… More than one DevInfo code relates to a single DSD
code OR More than one DSD code relates to a single DevInfo
code
IndicatorTime
PeriodData
ValueProportion of seats headed by women in
national parliament - Percent1999
11.1Seats held by men in national parliament -
Number2008
80Seats held by women in national
parliament - Number2008
14Seats in national parliament - Number 2008 94
SERIES LABEL UNITS LOCATION AGE_GROUP SEX
Seats in national parliament PERCENT T 000_099_Y FSeats in national parliament NUMBER T 000_099_Y MSeats in national parliament NUMBER T 000_099_Y FSeats in national parliament NUMBER T 000_099_Y T
Many-to-one mapping for Indicator codelist (Example 1)
Indicator
Indicator
Indicator
OtherFemale 15-24 yr
Male 15-24 yr
Total
Indicator2003 72.2 75.12008 65.9 76.1
Men 15-24 years with comprehensive knowledge of AIDS - Percent
200834.2
Women 15-24 years with comprehensive knowledge of AIDS - Percent
200828.3
Population 15-24 year-olds who have comprehensive correct knowledge of HIV/AIDS - Percent
Time Period
SERIES LABEL UNITS LOCATIONAGE_GROUPSEX
Population with comprehensive correct knowledge of HIV/AIDS PERCENT T 015_010_Y M
Population with comprehensive correct knowledge of HIV/AIDS PERCENT T 015_010_Y F
Many-to-one mapping for Indicator codelist (Example 2)
Indicator
Indicator
Indicator
SERIES LABEL UNITS LOCATION AGE_GROUP SEX
Population below national poverty line PERCENT R 000_099_Y T
Population below national poverty line PERCENT T 000_099_Y T
Population below national poverty line PERCENT U 000_099_Y T
Location Total Rural UrbanIndicator Time Period
1993 51.22000 60.3 65.7 14.3
2005-2006 56.7 61.9 28.52010 44.9 48.7 22.12001 60.42005 56.92006 56.92008 56.9
Population below national poverty line - Percent
Share of population below poverty line - Percent
Indicator Indicator
Many-to-one mapping for Indicator codelist (Example 3)
SERIES LABEL UNITS LOCATION AGE_GROUP SEX
Gender Parity Index in primary level enrolment RATIO T 000_099_Y T
Other Total Rural UrbanIndicator Time Period
Gender parity index in primary education - Ratio 2002 0.922003 0.922004 0.932005 0.932006 0.962007 0.96
Gender parity index at primary education - Index 2007-2008 0.962008 0.99 0.99 0.992009 0.96
Unit Unit
Many-to-one mapping for Unit codelist (Example 1)
SERIES LABEL UNITS LOCATION AGE_GROUP SEX
Mobile cellular telephone subscriptions NUMBER T NA NATelephone lines NUMBER T NA NA
Telephone lines - NumberLocation Total
Other Cellular lines Fixed lines
2000 72,602 58,2612001 276,034 56,1472002 505,627 59,4722003 893,035 65,7932004 1,165,035 82,4952005 1,525,125 100,7772006 2,697,616 129,863 2,827,4792007 5,163,414 165,788 5,329,202
Time Period
One-to-many mapping of Indicator
Indicator
Subgroup
Manual change
?
SERIES LABEL UNITS LOCATION AGE_GROUP SEX
Unmet need for family planning, limiting PERCENT T 015_035_Y F
Unmet need for family planning, spacing PERCENT T 015_035_Y F
Unmet need for family planning PERCENT T 015_035_Y F
Unmet need for family planning - PercentLocation Total Rural Urban
Other Limiting Spacing
1992 19.4 21 40.42000 11.6 24 35.62005 13.4 24.5 37.9 38.4 34.42010 18.9 19.5 15.5
Time Period
One-to-many mapping of Indicator
Indicator Unit
Manual change
?
Subgroup
SERIES LABEL UNITS LOCATION AGE_GROUP SEX
Gender Parity Index in primary level enrolment RATIO T 000_099_Y T
Gender Parity Index in primary level enrolment - PercentTime
PeriodData Value
2000 0.932003 0.972006 0.992009 1
“Manual” mapping of UNITIndicator Unit
Manual change• Unit =
“Ratio”
Attribute: Unit Multiplier (UNIT_MULT)
“Exponent in base 10 that multiplied by the observation numeric value gives the result expressed in the unit of measure.”
If the observation value is in millions, unit multiplier is 6; if in billions, 9, and so on. Where the number is simple units, use 0.
Mandatory attribute
Attribute: Unit Multiplier (UNIT_MULT)
Population undernourished, millionsTime Period Data Value
1993 5.821997 5.62002 6.152005 62006 5.092008 4.992009 4.62010 4.56
SERIES LABEL UNITS LOCATION AGE_GROUP SEX UNIT_MULT
Population undernourished NUMBER T 000_099_Y T 6
Back to mapping…
Example 1: Many-to-one mapping
Example 2: One-to-many mapping
Final Step: Register new mappings
Exercise 3: Complex mappings
Use unstats.un.org/demoginfo[1-6]
Map/ amend/ publish the time series for:1. “Gender parity index in primary education”
2. “Seats held by men in national parliament”
3. “Seats held by women in national parliament”
4. “Telephone lines”
Other issues encountered with generating SDMX from DevInfo
Type NameDimension SeriesDimension Units of measurementDimension LocationDimension Age groupDimension SexDimension Reference AreaDimension Time Period
• The CountryData DSD requires any data point to be uniquely described by the following dimensions:
• However, DevInfo allows data to be stored in overlapping time intervals and with multiple sources. These issues need to be resolved to conform to the “uniqueness” required by the CountryData DSD.
Multiple sources
Proportion of population with access to improved sanitation - PercentLocationSource CPC Census 1995 NCEHWS_2007 NCEHWS 2003 NCEHWS 2004
Time Period1990 111995 292000 372001 402002 42 41.62003 42 42.22004 44 44.32005 462006 472007 49
Total
SERIES LABEL UNITS LOCATION AGE_GROUP SEX
Population using improved sanitation facilities PERCENT T 000_099_Y T
Allowable in DevInfo but not in the DSD
Overlapping time
Infant mortality rate - Deaths per 1000 live birthsLocation Total
Time Period1990-1994 27.31995-1999 25.51999-2003 24.22005-2006 25.32005-2009 18.9
SERIES LABEL UNITS LOCATION AGE_GROUP SEX
Infant mortality rate PER_1000_LIVE_BIRTHS T 000_001_Y T
• This issue is only a problem where overlapping periods begin from the same year, as the mapping tool takes the first year in the period as the value for the “Time Period” dimension.
Targets in the database
Targets are also an issue when found in the database since they should not be exchanged as observed values
Target in database (Example 1)
Proportion of people living below the national poverty line - PercentLocation Total Rural Urban
Other MDG target
1990 481992 46 51.8 26.51997 39.1 42.5 22.12002 33.5 37.6 19.72015 24
Time Period
SERIES LABEL UNITS LOCATION AGE_GROUP SEX
Population below national poverty line PERCENT R 000_099_Y T
Population below national poverty line PERCENT T 000_099_Y T
Population below national poverty line PERCENT U 000_099_Y T
Sometimes stored as subgroup which can be ignored at the 2nd stage…
Location Rural Urban
SourceCPC
Census CPI Census
2005Governme
nt 2007SPC RHS
2000SPC RHS
2000SPC RHS
2000
1995 6502000 530 580 1702005 4052015 260
Time period
TotalMaternal mortality ratio - Deaths per 100,000 live births
SERIES LABEL UNITS LOCATION AGE_GROUP SEX
Maternal mortality ratio PER_100000_LIVE_BIRTHS T 000_099_Y F
Target in database (Example 2)
But other times can be found as a time period among observed values…
Use of filters at registration
To deal with the issues of multiple sources for a given time period overlapping time period beginning with the same
year targets presented alongside observed values
The registration page provides a feature to filter out data from a generated SDMX message associated with specific time periods and source references
Filter by time/ source
Final Step: Select source filter
Filter by time
Final Step: Select time filter
Final Step: Register new mappings
Final Step: Complete
Exercise 4: Filter time series
Use unstats.un.org/demoginfo[1-6]
Map/ amend/ publish the time series for;1. “Under-five mortality rate”
2. “Maternal mortality ratio (MMR)”
3. “Tuberculosis prevalence rate”
4. “Proportion of the population using improved sanitation facilities”
New features in DevInfo 7.1
Improved Quick Data Search
Database updates and publishingAfter updating your database with new data, you can optimize and publish the updates in SDMX at the same time in Admin Panel
Database updates and publishing
Database updates and publishing
Database updates and publishing
DevInfo installation and upgrading
The latest version of DevInfo is 7.1.0.1 Starting with this version, you no longer need to
uninstall and reinstall in order to upgrade Instead you will use patches available on the DevInfo
Downloads page
Exporting and importing mappings
Exporting mappings
Editing exported mappings
Editing exported mappings
Exporting and importing mappings
Importing edited mappings
Importing edited mappings
Importing edited mappings
Thank you for your attention