restricted daejeon, 26-29 april 2010 1 an sdmx based unified data catalogue (udc) msis – meeting...
TRANSCRIPT
Daejeon, 26-29 April 2010
Restricted
1
An SDMX based unified data catalogue (UDC)
MSIS – Meeting on the Management of Statistical Information Systems
1
Gabriele Becker / Massimo Bruschi
Statistical Information Systems
Monetary & Economic Department
Bank for International Settlements
Restricted
2
The SDMX vision
Need: up-to-date numbers, data documentation, good quality data
Data can be offered by: NSOs, CBs, IOs How to choose, filter out duplication, get the “fresher” ?
Data providers (originators) offer their data “in SDMX” Dissemination = reporting = data sharing… single storage ! SDMX registries help users and organisations to find data
How “real” is this SDMX vision? What do we still need to learn?
Restricted
3
The Unified Data Catalogue (UDC) concept Can we “implement” the vision ? UDC: a single data catalogue that allows to discover,
select and retrieve statistical data from all registered data sources
discovery implies access to metadata:• DSD – data structure definitions• concepts and code-lists• category schemes
An SDMX registry is a natural repository
Unified Data Catalogue feasibility study to analyse this
Restricted
4
UDC study: Objectives
Provide centralised access to a variety of internal and external data-sources
Generic search facilities against “registered” data sources Directly retrieve data and metadata from all data sources Use SDMX technical standards, SDMX registry, web services Broaden SDMX knowledge within BIS (business area and IT
colleagues)
Restricted
5
User stories Registrations Constraints GUI features Navigation /
Search Query & retrieval Output handling Automation Security
Restricted
6
UDC prototype architecture
Simplistic approach: to search and retrieve data from a data source all what we need to know are the data structures and the source query language
If a source follows the SDMX-IM we also need a (web) service connected to it able to respond to SDMX Query
SDMX-enabled data source: “native” or “adaptable”
SDMX-ML file + DSD + “file-query-handler” = simplest SDMX enabled source
Restricted
7
SDMXRegistry
web appl. SDMXUDC GUI
mappabledata source
SDMXquery adapterweb service
SDMXdata sourceweb-service
SDMXfiles
web service
Registrations
Plan: schematic architecture
Internalor
externalsources
Restricted
8
Components of the UDC prototype SDMX Registry (“off the shelf” SDMX Tool)
• Data structure definitions of all “connected” data sources• Registrations for all data flows for all connected data sources• URLs to SDMX-files and SDMX query services• Updated via SDMX-ML messages or interactively (“KeyMaster”)
UDC (developed for the study)• GUI to navigate the registry information• Queries the data sources• Retrieves data and presents them to the user
SDMX query web services (developed for the study)• For the different types of data sources
Data query services (partly existing, partly developed)• For each of the connected queryable data sources
Restricted
9
BIS Data Bank
DBQL output
SDMX-MLproxy daemon
medts.aLinux
MarkIT SQL database
SQL storedprocedures
mstat.sWin
TS web service
mstat.aWin
MSTAT Cubes
v.ds03Linux
SDMX-MLquery
web service/databank/query
SDMX-MLquery
web service/mstat/query
SDMX-MLquery
web service/markit/query
SDMXRegistry
web appl.
R/O Registry
UDC web appl.
SDMX-MLfile
browser
Internet ExplorerUDC GUI
PCWin
What we did: detailed architecture
SDMX-MLdatafiles
.xml
.xml
.xml
.xml
Restricted
10
UDC GUI key features
Browse the Categories / Data-flows / Provision registrations Browse selected DSD: dimensions, attributes, code-lists Build queries based on DSD (code selection) Run query and view results (simple table) Download results and DSDs in SDMX-ML format Search by Concept / Codelist
Restricted
13
UDC Prototype: some results UDC can provide (unsecured) access to
• BIS Data Bank: time series repository, SDMX-EDI IM, LINUX, FAME, Sybase, own query language + query adapter
• MSTAT OLAP: IBFS data multi-dimensional cubes, MS Windows, SQL Server, SDMX Query to OLAP / MDX adapter
• MSTAT Sandbox, research data in relational base, MS Windows, SQL Server, DSD on unstructured dataset + SMDX / SQL adapter
• SDMX-ML generic files + generic file adapter Practical use of registration, provisioning, constraints processing,
… SDMX vision is real … with some practical issues
Restricted
14
Issues found (Aug. 2009, SDMX 2.0)
Not possible to register compact or utility files in registry used
Not possible to register files using message groups and annotations as not supported in registry used
Missing functionality in SDMX Query message Some issues with registry implementation used Constraints processing on registry did not work ECB does not provide DSDs on their website (files are OK) Cross-platform communication with security not solved In general: access authorisation to query-able data sources
is unresolved
Restricted
15
Conclusions
SDMX vision is real: the UDC works Enhancements to standards already part of SDMX 2.1 Enhancements to registry implementation (eg industrial
strength required) Non-SDMX issues (cross-platform connectivity and
access authentication) exist and need to be looked into Current SDMX offerings from other organisations are
rather diverse (message types, features used, version implemented)
Diverse offerings make requirements for a UDC more complex
Restricted
16
Next steps for the BIS
UDC can be a central part of future BIS environment Road to UDC will take a few years Continue the feasibility study in the next year Refine UDC
• More data sources
• More user facilities for search and navigation Work with SDMX standards experts on issues found Work with other SDMX data providers