dental council of india

Upload: ajinkya-kadam

Post on 03-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Dental Council of India

    1/8

    Data format translation and migration

    Future possibilities

    Alasdair Crockett, Data Standards Manager

    UK Data Archive

  • 8/13/2019 Dental Council of India

    2/8

    Past problems and future

    solutions

    Past/existing problemsskeletons in the back catalogue. The

    UKDA and other long-standing archives have old studies in column

    binary or other legacy formats that are difficult, time consuming andoccasionally practically impossible to process/migrate.

    Future solutionsto ensure that we dont store up similar problems

    (with vastly increased amounts of data) in 20 or 30 years time

    This talk covers Future Solutions

  • 8/13/2019 Dental Council of India

    3/8

    When does data format translation

    occur? To enable data processing(validation, etc.)

    From ingest format to processing format (this being SPSS in the case of theUK Data Archive)

    To ensure long-term preservationFrom processing format to preservation format(s), these being SPSS portableand tab-delimited text (with data dictionary) in the case of the UK DataArchive.

    To achieve user-friendly disseminationFrom preservation or processing format to dissemination format of userschoice e.g. STATA, SAS or EXCEL, in addition to the ubiquitous SPSS.

    Migrationwhen previously mainstream formats become obscure or newformats are requested by users

  • 8/13/2019 Dental Council of India

    4/8

    What are the potential problems of data

    format conversion?At time of processing:

    Rounding/truncation of numeric data

    Truncation of textual data

    Differences in handling internal metadata (differential label lengths,

    missing value handling, etc.)

    Corruption of specially formatted variables (especially date/timevariables)

    Embedded special characters (line feeds, carriage returns, tabs, etc.)

    Migration:

    all the above and added problems with

    Dealing with out of date, unfamiliar and/or, inaccessible formats (e.g.column binary)

  • 8/13/2019 Dental Council of India

    5/8

    The Data Curation Initiative: An XML

    standard and conversion utilities for survey data

    The Data Curation Initiative(DCI) consists of:

    XML Standard:

    Open standard for sharing and preserving datasets

    Implemented as an XML Schema

    Stores all attributes of a survey datasetlabels, missing value

    definitions, variable level notes, etc.

    Conversion software:

    From proprietary formats to DEI (with no data loss)

    From DEI to proprietary formats (text file +command file)

    File and variable level metadata import/export to DDI XML

    schema

  • 8/13/2019 Dental Council of India

    6/8

  • 8/13/2019 Dental Council of India

    7/8

    Migration strategy

    An approach such as the data curation initiativeallows either:

    Traditional migration strategiessystematicmigration of whole collection on preservationserver

    Migration on requestpreservation versionremains the same but on-the-fly export utilities areupdated to cater for new versions/formats as they

    become popular

  • 8/13/2019 Dental Council of India

    8/8

    Doesnt the DDI do this?

    Not so far

    Could build onto the DDIin any case the DCI

    will populate variable level of DDI Some advantages to keeping data and metadata

    separate:

    Single xml file could become enormous and slow to

    parse

    Allows communities who dont use the DDI to use the

    DCI (and vice versa)