the binx language

36
e-Science Data Information and Knowledge Transformation e-Science Data Information and Knowledge Transformation The BinX Language The BinX Language

Upload: kaye-ball

Post on 01-Jan-2016

25 views

Category:

Documents


1 download

DESCRIPTION

The BinX Language. What is BinX?. B inary in X ML Use XML to mark up binary data Mark up data types Mark up sequences Mark up arrays Complex structures. Primitive Data Types. Mark up data types. FF 7F 7F FF FF FF 00 00 C8 42 42 C8 00 00 1234. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The BinX Language

e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation

The BinX LanguageThe BinX Language

Page 2: The BinX Language

www.edikt.orgwww.edikt.org

What is BinX?What is BinX?

Binary in XML– Use XML to mark up binary data– Mark up data types– Mark up sequences– Mark up arrays– Complex structures

Page 3: The BinX Language

www.edikt.orgwww.edikt.org

1. <short-16 byteOrder=“littleEndian”> 32767</short-16>

2. <integer-32 byteOrder=“bigEndian”> 2147483647</integer-32>

3. <float-32 byteOrder=“littleEndian”>100.0</float-32>

4. <float-32 byteOrder=“bigEndian”>100.0</float-32>

Primitive Data Types Primitive Data Types

Mark up data types

FF 7F 7F FF FF FF 00 00 C8 42 42 C8 00 00

1 2 3 4

Page 4: The BinX Language

www.edikt.orgwww.edikt.org

Abstract “struct” typesAbstract “struct” types

Mark up a sequence

<struct> <unsignedShort-16 /> <unsignedShort-16 /> <byte-8 /> <byte-8 /> <byte-8 /></struct>

Screen descriptor in GIF:

Screen width: unsigned short;

Screen height: unsigned short;

Packed field: a byte

Background colour index: byte

Pixel aspect ratio: byte

Page 5: The BinX Language

www.edikt.orgwww.edikt.org

Abstract “array” typesAbstract “array” types

Mark up an array

<arrayFixed> <integer-32 /> <dim indexTo=“99”> <dim indexTo=“9” /> </dim></ arrayFixed >

A 2-dimensional array containing 10-by-100,32-bit integers

Page 6: The BinX Language

www.edikt.orgwww.edikt.org

Embedded abstract typesEmbedded abstract types

Complex structures<struct>

<short-16 />

<arrayFixed>

<byte-8 />

<dim indexTo=“7” />

</arrayFixed>

<struct>

<integer-32 />

<float-32 />

<double-64 />

</struct>

</struct>

Page 7: The BinX Language

www.edikt.orgwww.edikt.org

User-defined metadataUser-defined metadata

Label the data types and structures<struct varName=“Data Sample”>

<short-16 varName=“ID” />

<arrayFixed varName=“List of 10 complex numbers”>

<struct varName=“Complex”><float-32 varName=“Real” /><float-32 varName=“Imaginary” />

</struct>

<dim indexTo=“9” />

</arrayFixed>

</struct>

Page 8: The BinX Language

www.edikt.orgwww.edikt.org

Reusable type definitionsReusable type definitions

Define macros for reuse<definitions>

<defineType typeName=“FourCC”><arrayFixed>

<character-8 /><dim count=“4” />

</arrayFixed></defineType>

</definitions>

<struct varName=“Wave_Header”><useType typeName=“FourCC” varName=“Keyword” /><integer-32 varName=“Chunk_Size” />

</struct>

Page 9: The BinX Language

www.edikt.orgwww.edikt.org

Linking to binary dataLinking to binary data

Reference the binary data file<definitions>

<defineType typeName=“Header”>… …</defineType><defineType typeName=“Format_Chunk”>… …</defineType><defineType typeName=“Data_Chunk”>… …</defineType>

</definitions>

<dataset src=“myfile.wav”><useType typeName="Header" /><useType typeName="Format_Chunk" /><useType typeName="Data_Chunk" />

</dataset>

Page 10: The BinX Language

www.edikt.orgwww.edikt.org

A BinX documentA BinX document

<binx byteOrder=“bigEndian”>– <definitions>

<defineType typeName=“myTyp”>– <arrayFixed>

• <character-8/>• <dim indexTo=“9”/>

– </arrayFixed>

</defineType>

– </definitions>– <dataset src=“myfile.bin”>

<useType typeName=“myTyp”/> <integer-32 varName=“X” />

– </dataset>

</binx>

Root element

Data class section

Data instance section

Abstract data type

Page 11: The BinX Language

www.edikt.orgwww.edikt.org

DataBinXDataBinX

DataBinX = BinX with Data<dataset src=“myfile.bin”>

<struct><short-16 /><long-64 /><double-64 />

</struct>

<arrayFixed><integer-32 /><dim count=“2” />

</arrayFixed>

</dataset>

<dataset> <struct> <short-16>100</short-16> <long-64>1000</long-64> <double-64>5.257</double-64> </struct> <arrayFixed> <dim> <integer-32>1</integer-32> </dim> <dim> <integer-32>2</integer-32> </dim> </arrayFixed></dataset>

Page 12: The BinX Language

e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation

The BinX LibraryThe BinX Library

Page 13: The BinX Language

www.edikt.orgwww.edikt.org

BinX ComponentsBinX Components

The library has core functionality to support generic utilities and applications

Applications

Utilities

BinX LibraryCore

BinX core functionality Parse/Gen BinX doc Read/write binary data Parse/Gen DataBinX

Generic tools DataBinx pack/unpack Extractor, Viewer BinX editorApplications Domain-specific

Page 14: The BinX Language

www.edikt.orgwww.edikt.org

BinX application modelsBinX application models

Data catalogue model

Data manipulation model

Data query model

Data service model

Data transportation model

Page 15: The BinX Language

www.edikt.orgwww.edikt.org

Data catalogue modelData catalogue model

Primary storage

Binary data files

Metadata

Syntactic annotation

Semantic annotation

Classification

Domain specific

Cross-reference

XLink 0101010101

0101010101

BinX

1.1

BinX

1.1

BinX

1.2.1

BinX

1.2.1

BinX

1.2.2

BinX

1.2.2

BinX

1.2.3

BinX

1.2.3

0101010101

0101010101

0101010101

0101010101

0101010101

0101010101

BinX

1.2

BinX

1.2

BinX1

BinX1

BINARY

Detailed

Abstract

METADATA

Page 16: The BinX Language

www.edikt.orgwww.edikt.org

Data manipulation modelData manipulation model

Extraction– Subset of a dataset

Combination– Merge several datasets

Transformation– Conversion of data types– Change of sequence order– Transposition of array dimensions

Transparency– Automatic change of byte order

Page 17: The BinX Language

www.edikt.orgwww.edikt.org

Data query modelData query model

In-dataset query– XPath against virtual XML

Cross-dataset query– Link into multiple datasets

Defining result format– XQuery-based return

fragment

Output interface– SAX events

Utility

BinX library

010101010

010101010

BinXdatasourc

e

BinXdatasourc

e

DataBinXSAX

Events

VOTableSAX

Events

APPVOTable

APPDataBinx

010101010

010101010

BinXdatasourc

e

BinXdatasourc

e

APPCustom

XQuerySAX

Events

010101010

010101010

BinXdatasourc

e

BinXdatasourc

e

XPath010101010

010101010

BinXdatasourc

e

BinXdatasourc

e

XLink

Transform

Page 18: The BinX Language

www.edikt.orgwww.edikt.org

Data service modelData service model

Publishing logical datasets in BinX

DB

0101010101

0101010101

0101010101

0101010101

0101010101

0101010101

0101010101

0101010101

Client

BinX

BinX

BinX

BinX

Grid

0101010101

0101010101

BinX

BinX

Dataset from one binary file

Dataset from several binary files

Dataset from multiple data sources

Page 19: The BinX Language

www.edikt.orgwww.edikt.org

Data transportation modelData transportation model

DataBinX as interlingua

XMLdocument

XMLdocument

DataBinX

DataBinX Schem

aBinX

SchemaBinX

BinX+Binary

BinX+Binary

ZIP(MIME)

ZIP(MIME)

XSLTBinXUtil

ZIPtool

SendReceive

XSLTBinXUtil

ZIPtool

Page 20: The BinX Language

e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation

Application in AstronomyApplication in Astronomy

Case Study 1

Data Conversion

Between FITS and VOTable

Page 21: The BinX Language

www.edikt.orgwww.edikt.org

Application in astronomyApplication in astronomy

FITS and VOTable conversion

DataBinX Utility

BinX libraryCore

SIMPLE = T… …END

01010101

SIMPLE = T… …END

01010101

<?xml version=.<VOTABLE>… …

</VOTABLE>

<?xml version=.<VOTABLE>… …

</VOTABLE>

Page 22: The BinX Language

www.edikt.orgwww.edikt.org

FITS fileFITS file

SIMPLE = T / file does conform to FITS standard

BITPIX = 8 / number of bits per data pixel

NAXIS = 1 / number of data axes

… …

END

3D 4A 14 0F 1C FE 25 04 … …

XTENSION= ‘BINTABLE’ / binary table extension

BITPIX = 8 / 8-bit bytes

NAXIS = 2 / 2-dimensional binary table

… …

END

7B 3E 40 2C 16 70 E7 6F … …

0 79

Primary HDU

Extension

Header

Header

Data

Data

Page 23: The BinX Language

www.edikt.orgwww.edikt.org

VOTableVOTable

<VOTABLE><RESOURCE>

<PARAM name=“Obs” value=“Bob”/><TABLE name=“Stars”> <FIELD name=“Star-name” datatype=“char” arraysize=“10” /> <FIELD name=“RA” datatype=“float” /> <FIELD name=“Dec” datatype=“float” /> <FIELD name=“Counts” datatype=“int” arraysize=“2x3x*” /> <DATA> <TABLEDATA> <TR> <TD>Procyon</TD><TD>114.827</TD><TD>5.227</TD> <TD>4 5 3 4 3 2 1 2 3 3 5 6</TD> </TR> </TABLEDATA> </DATA></TABLE>

</RESOURCE></VOTABLE>

Page 24: The BinX Language

www.edikt.orgwww.edikt.org

FITS →DataBinX →VOTableFITS →DataBinX →VOTable

FITS to VOTable conversion

DataBinX Utility

FITSFITS

SchemaBinX

SchemaBinX

Preprocessor

DataBinX

DataBinX

VOTable

VOTable

XSLTXSLT

XSLTtransformer

Page 25: The BinX Language

www.edikt.orgwww.edikt.org

VOTable→DataBinX→FITSVOTable→DataBinX→FITS

VOTable to FITS conversion

XSLTtransformer

VOTable

VOTable

XSLTXSLT

Preprocessor

DataBinX

DataBinX

FITSFITS

SchemaBinX

SchemaBinX

DataBinXUtility

BinaryData

BinaryData

Postprocessor

FITSHeader

FITSHeader

Page 26: The BinX Language

www.edikt.orgwww.edikt.org

FITS-VOTable experimentFITS-VOTable experiment

Sample FITS file– A data table of 82 rows X 20 fields– File size: 37KB

Generated DataBinX by DataBinX utility– Time spent: 268 ms– DataBinX document size: 1.2MB

VOTable transformed by MSXML– Time spent: about 1 second– VOTable document size: 51KB

F V DB

Page 27: The BinX Language

e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation

Application in AstronomyApplication in Astronomy

Case Study 2

Data Transportation by

pipelining BinX and VOTable

Page 28: The BinX Language

www.edikt.orgwww.edikt.org

The ProblemThe Problem

Three kinds of VOTable data sources– Pure XML VOTable (large)– VOTable + FITS (small)– VOTable + Binary (smaller)

Difficulties– Additional parser for VOTable+Binary– Limited binary format– Byte order and data types

Page 29: The BinX Language

www.edikt.orgwww.edikt.org

The Solution: VOTable + BinXThe Solution: VOTable + BinX

No coding necessary Smaller data files Easy to separate and restore Pipelined to work in the background Platform independent

Page 30: The BinX Language

www.edikt.orgwww.edikt.org

ApproachesApproaches

1. Embedded BinX

2. BinX document linking

Perhaps another method?

Page 31: The BinX Language

www.edikt.orgwww.edikt.org

Embedded BinXEmbedded BinX

Example:<VOTABLE xmlns:bx=http://www.edikt.org/binx/2003/06/binx>

<TABLE name=“stars”><FIELD name=“star-name” datatype=“char” arraysize=“*”/><FIELD name=“RA” datatype=“float”/><DATA>

<bx:dataset src=“bin-file.dat”><bx:array>

<bx:struct><bx:string varName=“star-

name” /><bx:float-32

varName=“RA” /></bx:struct>

</bx:array></bx:dataset>

</DATA></TABLE>

</VOTABLE>

Page 32: The BinX Language

www.edikt.orgwww.edikt.org

BinX Document LinkingBinX Document Linking

Example:

<VOTABLE><TABLE name=“stars”>

<FIELD name=“star-name” datatype=“char” arraysize=“*”/><FIELD name=“RA” datatype=“float”/><DATA>

<BINX href=“stars-data-binx.xml” type=“TABLEDATA”/></DATA>

</TABLE></VOTABLE>

Page 33: The BinX Language

www.edikt.orgwww.edikt.org

Comparison of the two approachesComparison of the two approaches

Embedded BinX– Advantages:

One annotation file Consistency with VOTable definitions

– Disadvantages: Spoil the VOTable document Difficult to parse

BinX document linking– Advantages:

Keep VOTable clean Easy to parse

– Disadvantages: Need separate BinX document Difficult to keep consistent

Page 34: The BinX Language

e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation

BinX SoftwareBinX Software

Today and the Future

Page 35: The BinX Language

www.edikt.orgwww.edikt.org

Future releasesFuture releases

Utilities (GUI BinX editor) XPath-based data query DFDL support Text file support Output through SAX events Output as XQuery return Database interfacing Java wrapper for utilities

Page 36: The BinX Language

www.edikt.orgwww.edikt.org

SupportSupport

Information and software download:– http://www.edikt.org/binx (coming soon)

Questions:– [email protected]

Requirements and suggestions:– [email protected][email protected]