describing xml wrappers for information integration research project xrake november 15, 2001

36
Describing XML Wrappers for Information Integration Research project XRAKE November 15, 2001

Upload: jeffrey-palmer

Post on 27-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Describing XML Wrappers for Information Integration

Research project XRAKE

November 15, 2001

November 15, 2001 Research project XRAKE 2

Research project XRAKE

Merja Ek, Heli Hakkarainen, Pekka Kilpeläinen, Eila Kuikka, and Tommi Penttinen

University of KuopioDepartment of Computer Science

and Applied Mathematics

November 15, 2001 Research project XRAKE 3

Content

• Research project XRAKE

• Introduction

• General ideas of XW

• Examples

• Implementation & Future

November 15, 2001 Research project XRAKE 4

November 15, 2001 Research project XRAKE 5

Content

• Research project XRAKE

• Introduction

• General ideas of XW

• Examples

• Implementation & Future

6

Introduction

• programming error-prone, tedious• XW

– declarative– serialised data– influenced by XML Schema, XSLT

• XW wrapper– well-formed XML– highly readable

November 15, 2001 Research project XRAKE 7

Content

• Research project XRAKE

• Introduction

• General ideas of XW

• Examples

• Implementation & Future

November 15, 2001 Research project XRAKE 8

General ideas of XW

• XW can– remove data items

– add structure

– remove structure

• XW cannot– change order of data

• crude transformation– followed by e.g. XSLT

AA x1x2

BBy1 y2

z1 z2

<part-a> <e1>x1</e1> <e2>x2</e2></part-a><part-b> <line-1> <d1>y1</d1> <d2>y2</d2> </line-1> <d3>z2</d3></part-b>

XWengine

XW wrapperspecification

XSLT

November 15, 2001 Research project XRAKE 9

General ideas of XW (cont'd)

• wrapper is a template for output– element names– structure input structure

• input hierarchically divided into parts

• part ~ element• part + subparts ~ element

+ child elements

<the-whole …><part-X …>

<part-Y …>

<subpart-1 …><subpart-2 …><subpart-3 …>

<subpart-1 …><subpart-2 …>

November 15, 2001 Research project XRAKE 10

Content

• Research project XRAKE

• Introduction

• General ideas of XW

• Examples

• Implementation & Future

November 15, 2001 Research project XRAKE 11

Examples

• positional text data– phone invoices

• separator-delimited text

• binary data

INVOICE INVOICE NUMBER: 44196 CUSTOMER NUMBER: 25272 PERSONAL REFERENCE: WORK

John SmithGarden Avenue 4043234 Bigtown

PHONE SPECIFICATION

DATE UNITS DURATION NUMBER PRICE11.1.1992 5 307 min 37126 50.0023.6.1995 10 193 min 53829 122.00----------------------------------------------------------------John SmithGarden Avenue 4043234 Bigtown

595324 17.8.1996 907.00

XW Wrapper Specification

<xw:wrapper xw:name=”phone-invoice” xw:sourcetype=”text” xmlns:xw=”http://www.cs.uku.fi/XW/2001”> <invoice xw:starter=”\^INVOICE” xw:occurs=”unbounded”> … </invoice></xw:wrapper>

INVOICE INVOICE NUMBER: 44196 CUSTOMER NUMBER: 25272 PERSONAL REFERENCE: WORK

John SmithGarden Avenue 4043234 Bigtown

PHONE SPECIFICATION

DATE UNITS DURATION NUMBER PRICE11.1.1992 5 307 min 37126 50.0023.6.1995 10 193 min 53829 122.00----------------------------------------------------------------John SmithGarden Avenue 4043234 Bigtown

595324 17.8.1996 907.00

<xw:wrapper xw:name="phone-invoice" xw:sourcetype="text" xmlns:xw="http://www.cs.uku.fi/XW/2001" > <invoice xw:starter="\^INVOICE" xw:occurs="unbounded"> <identifierdata ...> ... </identifierdata> <specification xw:starter="\^PHONE SPECIFICATION" ...> ... </specification> <invoicedata xw:starter="\^----------" ...> ... </invoicedata> </invoice></xw:wrapper>

INVOICE INVOICE NUMBER: 44196 CUSTOMER NUMBER: 25272 PERSONAL REFERENCE: WORK

John SmithGarden Avenue 4043234 Bigtown

PHONE SPECIFICATION

DATE UNITS DURATION NUMBER PRICE11.1.1992 5 307 min 37126 50.0023.6.1995 10 193 min 53829 122.00----------------------------------------------------------------John SmithGarden Avenue 4043234 Bigtown

595324 17.8.1996 907.00

<xw:wrapper xw:name="phone-invoice" xw:sourcetype="text" xmlns:xw="http://www.cs.uku.fi/XW/2001" > <invoice xw:starter="\^INVOICE" xw:occurs="unbounded"> <identifierdata xw:childterminator="\n" xw:ignoreemptysubpart="true"> <invoicenumber xw:position="53 64"/> <customernumber xw:position="60 71"/> <personalreference xw:position="60 71"/> <name xw:position="1 22"/> <streetaddress xw:position="1 22"/> <postoffice xw:position="1 22"/> </identifierdata> <specification xw:starter="\^PHONE SPECIFICATION" ...> ... </specification> <invoicedata xw:starter="\^----------" ...> ... </invoicedata> </invoice></xw:wrapper>

INVOICE INVOICE NUMBER: 44196 CUSTOMER NUMBER: 25272 PERSONAL REFERENCE: WORK

John SmithGarden Avenue 4043234 Bigtown

PHONE SPECIFICATION

DATE UNITS DURATION NUMBER PRICE11.1.1992 5 307 min 37126 50.0023.6.1995 10 193 min 53829 122.00----------------------------------------------------------------John SmithGarden Avenue 4043234 Bigtown

595324 17.8.1996 907.00

<xw:wrapper ...> <invoice xw:starter="\^INVOICE" xw:occurs="unbounded"> <identifierdata xw:childterminator="\n" ...> </identifierdata> <specification xw:starter="\^PHONE SPECIFICATION" xw:childterminator="\n" xw:ignoreemptysubpart="true"> <xw:ignore/> <specificationrow xw:occurs="unbounded"> <date xw:position="1 12"/> <units xw:position="14 22"/> <duration xw:position="24 33"/> <number xw:position="35 43"/> <price xw:position="45 52"/> </specificationrow> </specification> <invoicedata xw:starter="\^----------" ... </invoicedata> </invoice></xw:wrapper>

INVOICE INVOICE NUMBER: 44196 CUSTOMER NUMBER: 25272 PERSONAL REFERENCE: WORK

John SmithGarden Avenue 4043234 Bigtown

PHONE SPECIFICATION

DATE UNITS DURATION NUMBER PRICE11.1.1992 5 307 min 37126 50.0023.6.1995 10 193 min 53829 122.00----------------------------------------------------------------John SmithGarden Avenue 4043234 Bigtown

595324 17.8.1996 907.00

<xw:wrapper xw:name="phone-invoice" xw:sourcetype="text" xmlns:xw="http://www.cs.uku.fi/XW/2001" > <invoice xw:starter="\^INVOICE" xw:occurs="unbounded"> <identifierdata xw:childterminator="\n" xw:ignoreemptysubpart="true"> </identifierdata> <specification xw:starter="\^PHONE SPECIFICATION" xw:childterminator="\n" xw:ignoreemptysubpart="true"> </specification> <invoicedata xw:starter="\^----------" xw:childterminator="\n" xw:ignoreemptysubpart="true"> <xw:ignore xw:occurs="4"/> <reference xw:position="30 48"/> <xw:collapse> <duedate xw:position="30 39"/> <totalsum xw:position="42 50"/> </xw:collapse> </invoicedata> </invoice></xw:wrapper>

<invoice> <identifierdata> <invoicenumber>44196</invoicenumber> <customernumber>25272</customernumber> <personalreference>WORK</personalreference> <name>John Smith</name> <streetaddress>Garden Avenue 40</> <postoffice>43234 Bigtown</> </identifierdata> <specification> <specificationrow> <date>11.1.1992</date> <units>5</units> <duration>307 min</duration> <number>37126</number> <price>50.00</price> </specificationrow>

Resulting XML 1/2

<specificationrow> <date>23.6.1995</date> <units>10</units> <duration>193 min</duration> <number>53829</number> <price>122.00</price> </specificationrow> </specification> <invoicedata> <reference>595324</reference> <duedate>17.8.1996</duedate> <totalsum>907.00</totalsum> </invoicedata></invoice>

Resulting XML 2/2

November 15, 2001 Research project XRAKE 24

Examples

• positional text data

• separator-delimited text– HL7 version 2.3 messages

• binary data

MSH|^~\&|KL-Lab||CCIMS|RDNT01|200001071300||ORU^R01...PID|||311244A0112|ExamMod1|Smith^John||19441231|M...OBR||76551|Res_01||||20000107060000|||||||||||||||||CH|COBX||NM|1535^aB-pO2^||11||||||FNTE|||This is a comment for aB-pO2.NTE|||Another comment for aB-pO2.OBX||NM|1026^S -ALAT^||61|||*|||F

Research project XRAKE 26

<!-- MSH, PID and OBR lines processed above --> <xw:CHOICE xw:occurs='unbounded'> <xw:collapse xw:starter='\^OBX' xw:childseparator='|'> <xw:ignore xw:occurs='3'/> <observation/> <xw:ignore/> <result/> <xw:ignore xw:occurs='2'/> <flag/> <xw:ignore xw:occurs='2'/> <responsetype/> </xw:collapse> <xw:collapse xw:starter='\^NTE' xw:childseparator='|' xw:occurs='unbounded'> <xw:ignore xw:occurs='3'> <xw:collapse/> </xw:collapse>

</xw:CHOICE>

<!-- MSH, PID and OBR lines processed above --> <xw:CHOICE xw:occurs='unbounded'> <xw:collapse xw:starter='\^OBX' xw:childseparator='|'> <xw:ignore xw:occurs='3'/> <observation/> <xw:ignore/> <result/> <xw:ignore xw:occurs='2'/> <flag/> <xw:ignore xw:occurs='2'/> <responsetype/> </xw:collapse> <xw:ELEMENT xw:name='comment'> <xw:collapse xw:starter='\^NTE' xw:childseparator='|' xw:occurs='unbounded'> <xw:ignore xw:occurs='3'> <xw:collapse/> </xw:collapse> </xw:ELEMENT> </xw:CHOICE>

Resulting XML<response> ... <observation>1535^aB-pO2^</observation> <result>11</result> <responsetype>F</responsetype> <comment>This is a comment for aB-pO2.Another comment for aB-pO2.</comment> <observation>1026^S -ALAT^</observation> <result>61</result> <flag>*</flag> ...</response>

November 15, 2001 Research project XRAKE 28

Examples

• positional text data

• separator-delimited text

• binary data– packet of IP-based communications protocol

Binary data

length 16b 16b 16b 16b 4*8b 4*8b variesname len chk id off src dst paytype short short short short 4*byte 4*byte array of bytes

<xw:wrapper xw:name="IP-like-protocol" xw:sourcetype="binary" xmlns:xw="http://www.cs.uku.fi/XW/2001"> <datagram> <xw:ignore xw:name="total-length" xw:type="short"/> <checksum xw:type="short"/> <id xw:type="short"/> <segment-offset xw:type="short"/> ... </datagram></xw:wrapper>

Binary data

length 16b 16b 16b 16b 4*8b 4*8b variesname len chk id off src dst paytype short short short short 4*byte 4*byte array of bytes

<xw:wrapper xw:name="IP-like-protocol" xw:sourcetype="binary" xmlns:xw="http://www.cs.uku.fi/XW/2001"> <datagram> <xw:ignore xw:name="total-length" xw:type="short"/> <checksum xw:type="short"/> <id xw:type="short"/> <segment-offset xw:type="short"/> <xw:ELEMENT xw:name="source-address"> <a xw:type="byte"/> <b xw:type="byte"/> <c xw:type="byte"/> <d xw:type="byte"/> </xw:ELEMENT> <xw:ELEMENT xw:name="destination-address"> <a xw:type="byte"/> <b xw:type="byte"/> <c xw:type="byte"/> <d xw:type="byte"/> </xw:ELEMENT> <xw:ELEMENT name="payload"> <xw:collapse xw:occurs="total-length - 16" xw:type="byte" xw:numeric-output-format="hexadecimal"/> </xw:ELEMENT> </datagram></xw:wrapper>

Resulting XML

<datagram> <checksum>397485</checksum> <id>37</id> <segment-offset>0</segment-offset> <source-address> <a>193</a><b>167</b><c>232</c><d>253</d> </source-address> <destination-address> <a>193</a><b>167</b><c>224</c><d>8</d> </destination-address> <payload>e6a9ff120a</payload></datagram>

November 15, 2001 Research project XRAKE 34

Content

• Research project XRAKE

• Introduction

• General ideas of XW

• Examples

• Implementation & Future

November 15, 2001 Research project XRAKE 35

Implementation & Future

• Java program– reads wrapper specification into DOM tree– produces output as SAX events:

characters, startElement, endElement

• further development of XW– attribute generation– content generation from input– enhancements to alternative/optional parts

November 15, 2001 Research project XRAKE 36

The end of the presentation

• Questions?