wtui18 - modelling and parsing business data using dfdl
TRANSCRIPT
![Page 1: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/1.jpg)
I18: Data Format Description
Language (DFDL)
Modeling and Parsing Business Data
Alex Wood IBM DFDL Development Team Lead
IBM Hursley Lab ,UK
![Page 2: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/2.jpg)
© 2014 IBM Corpora/on
Please Note IBM’s statements regarding its plans, direc/ons, and intent are subject to change or withdrawal without no/ce at IBM’s sole discre/on. Informa/on regarding poten/al future products is intended to outline our general product direc/on and it should not be relied on in making a purchasing decision.
The informa/on men/oned regarding poten/al future products is not a commitment, promise, or legal obliga/on to deliver any material, code or func/onality. Informa/on about poten/al future products may not be incorporated into any contract. The development, release, and /ming of any future features or func/onality described for our products remains at our sole discre/on
Performance is based on measurements and projec/ons using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considera/ons such as the amount of mul/programming in the user’s job stream, the I/O configura/on, the storage configura/on, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
2
![Page 3: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/3.jpg)
• Introduc/on
• OGF DFDL – a standard for modelling text and binary data
• IBM DFDL Component
• DFDL and IIB -‐ demo
• Ques/ons
Agenda
![Page 4: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/4.jpg)
Modeling Text and Binary Data • Much of the data in the world resides in files, is not XML, is a mixture of textual and binary with
custom syntax and encodings, and does not have a shareable machine-‐readable descrip/on
• But there has been no universal standard for modelling this data! – XML -‐> use XML Schema – RDBMS -‐> use database schema – Text/binary -‐> ??
• Exis/ng standards are too prescrip/ve: “Put your data in this format!” • Organiza/ons including IBM evolved their own way of modelling text and binary data based on
customer need. • IBM® examples…
– WebSphere® Message Broker: MRM message set – WebSphere Transforma/on Extender: Type Trees – WebSphere DataPower: FFD – WebSphere Cast Iron: Flat File Schema – Sterling B2B Integrator: DDF and IDF files
ü DFDL: a universal, shareable, non-prescriptive description for general text & binary data formats
![Page 5: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/5.jpg)
• Introduc/on
• OGF DFDL – a standard for modelling text and binary data
• IBM DFDL Component
• DFDL and IIB
• Ques/ons
Agenda
![Page 6: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/6.jpg)
Data Format Descrip:on Language (DFDL)
§ A new open standard – From the Open Grid Forum (OGF)
– http://www.ogf.org/ – Version 1.0
– ‘Proposed Recommendation’ status
§ A way of describing data… – It is NOT a data format itself!
§ A powerful modeling language … – Text, binary and bit – Commercial record-oriented – Scientific and numeric – Modern and legacy – Industry standards
§ While allowing high performance … – You choose the right data format
for the job
§ Leverage XML Schema technology – Uses W3C XML Schema 1.0 subset
& type system to describe the logical structure of the data
– Uses XSDL annotations to describe the physical representation of the data
– The result is a DFDL schema § Keep simple cases simple § Annotations are human readable § Both read and write
– A DFDL Processor can parse and serialize data using a DFDL schema
§ Intelligent parsing – Automatically resolve choices and
optionality § Validation of data when parsing and
serializing
![Page 7: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/7.jpg)
Example – Delimited Text Data
cat=5;lbound=-7.1E8 ASCII text
integer ASCII text
floating point
Separator
Separators, initiators (aka tags), & terminators are all examples in DFDL of delimiters
Initiator Initiator
§ 7
![Page 8: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/8.jpg)
Example – DFDL Schema <xs:complexType name=“numbers"> <xs:sequence> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:sequence separator=“;” encoding=“ascii” …/> </xs:appinfo>
</xs:annotation> <xs:element name=“category" type=“xs:int”> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:element representation="text"
textNumberPattern="###0" encoding="ascii" lengthKind="delimited" initiator=“cat=" …/> </xs:appinfo>
</xs:annotation> </xs:element>
<xs:element name=“lowerBound" type=“xs:float”> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:element representation="text" textNumberPattern="##0.0#E0" encoding="ascii"
lengthKind="delimited" initiator=“lbound=" …/> </xs:appinfo>
</xs:annotation> </xs:element> </xs:sequence> </xs:complexType>
DFDL properties
DFDL annotation
cat=5;lbound=-7.1E8
![Page 9: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/9.jpg)
Example – DFDL Schema (Short Form)
<xs:complexType name=“numbers"> <xs:sequence dfdl:separator=“;” dfdl:encoding=“ascii” …> <xs:element name=“category" type=“xs:int” dfdl:representation="text"
dfdl:textNumberPattern="###0" dfdl:encoding="ascii" dfdl:lengthKind="delimited" dfdl:initiator=“cat=" … /> <xs:element name=“lowerBound" type=“xs:float” dfdl:representation="text"
dfdl:textNumberPattern="##0.0#E0" dfdl:encoding="ascii" dfdl:lengthKind="delimited" dfdl:initiator="lbound=" … /> </xs:sequence> </xs:complexType>
DFDL properties
cat=5;lbound=-7.1E8
![Page 10: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/10.jpg)
§ A DFDL processor uses a DFDL schema to understand a data stream
§ It consists of a DFDL parser and (optionally) a DFDL serializer
§ The DFDL parser reads a data stream and creates a DFDL ‘infoset’
§ The DFDL serializer takes a DFDL ‘infoset’ and writes a data stream
DFDL Processor
<Document> <Element name=“numbers”/> <Element name=“category” dataType=“xs:int” dataValue=“5”/> <Element name=“lowerBound” dataType=“xs:float” dataValue=“-7.1E08”/> </Element> </Document>
cat=5;lbound=-7.1E8
<xs:complexType name=“numbers"> <xs:sequence dfdl:separator=“;” dfdl:encoding=“ascii” ... > <xs:element name=“category" type=“xs:int” dfdl:representation="text"
dfdl:encoding="ascii“ dfdl:textNumberPattern=“###0” dfdl:lengthKind="delimited" dfdl:initiator=“cat=“ ... /> <xs:element name=“lowerBound" type=“xs:float” dfdl:representation="text"
dfdl:encoding="ascii“ dfdl:textNumberPattern=“##0.0#E0” dfdl:lengthKind="delimited" dfdl:initiator=“lbound=“ ... /> </xs:sequence> </xs:complexType>
DFDL Processor
![Page 11: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/11.jpg)
DFDL 1.0 Features § Text data types such as strings, numbers, zoned decimals, calendars, booleans § Binary data types such as integers, floats, BCD, packed decimals, calendars, booleans § Fixed length data and data delimited by text or binary markup § Bi-directional text § Bit data of arbitrary length § Pattern languages for text numbers and calendars § Ordered, unordered and floating content § Default values on parsing and serializing § Nil values for handling out-of-band data § Fixed and variable arrays § XPath 2.0 expression language including variables to model dynamic data § Speculative parsing to resolve choices and optional content § Validation to XML Schema 1.0 rules § Scoping mechanism to allow common property values to be applied at multiple points § Hide elements in the data § Calculate element values
![Page 12: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/12.jpg)
When should I use DFDL? § DFDL’s sweet spot is when you need to model and parse a text or binary data format and where either:
§ You have a specification of the data format ‘on the wire’ § You have actual wire examples of the data format
§ DFDL is recommended to model: § Binary data from COBOL, C, PL/1, ASM programs § Text data with delimiters such as CSV § Text industry standards such as SWIFT, HL7, EDIFACT, X12, HIPPA… § Binary industry standards such as ISO8583, TLog, ...
§ DFDL is not recommended to model: § XML
§ Already have XML parsers and XML Schema / DTDs § JSON
§ Already have JSON parsers, and JSON schema under design § GPB, HDF5, …
§ With serialization formats like GPB, the wire format is never exposed to the consumer and access to the data is using APIs
![Page 13: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/13.jpg)
DFDL Adop:on • IBM DFDL reusable component ships with:
– WebSphere Message Broker v8.0 – IBM Integra/on Bus v9.0 – IBM Integra/on Bus open-‐beta – Ra/onal® Performance Test Server v8 (v8.0.1 onwards) – Ra/onal Test Virtualiza/on Server v8 (v8.0.1 onwards) – Ra/onal Test Workbench v8 (v8.0.1 onwards) – Ra/onal Developer for System z v8.5 – InfoSphere® Master Data Management v11
• Further IBM products and appliances looking to adopt
• Open-‐source DFDL implementa/on in progress ‘Daffodil’ – Available as an alpha release (parser only) – More features added every release
• DFDL web community on GitHub for collabora/ve authoring of DFDL schemas for commercial and scien/fic data formats
![Page 14: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/14.jpg)
DFDL Schemas Web Community • Free public repository for DFDL models
• Hosted on the popular GitHub community website
• Unlimited read-‐only access • Collabora/on encouraged • Evolving content
![Page 15: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/15.jpg)
GeMng Started with DFDL
![Page 16: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/16.jpg)
• Introduc/on
• OGF DFDL – a standard for modelling text and binary data
• IBM DFDL Component
• DFDL and IIB
• Ques/ons
Agenda
![Page 17: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/17.jpg)
IBM DFDL • Designed as an embeddable component
– First shipped in 2011 • DFDL processor
– High performance parser and serializer – Java and C – Streaming, on-‐demand, specula/ve – Pre-‐compiles DFDL schema – Parser emits SAX-‐like events
• Tooling for crea/ng DFDL models – DFDL Schema editor eclipse plugins – Wizards for CSV, COBOL & C – Debug model using real data from within tooling
• IBM DFDL implements majority of the OGF DFDL 1.0 specifica/on – Some more advanced features of DFDL are not yet available – Will be added in future DFDL deliverables un/l 100% achieved
<Document> <Element name=“numbers”/> <Element name=“category” …/> <Element name=“lowerBound” …/> </Element> </Document>
cat=5;lbound=-7.1E8
<xs:schema …> <xs:annotation> <xs:appinfo …> </xs:appinfo> </xs:annotation> ... </xs:schema>
IBM DFDL Processor
![Page 18: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/18.jpg)
What’s New in IBM DFDL • Latest release of IBM DFDL is v1.1.1 • Since IBM DFDL v1.0 several spec features have been added:
– Extrac/ng data using a prefixed length – Extrac/ng data using a regular expression – Extrac/ng binary data with delimiters – User-‐defined variables – Default values when serializing – More XPath func/ons in expressions – Unordered sequences – Asserts with recoverable errors
• Since IBM DFDL v1.0 several tooling features have been added: – Keyboard shortcuts in DFDL editor – MBCS enablement in DFDL debugger – Copy/paste in DFDL editor – Genera/on of sample values
• Con/nual increase in performance
![Page 19: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/19.jpg)
DFDL Schemas for Industry Formats • Fully supported, full func/on DFDL schemas will appear as part of IBM products such as the industry connec/vity packs for IIB
• Unsupported, part func/on DFDL schemas will appear on DFDLSchemas GitHub site as and when available
• HL7 v2.5.1, v2.6 and v2.7 – IIB Connec/vity Pack for Healthcare & GitHub
• HIPPA v5010 mandatory transac/ons – IIB Connec/vity Pack for Healthcare
• IBM/Toshiba 4690 SurePos ACE v7r3 TLog – IIB Retail Pack & GitHub
• ISO 8583 1987 & 1993 – WMB v8 and IIB v9 sample & GitHub
• NACHA 2013 – GitHub
• EDIFACT (all releases) – GitHub
• SWIFT FIN (2013-‐2014) – New! IBM Integra/on Bus Solu/on for SWIFT FIN Messaging (DFDL Edi/on)
![Page 20: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/20.jpg)
• Introduc/on
• OGF DFDL – a standard for modelling text and binary data
• IBM DFDL Component
• DFDL and IIB -‐ demo
• Ques/ons
Agenda
![Page 21: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/21.jpg)
IBM DFDL in WMB and IIB • WMB v8 embeds IBM DFDL v1.0 • IIB v9 embeds IBM DFDL v1.1 • DFDL domain and parser
– Available in usual way – input nodes, ESQL, Java, … – More capable and higher performing than MRM CWF/TDS
• DFDL models – DFDL schema files reside in libraries, not in message sets
• DFDL wizards and editor for crea/ng DFDL models • DFDL model debug and test
– Debug and test parsing & wri/ng of data within toolkit – No message flow or run/me deploy necessary!
• DFDL schema deployed in BAR file – No dic/onary file
• No automa/c migra/on from MRM
![Page 22: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/22.jpg)
IBM DFDL Demo
![Page 23: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/23.jpg)
Launcher for creating Message Models
Selecting one of these creates
a DFDL schema
New Message Model
![Page 24: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/24.jpg)
New Message Model – CSV Wizard
Choose the end of line character
Contains a header record?
Change the number of columns
![Page 25: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/25.jpg)
DFDL Editor – With Generated CSV Model
Logical structure
view DFDL
properties view Problems
view
![Page 26: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/26.jpg)
Test and Debug
Start a test parse
Parsed ‘infoset’
Parsed data
Delimiters highlighted
![Page 27: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/27.jpg)
Test and Debug – Failure
Parsed data up to
error
Trace console
Error message
Object in error
Parsed ‘infoset’ up to error
Model and data
linked
![Page 28: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/28.jpg)
Using the DFDL domain and parser
On Demand or Complete
parsing
Optional validation
DFDL domain
Specify message
name
![Page 29: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/29.jpg)
DFDL message tree
( ['MQROOT' : 0xd6d218] (0x01000000:Name):Properties = ( ['MQPROPERTYPARSER' : 0x141d34e8] (0x03000000:NameValue):MessageSet = '' (CHARACTER) (0x03000000:NameValue):MessageType =
'{}:Company' (CHARACTER) (0x03000000:NameValue):MessageFormat = '' (CHARACTER) (0x03000000:NameValue):Encoding = 273 (INTEGER) (0x03000000:NameValue):CodedCharSetId = 850 (INTEGER) .... ) (0x01000000:Name):DFDL = ( ['dfdl' : 0xd812c8] (0x01000000:Name):Company = ( (0x03000000:NameValue):CompanyName = 'IBM' (CHARACTER) (0x01000000:Name):Employee = ( (0x03000000:NameValue):EmpNo = 12345 (INTEGER)
(0x03000000:NameValue):Dept = 21 (INTEGER) (0x03000000:NameValue):EmpName = 'Steve Hanson' (CHARACTER) ... )
) ) )
DFDL message
name
Message name in tree (like XMLNSC)
DFDL domain
Compact ‘Name/Value’
syntax elements
Data types from DFDL
schema
![Page 30: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/30.jpg)
• The IBM DFDL Java classes may be used to create stand-‐alone Java applica/ons for parsing/serializing text & binary data formats
• IIB license permits IBM DFDL classes to be used by Java applica/ons: – On a computer where IIB is installed – On a remote computer where IIB is not installed
• When used remotely, charged as if IIB was installed (via ITLM) • IBM DFDL for Java is fully supported when used in this manner • Javadoc for APIs and a sample program are provided
<Document> <Element name=“myNumbers”/> <Element name=“myInt” dataType=“xs:int” dataValue=“5”/> <Element name=“myFloat” dataType=“xs:float” dataValue=“-7.1E08”/> </Element> </Document>
intval=5;fltval=-7.1E8 IBM DFDL for Java
Stand-alone Java Application
Using IBM DFDL Outside a Broker
![Page 31: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/31.jpg)
Summary • DFDL – an OGF standard for describing and parsing text and
binary data – Defines a rich set of features to handle broad range of custom and
industry data formats. – Leverages XML schema logical model and logical valida/on. – DFDL Schemas community for sharing of standard industry data
format descrip/ons
• IBM DFDL – the parser for test and binary data in IBM Integra/on Bus and other IBM products. – Highly performing run/me parser inside IIB and other IBM products. – Rich set of DFDL development and test tools in IIB Studio and other
IBM products.
![Page 32: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/32.jpg)
• DFDL 1.0 specifica:on: hvp://www.ogf.org/documents/GFD.174.pdf
• DFDL tutorials: hvp://redmine.ogf.org/dmsf/dfdl-‐wg?folder_id=5485
• DFDL developerWorks: hvp://www.ibm.com/developerworks/library/se-‐dfdl/index.html
• DFDL Wikipedia page: hvp://en.wikipedia.org/wiki/DFDL
• DFDL Schemas on GitHub: hvps://github.com/DFDLSchemas
• OGF DFDL home page: hvp://www.ogf.org/dfdl/
• Daffodil open source project: hvps://opensource.ncsa.illinois.edu/confluence/display/DFDL/Daffodil%3A+Open+Source+DFDL
Useful DFDL links
![Page 33: WTUI18 - Modelling and Parsing Business Data Using DFDL](https://reader031.vdocuments.us/reader031/viewer/2022020711/55a0b6ba1a28ab1a4f8b45a1/html5/thumbnails/33.jpg)
© 2014 IBM Corpora/on
For Addi:onal Informa:on l IBM Training
hvp://www.ibm.com/training l IBM WebSphere
hvp://www-‐01.ibm.com/sozware/be/websphere/ l IBM developerWorks
www.ibm.com/developerworks/websphere/websphere2.html l WebSphere forums and community
www.ibm.com/developerworks/websphere/community/
33