customising oasis ciq specifications v3.0 to meet end user requirements – a case study ram kumar...
TRANSCRIPT
Customising OASIS CIQ Specifications V3.0 to meet end user requirements – A Case Study
Ram KumarRam KumarChairman Chairman
OASIS CIQ Technical CommitteeOASIS CIQ Technical Committee
Ram KumarRam KumarChairman Chairman
OASIS CIQ Technical CommitteeOASIS CIQ Technical Committee
http://www.oasis-open.org/committees/ciqhttp://www.oasis-open.org/committees/ciq
September 2007September 2007
Agenda Why this case study? Code List
What, Why, Standard OASIS Code List Representation TC Methodology : Schematron based Value Validation
using Genericode (from OASIS Code List TC) OASIS CIQ TC Implementation of OASIS Code
List Specifications and Methodology – A Case Study
Why this Case Study?
Why this case study? Demonstrate how OASIS CIQ Specifications v3.0
can be customised to meet end user requirements Without breaking the conformance to the specifications
due to customisation Improve interoperability of data defined/represented
using CIQ Specifications Define specific business rules using open industry
standards to customise CIQ specifications Define code lists of CIQ specifications using open
industry standards
Code List
What is a Code List?aka enumerations, aka controlled vocabularies aka classification scheme and classification values
A set of values to choose from which represent an agreed upon semantic concept
Days of a week = {“Mon”, “Tue”, “Wed”, “Thu”, “Fri”, “Sat”, “Sun”}
Code List = List Name + values List Name = Days of a week Values = {“Mon”, “Tue”, “Wed”, “Thu”, “Fri”, “Sat”,
“Sun”}
Why Code Lists are important?
It is not just elements and attribute names in XML that need to be semantically unambiguous & aligned for interoperability
The lexical form of element and attribute text content also needs to be aligned, i.e. simple data items need to be represented the same way
This is more important for applications For data oriented XML particularly (e.g. CIM), Code
Lists are as important as elements and attributes – they form part of the complete vocabulary of the document
Standard for Code List
If code lists were really so simple and obvious, there would be a single, well known and acceptable way of handling them in XML
There is no agreed solution, though The problem is that while code lists are a
well understood concept, people do not actually agree on exactly what code lists are, and how they should be used
The code list is in the eyes of the beholder
The XML schema may require only a 3-letter codes to represent the code list
The database may require a set of numeric codes, plus display labels (possibly in different languages)
The application may need to know which 3-letter code corresponds to which numeric code, so that it can process the XML and update the database
All of this code list information needs to be stored together in a single representation of the code list, so that all usages of code list can be generated from the same source information
The only constant is change
Code lists change For a code list model to be useful, it has to
account for the fact that the code lists will change over time
There is little use in having a code list model that works only for a code list that is frozen in time
The code list model has to support changes between versions of a code list
The only constant is change
Not all changes to a code list are version changes, however Some changes may be local changes to a distributed code
list The ISO 3-letter currency code list contains GBP for British
Pounds. However, prices on the London Stock Exchange are normally quoted in pence
This has led to the practice of adding an extra code to the standard ISO list (e.g. GBp, GBX) in order support pence as well as pounds
This kind of customisation is far from uncommon The utility of any code list model is greatly reduced if it does
not cater for local modifications of code lists
OASIS Code List Representation Technical Committee The OASIS Code List Representation format, “genericode”,
is a single model and XML format (with a W3C XML Schema) that can encode a broad range of code list information
The XML format is not designed for run-time orreal-time use, but to have the standardizedinterchange format massaged into an optimized representation
27 of the 40 requirements gathered are implemented in v1.0 of the specifications
Genericode Model Has a tabular structure for code list information Each row in the table represents a single distinct entry in the code list,
i.e. each row represents a single uniquely identifiable item in the code list.
Each column in the table represents a metadata value that can be defined for each distinct entry in the code list. Each column is either required or optional. A required column does not allow any row to have an undefined (nil or null) value. An optional column allows undefined values.
A genericode key is a set of one or more required columns that together uniquely identify each distinct entry in the code list. Optional columns cannot be used for keys. Each code list must have at least one key.
Genericode keys are equivalent to what people usually mean when they talk about the “codes” in a code list. However, genericode allows multiple keys for each code list, and there is no single preferred key.
Concept Keep code lists aka enumerations out of the core
XML schema by using “schemes” The idea is that the code lists from which an element
value is taken is indicated via a “scheme” attribute containing a URI which represents the scheme (code list)
Same as the way that URIs are used to represent XML namespaces
This is done so that a newer version of core XML schema need not be released just because an externally controlled enumeration that it uses has changed (e.g. country code)
Methodology : Schematron-based Value Validation
using Genericode
XML Instance Document ValidationNamespace: xmlns="urn:oasis:names:tc:ciq:xNL:3
Graphical Schema View:
XMLinstanceXXX.xml
<StsMetadataRecord>
xsi:schemaLocation="urn:oasis:names:tc:ciq:xNL:3”
<ESLVersionNumberID>5.0</ESLVersionNumberID><Person>
<cbc:BirthDate>1967-08-13</cbc:BirthDate><LearnerRegistration>
<cbc:NationalStudentNumberID>123456</cbc:NationalStudentNumberID></LearnerRegistration><PersonNameGeneric>
<cbc:FirstName>John</cbc:FirstName><cbc:LastName>Smith</cbc:LastName>
</PersonNameGeneric>. . .
Text view of XML instance:
XML instance documents can be validated against the applicable XML Schema
Background (Glossary)
XML Data ContentIn an XML instance document, any values- between XML angles ‘>’ and ‘<’and- between quotes of an attribute are message data content
Examples:<BirthDate>1960-06-09</BirthDate>
<Country> <CountryCode listSchemeURI=" urn:oasis:names:tc:ciq:xNL:3:codelist:gc:Country-1“>AUS</CountryCode> <Name>Australia</Name></Country>
Background (Glossary), continued
Types of XML data content: Code values Other values (non-code values)
Examples:<Country>
<CountryCode>AUS</CountryCode></Country>
<BirthDate>1960-06-09</BirthDate>
W3C XML Schema Limitations
W3C XML Schema is mostly about data structures
But it does some Data Content Validation has good support for
- data type conformity- min/max values- length, patterns …
has limited support for:- enumerations
has no support for- complex business rules- versioned changes of validation (without affecting the Schema’s version)
Business Rules Examples
Date Arithmetic:
BirthDate < CurrentDate – 6 Years
Attribute Value Restriction:The code list value “First Name” cannot occur more than onceThe code list value “Last Name” cannot occur more than once
Element Use RestrictionCountry element cannot occur more than once, but optional
Zero-length string:
<Name></Name>
Business Rules Examples, continued
Code Liststhe code list (+version) used by CountryCode must be an accepted code list<CountryCode listSchemeURI="urn:oasis:names:tc:ciq:xNL:3:codelist:gc:Country-1“>AUS</CountryCode>
Code ValueCountryCode ‘XYZ’ must be valid in that Country code list version <CountryCode listSchemeURI=" urn:oasis:names:tc:ciq:xNL:3:codelist:gc:Country-1“>AUS</CountryCode>
Co-occurrenceif Status=‘Closed’ then ClosureReason must be present also<StatusCode>Closed</StatusCode><ClosureReason>Obsolete</ClosureReason>
Data Content Validation Conclusion XML Schema does not cover all data content validation
requirements Embedding content validation in XML Schema has undesired
consequences in conjunction with re-use and Schema versioning
Business rules vary more frequently than schema constraints, and the business rules between different partners wouldvary where the schema constraints remain the same.
By layering value constraints on top of structural/ lexical constraints, the schemas can remain unchanged while being adapted to different partners through different value constraints
Is data content validation required ? How can data content be validated in XML instances ?
Without Data Content Validation in XML
Aextends
A
Content Validation at A: Content Validation at B:- Program code - Program code- Database constraints - Database constraints
Interoperability issues:- Validation at A equivalent to Validation at B?- Data quality of message is difficult to control- Communication of data quality issues between A & B- Relies on trust in the sender- Hard to ascertain equal interpretation of codes
XML file
W3C XMLDocument Schema
Schema Validation
Design
Implementation
Data ExchangePartner Agreement
With Data Content Validation in XML
Sender’s and receiver's data content validation must be - electronic - portable- of shared logic and error output- platform-independent- versioned
Aextends
AXML file
XML Content Validation 2. Content Validation
Design
Implementation
Data ExchangePartner Agreement
W3C XMLDocument Schema
1. Schema Validation
With Data Content Validation in XML
Sender’s and receiver's data content validation must be - electronic - portable- of equivalent logic and error output- platform-independent- versioned
Aextends
AXML file
Methodology 2. Content Validation
Design
Implementation
Data ExchangePartner Agreement
W3C XMLDocument Schema
1. Schema Validation
Methodology - Features
Code Value ValidationExample:CountryCode must be a valid CountryCode
Code List Metadata ValidationExamples:CountryCode must belong to an agreed, named Country Code list (+version) urn:oasis:names:tc:ciq:xNL:3:codelist:gc:Country-1
Complex Rules ValidationExamples:- BirthDate < CurrentDate- StatusCode ‘Closed’ requires a ClosureReason.
Methodology - Features, continued
Completely separate from W3C XML Schema
Platform-independent ISO/IEC 19757-3Schematron (implemented using W3C XSLT stylesheets) – Open Industry Standard
Completely independent of any XML Naming and Design Rules (NDRs)
Versioning in isolation of XML Schema
Methodology - Process Overview
Schematron-based Value Validation using Genericode
ValidationCoding
W3C XML Validation Stylesheettransform generate
Data Exchange Partner Agreement
Data Content Validation Requirements
Methodology - Involved Roles
Schematron-based Value Validation using Genericode
Data Content Validation Requirements
ValidationCoding
W3C XML Validation Stylesheettransform generate
Business Analysts & Testers
Users
(Developers)
(Data Architects)
Value Validation Service StaffRun-time Operator Specialist
Documentation
Developers & Testers
Users
Methodology Run-Time Components
Aextends
A
W3CXML
ValidationStylesheet
XML file
W3CXML
DocumentSchema(s)
Methodology - Value Validation
The validation process involves the use of Schematron language and XSLTs
Schematron is a rule-based XML Schema language, developed by Rick Jelliffe and internationally standardized as ISO/IEC 19757-3, using XPath expressions to describe validation rules .
Schematron is used to confirm the success or failure of a set of assertions made about XML document instances.
Schematron can be used as an adjunct to DTDs, RelaxNG or XML Schemas. It allows co-occurrence constraints, non-regular constraints, and inter-document constraints
Methodology - Overview
Methodology Data Flow Diagram
A
B
C
D
E
F
Default Code List (gc)
XSDMethodology
XML
structure validationCode list validation
XML
Validated
Application A
B
C
G
H
Customised Code List (gc)
References
References
CVA
schXSL
Methodology - Process
Application of the Process in an Enterprise
Enterprise Code ListsMethodology
Enterprise XML Schemas
Application B
Customised enterprise code
lists
Business Rules
Application A
Customised enterprise code
lists
Business Rules
Methodology - Status OASIS Code List TC draft standard 0.1 (was
version 0.8 under OASIS UBL TC) No known platform-independent alternative
Plug-and-play run-time component
Methodology can evolve without impacting run-time requirements
A A
W3CXML
DocumentSchema
W3CXML
ValidationStylesheet
Methodology - Benefits
Verify that instance document is valid as per DEPA Validate data content platform-independently Sender and receiver get the same validation result Simple run-time requirement (XSLT) Strong candidate to become a global industry standard
(UN/CEFACT is taking an interest) W3C Stylesheet and Schema are industry standards Simple run-time requirement (XSLT or Python
or any other ISO Schematron implementation)
Methodology - Benefits, continued
Supports versioned validation in isolation of schema version
Documentation is in synch with implementation
Validation can be switched on/off as required (by msg. server or appl.)
Simplifies application coding
Simple run-time requirement allows for evolution of the methodology
Details of methodology is transparent to operations
Methodology - Risks
An OASIS draft standard
Methodology not widely used yet
Methodology may change or evolve
Requires Schematron and XPath expertise
Affects the XML instance document processing (extra steps)
Affects the testing of XML Schema/XSLT release packages
OASIS CIQ TC Case Study – Using the “Schematron-based Value Validation using Genericode” Methodology to customise OASIS CIQ Specifications v3.0
OASIS CIQ Technical Committee
Open Industry Specifications for defining Party Centric Data from global (international) perspective
Party – Person or Organisation Name (241+ countries in over 36 formats) Address/Location (241+ countries in over 130 formats)
Party Centric Attributes Party Relationships
Delivering royalty free, open, international, industry and application neutral XML specifications for representing, interoperating, and managing party(person/organization) centric information
Why Genericode and the Methodology for CIQ TC? Keeps code list and values outside of the core CIQ
XML Schemas Provide users with the ability to define the
semantics for the data represented in CIQ structure
Provide users with the ability to customize the CIQ XML Schemas without modifying the CIQ XML Schemas
Provides users the ability to write business rules to constrain the structure of the CIQ XML Schemas without modifying the XML schemas
OASIS CIQ Specifications Party Name Schema – xNL.xsd Supporting enumeration list (13) – xNL-types.xsd
Party Address Schema – xAL.xsd Supporting enumeration list (32) – xAL-types.xsd
Party Information Schema – xPIL.xsd Supporting enumeration list (60) – xPIL-types.xsd
CIQ Specifications without Genericode Approach
Code Lists defined in these 4 files
Use Party Name as Case Study
Code Lists defined in an XML Schema (xNL-types.xsd) that is “included” in xNL.xsd
Enumeration List referenced from xNL-types.xsd
xNL Enumeration List
Users given the choice to modify the code lists to meet their specific requirements
Basic default values provided, but it is up to the users to use them as is or customise it
xNL Enumeration List - Drawbacks
Each application has to have its own enumeration list Point to point negotiations between applications No standard enumeration list file that remains untouched Change in enumeration list will result in change to
application code generation The Name schema might be used in multiple locations in an
organisation (e.g. billing, marketing, sales, customer identification) and hence, customising the enumeration list is not straightforward
It might be an overhead for an application to use a large code list when it requires only 3 values
Objective of this case study
Move away from embedding code lists as XML schemas and “include” or “import” them in base XML schemas
Investigate the use of genericode approach and UMCLVV in CIQ Specifications
Implement genericode approach in CIQ Specifications as an optional feature
Customise the genericode based default code lists with specific requirements without modifying the default code lists
Apply business rule constraints on the core CIQ XML schemas without modifying the XML schemas
Case Study - Scenarios Add a new code list value to default name code
list (“NativePlaceName”) Restrict the default name code list to allow no
more than one first and last name (“FirstName”, “LastName”)
Restrict the default code list to allow only “FirstName”, “LastName”, and “NativePlaceName” as code values
Apply business rule constraints on XML Schema
Customising the default xNL Code List without changing it to cater the above requirements is impossible
Preparing xNL Schema with Genericode Approach to Handle
Code Lists
Step 1- Create default .gc files
Identify and decide on list-level and instance-level metadata to be included
Create .gc file for each enumeration list in xNL-types.xsd
Ensure that the .gc file is valid structurally against genericode-code-list.xsd file
.GC file - Example
Code Value
List Level Metadata
Instance Level Metadata In the absence of metadata properties for values in the
instance being validated, only the values found in the associated external list representation can be used. There being no qualification of the values in the instance, all values in the external file are in play as valid values for validation
If the instance being validated does have metadata properties specified for a given value, then that value is asserted to be a value from a particular version or identified list of values.
Instance level metadata allows an instance to disambiguate a coded value that might be the same value from two different lists.
Step 2: Modify xNL.xsd
Remove references to enumeration list defined as xml schemas
Include distinct instance level metadata for all elements/attributes that uses code list values
Instance Level Metadata used Ref == genericode ShortName Ver == genericode Version URI == genericode CanonicalUri VerURI == genericode CanonicalVersionUri
Instance Level Metadata
Instance level Metadata for “ElementType” attribute
xs: string
Step 3: Prepare Context/Value Association (CVA) File
Every element and attribute information item below the document element of an XML document is in a document context described by its hierarchical ancestry of elements. A fully qualified document context specifies the information item’s precise location in the document.
Define the all the default document contexts with pointers to the default genericode files produced from xNL-types.xsd
CVA File
Step 4 - Prepare files for Value Validation
Run the supplied batch/shell files as part of the Methodology process to create the necessary files for code list value validation
Applying Constraints to Default Code Lists
Default Schema and Code List Values
- Add a new code value “NativePlaceName”
- Restrict the code values to have only “FirstName” and “LastName”
Step 1 – Add a new code list value
Add a new code list value “NativePlaceName”
Create a gc file with this code value
Step 2 – Restrict the default code list
Restrict the code values to only “FirstName” and “LastName”
Create a .gc file with this restriction
Step 3 – Create Restriction CVA File
Applying Business Rules to Constrain Default XML Schemas
Step 4 – Define Business Rules to include constraints to default schema
Restrict the schema to accept only one First Name and one Last Name
Business Rules to define constraint
No changes to xNL Schema
Step 4 - Prepare files for Value Validation
Run the supplied demonstration batch/shell files as part of the Methodology process to create the necessary files for value validation
CIQ Global Address Specification (xAL)
Can be customized to specific country address structure using the Methodology, but at the same time keeping the customized structure in compliance with xAL default structure
Example 1: Customizing xAL for Singapore
Let us assume that Singapore Address does not require the following xAL elements:
Administrative Area Rural Delivery, or Post Office Location Coordinates Free Text Address Country
Example 1: Customising xAL for Singapore
Example 1: Business Rule for Singapore Address
No changes to xAL Schema
Example 2: Customizing xAL to only use Free Text Address Lines
Business Rule for Example 2
No changes to xAL Schema
CIQ Specifications with Genericode Approach
Skills Required to use OASIS Code List Approach
XML Schema Language Schematron Language XSLT (some times) XPATH XML Processors/XML Parsers Batch Files / Shell Files
Experience using the Methdology and Genericode Approach
Powerful The only standard for managing code lists now in
industry Manual effort (requires patience) Painful without tool support But once everything has been set up, works
beautifully Does not deal with mapping between schemas
OASIS Codelist Representation (Genericode) Version 1.0, May 2007, http://docs.oasis- open.org/codelist/cd-genericode-1.0/doc/oasis-code-list-representation-genericode.pdf
Schematron-based Value Validation and Genericode, Working Draft, Version 0.1, July 2007, http://www.oasis-open.org/committees/document.php?document_id=24810
OASIS Code List Adaptation Case Study (OASIS CIQ), Version 0.3, July 2007, http://www.oasis-open.org/committees/document.php?document_id=24813
OASIS Party Information Standards, http://www.oasis-open.org/committees/ciq
References
Special Thanks……..
Ken Holman, Chair, OASIS Code List Representation TC
Juerg Tschumperlin, Data Management Solutions, New Zealand
Thank You
http://www.oasis-open.org/committees/ciq