jess-based web interface for xml document validation

Available online at www.sciencedirect.com

www.elsevier.com/locate/eswa

Expert Systems with Applications 36 (2009) 683–689

Expert Systemswith Applications

JESS-based web interface for XML document validation

Buhwan Jeong a,*, Jungyub Woo a, Boonserm Kulvatunyou b, Hyunbo Cho a

a Department of Industrial and Management Engineering, Pohang University of Science and Technolgoy (POSTECH),

San 31, Hyoja, Pohang, Kyungbuk 790-784, South Koreab Manufacturing Systems Integration Division, National Institute of Standards and Technology (NIST), 100 Bureau Drive,

Gaithersburg, MD 20899, USA

Abstract

Ensuring consistency among data exchange specifications in XML is critical to seamless integration of various business-to-business(B2B) applications. To this end, a specification should be thoroughly verified in manifold perspectives such as grammar/syntax confor-mance, design compliance, and canonical semantics accordance. A single hand-woven testbed implementation will fail to versatilely sup-port such a variety of testings. In this paper, we propose a novel validation tool of XML documents using an expert system and openvalidation rule scripts. Particularly, we illustrate a web-based implementation to ensure an XML schema’s design compliance to designand naming rules (NDR), namely a quality of schema design (QoD) test.� 2007 Elsevier Ltd. All rights reserved.

Keywords: Expert system; Interoperability; JESS; Schema design quality; Validation; XML

1. Introduction

The issues about interoperability have become a phenom-enon in industry as well as in academia. It is well known thatthe costs occurring due to lack of interoperability among sys-tems are beyond our estimate (for instance, see two reportsfor details (Brunnermeier & Martin, 1999; Gallaher & Gil-day, 2004)). The interoperability may be achieved throughlong discussions with all the parties involved. However,practical constraints make it impossible.

Let us focus on development of standard data exchangespecifications for further explanation. Inconsistenciesamong such specifications are inevitable due to the natureof distribution, evolution, and customization. The develop-ment is distributed because (1) project team members arephysically distributed over the world, and moreover, (2)the communication among them, perhaps, having different

0957-4174/$ - see front matter � 2007 Elsevier Ltd. All rights reserved.

doi:10.1016/j.eswa.2007.10.018

* Corresponding author. Tel.: +82 54 279 8243.E-mail addresses: [email protected] (B. Jeong), woo@postech.

ac.kr (J. Woo), [email protected] (B. Kulvatunyou), [email protected] (H.Cho).

viewpoints is inefficient due to concurrent development tomeet a short budget cycle. Therefore, they are rush to pro-pose standards without enough validation time to reach toa solid agreement. Even an agreed-upon standard is alsoevolving over time because it is impossible (1) to obtainresources enough to complete a project in a budget cycle,(2) to capture all the needs and constraints at the beginningof the project, and (3) to complete the entire project in ashort period from the management perspective. Sometimes,it is due to immaturity of technologies. Furthermore, astandard specification needs to be customized for differentindustrial domains – different environments and assump-tions for implementation.

For those reasons, the standard XML specificationsmust be thoroughly validated before deployed. At leastfour conformance tests should be conducted: schema gram-mar conformance, schema semantics conformance, schemadesign conformance, and canonical semantics conformance(Kulvatunyou, Ivezic, & Jeong, 2004). We propose to use arule-based expert system for that purpose. More specifi-cally, we address to construct and automatically transforman XML document into a knowledge base, and to encode

mailto:[email protected]

mailto:woo@postech. ac.kr

mailto:woo@postech. ac.kr



XML DocumentValidation

Test Requirements

XML Schema(& Instance)

Expert System

Test Result/Report

XML/XSLT

684 B. Jeong et al. / Expert Systems with Applications 36 (2009) 683–689

test scripts from various design requirements. A web-basedimplementation and related issues are discussed.

The remainder of this paper is organized as follows: Sec-tion 2 delineates the rule-driven testbed architecture that sup-ports generic XML document validation. Section 3 givesmethods to construct a knowledge base and to encode testcases. Section 4 describes a case study, XML schema designquality testing, with issues related to web-based testbed imple-mentation. Finally, Section 5 provides conclusion remarks.

Fig. 1. Expert system-based XML document validation.
2. Rule-driven XML document validation
We address the use of a rule-based expert system to val-idate XML documents. XML document validation checksfor an XML schema/instance document according to testassertions on a validation engine. Fig. 1 depicts the highestview of rule-based XML validation in the IDEF0 model(Lingzhi, Leong, & Gay, 1996). The test assertions statetestable requirements including the W3C XML grammar,design conventions, and/or canonical semantic models.For simplicity, we take the schema design quality (QoD)testing, in the context of which the rule-based expert systemfinds design violations against conventions and best prac-tices on design. The validation rules are expressed in adeclarative manner in order to keep them open. For exam-ple, the Schematron language1 encodes primitive rules inXPath expressions, and JESS script (Friedman-Hill, 2006;Jovanovic, Gasevic, & Devedzic, 2004) flexibly encodesmore complex rules.

The XML validation further breaks down into threefunctionalities as shown in Fig. 2. In order to facilitate thetesting, the XML schema documents must be first repre-sented in a compatible format with the internal expert sys-tem. This process, namely knowledge base construction,transforms the input XML documents into a collection ofknowledge nuggets, called (unordered) facts, which arefed to the test execution process as an input. The XSLT lan-guage is useful to automatically transform an XML docu-ment into the facts. The second process is to encode thedesign conventions in validation rule scripts. Since the rulescan trigger actions based on the contents of one or morefacts, they are used as controls for test execution. The lastprocess is to fire actions (i.e., generate test results) in the testscripts depending on satisfaction of condition clauses.

3. Knowledge base construction

3.1. Templates and facts

The rule-based expert system maintains a knowledgebase that is a collection of facts. In the (unordered) knowl-edge base, the facts must be stored in a certain structure.The structure must be declared before defining facts byusing the deftemplate construct (whose syntax is

1 <http://www.schematron.com/>.

depicted in Fig. 3A). The hdeftemplate-namei is thehead of facts being created. For a template, there may bean arbitrary number of slots, which describe the propertiesof a fact (deftemplate-name). Every hslot-nameimust be an atom and the default slot qualifier states thatits default value in a new fact is given by hvaluei (whosedefault value is the atom nil). The default-dynamicversion will evaluate the given value each time a new factusing this template is asserted. Finally, the type slot qual-ifier specifies what data type the slot is allowed to hold andan acceptable value is one of any, integer, float, number,atom, string, lexeme, and object. For example, the tem-plates for XML documents are defined in Fig. 3B. The firstthree templates are for element nodes, attribute nodes, andtext nodes. The last relation template is to relate a par-ent element and its child node, which is one of element,attribute, or text node according to the type slot. Theykeep the original tree structure of a given document.

The facts are inserted into a knowledge base with the con-structs assert and deffacts in forms of the templatesabove. The assert construct, i.e., (assert (hfact-namei(hsloti hvaluei)*)), inserts a fact at each time, while thedeffacts construct, a named list of facts, does these factsat a time, i.e., (deffacts hfact-list-namei (hfact-namei (hslot-nameihslot-valuei)*)*).

3.2. XML document transformation

The W3C XSLT (eXtensible Stylesheet LanguageTransformations) is a standard specification language fortransforming an XML document into other forms suchas another XML document, an HTML document, andeven a plain ASCII document (Jovanovic & Gasevic,2005). A transformation in the XSLT language is expressedas a well-formed XML document and describes transfor-mation rules from a source (XML-native DOM) tree intoa result tree. The result tree is separated from the sourcetree; thus the structure of the result tree can be completelydifferent from that of the source tree by filtering and reor-dering elements, and adding arbitrary structures.

The transformation is achieved by associating patternswith templates. A pattern is matched against nodes in thesource tree and a template is instantiated to create part ofthe result tree. When a template is instantiated, each instruc-tion is executed and replaced by the result tree fragment.

http://www.schematron.com

A

B

Fig. 3. (A) Syntax of template definition and (B) knowledge base templates for XML documents.

Constructknowledge base

XSLT for XML2Fact

XML Schema/Instance

XML/XSLT

(Unordered) Facts

Execute Test

Expert System Engine

Test Result

Encode testscript

Expert System spec.

W3C XML Schema spec.Test rule scripts

Design conventions

Fig. 2. Detailed functional model for XML document validation.

B. Jeong et al. / Expert Systems with Applications 36 (2009) 683–689 685

Consequently, the instruction selects and processes itsdescendant elements. Processing a descendant element cre-ates a result tree fragment by finding the applicable templaterules and instantiating its template. The result tree is con-structed by finding the template rule for the root node (pos-sibly, xsd:schema when the default namespace prefix isxsd in XML Schema transformations) and instantiatingits template.

Fig. 4 depicts partial XSLT transformation rules forXML schema documents into unordered facts. The XSLTconforms the XSLT namesapce and its root element ishxsl:stylesheeti (or, hxsl:transformi is possibleas well). The type of the result tree is specified as ‘text’,since the facts are represented in a plain text. The knowl-edge base templates above (Fig. 3B) are declared at first,and then facts assertion rules are defined for each element,attribute, and text node. The function generate-id() inXSLT is used to specify each node’s unique ID.

3.3. Test scripting

A test script, called as a rule, is defined by using thedefrule construct. The construct consists of two partsseparated by the ) token. The left hand side (LHS)includes condition clauses, whereas the right hand side(RHS) defines actions executed when all the conditionclauses in the LHS are satisfied. The condition clausesare interpreted as conjunctions (and relationships) indefault when no other indicator is specified such as or

and not. Since the condition clauses are sequentially andpartially matched, it should be noted that in order to

improve the expert system’s efficiency the most specificand accurate condition clauses (which will match the fewestfacts) must be placed in the top of each rule’s LHS. Thatmeans the condition clauses in the top of an LHS are usedto determine the context of the test, whereas those in thebottom are to identify specific conflicts in the context.

Fig. 5 illustrates a sample rule to detect anonymous types,which are locally defined in an element definition. If a knowl-edge base has all the five conditions (or facts), the message inthe action section is printed out in a prompt window. In thecondition clauses, element, attribute, and relationare from the templates above; and str-index and isTypeare functions, among which the isType (?name) functionis a user-defined one to check whether the associated elementis to define either simpleType or complexType. The user-defined function is created with the deffunction con-struct. It is noted that, instead of using hard equality (eqfunction), we used weak equality (str-index function) tomake the rule independent of a particular namespace.

4. An illustrative testbed

4.1. Schema design quality test

We define the quality of schema design (QoD) as howmuch schema appearance accords to organizational bestpractices on design and naming (Kulvatunyou et al.,2004). That is, it realizes look-and-feel schemas amongintegration parties involved – homogeneous use of struc-tures and naming, ease to understand, uniformity to imple-ment, etc. Hence, the QoD test verifies an XML schema

Fig. 5. A sample rule to detect ‘anonymous type’.

Fig. 4. XSLT fragment necessary for transforming an XML schema into facts (XML2Fact).

2 <http://java.sun.com/products/servlet/>.3 <http://tomcat.apache.org/>.


against design and naming conventions, which are not nec-essarily a standard but agreed-upon among partiesinvolved. The design conventions include conventionsabout naming, namespace, structure and declaration, doc-umentation, versioning, etc. (Kulvatunyou et al., 2004).

4.2. Web implementation

Fig. 6 illustrates the web-based implementation of theQoD testbed, which consists of a web interface, an expertsystem, and a document transformation engine. We used

Java Servlet Technology2 on Apache Tomcat Server3 tointerface between users and internal systems; JESS for anexpert system; and XSLT for document transformation.JESS is used as the underlying rule-based expert systembecause of its ability to use native Java functions. A useruploads XML documents to be tested and test scripts onthe testbed. The testbed transforms the input XML docu-ments into a list of facts, and then feeds the facts into the

http://java.sun.com/products/servlet/

http://tomcat.apache.org/

Web Server

Interface (Servlet)

XML Document

Test Report

Expert System

Test Results

XML Document

XML Facts

XML Facts

Transformation

Test Scripts

Test Scripts

Fig. 6. Web-based testbed for XML document validation.


expert system with test scripts. Accordingly, the testbedreports test results to the user.

Fig. 7 demonstrates how to embed the JESS engine innative Java codes (the Server-side application) to uploadan XML schema in JESS facts and test scripts (rules) tothe JESS engine and to retrieve the execution results usingthe Rete object (i.e., the rule engine itself) and execute-

Command function. It is observed that the JESS rule (i.e.,Fig. 7A) freely uses the native Java objects/classes (e.g.,java.lang.String) and methods (e.g., addElement).

Fig. 8 shows two snapshots of the QoD testbed. The firstsnapshot is the test execution screen of selecting test casesand uploading a target XML schema, while the second isthe test result screen.

4.3. Implementation issues

When defining a test requirement and checking a schemaagainst it, the schema type should be taken into consider-ation, since each test requirement has a different test scope.For example, a test requirement is ‘a control schema mustidentify one global element declaration’. Thus, the testcases derived from this test requirement are applied onlyto control schemas that include business document andinterface level construct schemas described below. On theother hand, most of the test requirements (e.g., conventionsabout naming, namespace, etc.) can be applied across sche-mas. We classify four levels of schema documents asfollows:

� Lowest level construct schema. A schema in this typedefines the schema construct primitives (e.g., object enti-ties and data types defined in CodeLists.xsd andFields.xsd in the OAGIS specifications4), which are thebasic building blocks of the consequent schemas. Hence,each primitive may be meaningless until it is used as partof other schemas.

4 <http://www.openapplications.org>.

� Aggregate level construct schema. This type schemaincludes logical and/or physical object components.An object component is restricted from; or composedof either primitives or other object components, or both.While the lowest level construct schema defines simpleand primary data types (e.g., City, State, Country, Cur-

rencyCode, ItemName), this defines more complex andhigher level data types (e.g., Address, EmployeeInforma-

tion, Product). The Components.xsd in OAGIS falls intothis type of schema. Like the lowest level constructschema, the components are seldom used independentlyas meaningful objects.� Business document level construct schema. This type of

schema defines business documents meaningfully orga-nized with the lowest and aggregate level construct sche-mas. In OAGIS, the Noun parts (e.g., PurchaseOrder,BillOfMaterial, etc.) are classified into this schema type.� Interface level construct schema. This kind schema is the

actual BODs (business object documents) exchangedamong involved parties. In OAGIS, a BOD (e.g., Add-PurchaseOrder, ShowBillOfMaterial) consists of a Verb(e.g., Add, Show, Sync) and a Noun.

Another important testing issue is how to select testcases to cover all the users’ requirements of interest. Inthe QoD implementation, two methods of test cases selec-tion are used: user-driven (i.e., test profile) and test require-ment-driven. The first is manually done by composing atest profile, thus it is not necessary that the test cases in atest profile have similar properties (i.e., derived from thesame test requirement). The second method automaticallyexecutes all the test cases derived from a test requirementselected by a user. Further, the test cases derived fromthe test requirements associated with that requirementshould or may also be selected. To automate the selectionand execution, we further need an advanced functionalityby exploiting and inferring dependency and associationamong test requirements. This is because the conventionshave concurrently and independently been amassed byscattered individual organizations without consistent man-agement by a central authority, and as a result, manyduplicate and/or similar test requirements among the

http://www.openapplications.org

Fig. 8. Snapshots of the QoD testbed: (A) test execution and (B) test result.

A

B

Fig. 7. Server-side Java implementation for JESS.



conventions exist and consequently much more duplicatesof test cases. A simple heuristics is to use a test requirementdependency table indicating interrelationships among testrequirements. The dependency includes ‘equality/equiva-lence’, ‘belonging to/sub’, ‘inclusion/super’, ‘overlapped’,‘conflict’, and ‘difference’, to each of which we can set aselection priority depending on the degree of relevance.A more close investigation on this test selection must beadvanced. Upon a decision threshold (e.g., the number ofassociated requirements or selected test cases), test casesfrom more relevant requirements are selected and executed.

5. Conclusion

The consistency in data exchange specifications (andtheir instances) is important in success of any enterpriseintegration projects. The consistency can be ensuredthrough several types of testings, e.g., grammar/syntaxconformance, quality of design (QoD), information map-ping, and canonical semantics. Among them, we detailedthe test for schema design quality, which checks schemasagainst organizational conventions and best practices ondesign, using a rule-based expert system. As part of testbedconstruction, we outlined how to construct a knowledgebase and how to encode test scripts. The use of JESSenabled us not only to utilize full powerful functionalitiesof expert systems (e.g., openness, flexibility), but also toimplement a Java/web-based interface. The testbed, as asyntax-perspective tool, needs to enhance advanced testingcapabilities to reason semantics behind XML documents,

namely SQoD (semantic schema design quality) and SIMT(semantic information mapping test). An intelligent testselection is also a topic to be achieved.

Disclaimer

Certain commercial software products are identified inthis paper. These products were used only for demonstra-tion purposes. This use does not imply approval orendorsement by NIST, nor does it imply that these prod-ucts are necessarily the best available for the purpose.

References

Brunnermeier, S., & Martin, S. (1999). Interoperability cost analysis of USautomative supply chain. In Technical report NC TRI-7007-03.Research Triangle Park, NC: Research Triangle Institute.

Friedman-Hill, E. (2006). Jess: The rule engine for the java platform.Gallaher, M. P., O’Connor, A. C., Dettbarn, J. L., Jr., & Gilday, L. T.

(2004). Cost analysis of inadequate interoperability in the US capital

facilities industry. Gaithersburg, MD: National Institute of Standardsand Technology (NIST).

Jovanovic, J., & Gasevic, D. (2005). Achieving knowledge interoperability:An XML/XSLT approach. Expert Systems with Applications, 29,535–553.

Jovanovic, J., Gasevic, D., & Devedzic, V. (2004). A GUI for Jess. Expert

Systems with Applications, 26(4), 625–637.Kulvatunyou, B., Ivezic, N., & Jeong, B. (2004). Testing requirements to

manage data exchange specifications in enterprise integration: Aschema design quality focus. In Proceedings of the 8th world multi-

conference on systemics, cybernetics, and informatics (SCI 2004).Lingzhi, L., Leong, A., & Gay, R. (1996). Integration of information

model (IDEF1) with function model (IDEF0) for CIM informationsystems design. Expert Systems with Applications, 10(3–4), 373–380.

jess-based web interface for xml document validation

Documents