tools for next generation of cms: xml, rdf, & grddl chimezie ogbuji (chee-meh) cleveland clinic...
TRANSCRIPT
Tools for Next Generation of CMS: XML, RDF, & GRDDL
Chimezie Ogbuji (chee-meh)Cleveland Clinic FoundationCardiothoracic Surgery [email protected] / [email protected]
Background (CT Research Roadmap)
A large, relational registry for Cardiothoracic procedures
Relatively small research department with very little software engineering experience
Traditional CMS and DBMS were insufficient Initiated a large effort to convert to a metadata-
driven XML / RDF repository (SemanticDB) Need to replace a productive, integrated research
pipeline Data entry, clinical Q&A, patient follow-up, concurrent
study management,... 100+ research papers per year
Background (Institute of Medicine Proposal)
The Computer-Based Patient Record: An Essential Technology for Health Care ISBN: 0309055326
Old but very relevant set of requirements by the IOM (still unfulfilled).
A comprehensive attempt to address all the requirements: technological, clinical, procedural, etc..
Can be (completely) addressed with Semantic Web architecture, document processing, and “Web 2.0” architecture.
CPR: Functional Requirements
Uniform, extensible record content (Standard) record formats System performance Linkages Intelligence Reporting Capabilities Security Multi-views Accessiblity
Definitions: KR / CMS
What is Knowledge Representation (KR)? What is a Knowledge Base (KB)?:
A database system which facilitates deductive reasoning over a KR
Commonly called Rule-based Systems What are Expert Systems? What is a Content Management System
(CMS)?
Knowledge Representation
Older ideas at corners, newer ideas along sides (Credit: Conrad Barski, M.D.)
Content Management System:The What
The terms CMS and Content Repository are essentially interchangeable
Modern content repositories are best characterized by JSR 170 / 283
“.. a high-level information management system that is a superset of traditional data repositories”
Integrated support for the XPath data model is the most prominent feature (native document management)
Content Repository Feature Set
Modern CMS standards cover document management effectively Read/write access Versioning Event monitoring Document-level access control Concurrent access Cross-linking Profiles and Document Types
Anatomy of a JSR 170 Implementation
Jack Rabbit Component-based
Content Applications Content Repository API Implementation
Knowledge Bases and CMS
What of the requirements that Expert Systems meet?
Document management and knowledge management systems are historically isolated from each other
XML & RDF are contemporary manifestations of these methodologies
They have remained as isolated as their predecessors
They typically only coincide with regards to syntax
XML & RDF:Eating and Having your Cake
Classic example of where the document-oriented approach falls short: Modern EHR cannot facilitate dynamic research
Unified infrastructure for document and knowledge management is needed
One of the earliest examples: 4Suite Server version 0.10.0 (December 2000)
Current state of the art (GRDDL): Gleaning Resource Descriptions from Dialects of Language
GRDDL:The Elevator Pitch
Provides a way to normalize RDF concrete syntaxes
The problem: Many RDF concrete syntaxes (RDF/XML,Trix, RDFa,..) The authoritative concrete syntax is not without issues
The solution: Define mappings from XML dialects to RDF graphs Use turing-complete XML pipelines
English as a second language analogy
The GRDDL Picture
GRDDL:The Components
Faithful Rendition “By specifying a GRDDL transformation, the author of a
document states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.”
Various Mechanism for nominating transformations: Specific XML attribute, XML Namespaces, HTML
Profiles, and XHTML links GRDDL-aware agents compute GRDDL results
(RDF graphs)
The CMS Alternative:“Dual Representation”
Persist XML in synchrony with its faithful rendition Changes to the XML trigger calculation and storage of
corresponding RDF “Dual Representation” Implemented by 4Suite Server Document
Definitions The basis of how we capture patient records with
maximum syntactic and semantic expressivity
Document Definition
The document definition is the mapping Usually an XSLT document
Content Repository Architecture
Overlap between Content Repository APIs
Dual Representation:Advantages
Maximum expressiveness and versatility of content Unified naming convention and access control
(more on this later) Uniform, concrete RDF syntaxes
For systems which speak XML fluently (XForms, POX over HTTP, WS-*, etc..)
Cheap support for XML & RDF content negotiation Use of RDF as a semantic index for XML
Document Definition:Similarities
GRDDL RDDL
Resource Directory Description Language Human-readable descriptive material about a target A directory of individual resources related to a target
Nature and Purpose Schema, stylesheet, etc.
Lives at a namespace URI WXS's targetNamespace Common theme is a set of definitions for a
document or a class of documents
Registering a Document to a Class
Namespace registration works well for the web (preferred approach of W3C TAG)
What if you don't control the content served from the namespace of an existing vocabulary? Atom, Docbook, etc.
A CMS is better suited for a 'closed' / 'controlled' approach Persist membership metadata in the CMS
SemanticDB and Dual Representation
Document and Graph Granularity
Tying documents to graphs normalizes the content granularity
Documents and their RDF graphs can be treated uniformly: Naming convention Targeted querying Access control management
JSR Fine-Grained Control
'Controlled' Naming Convention
Controlled Naming Convention:Continued
RDF Dataset (from SPARQL): A collection of named graphs
The RDF is stored in a graph with the same URI as the XML source document
When RDF is used as the primary cross-document 'index' you can:
SELECT ?graph WHERE { GRAPH ?graph { ... } } document($graph)/.. XPath ..
The space compromise (of dual representation) can be further mitigated by only extracting a minimal RDF graph
Uniform Access Control for XML/RDF CMS
Traditionally, Access Control Lists are associated with an object Example: a file or directory in a filesystem
Assign document / graph ACLs to a single URI Certain users / groups can query the RDF but cannot
read the XML De-identification of EHR: HIPPA
The 4Suite repository supports unified XML/RDF ACL
Going Forward
The SPARQL RDF dataset needs to be generalized There is a long list of representation problems solved by
a formal named graph specification RDF graphs need to be first-class objects in CMS Build a common Content Repository API for XML /
RDF on the JSR 170 / 283 foundation Where do the 4Suite Repository API and JSR 170 /
283 overlap? How do we generalize Document Definitions?
A Proposal for XML/RDF CMS
Primary Takeaways
We need to stop thinking of XML & RDF as mutually exclusive solutions to similar problems
CMS standards are needed for the next generation of semantic / rich web applications
These standards can preemptively level the landscape of toolkits in this space
References
D. Nuescheler et al, JSR 170: Content Repository for Java http://jcp.org/en/jsr/detail?id=170
D. Connolly, Gleaning Resource Descriptions from Dialects of Language http://www.w3.org/TR/grddl/
J. Borden, T. Bray, Resource Directory Description Language http://www.rddl.org/
E. Prud'hommeaux, A. Seaborne, SPARQL Query Language for RDF http://www.w3.org/TR/rdf-sparql-query/
Fourthought Inc., 4Suite http://4Suite.org