native xml database for information systems chris wallace smrg seminar feb 2006
Post on 22-Dec-2015
217 views
TRANSCRIPT
![Page 1: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/1.jpg)
Native XML Databasefor Information Systems
Chris WallaceSMRG Seminar
Feb 2006
![Page 2: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/2.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
2
Exploring the design space
• “design as a conversation with the materials in the situation” (Schon)
• Native XML database (NXD)– Storing, querying and updating XML documents without
mapping into relations– Schema-free– Trees are to NXD what tables are to RDBMS– Tables are trees
• Information Systems– Focus on semi-structured data (mixture of simple data
items, text and complex nested structures)– Searching, derived data, visualisation– Process support– Large problem space variously supported by
spreadsheets, word documents, ad-hoc databases, increasingly web-integrated data.
![Page 3: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/3.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
3
eXist Native XML Database• Open source Java • European team of developers led by Wolfgang
Meier• Documents (files) are organised in collections
(folders) in a file store– XML Documents stored in an efficient, B+ tree structure
with indexes– Non-XML resources (XQuery, CSS, JPEG ..), etc can be
stored as binary• Deployable in different ways
– Embedded in a Java application– Part of a Cocoon pipeline– As web application in Apache/Tomcat– With embedded Jetty HTTPserver (as on stocks)
• Multiple Interfaces– REST – to Java servlet – SOAP– XML:RPC
![Page 4: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/4.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
4
NXD case studies
• FOLD– modules, programmes, scheme operations,
staff, organisational structures, events
• Family photos and history– Integration of meta-data on family photos with
family history (births, deaths and marriages)
• ISD3 Assignment – a web-based calculator– e.g. a currency converter
![Page 5: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/5.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
5
Research Work
• Development of the FOLD (Faculty OnLine Data) - a pilot project for UWE
• Teaching students and staff in XML languages (XML Schema, XSLT, XQuery) and NDX database design
• Links with other eXist projects• SPA2006 Workshop on NDX• XML Prague (eXist)
![Page 6: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/6.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
6
Research Areas
• Design practice for NDX– ‘Pattern language’ to help map from conceptual
model to multiple XML schemes– Identifier design– Structuring documents by responsibility and
versions
• NDX in organisational use– Social effects of distributed responsibility– Visualisation of complex relationships – Handling integrity problems – accept
inconsistency as a way of life– Management of veracity
![Page 7: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/7.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
7
The FOLD
• Faculty OnLine Data• Technologies
– eXist– (Java) – not yet– XQuery – XSLT– CSS– PHP – to be eliminated
![Page 8: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/8.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
8
The FOLD (2)
• Scope – Module and Programme specifications– Modular Schema operations (runs)– Staff– Organisational structure– Events
• Functionality– Highly linked– (Integrating UWE sources)– (Personalized Interface)
![Page 9: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/9.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
9
FOLD - Modules and Programmes
+ Module
- moduleCode : String
+ Module Specification
- version : Year
- faculty : Faculty
- field : Field
- title : String
- credits : CreditsType
- level : LevelType
- syllabus : RestrictedHTML
- readingStrategy : RestrictedHTML
+ 1..1+ 1..*
+ definition
+ ProgrammeStructure
- version : Year
+ Programme
- programmeCode : String
- ucasCode : String [0..1]
+ 1..1
+ 1..*+ s tructure
+ Stage
+ 1..1
+ 1..* {ordered}
+ OptionGroup
- id : String
- comment : String [0..1]
- minCredits : int
- maxCredits : int
+ 1..1
+ 1..* {ordered}
+ Core
+ 1..1
+ 1..* {ordered}
+ 1..*
+ 1..*
+ core
+ Option
+ 1..1
+ 1..* {ordered}
+ 1..*
+ 1..*
+ optional
+ Module Combination
- comment : String
+ 1..1
+ 0..1+ pre-requis ite
+ 1..1
+ 0..1
+ co-requisite
+ 1..*
+ 1..*
+ e
xpre
ssio
n
This is a boolean expression such as ( m1 and m2 and (m4 or (m5 and m6))
+ Learning Outcome
- assessed in Comp A : boolean
- assessed in Comp B : boolean
- specification : RestrictedHTML
- outcomeType : Learning Outcome
+ 1..1
+ 1..* {ordered}
+ Reading item
+ Book
- authors : String
- title : String
- year : String
- source : String
+ WebSite
- url : URL
- text : String
+ 1..1
+ 1..1
+ 1..1
+ 1..*+ Excluded
The FOLD
![Page 10: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/10.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
10
Fold Design Issues
• Conceptual Modelling• Conceptual – Logical – Physical mapping• Identifiers• Relationships and links• Versioning• Editing• Views• Responsibilities• Processes
![Page 11: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/11.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
11
Mapping from Conceptual modelto the Logical and physical layers
• What criteria to use in breaking up the whole model into – Logical
• Entity – a logical compound structure– Physical
• Documents – a physical aggregation of entity instances• Collections – a physical aggregation of documents
• Examples– Module Specification [moduleCode]
• Module Spec is an Entity• Each Module Spec is a Document
– Module Run [moduleCode/year/runNo]• Module Run is an Entity• Set of Module Runs for a Field is a Document
• Issues– Where to develop Schemas?– No logical data in the physical – purely for convenience
![Page 12: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/12.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
12
Conceptual Modelling
• Conventional normalised data model• Generality issue e.g. Module run
– Roles as Attributes• <ModuleLeader>Stewart Green</ModuleLeader>
– Roles as Entities• <role><title>Module Leader</title><person>Stewart Green</person></role>
– Entities enable meta data, but defeat use of tables for data entry
• Need views
• Attributes v elements – a Conceptual/logical mapping issue– <Module code=“UFIEKG-20-3” level=“3”>…– <Module><ModuleCode>UFIEKG-20-3</ModuleCode>..
![Page 13: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/13.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
13
Conceptual Modelling Tools
• UML class model closest to suitable conceptual model– Allows multi-valued attributes– Distinguished relationship kinds
• Composition• Bi-directional associations• Uni-directional associations (for multiplicity resolution)
– QSEE/Rose• No identifiers (primary keys) ??• No indication of mapping to attributes or elements• No mapping into Entites• No mapping into Documents and Collections
![Page 14: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/14.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
14
Identifiers• Principle adopted – use naturally occurring identifiers wherever possible
– Persons : “Ian Beeson”– Rooms : “3P14”
• Plus– Reduces gap between RW domain and system– Names in minutes of meetings, on spreadsheets are readable– )
• Minus– Duplicates
• Duplicates not tolerable in the RW either, resolved through RW negotiation within a RW namespace e.g. the Faculty
• Mergers generate duplicates– Aliases– Not all entities have unique identifiers
• Programmes – ISIS Primary Award and UCAS are candidates but don’t work
• ?– All names need namespace – “Ian Beeson” at CEMS at UWE– Need to replace multiple naming conventions with a single naming scheme (e.g.
initials)– URN’s and semantic web
![Page 15: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/15.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
15
Alias handling
– Problem handling aliases in staff data• Currently a person can have multiple names
–first is the prime• Better is a separate alias table
– Lookup the base table– If not find, try the alias table
![Page 16: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/16.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
16
Relationships and Links• Relationships need to be implemented
– One – Many • RDBMS – primary key on the One side becomes foreign key on the
Many side• NXD – choose which side on the basis of complexity and
responsibility– Sequence (modules in a stage)– Complex (pre-requisite expression)
– Many-Many• RDBMS – intersection table • NXD– as for one-many • or either side as appropriate – Groups and subgroups
• Issues– Referential integrity
• RDBMS – ‘eager’ – data not allowed in unless links OK, links maintained through updates– integrity failures transient, repair outside database
• NXD – ‘lazy’– store the data and provide on-demand or on-trigger validation– Integrity failures can be persisted (XLinkit) and repair is inside
database
![Page 17: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/17.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
17
Versioning
• Based on Yearly cycle– Base Year set in user’s session– Default set in system config
• Two different approaches– Module Run, Coursework Elements..
• Explicit version identifier– ModuleCode/Year/RunNo– Selection is explicit [Year= $year]
– Module Specification, Programme Structure• Implicit version defined by sequence of versions
![Page 18: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/18.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
18
Implicit Versioning
2002
2005
2007
Versions
Year=2006 Latest version =2005
Latest version =2002Year=2004
![Page 19: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/19.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
19
Implicit Versioning
let $specPath := "/db/versionTest", $currentYear := "2005", $moduleCode := request:request-parameter("moduleCode",""),
$year := request:request-parameter("year",$currentYear),
(: get the set of possible versions for this module :) $modspecs := collection($specPath)/moduleSpecification [ModuleCode=$moduleCode] [Version <= $year],
(: select the version with the highest version number :) $modspec := $modspecs[Version = max($modspecs/Version)] return $modspec
![Page 20: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/20.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
20
Editing• Table structured Document editing
– Allows maintenance using familiar Spreadsheet tools (Excel 2003)– Schema is induced by Excel– Accommodations
• Multi-valued fields as concatenated values– XPath Join and tokenise functions– Embedded separator problem (a name with ‘,’ as a legitimate character)– Defeats indexing
• Optional elements increase table width• Formatting choices not maintained (e.g.Freeze-Window)
• Structured Document editing– Allows maintenance with Word without a schema
• With difficulty –not schema awareness– Use InfoPath to create desktop form based on schema
• Need to redo if schema changes• In-situ Updates
– With Xquery-generated forms and update– With XForms
![Page 21: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/21.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
21
Views
• Views arise from the need for de-normalisation– Coursework Element
• As a simple element– Key : moduleCode/Year/runNo/elementNo– Data: due date
• As a derived complex element– SuggestedHours (computed from Hours table)– Late date (computed from UWE calendar)– Weighings (extracted from relevant specification)– Module Leader (extracted from Module Run)
• Views as transient or materialize• View definition• View Maintenance
![Page 22: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/22.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
22
![Page 23: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/23.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
23
declare function fold:courseworkElement($moduleCode, $year, $runNo, $elementNo) { let $mod := fold:moduleSpecification($moduleCode,$year), $run := fold:moduleRun($moduleCode,$year,$runNo), $elementRun := fold:elementRun($moduleCode,$year,$runNo,'B', $elementNo) , $elementSpec := $mod/Assessment/FirstAttempt/Components/ComponentB/Element[position() = $elementNo], $dueDate := $elementRun/DueDate, $returnDate := fold:workingDays($dueDate,20), $componentWeight := $mod/Assessment/Weighting/ComponentWeightB, $weightInComponent := data($elementSpec/Weight), $weightInModule := round($weightInComponent * $componentWeight div 100), $load := fold:load($mod/Level), $hrs := round(data($mod/UWERating) div data($load/Credits) * $weightInModule div 100 * data($load/Hours)) return<CourseworkElement> <ModuleCode>{$moduleCode}</ModuleCode> {$mod/Title} <RunNo>{$runNo}</RunNo> {$run/ModuleLeader} {$run/InternalModerator} {$run/ExternalExaminer} <Component>CW</Component> <ElementNo>{$elementNo}</ElementNo> {$elementSpec/Description} <SuggestedHours>{$hrs}</SuggestedHours> <WeightInComponent>{$weightInComponent}</WeightInComponent> <WeightInModule>{$weightInModule}</WeightInModule> <DueDate>{data($dueDate)}</DueDate> <ReturnDate>{data($returnDate)}</ReturnDate></CourseworkElement>
};
![Page 24: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/24.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
24
Process support
• Short term – Process support– Form generation– Linkage to process documentation
• Medium term – Process monitoring– Online capture of significant dates
• Coursework hand-in date• Date exam sent to moderator• Date coursework returned to students
– Derived information• Workload prediction based on coursework schedule and
student numbers• Display of latest coursework returned and SMS message to
students
• Long term- Process management – Workflow – Process enactment software
![Page 25: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/25.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
25
Short-term • Session based logins to personalise the interface and
specify parameters (currentYear) • Form generation as passive documents
– Update through the form an obvious extension• Extend operational data with date-based status
– Date-returned-to students • If set (work has been returned)
– Date used to generate page of coursework recently returned – Date used to monitor conformance to target return date(!)
• Link Forms to textual/graphical process description– Coursework from setting to field board– How to specialise a generic description?
• By level• By module• By field
![Page 26: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/26.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
26
Responsibilities
• Responsibility allocation– Admin / architect decision– Physical level design for responsibility
• All Module Runs in a Field in one document• Modules and Programme Structures in Field Collections
(within Year)– Group access rights
• For IS Field - ISAdmin– Anne Moggridge– Peter Rawlings– Lilly Cooke– Tracey Davis
• Need for check-in check-out of documents– WebDav (Web Folders)
![Page 27: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006](https://reader030.vdocuments.us/reader030/viewer/2022032704/56649d785503460f94a5a903/html5/thumbnails/27.jpg)
Chris Wallace, SMRG Seminar, Feb 2006
27
Conclusion
• Slide from prototype to production• Pluses and Minuses of user enthusiasm• Go for ‘low-hanging fruit’• Pay attention to the learning process
– XQuery, XSLT are non-trivial languages because deeply unlike Java/PHP
• Reflection forced by presentations and workshops