oasis electronic trial master file standard technical committee content classification layer january...

Download OASIS Electronic Trial Master File Standard Technical Committee Content Classification Layer January 20, 2014 9:00 – 10:00 AM PST

If you can't read please download the document

Upload: lauren-flowers

Post on 24-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • OASIS Electronic Trial Master File Standard Technical Committee Content Classification Layer January 20, 2014 9:00 10:00 AM PST
  • Slide 2
  • Agenda TopicPresenter 9:00-9:05 Call to Order & Roll Call Zack Schmidt 9:05-9:10 Approval of Minutes https://www.oasis- open.org/committees/documents.php?wg_abbrev=etmf https://www.oasis- open.org/committees/documents.php?wg_abbrev=etmf All TC Process and Administration (deferred) Chet Ensign 2 9:10-9:20 Outreach Subcommittee - AllJennifer Alpert 9:20-9:50 Tech presentation Content Classification Layer Z. Schmidt/Aliaa 9:50-9:55 New Business All 9:55-10:00 Next meeting agenda / Date Z. Schmidt
  • Slide 3
  • NameCompanyVoting StatusPresent? Jennifer Alpert Palchak CareLex Votery Aliaa BadrCareLexVotery Oleksiy (Alex) PalinkashCareLexVotery Troy JacobsonForte ResearchVotery Lou ChappuieIndividualVotery Lisa MulcahyIndividualNon-Votery Robert GehrkeMayo ClinicVotern Rich LustigOracleNon-Voter y Michael AgardParagon SolutionsNon-Votery Christopher McSpirittParagon SolutionsNon-Votery Jamie OKeefeParagon SolutionsNon-Votern Fran RossParagon SolutionsNon-Votery Peter AltermanSAFE-BioPharmaVotery Catherine SchmidtSterlingBioVotery Zack SchmidtSureClinicalVotery Trish Whetzel, PhDSureClinicalNon-Votery Peter JungeBeijing SursenObservern Laura HiltyForte ResearchObservern Tony OHareForte ResearchObservern Eldin RammellRammell ConsultingObservern Robin CoverOASIS staffNon-Votern Chet EnsignOASIS staffNon-Votern Roll Call
  • Slide 4
  • Meeting Etiquette Announce your name prior to making comments or suggestions Keep your phone on mute when not speaking (#6) Do not put your phone on hold Hang up and dial in again when finished with your other call Hold = Elevator Music = very frustrated speakers and participants Meetings will be recorded and posted Another reason to keep your phone on mute when not speaking! Use the join.me Chat feature for questions / comments / Votes We will follow Roberts Rules of Order NOTE: This meeting is being recorded and minutes will be posted on TC page after the meeting From eTMF Std TC to Participants: Hi everyone: remember to keep your phone on mute 4
  • Slide 5
  • Status New Members: Oracle Joined In Progress: EMC, Kaiser Permanente, Shire, Medtronics Activities / Milestones Outreach Subcommittee
  • Slide 6
  • Status Timeline In parallel with other Tech work from charter Tech Discussion
  • Slide 7
  • Classification System Components: Classification Categories Taxonomy, hierarchy Metadata (Tags) Characterizes content Content Model Published set of classifications, metadata for a domain (e.g., eTMF) Content Classification System Discussion
  • Slide 8
  • Classification Categories Component Hierarchy of categories Categories, subcategories, content types Defined relationships with rules: Parent-Child All categories, content types required to have unique names and machine codes Each content type is associated with Metadata Properties (includes core and domain-specific) Content items are linked to content types. Unique classification and term codes based on Universal Decimal Classification System (UDC) numbering, widely used in libraries worldwide. Human and machine readable; infinitely expandable Can be described, edited and validated using OWL editor (like open source editor Protg) Supports any simple text vocabulary, including TMF Ref Model and other vocabularies W3C OWL2 and RDF/XML supported Classification Categories Component Study Digital Content Classification Categories Hierarchy
  • Slide 9
  • Metadata Component Used to tag or index digital content items Metadata Classes: Core - Comprised of four areas: File Properties, Classification, Audit Trail Business Process Domain-specific -- Metadata for a domain in life sciences such as eTMF, finance, legal administration, or others. Uses standards-based terms from groups like NCI Org Specific Metadata that meets organizations needs not standards based General obtained from public standards-based vocabulary terminology resources like dublin core Annotation Properties Metadata about classification categories and metadata: Core, Org-Specific metadata Metadata Component Core Metadata Example File Properties:
  • Slide 10
  • Content Model Component Contains classification hierarchy, metadata in machine readable format: Content Model Component
  • Slide 11
  • Term Sourcing Concepts: Terms adopted by standards bodies should be used first in eTMF model Primary Term Sources for eTMF Classification System: Internet Standards Dev Orgs : W3C, IETF, ISO, etc. Required for interoperability of machine code NIH NCIthesaurus: Term database for FDA, CDISC, HL7, other orgs Required for interoperability of clinical / health sciences data Secondary Term Sources for eTMF Classification System: Industry sources widely used terms in enterprise content mgmt software, TMF RM Classification System Term Sources *Spec, Table 6, p21
  • Slide 12
  • Classification Categories Component Classification hierarchy and numbering is based on UDC library numbering standard and XML naming Digital dot notation Designed for human and machine readability Each number is also a unique code for naming and ordering in the hierarchy Primary Categories (PC): Three digit. eTMF: 100-200 Subcategories (SC): Two digit: 10-99 Content Types (CT): : Two digit: 10-99 Maximum number of Sub- Category divisions is 5, excluding the 3-digits for the Primary Category [1] Per spec section 2.1.1; 6.0 Classification Categories Component Classification Categories Hierarchy and Numbering [1]: Hierarchy Numbering/Naming Considerations: Flexible, standards-based approach (W3C XML compliant naming*) Ability to add multiple hierarchy divisions / levels Proposed: 5 divisions = [100*90 5 ) = 5.9x10 11 Content Types Uniqueness of numbers usable as machine code identifiers Machine readable, human readable No sorting issues, no need for leading zeros*, no special chars *Leading zeros in XML syntax are ignored: http://www.w3.org/TR/REC-xml/
  • Slide 13
  • Numbering and Naming Scheme Numbering Primary Categories and Sub-Categories : Category Code number Content Type: Content Type ID Naming Primary Categories and Sub-Categories Simple text-based names Unique name, 64 char limit Abbreviation 16 char limit suggested Compatible with W3C XML naming standards : No special characters : ( ) ? / % # @ ! Classification Categories Component Example: Classification Categories Hierarchy, Naming, Numbering
  • Slide 14
  • Modifying Classification Category Entities General Editing Rules Domain Specific Classifications cannot be deleted > Reserve/Unreserve Modifications allowed to some annotation properties (see spec) Codes (Category Codes, CT Type ID) cannot be generated Organization Specific Classifications can be deleted Modifications allowed for classification metadata, annotations Codes (Category Codes, CT Type ID) can be generated Classification Categories Component Classification Category, Content Type Editing Rules* TypeImport TermsGenerate Code Add/ModifyDelete/Reserve Domain Specific YesNoNo/Yes**Reserve/Unreserve Organization Specific Yes Yes/YesDelete *Spec, Table 6, p21 **Annotation metadata
  • Slide 15
  • Classification Editing Tool Free, Open Source Protg (From Stanford University: http://protege.stanford.edu/ )http://protege.stanford.edu/ *Spec, Table 6, p21 Protg Editor: -Edit Classification Taxonomy and Metadata Terms -Validate Taxonomy and Term name compliance -Create valid RDF/XML Ontology
  • Slide 16
  • Proposed Classification System has following Properties: Based on Naming and Numbering that is W3C XML compliant No special characters: ( ) & # @ / etc. No leading zeros in classification numbers Based on Universal Decimal Classification (UDC) system for content classification: 100 199 : eTMF Domain UDC system used in 170+ countries worldwide; expandable, human and machine readable, sortable http://en.wikipedia.org/wiki/Universal_Decimal_Classification http://en.wikipedia.org/wiki/Universal_Decimal_Classification Flexible and customizable for organizations, yet interoperable Domain classifications Standardized; Organization-specific classifications Editable Defined set of rules for Editing, modifying Taxonomy Any Organization can Modify/Edit taxonomy using open source editors like Protg Classification Categories - Summary *Spec, Table 6, p21
  • Slide 17
  • Appendix
  • Slide 18
  • Content Classification System Core Terms needed for Architecture Objectives: Classification, Subclassification concept - Supports RDF/XML, OWL languages Non-domain specific, generic terms Easily understandable by anyone - conveys concept Conveys hierarchy No conflicts not a reserved term in RDF/XML, OWL or other compilers/ IDEs First priority Source terms from standards bodies Classification System Core Terms *Spec, Table 6, p21
  • Slide 19
  • Content Classification System Core Terms needed for Architecture Classification, Subclassification term concept: Classification System Core Terms *Spec, Table 6, p21 Term Options:SourceDefinition Category, SubCategoryNIH NCIthesaurusCategory: This term is used informally to mean a class of things (NCI code: C25372); C25372 Subcategory: A subdivision that has common differentiating characteristics within a larger category. (NCI Code C25692) C25692 Class, SubClassW3C OWL Class: Resources may be divided into groups called classes SubClass: Subclasses are classes; If a class C is a subclass of a class C', then all instances of C will also be instances of C'. (W3C RDF Class def)RDF Class TMF Zone, SectionTMF Ref Model TMF Zone = Primary Classification (no published def found online) Section = SubClassification (no published def found online) Proposed Term
  • Slide 20
  • Content Classification System Core Terms needed for Architecture Classification, Subclassification term concept: Classification System Core Terms *Spec, Table 6, p21 Term Options:Source+/- Category, SubCategoryNIH NCIthesaurus+Everyone knows it +Describes hierarchy +In use by standards body (NIH NCI Thesaurus) +Generic Class, SubClassW3C OWL+Describes hierarchy +In use by standards body +Generic - Could be a reserved word for some development tools TMF Zone, SectionTMF Ref Model+In use by TMF RM users -Doesnt convey hierarchy -Not in use by standards body -Not Generic Proposed Term
  • Slide 21
  • Content Classification System Core Terms needed for Architecture Objectives: Content Type concept Supports RDF/XML, OWL languages Non-domain specific, generic terms Easily understandable by anyone conveys concept No conflicts not a reserved term in RDF/XML, OWL or other compilers/ IDEs First priority Source terms from standards bodies Classification System Core Terms *Spec, Table 6, p21
  • Slide 22
  • Content Classification System Core Terms needed for Architecture Content Type term concept: Classification System Core Terms *Spec, Table 6, p21 TermSourceDefinition Content TypeW3C & CareLex Oracle W3C: Specifies the nature of a linked resource W3C and RFC2045] and [RFC2046]W3C RFC2045][RFC2046] CareLex: A content type is a reusable collection of metadata, business processes, behavior, and other settings for a category of items or documents in electronic content material. Oracle: Content types are used to define the metadata that you can associate with content. ArtifactTMF Ref ModelA collection of documents Wikipedia Wikipedia ( Not published) Proposed Term
  • Slide 23
  • Content Classification System Core Terms needed for Architecture Content Type term concept: Classification System Core Terms *Spec, Table 6, p21 TermSource+/- Content TypeW3C+Widely used in internet SW +ECM SW use - Microsoft, Oracle, Alfresco, etc. +In use by standards body (W3C) +Generic ArtifactTMF Ref Model+In use by TMF RM users -Not in use by standards body -Not Generic -Doesnt convey concept of metadata Proposed Term
  • Slide 24
  • Roll call Reports Outreach Tech Discussion: Classification Layer: Core Metadata (Charter item 2, p.2) New business Draft Agenda: Next Meeting