electronic notebooks: an interface component for semantic records systems james d. myers, michael...
TRANSCRIPT
Electronic Notebooks: An Interface Electronic Notebooks: An Interface Component for Semantic Records Component for Semantic Records
SystemsSystems
Electronic Notebooks: An Interface Electronic Notebooks: An Interface Component for Semantic Records Component for Semantic Records
SystemsSystems
James D. Myers, Michael Peterson, K Prasad Saripalli, Tara Talbott
Mathematics and Computational Science DirectoratePacific Northwest National Laboratory
2
OutlineOutlineOutlineOutline
Why have an electronic notebook?The changing science/IT landscapeSemantic repositories Scientific Annotation Middleware
ENs on semantic repositories The ELN on SAM
3
Secure shared WWW based space
Hierarchical Chapters/Pages/Notes
Add/View/Search Notes
File upload, sketch, text, equations, forms, image capture, …
Interactive views of data
Editor/Viewer APIs
Cross-out capability
Digital Signatures/Timestamps
Java Client, Perl and Java (2001+) servers
…
PNNL Electronic Laboratory Notebook (ELN)PNNL Electronic Laboratory Notebook (ELN)~1995+~1995+
PNNL Electronic Laboratory Notebook (ELN)PNNL Electronic Laboratory Notebook (ELN)~1995+~1995+
4
What distinguishes ENs from other tools?What distinguishes ENs from other tools?What distinguishes ENs from other tools?What distinguishes ENs from other tools?
Emphasis on multimedia human-entered informationChronological, page-oriented displayMaster/personal project recordRecords functionality: Non-repudiation - digital signatures and timestamps Persistence/completeness - write-once/no deletions/audit trail Standardized lifecycle – signing/witnessing policies, archiving,
retention schedules, …
5
The Systems Science RevolutionThe Systems Science RevolutionThe Systems Science RevolutionThe Systems Science Revolution
Community ResourcesBi-directional flow/feedback of information
Partial results being combined to produce new knowledge
Experiment/Theory/Model comparisons Multiscale optimizations
Rapid EvolutionHigh ComplexityShifting/Emerging disciplinary boundaries
Resources will be distributedWith multiple curators
Supernova Cosmology Requires Complex,Widely Distributed Workflow ManagementSupernova Cosmology Requires Complex,Supernova Cosmology Requires Complex,Widely Distributed Workflow ManagementWidely Distributed Workflow Management
Slide from Bill Johnston, LBNL
6
Advances in Problem Solving Advances in Problem Solving Environments/Grids/Semantic TechnologiesEnvironments/Grids/Semantic Technologies
Advances in Problem Solving Advances in Problem Solving Environments/Grids/Semantic TechnologiesEnvironments/Grids/Semantic Technologies
Multiple Applications recording data Pedigree/Provenance
Experiment Metadata Project Organization Workflow Categorization Detected Features Instrument logs … Replica Locations Endorsements Community Annotations …
How do we provide EN capabilities in this larger context?
7
Semantic RepositoriesSemantic RepositoriesSemantic RepositoriesSemantic Repositories
Use self-describing metadata/relationships Triple-stores RDF OWL
Aggregate information generated by multiple applicationsAllows browsing, searching, reasoning across integrated information
8
Scientific Annotation Middleware (SAM) Scientific Annotation Middleware (SAM) - 5 yr DOE funded research project- 5 yr DOE funded research project
Scientific Annotation Middleware (SAM) Scientific Annotation Middleware (SAM) - 5 yr DOE funded research project- 5 yr DOE funded research project
Develop middleware to create semantic repositoriesEnable the sharing of this information among portals and problem solving environments, software agents, scientific applications, and electronic notebooks With different levels of sophistication Without global schema
Improve the completeness, accuracy, and availability of the scientific record.
http://www.scidac.org/SAM/
9
SAM ArchitectureSAM ArchitectureSAM ArchitectureSAM Architecture
Notebook Services
Semantic Services
Metadata Services
DataGrid
Database
Web
DA
V, D
AS
L, J
MS
, SA
M E
xten
sio
ns
DA
V, J
DB
C, G
rid
FT
P
10
Web Distributed Authoring and Versioning Web Distributed Authoring and Versioning (WebDAV)(WebDAV)
Web Distributed Authoring and Versioning Web Distributed Authoring and Versioning (WebDAV)(WebDAV)
An early web servicePut/Get data with arbitrary properties (dynamic)Properties can be discovered and accessed independentlyDASL, Versioning, Transactions, …Widely supported (MS Office, databases, file system drivers,…)
11
Binary Format Description (BFD) LanguageBinary Format Description (BFD) LanguageBinary Format Description (BFD) LanguageBinary Format Description (BFD) Language
XML Language to describe ASCII, Binary, and XML data formatsGeneric Parser to extract and semantically tag data in files/streamsThe meaning of data can be
captured, regardless of format, for future use
Data Format Description Language Standard
XSLStylesheet(reformat)
XSLTProcessor
XML Format2
BFD Parser
BFDDescription
1
XML Format1
File Format1
<XSIL>
<Param Name=“units”>meters</Param>
<Param Name=“numColumns” Type="int“/>
<Vector Name=“orbitData”>
<Dim><XBFDvalue-of
select="/XSIL/Param
[@Name='numColumns']"/>
</Dim>
<Dim>4</Dim>
</Vector>
<Stream Type=“remote”
XBFDStreamnumber=“0”
Encoding = “biinary”/>
</XSIL>
12
SAM Metadata Services LayerSAM Metadata Services LayerSAM Metadata Services LayerSAM Metadata Services Layer
Jakarta Slide DAV server plus configurable:Mapping to Data Store(s)Property Generation from binary/ASCII/XML filesDynamic Virtual TranslationsServer generated Properties and Relationships
Timestamp, size, CopyOf
FortranApplication
‘LocalDisk’ DAV
DAV+
Content
ELNProp1Prop2Prop1
hastranslationBFDWebServiceXSLT
…
TranslatedContent
RDF Export
13
SAM Semantic Services LayerSAM Semantic Services LayerSAM Semantic Services LayerSAM Semantic Services Layer
SAM Metadata Layer plus configurable: Relation-scoped Queries Translation of DAV Properties to RDF Triples RDF/GXL Pedigree Generation …
14
Back to ENs…Back to ENs…Back to ENs…Back to ENs…
What is needed to be able to provideUnstructured human entry of information?Chronological, page-oriented display?A master/personal project record?Records functionality?
15
Creating NotesCreating NotesCreating NotesCreating Notes
A ‘standard’ ELN client can create notesStored as content with a hasNote
relationship with pages, notes
Plus…any app can store notes the same way
Page generation – works as before
16
ENs as a Primary View?ENs as a Primary View?ENs as a Primary View?ENs as a Primary View?
Instruments, PSEs, etc. may organize parts of the experiment that an EN should not duplicate
define other relationships as part of the EN chapter/page/note hierarchy:
Project Experiment1 Experiment2 Data1 Data2
Notebook1 Chapter1 Chapter2 Page1 Page2
Defined by PSE Interpreted by EN
17
Records?Records?Records?Records?
Digital Signatures, Timestamps, etc. are services that can be exposed as repository services and associated metadataBut What do we sign (content/metadata)? Where is the edge of the record?
How deep do we travel through the web of relations? How do we stop other applications from changing/deleting
signed content?
18
Multiple OptionsMultiple OptionsMultiple OptionsMultiple Options
Simple: Sign content plus defined subset of metadata Stop at edge of server Treat relationship cycles as links Lock content and metadata subset when signed
Advanced: Multiple self-describing signatures (e.g. XMLSignature) Allow records across servers via trust, cached metadata/data Define fine-grained retention schedules
19
SAM Notebook Services LayerSAM Notebook Services LayerSAM Notebook Services LayerSAM Notebook Services Layer
SAM Metadata and Semantic Layers plus:Notebook Management, Page Display, …Digital SignaturesCanonicalizationNotarized TimestampsData/Signature Migration Capabilities
Notebook API, Notebook ComponentsSupports ELN 5.1, Annotation Applet, new portal-based EN
client
EN Portlets
20
Collaboratory for Multiscale Chemical Science (CMCS) SAM as primary data system, pedigree, notebook
NEESgrid/CHEF Portal/NMI Grid User Computing Environments ELN, SAM as a metadata/pedigree store?
Genomes-To-Life SAM as annotation/metadata repository, notebook
Internal PNNL Projects Concept Map Repository, Interface to Lustre, Biological Data Annotation
DOE2000 Notebook Community (1500+ email addresses) Upgrades to DOE2K Notebooks E.g. Columbia University Environmental Science Lab Notebooks
CollaborationsCollaborationsCollaborationsCollaborations
21
A Scientific Content Repository VisionA Scientific Content Repository VisionA Scientific Content Repository VisionA Scientific Content Repository Vision
Notebooks become just one view of the scientific informationApplications contribute data, metadata, and relationships directlyRecords functionality provided by middleware, available to multiple applicationsContent is stored in multiple repositories managed independentlyThe scientific record becomes richer and re-integrated
22
AcknowledgmentsAcknowledgmentsAcknowledgmentsAcknowledgments
Carina Lansing, PNNLAl Geist, Jens Schwidder, David Jung, ORNLU.S. Department of Energy
Pacific Northwest National Laboratory Pacific Northwest National Laboratory is a multiprogram national laboratory operated by
Battelle Memorial Institute for the U.S. Department of Energy under Contract DE-AC06-76RL0 1830
Oak Ridge National Laboratory Oak Ridge National Laboratory is a multiprogram national laboratory operated by UT-
Battelle, LLC for the U.S. Department of Energy under Contract DE-AC05-00OR22725
Mathematical, Information and Computational Sciences Division of the Office of Science