![Page 1: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace](https://reader035.vdocuments.us/reader035/viewer/2022070605/5a4d1af67f8b9ab059981a38/html5/thumbnails/1.jpg)
System Development & Operations
NSF DataNet site visit to MITFebruary 8, 2010
2/8/2010 1NSF Site Visit to MIT DataSpace
DataSpace
![Page 2: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace](https://reader035.vdocuments.us/reader035/viewer/2022070605/5a4d1af67f8b9ab059981a38/html5/thumbnails/2.jpg)
Other USA Nodes
International Nodes
DataSpaceHigh-Level
Architecture
Global Network (Web)
Local Network
Metadata Repository
for Scientific Data
Multiple Scientific Data Repositories (DataSpace Native Architecture)
Interface to Legacy Scientific
Data Repositories
. . .
Distributed Data Management Services: Security, Replication, Administration
Policy Management, Workflow Services
Additional Data User Services : • Data Analytics • Data Visualization
Basic Data User Services:Discovery, Quality, Conversion, IntegrationData Curation Services:Process, Catalog, Annotate, Preserve
DataSpace Services
MIT Node
. . .
Scientist Curator UserProvides
data,preliminary metadata
Process and ingests data,
complete metadata, and policies (e.g.
retention)
Searches (meta)data, accesses/integrates data, analyzes/visualizes data (via DataSpace data services or 3rd party data services)
Basic Workflow
DataSpace
3rd par
3rd Party Specialized Data Services
2
![Page 3: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace](https://reader035.vdocuments.us/reader035/viewer/2022070605/5a4d1af67f8b9ab059981a38/html5/thumbnails/3.jpg)
PLATFORM ARCHITECTURE
2/8/2010 NSF Site Visit to MIT DataSpace 3
DataSpace
![Page 4: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace](https://reader035.vdocuments.us/reader035/viewer/2022070605/5a4d1af67f8b9ab059981a38/html5/thumbnails/4.jpg)
Platform Architecture
Version 0.1 Version 1.0
2/8/2010 4NSF Site Visit to MIT DataSpace
![Page 5: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace](https://reader035.vdocuments.us/reader035/viewer/2022070605/5a4d1af67f8b9ab059981a38/html5/thumbnails/5.jpg)
2/8/2010 5NSF Site Visit to MIT DataSpace
![Page 6: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace](https://reader035.vdocuments.us/reader035/viewer/2022070605/5a4d1af67f8b9ab059981a38/html5/thumbnails/6.jpg)
Federated Architecture
2/8/2010 6NSF Site Visit to MIT DataSpace
![Page 7: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace](https://reader035.vdocuments.us/reader035/viewer/2022070605/5a4d1af67f8b9ab059981a38/html5/thumbnails/7.jpg)
Multiple Implementations
2/8/2010 7NSF Site Visit to MIT DataSpace
![Page 8: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace](https://reader035.vdocuments.us/reader035/viewer/2022070605/5a4d1af67f8b9ab059981a38/html5/thumbnails/8.jpg)
Federated Model• Data can be widely distributed; Web-based Services
can be centralized or federated– e.g. centralized, domain-specific search service that
harvests metadata from relevant archives (“google for biological oceanography”)
– e.g. real-time data integration across small sets of archives identified via subject search
• DataSpace will develop some , but more importantly create an ecosystem that others can contribute to (e.g. technology & scientific companies, universities, researchers, labs)
February 8, 2010 NSF Site Visit to MIT DataSpace 8
![Page 9: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace](https://reader035.vdocuments.us/reader035/viewer/2022070605/5a4d1af67f8b9ab059981a38/html5/thumbnails/9.jpg)
Development Methodology
• Behavior-Driven Development model• Continuous Integration Process– iteratative research prototyping and production
implementation phases• Small centralized development team to start • Institutional partners add developers in years 1-2• Transparent, open source process• Close collaboration with Data Conservancy
2/8/2010 9NSF Site Visit to MIT DataSpace
![Page 10: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace](https://reader035.vdocuments.us/reader035/viewer/2022070605/5a4d1af67f8b9ab059981a38/html5/thumbnails/10.jpg)
OPERATIONS
2/8/2010 NSF Site Visit to MIT DataSpace 10
DataSpace
![Page 11: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace](https://reader035.vdocuments.us/reader035/viewer/2022070605/5a4d1af67f8b9ab059981a38/html5/thumbnails/11.jpg)
Local Operations – MIT Example
• Scientists– data production, early-stage curation– lots of domain expertise, little or no curation expertise
• Libraries– outreach and recruitment (e.g. HMI study)– later-stage data curation, ingest– some domain expertise, lots of curation expertise
• IS&T – identifying, operating hardware & system– Enterprise systems management expertise– lots of IT expertise, some curation expertise
2/8/2010 11NSF Site Visit to MIT DataSpace
![Page 12: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace](https://reader035.vdocuments.us/reader035/viewer/2022070605/5a4d1af67f8b9ab059981a38/html5/thumbnails/12.jpg)
Project-Wide Operations
• Platform governance– distributed open source software model– transparent decision-making process
• Service model(s) for each institutional partner– including all data curation activities– including CI templates (e.g. hardware, cloud)– associated cost model for each service model
2/8/2010 12NSF Site Visit to MIT DataSpace
![Page 13: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace](https://reader035.vdocuments.us/reader035/viewer/2022070605/5a4d1af67f8b9ab059981a38/html5/thumbnails/13.jpg)
Project-Wide Operations
• Ongoing usability studies with researchers, students, public audiences
• Develop certification strategy for TDRs using DataSpace (.arc domain)
2/8/2010 13NSF Site Visit to MIT DataSpace
![Page 14: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace](https://reader035.vdocuments.us/reader035/viewer/2022070605/5a4d1af67f8b9ab059981a38/html5/thumbnails/14.jpg)
Data Curation Lifecycle Highlights
• Deposit workflows for researchers based on locally-produced data (interactive and batch)
• Data Curators– outreach, marketing, data recruitment– metadata creation and data ontology application– curatorial policies developed, applied– tailored preservation strategies (local, consortial, outsourced)
Direct access to data creators and boots on the ground support services
2/8/2010 NSF Site Visit to MIT DataSpace 14
![Page 15: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace](https://reader035.vdocuments.us/reader035/viewer/2022070605/5a4d1af67f8b9ab059981a38/html5/thumbnails/15.jpg)
Data Curation Lifecycle Highlights
• Novel distributed, standards-based policy management strategy based on emerging Semantic Web standards and TRAC
• Semantic Web standards (e.g. RDF) to support improved data integration and interoperability
• Separation of access layer (discovery, use) from curation layer, in support of broad federation, distributed tool development
2/8/2010 NSF Site Visit to MIT DataSpace 15