cni fall 2011 meeting presentation margaret hedstrom & robert mcdonald (dec. 2011)
DESCRIPTION
CNI Fall 2011 Meeting Presentation by Margaret Hedstrom & Robert McDonald (Dec. 2011)TRANSCRIPT
SEADSustainable Environment – Actionable Data
Margaret HedstromSEAD PI/Project DirectorProfessor & Associate DeanUM School of Information
Robert H. McDonaldSEAD Sr. PersonnelAssoc. Dean/Associate DirectorIndiana University
CNI Fall Members Meeting Arlington, VA
12/12/2011
NSF DataNet Program• new types of organizations that integrate library & archival
sciences, cyberinfrastructure, computer & information sciences, & domain science expertise
• provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline;
• continuously anticipate and adapt to changes in technologies and in user needs and expectations;
• engage in research to drive the leading edge forward• serve as component elements of an interoperable data
preservation and access network
http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503141
Partners
• SEAD’s Unique Contributions– Address domain-driven
needs & requirements– Serve scientists and
researchers in the “long tail”– Integrate existing
technologies, tools & services (rather than build new from scratch)
Sustainability Science
4
Science
Technology
Economics
Poverty & Justice
Policy
Cooperation
Data challenges• Heterogeneity
of all kinds• Multiple scales• Multidisciplinar
y• Many small
datasets
The long tail of scientific research
• Small and derived data sets• Heterogeneous data• Multiple sources of data• Short-lived data with long-term
value• Value of data grows when
combined & integrated
SEAD’s Goals• Provide data services that address the needs of
researchers working toward sustainability• Integrate these services into an generalizable
“Active and Social Curation” infrastructure suited to the social structure and economics of long-tail research communities
• Develop capabilities to package and migrate the most valuable datasets to a federated repository infrastructure for long-term preservation
• Education, outreach, & training to disseminate SEAD’s contributions to other projects & communities
SEAD’s Strategy
• Leverage social media for discovery of data, interest, and expertise
• Move data curation upstream in the data life cycle
• Involve domain scientists in setting priorities for evolution of data and services
• Take advantage of existing infrastructures (Institutional Repositories, ICPSR) for long-term preservation
Active and Social Curation• Engage researchers during projects,
not at the end• Automatically capture metadata as
defined by the data producers• Provide facilities for commentary,
recommendations, and mark-up of data• Further reduce costs by re-engineering
curation processes to leverage this rich metadata and volunteered effort
Active Curation Model
Active Curation
Social Media
Data
Metadata
Workflows
ReviewRatingCommenting
Phase 2Years 3-5
Grow SEAD users, data,
and functionality
Phase 1Months 1-18
Develop Prototype
SEAD start date: 10/1/2011
In other words, SEAD is not ready to accept your data!
SEAD Status
SEAD Personnel• Margaret Hedstrom, PI (Michigan)• Praveen Kumar, co-PI (Illinois)• Jim Myers, co-PI (RPI)• Beth Plale, co-PI (Indiana)• Ann Zimmerman, co-PI/Project Manager (Michigan)• George Alter (ICPSR)• Bryan Beecher (ICPSR)• Katy Börner (Indiana)• Robert McDonald (Indiana)• Jude Yew, Post-doc (Michigan)• + many more to come
http://sead-data.net
SEAD TEAMUniversity of Michigan: Margaret Hedstrom (UM PI), Ann Zimmerman (Co-PI and Project Manager), George Alter, Bryan Beecher, Charles Severance, Karen Woollams, Jude Yew. Indiana University: Beth Plale (IU PI), Katy Borner, Robert H. McDonald, Kavitha Chandrasekar, Robert Ping, Stacy Kowalczyk, Robert Light. University of Illinois: Praveen Kumar (UIUC PI), Rob Kooper, Luigi Marini, Terry McLaren. Rensselaer Polytechnic Institute: Jim Myers (RPI PI), Ram Prasanna Govind Krishnan, Lindsay Todd, Adam Wilson.
SEAD Cyberinfrastructure• An international
resource for sustainability science
• Novel technical and business approaches to supporting the long-tail of research data
• Lifecycle support: actionable data services integrated with curation and preservation infrastructure
Key Challenges for SEAD Cyberinfrastructure
• Managed Data storage and services are expensive!• Begging for metadata doesn’t work!• Curation and preservation are time consuming!• The long-tail is not standardized!• Data collections are always missing something
valuable!• Data models evolve!• Cyberinfrastructure is obsolete by the time you
build it!• Building Community as you leverge
cyberinfrastructure
SEAD: Social Networking
•Co-authorship•Co-funding•Micro-citation•Shared project repositories•Shared tags•Threaded discussions•Quoting, forwarding, …
Linked Data and Repositories•Tag and annotate data•Overlay it with reference data•Organize it in domain terminology•Link it to people, papers, projects,
conversations…
Using Science of Science to Link Repositories
KEY SEAD Questions
•What could SEAD capture when?
•How can SEAD provide direct value to data producers, users, and curators?
•How can robust web-services and social computing lower barriers and reduce/realign costs?
SEAD: Active Content Repository•With the ‘Big Picture’ graph in-hand,
curators can:▫Focus on what to curate and when,▫Automate parts of the process▫Use existing/emerging technologies for
packaging and preserving datasets▫Better manage federated repositories
SEAD: Leveraging Existing Resources
•Cyberinfrastructure▫IU Data Capacitor/HPC Capabilities▫UIUC/NCSA HPC Capabilities▫Rensselaer CCNI Capabilities
•Repositories▫UM Deep Blue▫IU ScholarWorks▫ICPSR Repository▫UIUC IDEALS
SEAD LayerCake View
•Services over an active content layer that is backed by/harvested into a federated archive infrastructure based on institutional resources
Institutional Repositories
Network of Data Producers
Web User Interface
Active Content Repository
Services Provided
Virtual Archives
User Network
Data Conservancy
IU ICPSR
Content Mining
Curation Decisions
Archival data
generation
Other services
RPI UIUC UM
CI Technical Approach
CI Technical Approach
Appraisal and
Selection Digital Repository Federation (OAIS compliant)
Scholarly Communicati
on
Preservation Actions
Compound Objects - OAI-ORE
Dissemination Packages
Ingest, AIPs
Data Acquisition,
Analysis and
Simulation
Search, Browse,
Annotation, Visualizatio
n Tools
Metadata Managemen
tDDI3. METS,
PREMIS, MODS, DC, SensorML,
OGC, …
Automated Curation
Workflow/Rule Engine
Operates on Metadata, Content
Objects and Trigger Events
Access Mechanisms and E-
Scholarship Services
Migration and
Emulation Tools
Use, Reuse, Repurposing
Tools
Wide-Area File System
Ingest scripts: fixity,
integrity, authenticatio
n, transformatio
n
Active and Social Curation OAIS Repository FederationCuration Boundary
UserContributor
Active Content Reposito
ry
VIVO/Linked Data
Toward PetaScale Data
• Internet2 upgrade:▫ Total bandwidth from 100 Gbps to 8.8 Tbps▫ Moving a petabyte of data will go from from 10 days to 25 hrs
SEAD 18 Month Prototype Targets for Cyberinfrastructure
•Active and Social Content Curation▫Pilot Active Content Repository, VIVO
deployments▫Exemplar services for Data Ingest,
Discovery, Re-use, Curation•CI for Long-term Access
▫Data model, protocol design/development▫Pilot Federated Repository infrastructure
SEAD CI QuickView• SEAD will quickly build a repository and data services
infrastructure for sustainability research that can be responsively adapted based on community feedback – Community Agile Development
• SEAD will leverage existing tools and emerging practices to dramatically enhance the interactions of researchers and data librarians – Active Curation
• SEAD’s focus on the long-tail will force an emphasis on ease-of-use and low costs that is critical for long-term sustainability – Leverage Existing Institution Resources for Long-term Access
• SEAD will leverage experiences in the sustainability research community to provide guidance for other long-tail communities making the transition to an interdisciplinary, systems-oriented approach to research – Sustainability and Resource Growth Partnership and Collaboration
AcknowledgmentsSEAD is funded by the National Science Foundation under cooperative agreement #OCI0940824
http://sead-data.net
• For more on SEAD go to:• http://sead-data.net
• Follow us on Twitter @SEADdatanet