cni fall 2011 meeting presentation margaret hedstrom & robert mcdonald (dec. 2011)

28
SEAD Sustainable Environment – Actionable Data Margaret Hedstrom SEAD PI/Project Director Professor & Associate Dean UM School of Robert H. McDonald SEAD Sr. Personnel Assoc. Dean/Associate Director Indiana University CNI Fall Members Meeting Arlington, VA 12/12/2011

Upload: sead

Post on 15-Jan-2015

390 views

Category:

Technology


2 download

DESCRIPTION

CNI Fall 2011 Meeting Presentation by Margaret Hedstrom & Robert McDonald (Dec. 2011)

TRANSCRIPT

Page 1: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

SEADSustainable Environment – Actionable Data

Margaret HedstromSEAD PI/Project DirectorProfessor & Associate DeanUM School of Information

Robert H. McDonaldSEAD Sr. PersonnelAssoc. Dean/Associate DirectorIndiana University

CNI Fall Members Meeting Arlington, VA

12/12/2011

Page 2: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

NSF DataNet Program• new types of organizations that integrate library & archival

sciences, cyberinfrastructure, computer & information sciences, & domain science expertise

• provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline;

• continuously anticipate and adapt to changes in technologies and in user needs and expectations;

• engage in research to drive the leading edge forward• serve as component elements of an interoperable data

preservation and access network

http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503141

Page 3: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

Partners

• SEAD’s Unique Contributions– Address domain-driven

needs & requirements– Serve scientists and

researchers in the “long tail”– Integrate existing

technologies, tools & services (rather than build new from scratch)

Page 4: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

Sustainability Science

4

Science

Technology

Economics

Poverty & Justice

Policy

Cooperation

Page 5: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

Data challenges• Heterogeneity

of all kinds• Multiple scales• Multidisciplinar

y• Many small

datasets

Page 6: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

The long tail of scientific research

• Small and derived data sets• Heterogeneous data• Multiple sources of data• Short-lived data with long-term

value• Value of data grows when

combined & integrated

Page 7: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

SEAD’s Goals• Provide data services that address the needs of

researchers working toward sustainability• Integrate these services into an generalizable

“Active and Social Curation” infrastructure suited to the social structure and economics of long-tail research communities

• Develop capabilities to package and migrate the most valuable datasets to a federated repository infrastructure for long-term preservation

• Education, outreach, & training to disseminate SEAD’s contributions to other projects & communities

Page 8: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

SEAD’s Strategy

• Leverage social media for discovery of data, interest, and expertise

• Move data curation upstream in the data life cycle

• Involve domain scientists in setting priorities for evolution of data and services

• Take advantage of existing infrastructures (Institutional Repositories, ICPSR) for long-term preservation

Page 9: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

Active and Social Curation• Engage researchers during projects,

not at the end• Automatically capture metadata as

defined by the data producers• Provide facilities for commentary,

recommendations, and mark-up of data• Further reduce costs by re-engineering

curation processes to leverage this rich metadata and volunteered effort

Page 10: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

Active Curation Model

Active Curation

Social Media

Data

Metadata

Workflows

ReviewRatingCommenting

Page 11: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

Phase 2Years 3-5

Grow SEAD users, data,

and functionality

Phase 1Months 1-18

Develop Prototype

SEAD start date: 10/1/2011

In other words, SEAD is not ready to accept your data!

SEAD Status

Page 12: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

SEAD Personnel• Margaret Hedstrom, PI (Michigan)• Praveen Kumar, co-PI (Illinois)• Jim Myers, co-PI (RPI)• Beth Plale, co-PI (Indiana)• Ann Zimmerman, co-PI/Project Manager (Michigan)• George Alter (ICPSR)• Bryan Beecher (ICPSR)• Katy Börner (Indiana)• Robert McDonald (Indiana)• Jude Yew, Post-doc (Michigan)• + many more to come

Page 13: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

http://sead-data.net

Page 14: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

SEAD TEAMUniversity of Michigan: Margaret Hedstrom (UM PI), Ann Zimmerman (Co-PI and Project Manager), George Alter, Bryan Beecher, Charles Severance, Karen Woollams, Jude Yew. Indiana University: Beth Plale (IU PI), Katy Borner, Robert H. McDonald, Kavitha Chandrasekar, Robert Ping, Stacy Kowalczyk, Robert Light. University of Illinois: Praveen Kumar (UIUC PI), Rob Kooper, Luigi Marini, Terry McLaren. Rensselaer Polytechnic Institute: Jim Myers (RPI PI), Ram Prasanna Govind Krishnan, Lindsay Todd, Adam Wilson.

Page 15: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

SEAD Cyberinfrastructure• An international

resource for sustainability science

• Novel technical and business approaches to supporting the long-tail of research data

• Lifecycle support: actionable data services integrated with curation and preservation infrastructure

Page 16: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

Key Challenges for SEAD Cyberinfrastructure

• Managed Data storage and services are expensive!• Begging for metadata doesn’t work!• Curation and preservation are time consuming!• The long-tail is not standardized!• Data collections are always missing something

valuable!• Data models evolve!• Cyberinfrastructure is obsolete by the time you

build it!• Building Community as you leverge

cyberinfrastructure

Page 17: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

SEAD: Social Networking

•Co-authorship•Co-funding•Micro-citation•Shared project repositories•Shared tags•Threaded discussions•Quoting, forwarding, …

Page 18: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

Linked Data and Repositories•Tag and annotate data•Overlay it with reference data•Organize it in domain terminology•Link it to people, papers, projects,

conversations…

Page 19: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

Using Science of Science to Link Repositories

Page 20: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

KEY SEAD Questions

•What could SEAD capture when?

•How can SEAD provide direct value to data producers, users, and curators?

•How can robust web-services and social computing lower barriers and reduce/realign costs?

Page 21: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

SEAD: Active Content Repository•With the ‘Big Picture’ graph in-hand,

curators can:▫Focus on what to curate and when,▫Automate parts of the process▫Use existing/emerging technologies for

packaging and preserving datasets▫Better manage federated repositories

Page 22: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

SEAD: Leveraging Existing Resources

•Cyberinfrastructure▫IU Data Capacitor/HPC Capabilities▫UIUC/NCSA HPC Capabilities▫Rensselaer CCNI Capabilities

•Repositories▫UM Deep Blue▫IU ScholarWorks▫ICPSR Repository▫UIUC IDEALS

Page 23: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

SEAD LayerCake View

•Services over an active content layer that is backed by/harvested into a federated archive infrastructure based on institutional resources

Institutional Repositories

Network of Data Producers

Web User Interface

Active Content Repository

Services Provided

Virtual Archives

User Network

Data Conservancy

IU ICPSR

Content Mining

Curation Decisions

Archival data

generation

Other services

RPI UIUC UM

Page 24: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

CI Technical Approach

CI Technical Approach

Appraisal and

Selection Digital Repository Federation (OAIS compliant)

Scholarly Communicati

on

Preservation Actions

Compound Objects - OAI-ORE

Dissemination Packages

Ingest, AIPs

Data Acquisition,

Analysis and

Simulation

Search, Browse,

Annotation, Visualizatio

n Tools

Metadata Managemen

tDDI3. METS,

PREMIS, MODS, DC, SensorML,

OGC, …

Automated Curation

Workflow/Rule Engine

Operates on Metadata, Content

Objects and Trigger Events

Access Mechanisms and E-

Scholarship Services

Migration and

Emulation Tools

Use, Reuse, Repurposing

Tools

Wide-Area File System

Ingest scripts: fixity,

integrity, authenticatio

n, transformatio

n

Active and Social Curation OAIS Repository FederationCuration Boundary

UserContributor

Active Content Reposito

ry

VIVO/Linked Data

Page 25: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

Toward PetaScale Data

• Internet2 upgrade:▫ Total bandwidth from 100 Gbps to 8.8 Tbps▫ Moving a petabyte of data will go from from 10 days to 25 hrs

Page 26: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

SEAD 18 Month Prototype Targets for Cyberinfrastructure

•Active and Social Content Curation▫Pilot Active Content Repository, VIVO

deployments▫Exemplar services for Data Ingest,

Discovery, Re-use, Curation•CI for Long-term Access

▫Data model, protocol design/development▫Pilot Federated Repository infrastructure

Page 27: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

SEAD CI QuickView• SEAD will quickly build a repository and data services

infrastructure for sustainability research that can be responsively adapted based on community feedback – Community Agile Development

• SEAD will leverage existing tools and emerging practices to dramatically enhance the interactions of researchers and data librarians – Active Curation

• SEAD’s focus on the long-tail will force an emphasis on ease-of-use and low costs that is critical for long-term sustainability – Leverage Existing Institution Resources for Long-term Access

• SEAD will leverage experiences in the sustainability research community to provide guidance for other long-tail communities making the transition to an interdisciplinary, systems-oriented approach to research – Sustainability and Resource Growth Partnership and Collaboration

Page 28: CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. 2011)

AcknowledgmentsSEAD is funded by the National Science Foundation under cooperative agreement #OCI0940824

http://sead-data.net

• For more on SEAD go to:• http://sead-data.net

• Follow us on Twitter @SEADdatanet