datanet: infrastructure to connect data, people, and · pdf filedatanet: infrastructure to...

Post on 13-Mar-2018

220 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DataNet: Infrastructure to Connect Data, People, and Science

Mission:  Lower  barriers  to  conducting  interdisciplinary  human-­‐environment  interactions  research  by  making  data  with  different  formats  from  different  scienti:ic  domains  easily  interoperable.  

Data,  Tools  &  Services  Source  Data  

  Census  microdata  and  aggregate  data    Land  use/land  cover  and  climate  data    Other  population  and  environmental  data  

Data  Integra.on  Methods  to  integrate  diverse  data  using  spatial  location  and  geographic  boundaries  to  link  data  contents      

Web-­‐based  Data  Access  System    Explore  available  data  and  metadata    Select  variables  of  interest    Merge  data  from  different  source  datasets  and  formats  

Human  Networks  Development  system  tes.ng      Opportunities  to  explore  pre-­‐release  versions  and  provide  feedback  at  conferences,  including  AGU  (contact  takugler@umn.edu  to  participate)  

Development  Community    Feedback  through  surveys  and  beta  testing.    Sign  up  at  www.terrapop.org  

Mission:  Support  the  “long  tail”  of  research  through  an  environment  with  low  barriers  to  deposit,  active  and  social  curation,  and  links  to  existing  preservation  infrastructure  for  long-­‐term  access.  

Data,  Tools  &  Services  Social  Networking  Environment  VIVO  instance  with  researcher  pro:iles,  publications  and  data  citations  for  discovery  of  expertise,  publications,  and  data  with  network  visualizations  

Ac.ve  Content  Repository  Storage  for  data  and  metadata  undergoing  active  use  with  capabilities  for  deposit,  metadata  extraction,  previewing,    tagging  and  social  curation      

Virtual  Archive  Distributed  storage  for      long-­‐term  archiving  and    dissemination  of    ‘:inished’  data    products  in    institutional    repositories    and  topical    archives  

Human  Networks  Ac.ve  and  Social  Data  Cura.on      Tools  for  incorporating  community-­‐generated  tags,  annotations,  assessments,  and  repurposing  notes  in  metadata  and  for  identi:ication  and  generation  of  archival  data  packages  

Science  Community  Networking    Compiling  connections  among  individual  scientists,  research  teams,  publications,  source  datasets  and  derived  datasets  and  tools  for  traversing  the  network  to  discover  related  people  and  work  

Mission:  Enable  collaborative  research  through  policy-­‐  and  standards-­‐based  federation  of  existing  data  management  infrastructure  

Data,  Tools  &  Services  iRODS  Data  Grids  Sharable  collections  of  remotely-­‐located  datasets  managed  by  policies  that  automate  administrative  tasks,  validation,  and  federation  

Workflow  Integra.on  Capture  processes  applied  to  data  to  support  documentation,  repeatability,  sharing,  and  re-­‐execution      

Interoperability  Mechanisms  Enable  access  to  community  resources  using  their  protocols  and  register  remote  data  into  collaboration  environments  

Human  Networks  Collabora.on  Environments      Enable  groups  of  researchers  to  access  common  datasets,  work:lows,  and  relationships  between  data  and  work:lows  

Educa.onal  Access  to  Live  Data    Support  controlled  access  to  collections  of  data  allowing  students  to  build  personal  reference  collections  and  perform  de:ined  data  management  and  analysis  tasks  

Mission:  Develop  an  institutional  solution  for  the  collection,  preservation  and  re-­‐use  of  data;  encourage  collaboration  by  enabling  researchers  to  :ind  someone  else’s  data  products  and  assess  their  potential  for  re-­‐use  and  re-­‐combination.  

Data,  Tools  &  Services  Data  Conservancy  Service  and  Reference  UI  •  Robust  ingest  framework  •  Query  interface  •  Archival  store  abstraction  over  the  Fedora  Repository  •  HTTP  APIs  supporting  ingest,  query,  and  retrieval  of  

data  •  Browser-­‐based  user  interface  Integra.ons  with  External  Systems  •  Antarctica  Dry  Valley  Glacier  Photograph  Collection  at  

National  Snow  and  Ice  Data  Center  (NSIDC)  –  Uses  search  and  access  APIs.  

•  ArXiv.org  Pre-­‐Print  Repository  –  Uses  search,  access  and  ingest  APIs  

Human  Networks  DC  Instances  at  JHU  and  NSIDC      •  Technical  tools  and  organizational  services  for  data  

collection,  curation,  management,  storage,  preservation,  and  sharing.      

•  JHU  Data  Management  Services  –  Helps  researchers  develop  data  management  plans  and  both  preserve  and  share  research  data.  

•  NSIDC  –  Facilitates  curation  of  results  from  knowledge  documentation  projects  in  Arctic  communities  by  the  Exchange  for  Local  Observations  and  Knowledge  of  the  Arctic  project  

Educa.on    Graduate  programs,  training  courses,  webinars,  and  other  resources  on  data  curation  and  management.  

Mission:  Enable  new  science  and  knowledge  creation  through  universal  access  to  data  about  life  on  earth  and  the  environment  that  sustains  it.  

Data,  Tools  &  Services  Distributed  Data  Network  •  Member  Nodes  –  Existing  data  collections  exposed  

through  DataONE  •  Coordinating  Nodes  –  Support  indexing  and  

replication  services  across  member  nodes  •  Common  Search  and  Discovery  –  ONEMercury  :inds  

data  in  in  all  member  nodes  from  a  single  entry  point  Inves.gator  toolkit  •  Data  Management  Planning  Tool  –  Guides  

development  of  DMPs  for  grant  proposals  •  Data  Citations  –ONEMercury  search  results  are  tagged  

for  import  into  common  bibliography  management  tools  

•  DataUp  –  Best  practice  checks  and  metadata  creation  to  prepare  data  in  Excel  for  archives  

Human  Networks  DataONE  Users’  Group      Annual  meetings  and  other  opportunities  for  stakeholders  to  learn  about  and  guide  DataONE’s  development  

Working  Groups  Identify,  describe,  and  implement  DataONE  cyber-­‐infrastructure,  governance,  and  other  projects  

Educa.on  Training  sessions,  education  models,  and  graduate  courses  relating  to  various  aspects  of  data  management  for  students  and  citizen  scientists  

Institutional  Repositories

Network  of  Data  Producers

Web  User  Interface

Active  Content  Repository

Services  Provided

Virtual  Archives

User  Network

Data  Conservancy

IU ICPSR

Content  Mining

Curation  Decisions

Archival  data  

generation

Other  services

RPI UIUC UM

For  more  informa<on:  www.dataone.org  Amber  Budden,  Director  for  Community  Engagement  and  Outreach  aebudden@dataone.unm.edu    

For  more  informa<on:  www.dataconservancy.org  Shonna  Clark,  Project  Coordinator  shonna.clark@jhu.edu      

For  more  informa<on:  hOp://datafed.org  Mary  Whitton,  Project  Manager  whiOon@renci.org    

For  more  informa<on:  hOp://sead-­‐data.net  Marietta  Van  Buhler,  Project  Manager  mznblue@umich.edu    

For  more  informa<on:  www.terrapop.org  Tracy  Kugler,  Project  Manager  takugler@umn.edu    

get  

create  

replicate  

synchronize  

search  

Cross-DataNet Collaboration

The  :ive  DataNet  projects  collaborate  through  monthly  conference  calls,  in-­‐person  PI  meetings,  and  

joint  projects  to  build  interoperable  cyber-­‐infrastructure  and  to  engage  with  a  broad  network  of  

researchers  in  the  natural  and  social  sciences.  

Interoperable  Cyber-­‐Infrastructure   Human  Networks  Examples  of  Joint  Projects  •  Access  to  TerraPop  extracts  in  DFC  collaboration  environments  •  Integration  of  Data  Conservancy  DCS-­‐Lite  and  SEAD  Active  

Content  Repository  tools  •  Projects  participating  in  DataONE  as  member  nodes  

DataNet  Collabora.on  Areas  •  Semantic  integration  •  Technical  best  practices  for  sustainability  •  Data  discovery,  formats,  and  interoperability  

from  the  scientist’s  perspective  

Popula.on  and  environmental  data  in  grids    

Environmental    and  popula.on  summaries  for  spa.al  units  

Area-­‐level  data  

Rasters  

Microdata  

Individuals  and  households    with  their  environmental    and    social    context  

•  Training  and  educa.on  –  Joint  development  and  cross-­‐program  utilization  of  data  management  courses,  sessions,  and  workshops  

•  Cross-­‐disciplinary  data  awareness  –  Introducing  scientists  to  data  from  other  disciplines  through  cross-­‐program  conference  activities  and  other  outreach  

•  Long-­‐term  financial  sustainability  –  Identifying  and  implementing  funding  and  revenue  models  to  support  long-­‐term  data  preservation  and  access  

•  Governance  –  Mechanisms  for  gathering  stakeholder  feedback  and  decision-­‐making  

Data  Grid  iRODS    

controlled  workflows  

Storage  

Shared  Collec.on  

Data  Grid  iRODS    

controlled  workflows  

Researchers  -­‐  Client  

Storage   Storage   Storage  

Minnesota Population Center

top related