taming an ocean of data at aoos - arctic · clean up or flag invalid/suspect data monitor data...
TRANSCRIPT
![Page 1: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/1.jpg)
Taming an Ocean of Data at AOOSEnd to end data lifecycle management
![Page 2: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/2.jpg)
Background - Axiom
● Cyberinfrastructure technology development:
○ environmental, biological and geo-scientific data
● AK Headquarters, OR and RI satellite offices
● Support federal, university and NGO groups
● Mission driven
![Page 3: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/3.jpg)
Background - Axiom
● Shared cyberinfrastructure approach● Community developed software,
standards and protocols● Scalable compute and storage
infrastructure (HPC)o 5 petabytes storage; 3,000
processing cores
![Page 4: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/4.jpg)
Background - Alaska Ocean Observing System
● Officially established in 2005● Regional member of the Integrated
Ocean Observing System (IOOS)● Network of critical ocean and coastal
observations, data and information products
● “Eye on Alaska’s coasts and oceans”
![Page 5: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/5.jpg)
Goals - Alaska Ocean Observing System
● Increase access to existing coastal and ocean data
● Package information and data in useful ways to meet the needs of stakeholders
● Increase observing and forecasting capacity in all regions of the state, with a priority on the Arctic and Gulf of Alaska
![Page 6: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/6.jpg)
Challenges - Alaska Ocean Observing System● Lots of existing data and research efforts in the region
○ Varying data/metadata quality○ Limited sharing beyond specific research effort○ Often not ready for public consumption
■ Complex data formats■ Lack of context/metadata
○ Hard to find (not discoverable)○ Isolated (not interoperable/comparable/synthesizable)
![Page 7: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/7.jpg)
Biodiversitycount, richness, diversityindices
Platformsmoorings, shorestations
Productsskillassessment,shorelinechange,etc.
Gridsmodels,satellite, radar
GISHabitattypes,bathymetry,fishingzones,etc.
7
Data Types
Gliders
![Page 8: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/8.jpg)
Data Pipeline
Data Portal
![Page 9: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/9.jpg)
Data Cleaning/Upgrades
● Data○ Structural/syntax problems○ Quality Control/Quality Assurance (QA/QC)
■ Clean up or flag invalid/suspect data○ Monitor data stream sources for outages
● Metadata○ Make sure observable properties are clearly defined○ Make sure units are clearly defined○ Make sure spatial and temporal axes are defined○ Use community standards when possible
■ CF conventions/standard names■ ACDD metadata attribute conventions
○ Dataset should be self describing!!■ Even to people outside of the domain
![Page 10: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/10.jpg)
Data Portal - Public Data Exploration and Access
CatalogSearch , metadata, & data download
MapIntegrate & visualize data from many sources
Rapidly assimilate & compare different data streams
Data Views
![Page 11: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/11.jpg)
11
Downloads Using Interoperability Services
ncWMSShapefileCSVTHREDDS netCDFOPenDAPERDDAP
![Page 12: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/12.jpg)
ERDDAP
![Page 13: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/13.jpg)
That’s great for structured, predictable data. What about research programs,
synthesis projects, citizen science, etc.?
![Page 14: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/14.jpg)
ShareAnalyzePreserve
~web-based platform for collaboratively managing science projects through the entire data lifecycle~
![Page 15: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/15.jpg)
● Organize into projects, research campaigns, and organizations● Manage sharing through advanced security permissions● Coordinate data exchange across networks, groups, programs● ISO 19115/19110 standards metadata editor● Execute server side Python and R numeric workflows (Jupyter)
on uploaded data AND any data in Axiom CI stack● Archive pathway to DataONE & Datacite DOI minting
![Page 16: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/16.jpg)
Supportingtheentiredatalifecycle
DATACOLLECTION&QUALITYCONTROLScientistsorIngestion
STORAGEResearchWorkspace
DESCRIPTIONMetadataEditor
ARCHIVE&PRESERVATION
Repositorysubmissionpathway
ACCESS&DISCOVERYDataportals&search
catalogs
REUSE&TRANSFORMATIONJupyterNotebook&data
analyses
![Page 17: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/17.jpg)
Research Workspace - Metadata
● We help build large collections of diverse datasets.
● Unorganized, undocumented data collections benefit no one.
● Metadata tells the story (who, what, when, where, why, and how) of the data.
● Metadata makes data discoverable.
17
![Page 18: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/18.jpg)
● The best metadata is written by the people closest to the data.
● Many researchers aren't familiar with authoring it.
● Metadata can be difficult to write well (existing standards are complex and confusing).
● Researchers are also very busy people.
18
Research Workspace - Metadata
![Page 19: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/19.jpg)
19
![Page 20: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/20.jpg)
Research Workspace - Publish & Archive
Publish Archive
LinksData Portal
![Page 21: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/21.jpg)
Mission:
“Enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it.” How:Cyberinfrastructure + Community
www.dataone.org
![Page 22: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/22.jpg)
DataONE - Networked Repositories
Member Nodes
Coordinating Nodes
![Page 23: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/23.jpg)
DataONE - Member Nodes
Member Nodes● Defined policies● Persistent IDs● Immutable content● Standardized
metadata● Resource maps (bag-
it, etc)● Implement DataONE
API
![Page 24: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/24.jpg)
DataONE - Coordinating Nodes
Member Nodes (Preservation Repo + DataONE API)
Coordinating Nodes● Sync metadata from MNs
![Page 25: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/25.jpg)
DataONE - Coordinating Nodes
Member Nodes(Preservation Repo + DataONE API)
Coordinating Nodes● Sync metadata from MNs● Control MN to MN replication
![Page 26: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/26.jpg)
DataONE - Federated Search
![Page 27: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/27.jpg)
DataONE - Discoverability
carbon cycling plant biomass
ocean nitrogen avian distribution
![Page 28: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/28.jpg)
Dataset Review
Flag for Archive
Reserve DOI
Update Metadata
Package Dataset
Push to RW MN
CN Index and Sync
![Page 29: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/29.jpg)
● Locate, identify and cite research data
![Page 30: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/30.jpg)
Research Workspace - Analysis/Synthesis
● Challenges○ Data Availability○ Compute Resources○ Barriers to Entry
● Solution: Jupyter Notebooks○ Local availability of large datasets
■ TB+ model/satellite data■ Real time sensor system■ No need to download data to analyze
○ Powerful compute resources■ High bandwidth/throughput■ Powerful CPUs / large RAM■ Hardware optimized for numerical computation
○ No software management burden!
![Page 31: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/31.jpg)
● Create and share documents that contain code, equations, and visualizations
● Reproducible numerical simulations and statistical modeling● Access uploaded data stored in the Workspace or data portal
![Page 32: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/32.jpg)
![Page 33: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/33.jpg)
Research Workspace - Time-series Anomalies
● Calculate climate normals on a 31-year long, multi-terabyte dataset● Then plot temperature anomalies over a region
![Page 34: Taming an Ocean of Data at AOOS - Arctic · Clean up or flag invalid/suspect data Monitor data stream sources for outages ... Solution: Jupyter Notebooks Local availability of large](https://reader034.vdocuments.us/reader034/viewer/2022050121/5f51b390b0893262a7013df7/html5/thumbnails/34.jpg)
Thank you! Questions?