uaf/osmc presenters: kevin o’brien and eugene burger abstract: kevin o’brien and eugene burger...

27
Improving Data Catalogs Kevin O’Brien - University of Washington/JISAO, NOAA/PMEL Roland Schweitzer – Weathertop Consulting Eugene Burger – NOAA/PMEL

Upload: jemimah-park

Post on 14-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

Improving Data Catalogs

Kevin O’Brien - University of Washington/JISAO, NOAA/PMEL

Roland Schweitzer – Weathertop Consulting

Eugene Burger – NOAA/PMEL

Page 2: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

The Unified Access Framework (UAF)

• A Global Earth Observation Integrated Data Environment (GEO-IDE) project

• An attempt to improve scientific data management and access

• Focus on successes

Page 3: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

Lots of data already available

Page 4: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

Projects: (too many to name)

Dataformats:

netCDF GRIB ASCII

Applications: Matlab ArcGIS Ferret

GrADS Google Earth IDV LAS ERDDAP …

Users: (too many to name)

netCDF-CF-DAP-THREDDS-WMS

Page 5: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

Developing the UAF Catalog Cleaner

(a ‘web crawler’)N

OM

ADS

UAF ‘RAW’ catalog

NOAA NOAA Affiliated

NMFSOAR NWS NESDIS

NO

DC

NG

DC

GFD

L

PMEL

AOM

LO

CO

PFEG

ND

BC

ESRL

Coas

twat

ch

IOOS National Partners

IOOS Regional Partners

NAV

O

AOO

S

NAN

OO

S

CEN

COO

S SCCO

OS

PACI

OO

SG

LOS

NER

ACO

OS

MAC

OO

RA SECO

ORA

CARI

COO

S GCO

OS

NO

MAD

S

UAF ‘CLEAN’ catalog

NOAA NOAA Affiliated

NMFSOAR NWS NESDIS

NO

DC

NG

DC

GFD

L

PMEL

AOM

LO

CO

PFEG

ND

BC

ESRL

Coas

twat

ch

IOOS National Partners

IOOS Regional Partners

NAV

O

AOO

S

NAN

OO

S

CEN

COO

S SCCO

OS

PACI

OO

SG

LOS

NER

ACO

OS

MAC

OO

RA SECO

ORA

CARI

COO

S GCO

OS

‘RAW’

‘CLEAN’

Page 6: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

Tree Crawl Dataset Crawl Cleaner

CatalogRef and

Dataset URL’s

Raw catalog XML

Page 7: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

Tree Crawl Dataset Crawl Cleaner

url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/OCEAN_GEOSTROPHIC_CURRENTS/CURRENTS.nc"url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/GLOBAL_MONTHLY_CARBON_FLUXES/FLUXES.nc"url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/GLOBAL_SEASON_CARBON_FLUXES/FLUXES.nc"url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/ROMSMETEO/kk1.nc"url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/MCI_GULF/kk1.nc"url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/MSGSST/SST.nc"url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/TERRA_K490_GULF/terrak490.nc"url="http://cwcgom.aoml.noaa.gov/thredds/dodsC/TERRA_K490_GULF_3D/terrak490.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.199910.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.199911.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.199912.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200001.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200002.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200003.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200004.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200005.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200006.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200007.nc"url="http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR.dailyavgs/subsurface/soill.200008.nc".

CatalogRef and

Dataset URL’s

Page 8: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

Tree Crawl Dataset Crawl Cleaner

Aggregations

CF complianc

e

Access services

UAF Clean Catalog

Page 9: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

UAF Clean Catalog

Page 10: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

How to provide feedback to data providers?

•Remember the “Building on Success” theme

• ncISO metadata assessment tool is very successful

Page 11: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory
Page 12: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory
Page 13: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

How about a catalog quality assessment tool?

How to provide feedback to data providers?

•Remember the “Building on Success” theme

• ncISO metadata assessment tool is very successful

Page 14: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory
Page 15: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory
Page 16: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

Statistics for current catalog and all it’s children

Links to rubric reports for child catalogs

Page 17: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

Missing services

Data issues

Page 18: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

url url

url

url url

url

url url

Page 19: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

Data issues

Original Catalog

Page 20: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

1. Crawl a collection of catalogs and find all of the OPeNDAP end points.

2. Examine each end point and determine if it has gridded CF compliant netCDF data.

The catalog cleaner can...

Page 21: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

1. Report problems:a. No grids found that follow CFb. Unordered time axisc. Data access errors (underlying files missing, mis-

configured gateways, etc.)2. Detect unaggregated time series data3. Detect missing services

The catalog cleaner can...

Page 22: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

1. Write a new catalog with remote links to the data and with local versions of missing services.

The catalog cleaner can...

Page 23: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

but shouldn’t…

1. Construct an aggregation to run locally accessing remote data via OPeNDAP.

The catalog cleaner can…

Page 24: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

1. Unacceptably poor data access performance.

2. No access to the local file system, so it cannot make a catalog that would aggregate the files via configuration pointing to the local file system.

Why not...

Page 25: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

1. Use a modified version of the tool to assess the quality of a local catalog.

IE: CatalogCleaner CatalogEvaluator

2. Do the (not difficult) work locally to aggregate files where appropriate and turn on missing services.

What to do...

Page 26: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

Moving Forward….

• Welcome feedback on rubric and Catalog Cleaner tool

• Evolution of tool to an evaluation tool

• UAF master catalog to go beyond gridded files• Use ERDDAP to including In Situ featureTypes• Building support for visualization of these in LAS

• Continue community outreach to improve catalogs

Page 27: UAF/OSMC Presenters: Kevin O’Brien and Eugene Burger Abstract: Kevin O’Brien and Eugene Burger are from NOAA’s Pacific Marine Environmental Laboratory

Thank you!UAF: geo-ide.noaa.govCatalog Cleaner code and documentation:

http://ferret.pmel.noaa.gov/LAS/documentation/the-uaf-catalog-cleaner/ERDDAP: upwell.pfeg.noaa.gov/erddapTHREDDS: www.unidata.ucar.edu/projects/THREDDSnetCDF: www.unidata.ucar.edu/netcdfOPeNDAP: www.opendap.orgCF: cf-pcmdi.llnl.gov