collaboration tools and techniques for large model data sets rich signell,usgs woods hole, ma

24
Collaboration Tools and Collaboration Tools and Techniques for Large Model Techniques for Large Model Data Sets Data Sets Rich Signell,USGS Rich Signell,USGS Woods Hole, MA Woods Hole, MA

Upload: jordan-fitzgerald

Post on 18-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

Collaboration Tools and Collaboration Tools and Techniques for Large Model Techniques for Large Model

Data SetsData Sets

Rich Signell,USGSRich Signell,USGSWoods Hole, MAWoods Hole, MA

Page 2: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

MotivationMotivation

Typical model outputs are 100 Mb Typical model outputs are 100 Mb up to several GB.up to several GB.

Traditional collaboration method: Traditional collaboration method: users grab the whole NetCDF file users grab the whole NetCDF file from your web/ftp site, or you e-mail from your web/ftp site, or you e-mail them a few images.them a few images.

There is a better way…There is a better way…

Page 3: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

NetCDFNetCDF

Machine independentMachine independent, , self-self-describingdescribing, , binary formatbinary format for for multidimensional scientific datamultidimensional scientific data

Interfaces: Fortran, C, C++, Java, Interfaces: Fortran, C, C++, Java, Perl, Matlab, IDL, PythonPerl, Matlab, IDL, Python

Free, supported by NSF at Unidata Free, supported by NSF at Unidata

Page 4: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

netcdf swan_short {dimensions: y = 376 ; x = 136 ; time = UNLIMITED ; // (82 currently)variables: float depth(y, x) ; depth:units = "m" ; depth:long_name = "water depth" ; depth:_FillValue = -99999.f ; depth:coordinates = "lon lat" ; short hsig(time, y, x) ; hsig:units = "m" ; hsig:long_name = "significant wave height" ; hsig:_FillValue = 32767s ; hsig:add_offset = 14.5f ; hsig:scale_factor = 0.00047304f ; hsig:coordinates = "lon lat" ; double time(time) ; time:units = "days since 1968-05-23" ; time:long_name = "modified julian day (ROMS-style)" ; float lon(y, x) ; lon:units = "degrees_east" ; lon:long_name = "longitude" ; float lat(y, x) ;

lat:units = "degrees_north" ; lat:long_name = "latitude" ; // global attributes: :Conventions = "CF-1.0" ; :title = "SWAN driven by 7 km LAMI met model" ; :institution = "SACLANT Undersea Research Centre" ; :source = "SWAN Wave Model (NRL-SSC OpenMP version 31-Mar-2003)"; :contact = "Rich Signell ([email protected]"}

Page 5: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

PROGRAM WRITE_NCc INCLUDE 'netcdf.inc' PARAMETER (TIMES=3, LATS=5, LONS=10) ! dimension lengths INTEGER STATUS, NCID, TIMES INTEGER RHID ! variable ID DOUBLE RHVALS(LONS, LATS, TIMES) ... NF_OPEN ('foo.nc', NF_WRITE, NCID) NF_INQ_VARID (NCID, 'rh', RHID) DO 10 ILON = 1, LONS DO 10 ILAT = 1, LATS DO 10 ITIME = 1, TIMES RHVALS(ILON, ILAT, ITIME) = 0.5 10 CONTINUE NF_PUT_VAR_DOUBLE (NCID, RHID, RHVALS)

Page 6: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

DODS/OpenDAPDODS/OpenDAPhttp://www.opendap.orghttp://www.opendap.org

Open Data Access Protocol for delivery Open Data Access Protocol for delivery of multidimensional scientific data via of multidimensional scientific data via httphttp

DODS allows efficient slicing from DODS allows efficient slicing from data via the web, just as NetCDF data via the web, just as NetCDF works for local files. (Putting the “Net” works for local files. (Putting the “Net” in NetCDF!)in NetCDF!)

DODS serves not just NetCDF, but also DODS serves not just NetCDF, but also Matlab, HDF (also GRIB, BUFR, etc…)Matlab, HDF (also GRIB, BUFR, etc…)

Page 7: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

Accessing DODS dataAccessing DODS data

DODS APIs (C++, Java)DODS APIs (C++, Java) Any NetCDF code, relinked instead Any NetCDF code, relinked instead

with DODS netCDF librarywith DODS netCDF library ncdump => dncdumpncdump => dncdump ncview => dncviewncview => dncview Your Fortran, C, C++, Python, Perl, Your Fortran, C, C++, Python, Perl,

Java code…Java code…

Page 8: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

DODS & MatlabDODS & Matlab

DODS GUI and command line toolsDODS GUI and command line tools Relinked mexcdf53.dll, which can Relinked mexcdf53.dll, which can

enable enable allall Matlab tools that read Matlab tools that read NetCDF!NetCDF! (e.g.) NetCDF/Matlab toolbox(e.g.) NetCDF/Matlab toolbox >> url=‘http://long_path/myfile.nc’>> url=‘http://long_path/myfile.nc’ >> nc=netcdf(url);>> nc=netcdf(url); >> lon=nc{‘lon’}(:);>> lon=nc{‘lon’}(:); Google on: “sourceforge” “mexcdf”Google on: “sourceforge” “mexcdf”

Page 9: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

DODS/OpenDAP DODS/OpenDAP

Serving DODS data requires almost no Serving DODS data requires almost no effort on the part of the data provider:effort on the part of the data provider:

1.1. Download DODS server binaries to the Download DODS server binaries to the cgi-bin directory on the web servercgi-bin directory on the web server

2.2. Put your NetCDF files on the web serverPut your NetCDF files on the web server3.3. Go have a coffee to celebrate !Go have a coffee to celebrate !

(Note: most people don’t know that (Note: most people don’t know that getting a DODS server going is this getting a DODS server going is this easy!)easy!)

Page 10: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

DODS Success StoryDODS Success Story

DODS at sea: in limited bandwidth DODS at sea: in limited bandwidth situation, grabbed only 200 k OBC situation, grabbed only 200 k OBC region instead of 18 Mb NetCDF file.region instead of 18 Mb NetCDF file.

30 second download instead of 45 30 second download instead of 45 minutes!minutes!

Page 11: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

Need for ConventionsNeed for Conventions One of the greatest things about One of the greatest things about

NetCDF is that it places few demands on NetCDF is that it places few demands on the data provider - they are free to the data provider - they are free to specify whatever attributes they want, specify whatever attributes they want, or none at allor none at all

This is also one of the worst things, This is also one of the worst things, making it hard to develop flexible making it hard to develop flexible softwaresoftware

Software for ROMS won’t work for POM, Software for ROMS won’t work for POM, NCOM, HOPS, ECOM, etc (and vice NCOM, HOPS, ECOM, etc (and vice versa)versa)

Page 12: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

CF Conventions ICF Conventions IGoogle: “CF” “ucar”

Page 13: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

CF Conventions IICF Conventions II

Page 14: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

Making ROMS CF-Making ROMS CF-compliantcompliant

Store all information about the grid (lon_u, Store all information about the grid (lon_u, lat_u, angle) in the .his and .avg files (not lat_u, angle) in the .his and .avg files (not just the grid file)just the grid file)

Add “coordinates” attributes to curvilinear Add “coordinates” attributes to curvilinear variables (e.g. zeta:coordinates=“lat_rho variables (e.g. zeta:coordinates=“lat_rho lon_rho)lon_rho)

Add “standard_name=ocean_s_coordinate”Add “standard_name=ocean_s_coordinate” Make sure dimension names match Make sure dimension names match

coordinate variable names (ocean_time, sc_r)coordinate variable names (ocean_time, sc_r) Units need to be recognized by UDUNITSUnits need to be recognized by UDUNITS

Page 15: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

NCO INCO I

Page 16: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

NCO IINCO II

Page 17: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

ROMS2CF scriptROMS2CF script#!/bin/bash#!/bin/bashGFILE='../adria02_grid2.nc'GFILE='../adria02_grid2.nc'FFILE='adria03_avg.nc' FFILE='adria03_avg.nc'

ncks -F -d ocean_time,1 $FFILE ${FFILE}_CFncks -F -d ocean_time,1 $FFILE ${FFILE}_CF

# Specify horizontal coordinate variables associated with "RHO fields"# Specify horizontal coordinate variables associated with "RHO fields"

ncatted -O -h -a "coordinates","temp",c,c,"lat_rho lon_rho" ${FFILE}_CFncatted -O -h -a "coordinates","temp",c,c,"lat_rho lon_rho" ${FFILE}_CFncatted -O -h -a "coordinates","salt",c,c,"lat_rho lon_rho" ${FFILE}_CFncatted -O -h -a "coordinates","salt",c,c,"lat_rho lon_rho" ${FFILE}_CF

# Specify horizontal coordinate variables associated with "U fields"# Specify horizontal coordinate variables associated with "U fields"

ncatted -O -h -a "coordinates","u",c,c,"lat_u lon_u" ${FFILE}_CFncatted -O -h -a "coordinates","u",c,c,"lat_u lon_u" ${FFILE}_CFncatted -O -h -a "coordinates","ubar",c,c,"lat_u lon_u" ${FFILE}_CFncatted -O -h -a "coordinates","ubar",c,c,"lat_u lon_u" ${FFILE}_CF

# Merge the ROMS grid file into the CF file so we # Merge the ROMS grid file into the CF file so we # have all the coordinate variables we need # have all the coordinate variables we need

ncks -O -v lon_rho,lat_rho,lon_u,lat_u,lon_v,lat_v,mask_rho,mask_u,mask_v,angle $GFILE $GFILE.tmpncks -O -v lon_rho,lat_rho,lon_u,lat_u,lon_v,lat_v,mask_rho,mask_u,mask_v,angle $GFILE $GFILE.tmpncks -A $GFILE.tmp ${FFILE}_CFncks -A $GFILE.tmp ${FFILE}_CFrm $GFILE.tmprm $GFILE.tmp

# Add vertical coordinate info# Add vertical coordinate info

ncatted -O -h -a "standard_name","sc_r",c,c,"ocean_s_coordinate" ${FFILE}_CFncatted -O -h -a "standard_name","sc_r",c,c,"ocean_s_coordinate" ${FFILE}_CFncatted -O -h -a "positive","sc_r",c,c,"up" ${FFILE}_CFncatted -O -h -a "positive","sc_r",c,c,"up" ${FFILE}_CFncatted -O -h -a "formula_terms","sc_r",c,c,"s: sc_r eta: zeta depth: h a: theta_s b: theta_b depth_c: hc" ${FFILE}_CFncatted -O -h -a "formula_terms","sc_r",c,c,"s: sc_r eta: zeta depth: h a: theta_s b: theta_b depth_c: hc" ${FFILE}_CF

# Add data from field file to template # Add data from field file to template

ncks -A $FFILE ${FFILE}_CFncks -A $FFILE ${FFILE}_CF

# rename the dimension# rename the dimensionncrename -O -h -d s_rho,sc_r ${FFILE}_CFncrename -O -h -d s_rho,sc_r ${FFILE}_CF

CF checker: http://titania.badc.rl.ac.uk/cgi-bin/cf-checker.pl

Google: “CF” “checker”

Page 18: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

Integrated Data Viewer Integrated Data Viewer (IDV)(IDV)

Works on local CF-compliant Works on local CF-compliant NetCDF filesNetCDF files

Works on THREDDS catalog dataWorks on THREDDS catalog data

Page 19: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

Integrated Data Viewer Integrated Data Viewer (IDV)(IDV)

Works on local CF-compliant Works on local CF-compliant NetCDF filesNetCDF files

Works on THREDDS catalog dataWorks on THREDDS catalog data

Page 20: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

IDVIDV

Freeware supported by the Unidata Freeware supported by the Unidata Program Center (new app, version 1.2)Program Center (new app, version 1.2)

Java, utilizing Java3D and VisAD (VIS5D)Java, utilizing Java3D and VisAD (VIS5D) Runs on Windows, Mac, Solaris (VIS5D Runs on Windows, Mac, Solaris (VIS5D

is limitation)is limitation) Reads NetCDF, DODS, ADDE, GeoTiff, Reads NetCDF, DODS, ADDE, GeoTiff,

Arc ShapefilesArc Shapefiles Slices, dices, animatesSlices, dices, animates

Page 21: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

IDV in ActionIDV in Action

Page 22: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

THREDDSTHREDDS

Page 23: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

RecommendationsRecommendations

Make your model output CF-compliant!Make your model output CF-compliant! Distribute your model output via DODSDistribute your model output via DODS Make a THREDDS catalog for DODS dataMake a THREDDS catalog for DODS data Allow “packing” of data for efficient Allow “packing” of data for efficient

internet delivery (and disk utilization)internet delivery (and disk utilization) Develop software for CF-compliant dataDevelop software for CF-compliant data

Page 24: Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

AbstractAbstract Collaboration Tools and Techniques Collaboration Tools and Techniques

for Large Model Data Sets for Large Model Data Sets

Rich SignellRich SignellU.S. Geological SurveyU.S. Geological SurveyWoods Hole, MA USAWoods Hole, MA USA

New tools and standards are emerging that facilitate web-based collaboration with large data sets such as New tools and standards are emerging that facilitate web-based collaboration with large data sets such as those produced by the ocean model those produced by the ocean model ROMSROMS. Using . Using OpenDAPOpenDAP (a.k.a. DODS), ROMS NetCDF output files can be (a.k.a. DODS), ROMS NetCDF output files can be placed on a web server and users can extract just the data they need (say, the surface temperature from a placed on a web server and users can extract just the data they need (say, the surface temperature from a particular day) from the file without any extra effort by the modeller. This, for example, allows a particular day) from the file without any extra effort by the modeller. This, for example, allows a collaborator to issue a simple command in Matlab that will load just the model output desired from the collaborator to issue a simple command in Matlab that will load just the model output desired from the remote web site into a local Matlab session, avoiding file format conversion and wasting network remote web site into a local Matlab session, avoiding file format conversion and wasting network bandwidth. By linking with the OpenDap NetCDF library instead of the standard NetCDF library, any bandwidth. By linking with the OpenDap NetCDF library instead of the standard NetCDF library, any NetCDF application can be turned into a OpenDAP application. This approach was used to rebuild the NetCDF application can be turned into a OpenDAP application. This approach was used to rebuild the popular Matlab/NetCDF interface “Mexcdf”, so if you get the OpenDAP-enabled version of this interface popular Matlab/NetCDF interface “Mexcdf”, so if you get the OpenDAP-enabled version of this interface from the SourceForge MexCDF site, you can use any Matlab/netcdf application to access OpenDAP data as from the SourceForge MexCDF site, you can use any Matlab/netcdf application to access OpenDAP data as well.well.

If in addition the ROMS NetCDF files are modified to follow the CF Conventions, a set of conventions If in addition the ROMS NetCDF files are modified to follow the CF Conventions, a set of conventions specifically designed for complex model output (including handling of the ROMS s-coordinate), then public specifically designed for complex model output (including handling of the ROMS s-coordinate), then public domain software such as Unidata’s Integrated Data Viewer (IDV) will recognize the ROMS output files, and domain software such as Unidata’s Integrated Data Viewer (IDV) will recognize the ROMS output files, and can be used to interactively browse, analyze and visualize the results in 3D. Multiple web users can can be used to interactively browse, analyze and visualize the results in 3D. Multiple web users can visualize and manipulate the data interactively through the collaboration facility built into IDV. The visualize and manipulate the data interactively through the collaboration facility built into IDV. The conversion to CF-compliant NetCDF can be achieved easily using the NetCDF operator tools (NCO). The conversion to CF-compliant NetCDF can be achieved easily using the NetCDF operator tools (NCO). The NCO tools can also be used to automatically reduce the ROMS output files by a factor of 2 by converting NCO tools can also be used to automatically reduce the ROMS output files by a factor of 2 by converting floats to short integers, which have sufficient dynamic range for most variables. This also doubles the speed floats to short integers, which have sufficient dynamic range for most variables. This also doubles the speed at which Internet users can obtain their requested data. at which Internet users can obtain their requested data.

If the model data provider takes a small additional step of creating a THREDDS catalog (a straightforward If the model data provider takes a small additional step of creating a THREDDS catalog (a straightforward XML file) of the CF compliant ROMS output files, then the model results appear as just another data source XML file) of the CF compliant ROMS output files, then the model results appear as just another data source to an IDV user. This allows users to browse and create visualization using model results without knowing to an IDV user. This allows users to browse and create visualization using model results without knowing that they are using NetCDF.that they are using NetCDF.