opendap in the cloud optimizing the use of storage systems provided by cloud computing environments...
TRANSCRIPT
![Page 1: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/1.jpg)
OPeNDAP in the CloudOptimizing the Use of Storage Systems Provided by Cloud Computing Environments
OPeNDAPJames Gallagher, Nathan Potter
and
NOAA/NODCDeirdre Byrne, Jefferson Ogata, John Relph
26 June 2013
![Page 2: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/2.jpg)
Cloud Systems Now*
•Providers: IBM, Microsoft, Amazon, Google, Rackspace, …
•Microsoft: Azure “…handles 100 petabytes of data a day”
•Amazon: “…hundreds of thousands of users”•Netflix: “…stopped building it’s own data
centers in 2008;” all in Amazon by 2012•Snapchat: 4000 pictures per second; “…never
owned a computer server.” (Google cloud)
*Quentin Hardy, “Google Joins a Heavyweight Competition in Cloud Computing,” NY Times, 3 December 2013
![Page 3: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/3.jpg)
• TheOPeNDAP request smaller and is just the data the person wants
• In cloud systems cost is a function of data transfer, in addition to to data stored, so smaller targeted requests reduce costs
OPeNDAP request
4% Download
Full dataset
100% Download
Why use OPeNDAP?
![Page 4: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/4.jpg)
NOAA Environmental Data Management Conceptual Cloud Architecture*
Potential locations of cloud-enabled OPeNDAP instances
*Aadapted from NOAA Environmental Data Management Framework Draft v0.3Appendix C - Dr. Jeff de La Beaujardière, NOAA Data Management Architect
![Page 5: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/5.jpg)
• No vendor lock-in! • No Stovepipes! - flexible storage method
• What will be the client of 2020?• Hierarchical/human browsable
Constraints
file
dataset
file file
![Page 6: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/6.jpg)
Data stores: S3 and Glacier•S3
• Spinning disk with a flat file system• Designed to make web-scale computing easier
•Glacier• Near-line device with 4-hour (or >) access times• Secure and durable storage
•EC2• EC2 was used to run the OPeNDAP data server• Linux
![Page 7: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/7.jpg)
Using S3 as a Data Store
Catalog
Data
S3HTTP GET & HEAD requests
![Page 8: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/8.jpg)
Web requests
S3
Catalog, or data request
XML or data file
![Page 9: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/9.jpg)
To enhance performance, data were accessed from S3 only when not already cached.
OPeNDAP Catalog requests
S3OPeNDAP
Server
catalogcache
XML File
User catalogRequest Catalog Access
THREDDScatalog or HTML
EC2
datacache
![Page 10: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/10.jpg)
To enhance performance, data were accessed from S3 only when not already cached.
OPeNDAP Data requests
S3OPeNDAP
Server
catalogcache
Data File
User dataRequest Data Access
Data Slice
EC2
datacache
![Page 11: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/11.jpg)
Observations
• S3FS & Amazon's APIs: vendor lock-in
• XML catalogs were flexible: • Support both direct web and…
• Subsetting server access
• Likely adaptable to other use-cases
• Easily support hierarchical structure
• Catalogs didn't need to be stored in S3
![Page 12: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/12.jpg)
Glacier and Asynchronous Responses
• To use Glacier, a web service protocol must support asynchronous access! Glacier is a near-line device; not a spinning disk.
• Support via protocol is not enough: typical use cases cannot be met without caching ‘metadata’o To support web interfaces/clients DAP metadata
objects should be cachedo To support smart clients, may need domain data in
cache
![Page 13: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/13.jpg)
Glacier Implementation
• Cachingo Catalogo DAP metadata
• Support for programmatic and web clientso Web clients are the primary user of the DAP
metadata because of their ‘click and browse’ behavior
• XML with an embedded XSL style sheeto Single response (XML) o Multiple target clients – smart and browser
![Page 14: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/14.jpg)
![Page 15: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/15.jpg)
![Page 16: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/16.jpg)
![Page 17: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/17.jpg)
![Page 18: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/18.jpg)
![Page 19: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/19.jpg)
![Page 20: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/20.jpg)
![Page 21: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/21.jpg)
Comparison: S3 and Glacier*
•Glacier provides “secure and durable storage”•S3 is “designed to make web-scale computing
easier”•These graphs: A tiny part of complex cost model.
They do not include the cost to move data out of the Amazon cloud, EC2 instances, etc.
*http://calculator.s3.amazonaws.com/calc5.html
![Page 22: OPeNDAP in the Cloud Optimizing the Use of Storage Systems Provided by Cloud Computing Environments OPeNDAP James Gallagher, Nathan Potter and NOAA/NODC](https://reader035.vdocuments.us/reader035/viewer/2022062308/56649c7d5503460f94931bc6/html5/thumbnails/22.jpg)
Summary
• OPeNDAP server with minimal changes • Data stored in S3 and Glacier• Solution widely applicable: Web + Smart
clients• Complexity of the cost model combination
of both S3 and Glacier likely• Modeling & Monitoring use required