Download - Geospatial Analysis in the Cloud
Use of Cloud Computing for scalable geospatial data processing and access
Andrew TurnerCTO, [email protected]
Partner: U.S. Federal Geographic Data Committee
What is GeoCommons?A Brief History
Vulnerability Identification
Chicago
Denver
Route 2
Route 1Los Angeles
Atlanta Fiber Density
Electric Transmission Line
Density
Baseline connectivity of a fiber network provider in NYC. This particular provider is a good proxy for the structure of the entire island of Manhattan since they have about 80% of the right of ways on the island and a large number of egress points off the island. The higher the peak in the map the more frequently used the path is as a possible routing path.
WTC
Holland Tunnel
Columbus Circle
Lastly a scenario is run where just 10,000 sq ft. of damage is done to the Holland Tunnel and the impact calculated. The result is a 8.6% loss of network connectivity, 134 times the impact of the WTC simulation. The dramatic impact is seen in the image from the loss as well as the stress put on the GW Bridge route out of the city.
GeoCommons: Version 1
Find interesting data
Find interesting data
Map arelevant area
Find interesting data
Map arelevant area
Visualize to find meaning
Find interesting data
Map arelevant area
Visualize to find meaning
Layer, Modify,and Analyze
Find interesting data
Map arelevant area
Visualize to find meaning
Collaborate with others
Layer, Modify,and Analyze
Find interesting data
Map arelevant area
Visualize to find meaning
Collaborate with others
Publish and share results
Layer, Modify,and Analyze
Visualization
Analysis
Applying Lessons Learned
Modularize
MakerFinder
CoreRESTfulInterfaces
Application Programming Interface
Relational Databases Don’t Scale Well
Datasets as Databases
MakerFinder
Core
KML
Shapefile
CSV (Excel)
GeoRSS
Documents
Datasets as Databases
MakerFinder
Core
KML
Shapefile
CSV (Excel)
GeoRSS
Documents
Upload
Datasets as Databases
MakerFinder
Core
KML
Shapefile
CSV (Excel)
GeoRSS
Documents
Upload
Parse & Store
Datasets as Databases
MakerFinder
Core
KML
Shapefile
CSV (Excel)
GeoRSS
Documents
Upload
Parse & Store
Datasets as Databases
MakerFinder
Core
KML
Shapefile
CSV (Excel)
GeoRSS
Documents
Upload
Parse & Store
Datasets as Databases
MakerFinder
Core
KML
Shapefile
CSV (Excel)
GeoRSS
Documents
Upload
Parse & Store
Datasets as Databases
MakerFinder
Core
KML
Shapefile
CSV (Excel)
GeoRSS
Documents
Upload
Parse & Store
Download
Datasets as Databases
MakerFinder
Core
KML
Shapefile
CSV (Excel)
GeoRSS
Documents
Upload
Parse & Store Analyze
Download
Datasets as Databases
MakerFinder
Core
KML
Shapefile
CSV (Excel)
GeoRSS
Documents
Upload
Parse & Store Analyze
Download
Visualize
Geospatial Catalog and Server
Delivery Mechanisms
Appliances
• Sun 4150• RAID Array
Web Scaled Racks
• 3 Appliances• Network File Storage• Load Balancer• Monitoring and Tunnels• Production & Staging racks• Racks in office for development
Limits in Scaling
Limits in Development
Limits in Scaling
People
Limits in Development
Limits in Scaling
PeoplePower
Limits in Development
Limits in Scaling
PeoplePowerSize
Limits in Development
Limits in Scaling
PeoplePowerSizeCost
Limits in Development
Limits in Scaling
PeoplePowerSizeCostTime
Limits in Development
Limits in Scaling
PeoplePowerSizeCostTime
Limits in Development
Limits in Scaling
PeoplePowerSizeCostTime
Limits in Development
Testing on “clean” machines
Limits in Scaling
PeoplePowerSizeCostTime
Limits in Development
Testing on “clean” machines
Deployment testing of upgrades
Limits in Scaling
PeoplePowerSizeCostTime
Limits in Development
Testing on “clean” machines
Deployment testing of upgrades
Controlled Environments
url
Leveraging the Cloud
http
://w
ww
.flic
kr.c
om/p
hoto
s/kk
y/70
4056
791
Amazon Web Services
Management Consoles
Processing via MapReduce
Launching New Instances
Elastic Computing Cluster - EC2
• Virtual Servers
• Machine Images (AMI)
• On-Demand
CentOS AMI
Elastic Computing Cluster - EC2
• Virtual Servers
• Machine Images (AMI)
• On-Demand
CentOS AMI
build
Elastic Computing Cluster - EC2
• Virtual Servers
• Machine Images (AMI)
• On-Demand
CentOS AMI
build
bundle
register
Elastic Computing Cluster - EC2
• Virtual Servers
• Machine Images (AMI)
• On-Demand
CentOS AMI
build
bundle
register
instantiate
Elastic Computing Cluster - EC2
• Virtual Servers
• Machine Images (AMI)
• On-Demand
CentOS AMI
build
bundle
register
instantiate
Elastic Computing Cluster - EC2
• Virtual Servers
• Machine Images (AMI)
• On-Demand
CentOS AMI
build
bundle
register
instantiate
Elastic Computing Cluster - EC2
• Virtual Servers
• Machine Images (AMI)
• On-Demand
CentOS AMI
build
bundle
register
instantiate
Elastic Computing Cluster - EC2
• Virtual Servers
• Machine Images (AMI)
• On-Demand
CentOS AMI
build
bundle
register
instantiate
Elastic Block Store - EBS
Create EBS
100 GB
Elastic Block Store - EBS
attach
Create EBS
100 GB
Elastic Block Store - EBS
attach
Create EBS
snapshot100 GB
Elastic Block Store - EBS
attach
Create EBS
snapshot100 GB
Diff v1S3
Elastic Block Store - EBS
attach
Create EBS
snapshot100 GB
Diff v2Diff v1S3
Elastic Block Store - EBS
attach
Create EBS
snapshot100 GB
Diff v2
Create & AttachDiff v1S3
Elastic Block Store - EBS
attach
Create EBS
snapshot100 GB
Diff v2
Create & AttachDiff v1S3
Elastic Block Store - EBS
attach
Create EBS
snapshot100 GB
Diff v2
Create & AttachDiff v1S3
Elastic Block Store - EBS
attach
Create EBS
snapshot100 GB
Diff v2
Create & AttachDiff v1S3
Public Datasets
Additional Benefits
• Federation
• Tile generation
• Content-delivery System
• Simple Queue System (SQS)
tiles/openstreetmap/9/74/97.png
tiles/openstreetmap/9/74/98.png
tiles/bluemarble/9/74/97.png
tiles/bluemarble/9/74/98.pngS3 Storage
Cloud Architecture
• EC2 image of current system architecture
• EBS image stored to S3 of default database
• Current application release in S3
• Start an EC2, attach data, attach code, startup
Default
Datasets
v1.4.3
Cloud Architecture
• EC2 image of current system architecture
• EBS image stored to S3 of default database
• Current application release in S3
• Start an EC2, attach data, attach code, startup
createinstance
Default
Datasets
v1.4.3
Cloud Architecture
• EC2 image of current system architecture
• EBS image stored to S3 of default database
• Current application release in S3
• Start an EC2, attach data, attach code, startup
createinstance
Default
Datasets
v1.4.3
Cloud Architecture
• EC2 image of current system architecture
• EBS image stored to S3 of default database
• Current application release in S3
• Start an EC2, attach data, attach code, startup
attach data
createinstance
Default
Datasets
v1.4.3
Cloud Architecture
• EC2 image of current system architecture
• EBS image stored to S3 of default database
• Current application release in S3
• Start an EC2, attach data, attach code, startup
attach data
createinstance
Default
Datasets
v1.4.3
Backup BackupBackup
Snapshot
Cloud Architecture
• EC2 image of current system architecture
• EBS image stored to S3 of default database
• Current application release in S3
• Start an EC2, attach data, attach code, startup
attach data
createinstance
Default
Datasets
v1.4.3
Backup BackupBackup
Snapshot
Cache Downloads
S3
Scaling
• RESTful architecture
• Caching for speed, and CDN support
• Amazon Web Services
• CloudWatch
• Elastic Scaling
• Load Balancer
Private Instances
First Users: Meedan, Media
Repeatable
Repeatable
Data Federation
community
Geospatial Federated Search Search
Geocoding
Geocoding - Scale as Required
TIGER/LineSQLite
Geocoding Engine
API
UploadCSV
GeocodeCacheResults
Geocoding - Scale as Required
TIGER/LineSQLite
Geocoding Engine
API
UploadCSV
GeocodeCacheResults
Best Practices Applied to the Government
• Built using open, established tools
• Full choice - Linux, Windows
• Full Control
• Repeatable processes
• Continual backup
• Scaling dynamic and large datasets
• Synchronous and Asynchronous analysis
Level of Maturity
• Widely adopted
• Broad support and ecosystem
• Full stack support
Perceived Impediments to Adoption
• Single Vendor (open-source alternatives arising)
• Maintenance and Location
• Data Security