geospatially enabling your spark and accumulo clusters with locationtech
TRANSCRIPT
What we’ll be covering…
What does “processing geospatial data at scale” mean?
What are Accumulo and Spark?
What is LocationTech?
How GeoMesa, GeoWave, and GeoTrellis can GeoSpatially enable your projects.
• 170 X 180 km
• 2gb each.
• 11 bands
• 700 scenes per day
• 1.4 TB / day
• 255,500 scenes / year
• 0.25 PB / year
Landsat 8
Landsat 8 on
• All Landsat 8 scenes from 2015 and beyond.• Selection of cloud-free scenes from 2013 and 2014.
Apache Spark
a distributed computation engine.
An API that lets you work with distributed data as a collection.
Written in Scala, with language bindings for use with Java, Python, and R.
Data Node
Data Node
Data Node
Name Node
Master
Tablet Server
Tablet Server
Tablet Server
Accumulo
BigTable clone (columnar database)
Records stored on HDFS
Lexicographically sorted table index
Hey Flyers Fans, what is the total count of Landsat 8 Scenes on your phones A) per month, B) per country,
C) per both?
SELECT tweet.text, user.name FROM tweet, userWHERE bbox(tweet.location, -115, 45, -110, 50) AND tweet.user_id = user.user_id
+
100 spot instance m3.xlarge workers @ $0.04 / hr = $4.00 / hr
400 CPUs / ≈1.5 TB memory
1 master m3.xlarge on-demand instance @ $0.26 / hr
EMR cluster charge, $0.07 / hr
$4.37 / hr
Rendering elevation with hillshade + NLCD on AWS EMR
Hey Flyers Fans, can you take the average pixel value of each scene’s band and derive a EPSG:3857 tile set of PNGs to be served on web
maps?
• Come see us at our booth
• Join the locationtech-iwg mailing list
• Share you big geospatial data challenges
Get involved!
THANK YOU
@lossyrob
gitter.im/geotrellis/geotrellis
github.com/geotrellis/geotrellis