helping the world’s farmers adapt to climate change
Post on 07-Jan-2016
28 Views
Preview:
DESCRIPTION
TRANSCRIPT
Helping the World’s Farmers Adapt to Climate Change
Strata Conference Oct 2012Siraj Khaliq, CTO, The Climate Corporation
Fritchton, IN – late summer, 2012
Louisville, IL
Wichita, KA
Click to edit Master title style
• Click to edit Master text styles– Second level
• Third level– Fourth level
» Fifth level 195620121988
Worst US Droughts in the Last Fifty Years
Click to edit Master title style
-16%2012 Estimated Corn Yield (USDA)
Click to edit Master title style
+6%World food prices month-on-month
change in July 2012 (UNFAO)
Large capital outlays at start of season (April)
Seed, equipment, pesticide, and land
Revenue comes in at harvest
1-2 years of revenue shortfall could be catastrophic
Futures help with price volatility, not weather
Farm Economics
Farmer Rich Vernon talks to NPR's David Schaper (audio)
A real-life example
This is set to continue
Click to edit Master title style
Click to edit Master title style
To help all the world's people & businesses manage and
adapt to climate change
Our Mission
Evaluating Markets
Click to edit Master title style
$4.2 Trillion2012 Estimated Corn Yield (USDA)
Total Weather Insurance (TWI)
TWI Demo
HOW?
OutcomeWeather DataPolicy
Modeled Outcomes
Weather Simulations
Structure
StructureHow does weather impact crop yield?
Structure
Varies based on many inputs: Temperature Precipitation Soil type Topography Farming practices Crop varietal
Structure
Agronomically deduced candidates Model at large scale Every farm in the US (20M)
Structure
Modeled Outcomes
Weather Simulations
Structure
What weather dowe expect?
Weather Simulations
Weather Simulations
1M locations (2.5mi x 2.5mi grid)10k scenarios/location
going 2 years out
2 measurements
60Tb of data per
simulation set
every couple of weeks
Weather Simulations
2.5 x 2.5Square Miles
Weather Simulations
Expensive computation Parallelizing hard due to correlations
Would take 80+ years on one fast modern server-class machine
We need to generate these within days
Soil Moisture Modeling
What's the soil moisture at farm X?
Soil Moisture Modeling
soil type, weather, topography, crop
Evolution of Our Technology
Java frontend
PricingServer (Rserve)
MySQL
2007
400 stations All data in MySQLPricing servers (Rserve)Java-based webapp
Java frontend
PricingServer (Rserve)
MySQL
2008
2000 stations Weather data now on disk Versioning hard Java-R bridge messy
Disk
Java frontend
PricingServer (java)
SimulationService
Weather dataServiceSim gen (hadoop)
SimpleDB / S3 SimpleDB / S3
MySQL
2009-2010
22,000 locations Rserve replaced by java Simulations & S3/SimpleDB Model gen in Hadoop Moved fully to EC2
Rails frontend
PricingServer (java)
Marty (HBase)Geo data storeSim gen
(cascalog)
S3
MySQL
2011 – today
1,000,000 locations Own big geo-data store Many more hadoop jobs Eliminated SimpleDB
Soil moisture dataset gen (cascalog)
Structures gen (cascalog)
Other hadoop jobs
MapReduce at TCC
Python (Hadoop streaming) Some native java Most are higher-level frameworks
Big Wins
Cascalog/Clojure EC2 Spot Instances “NoSQL”
Big Win #1 - Cascalog
(defn weather-map-q "Creates a Cascalog query to extract individual measurement values of ObservationSet data and produces tuples of [date JSON-encoded map], in which each JSON-encoded map is keyed by station-id" [stations interval measurement sources start end nostra] (<- [?date ?json-aggregated-values] ; from hfs-textline (stations ?station-id) (fetch-obs-for-station [interval measurement sources start end nostra] ?station-id :> ?obs) (extract-values-by-date ?obs :> ?date ?value) (aggregate-values ?value :> ?aggregated-values) (json/generate-string ?aggregated-values :> ?json-aggregated-values)))
Big Win #1 - Cascalog
Easily composable workflows Can unit test Hadoop flows Quick iteration
Big Win #2 – EC2 Spot Instances
Good fit to our compute approach Can be very cheap Good availability
MapReduce at TCC
Big Win #3: NoSQL
Datasets must be: Repeatably Generated Versioned Indexed
Big Win #3 – NoSQL
Why not SQL? Time-series data, not relational Large size and ad hoc structure Specific query patterns 10s of Terabytes in size
NoSQL at TCC - Marty
Own big geo-data store Built on HBase Billions of records
Learning #1 – Embrace Hadoop
Defines problem clearly Focus on problem more than architecture Great tools and community support
Learning #2 – Be Careful
Fail-fast code Test, test, test Run at small scale first
Learning #3 – Architecture Matters
Eliminate single points of failure Consider memory usage and I/O Write simple flows with checkpointing Monitoring is invaluable
TCC Today
150 employees Half engineering 20 PhDs Reputation for hard science problems
… by standing on the shoulders of giants
Open Source at TCC
github.com/TheClimateCorporation
Lemur (EMR / Clojure) Repoman (coming soon) Marty (coming)
??
Click to edit Master title style
top related