helping the world’s farmers adapt to climate change

Post on 07-Jan-2016

28 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Helping the World’s Farmers Adapt to Climate Change. Strata Conference Oct 2012 Siraj Khaliq, CTO, The Climate Corporation. Fritchton, IN – late summer, 2012. Louisville, IL. Wichita, KA. Click to edit Master title style. 1956 2012 1988 Worst US Droughts in the Last Fifty Years. - PowerPoint PPT Presentation

TRANSCRIPT

Helping the World’s Farmers Adapt to Climate Change

Strata Conference Oct 2012Siraj Khaliq, CTO, The Climate Corporation

Fritchton, IN – late summer, 2012

Louisville, IL

Wichita, KA

Click to edit Master title style

• Click to edit Master text styles– Second level

• Third level– Fourth level

» Fifth level 195620121988

Worst US Droughts in the Last Fifty Years

Click to edit Master title style

-16%2012 Estimated Corn Yield (USDA)

Click to edit Master title style

+6%World food prices month-on-month

change in July 2012 (UNFAO)

Large capital outlays at start of season (April)

Seed, equipment, pesticide, and land

Revenue comes in at harvest

1-2 years of revenue shortfall could be catastrophic

Futures help with price volatility, not weather

Farm Economics

Farmer Rich Vernon talks to NPR's David Schaper (audio)

A real-life example

This is set to continue

Click to edit Master title style

Click to edit Master title style

To help all the world's people & businesses manage and

adapt to climate change

Our Mission

Evaluating Markets

Click to edit Master title style

$4.2 Trillion2012 Estimated Corn Yield (USDA)

Total Weather Insurance (TWI)

TWI Demo

HOW?

OutcomeWeather DataPolicy

Modeled Outcomes

Weather Simulations

Structure

StructureHow does weather impact crop yield?

Structure

Varies based on many inputs: Temperature Precipitation Soil type Topography Farming practices Crop varietal

Structure

Agronomically deduced candidates Model at large scale Every farm in the US (20M)

Structure

Modeled Outcomes

Weather Simulations

Structure

What weather dowe expect?

Weather Simulations

Weather Simulations

1M locations (2.5mi x 2.5mi grid)10k scenarios/location

going 2 years out

2 measurements

60Tb of data per

simulation set

every couple of weeks

Weather Simulations

2.5 x 2.5Square Miles

Weather Simulations

Expensive computation Parallelizing hard due to correlations

Would take 80+ years on one fast modern server-class machine

We need to generate these within days

Soil Moisture Modeling

What's the soil moisture at farm X?

Soil Moisture Modeling

soil type, weather, topography, crop

Evolution of Our Technology

Java frontend

PricingServer (Rserve)

MySQL

2007

400 stations All data in MySQLPricing servers (Rserve)Java-based webapp

Java frontend

PricingServer (Rserve)

MySQL

2008

2000 stations Weather data now on disk Versioning hard Java-R bridge messy

Disk

Java frontend

PricingServer (java)

SimulationService

Weather dataServiceSim gen (hadoop)

SimpleDB / S3 SimpleDB / S3

MySQL

2009-2010

22,000 locations Rserve replaced by java Simulations & S3/SimpleDB Model gen in Hadoop Moved fully to EC2

Rails frontend

PricingServer (java)

Marty (HBase)Geo data storeSim gen

(cascalog)

S3

MySQL

2011 – today

1,000,000 locations Own big geo-data store Many more hadoop jobs Eliminated SimpleDB

Soil moisture dataset gen (cascalog)

Structures gen (cascalog)

Other hadoop jobs

MapReduce at TCC

Python (Hadoop streaming) Some native java Most are higher-level frameworks

Big Wins

Cascalog/Clojure EC2 Spot Instances “NoSQL”

Big Win #1 - Cascalog

(defn weather-map-q  "Creates a Cascalog query to extract individual measurement values of  ObservationSet data and produces tuples of [date JSON-encoded map], in  which each JSON-encoded map is keyed by station-id"  [stations interval measurement sources start end nostra]  (<- [?date ?json-aggregated-values] ; from hfs-textline    (stations ?station-id)    (fetch-obs-for-station [interval measurement sources start end nostra]                           ?station-id :> ?obs)    (extract-values-by-date ?obs :> ?date ?value)    (aggregate-values ?value :> ?aggregated-values)    (json/generate-string ?aggregated-values :> ?json-aggregated-values)))

Big Win #1 - Cascalog

Easily composable workflows Can unit test Hadoop flows Quick iteration

Big Win #2 – EC2 Spot Instances

Good fit to our compute approach Can be very cheap Good availability

MapReduce at TCC

Big Win #3: NoSQL

Datasets must be: Repeatably Generated Versioned Indexed

Big Win #3 – NoSQL

Why not SQL? Time-series data, not relational Large size and ad hoc structure Specific query patterns 10s of Terabytes in size

NoSQL at TCC - Marty

Own big geo-data store Built on HBase Billions of records

Learning #1 – Embrace Hadoop

Defines problem clearly Focus on problem more than architecture Great tools and community support

Learning #2 – Be Careful

Fail-fast code Test, test, test Run at small scale first

Learning #3 – Architecture Matters

Eliminate single points of failure Consider memory usage and I/O Write simple flows with checkpointing Monitoring is invaluable

TCC Today

150 employees Half engineering 20 PhDs Reputation for hard science problems

… by standing on the shoulders of giants

Open Source at TCC

github.com/TheClimateCorporation

Lemur (EMR / Clojure) Repoman (coming soon) Marty (coming)

??

Click to edit Master title style

top related