Transcript
Page 1: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

Indexing the Earth

Hadoop World NYC 2011 Oliver Guinan -VP Ground Data Systems

[email protected]

Page 2: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Session Agenda

2

‣ Skybox

‣ The Big Data problem

‣ Indexing the planet at scale

‣ Questions

Page 3: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Today’s data is old

3

Stadium under construction

(completed 2010)

Bridge under construction (completed

2009)

Convention center under construction (completed 2010)

Image taken September 2008. > than

three years old

Page 4: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

A problem of scale

4

Page 5: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Satellite Imagery = Transparency...

215 automobiles

55,245 gallonsof oil crude

6,254containers

43%damage

-15%vegetation

5J F M A M J J A S O N D J F M A M J J A S O N D J F

Page 6: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

6

The problem ofcapacity

Page 7: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

7

Sensor networkin space

Page 8: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

New approach: Many distributed, low-cost satellites

8

Page 9: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Total Raw Data compute

• Satellites produce ~1TB of raw data/day

9

0

3.75

7.5

11.25

15

Year1 Year2 Year3 Year4 Year50

5

10

15

20

Dat

a C

aptu

red

per

Yea

r (P

B)

Sen

sors

in N

etw

ork

Title

Sensor NetworkSingle SatelliteSensors in Network

Page 10: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Total Raw Data storage

• Satellites produce ~1TB of raw data/day

10

0

7.5

15

22.5

30

Year1 Year2 Year3 Year4 Year50

5

10

15

20

Dat

a C

aptu

red

per

Yea

r (P

B)

Sen

sors

in N

etw

ork

Title

Sensor NetworkSingle SatelliteSensors in Network

Page 11: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Enter the elephant

11

Page 12: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Hadoop from space - processing bits

12

Hadoop is bad at:

๏Calling native C code or libraries at scale

๏Scientific computing is immature in Java

Page 13: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Hadoop from space - processing bits

13

Standard Java Hadoop

๏Hadoop knows where data stored

๏Jobs efficiently scheduled close to data

๏Throughput optimized

Page 14: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Hadoop from space - processing bits

14

Hadoop Pipes & Streaming

๏Hadoop schedules jobs without regard to

the data required by the job

๏Native code reads data across the network

๏Drives up network costs and drives down

throughput

Page 15: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Hadoop from space - processing bits

15

BusBoy

✓Hadoop manages data reads & writes

✓Hadoop schedules jobs close to the data

✓Jobs read data and hand off to native code

for processing

Page 16: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Architecture Overview

16

Hadoop Task

C code

math.libgdal.libcv.lib

BusBoy

Logging ProgressInputs Outputs

Hadoop JobTracker

HDFS HBase Hive

Page 17: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Framework Benefits - Deployment

17

✓Low time to first byte

✓Insight into job progress

✓Diagnostics for large scale operations

✓Logging

Page 18: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Framework Benifits - Development

18

✓Prototyping outside of Hadoop

✓Rapid turnaround

✓Testable interfaces

Page 19: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Skybox providing Big Data

19

✓Produce the most complete and timely data

about the world

✓Make data available to users to mine the raw

data for information

✓Turn Big Data into knowledge, at Earth scale

SkyboxBusBoy

Page 20: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

20

Simulated from aerial platform using flight sensor

Color Images

Page 21: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

HD Video

Page 22: Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

HadoopWorld 2011

Questions?Sample Data?

[email protected]


Top Related