bigdata spatial analytics - amazon s3...hadoop tools arcmap catalog demo time the “zoo” • pig...

69
BigData Spatial Analytics Mansour Raad

Upload: others

Post on 10-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

BigDataSpatial Analytics

Mansour Raad

Page 2: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Story Time...

Page 3: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

is hereby granted to

to certify that he/she has completed to satisfaction

The CCDH Exam

Cloudera, Inc. 210 Portage Avenue Palo Alto, CA 94306 www.cloudera.com

___________________________ Date Granted

Test Date:

___________________________ Authorized Signature

Mansour Raad

March 2, 2012

Mar 09, 2012

is hereby granted to

to certify that he/she has completed to satisfaction

The CCDH Exam

Cloudera, Inc. 210 Portage Avenue Palo Alto, CA 94306 www.cloudera.com

___________________________ Date Granted

Test Date:

__ __________________________Authorized Signature

March 2, 2012

Mar 09, 2012

Page 4: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •
Page 5: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •
Page 6: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Finally, a big nail...

Page 7: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Input 1

Page 8: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

U.S.Demographic

Data

Page 9: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Demographic Info

• Location

• Gender

• Race

• Income

• Age

Page 10: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Input 2

Page 11: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

~1000 Locations

Page 12: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Task...

Page 13: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

For Each LocationFor Each Demographic

50 Mile Heatmap

Page 14: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •
Page 15: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

“Traditional Way”

• 14 Days Later

• 850GB Raster

Page 16: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Gotta Be A Better Way !

Page 17: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Hadoop

Page 18: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

$> cat input | map | sort | reduce > out

Page 19: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •
Page 20: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Advantage

• Parallelism

• Fast Input Stream

• Fast Computational Geometry

• Distributed Cache

Page 21: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Vector / Raster

Page 22: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Cooperative Processing

g.beginGradientFill(GradientType.RADIAL,[ 0xFF0000, 0x0000FF ], ...);g.drawRect(x, y, 200, 200);g.endFill();bitmapData.draw(shape, null, null, BlendMode.SCREEN, null, true);

Page 23: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Where To Run 10 Nodes ?

Page 24: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •
Page 25: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •
Page 26: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •
Page 27: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

~238 MB Vectorvs.

~850 GB Raster

Page 28: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •
Page 29: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Best Visualizer ?

Page 30: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •
Page 31: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

What is Big Data ?

Page 32: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Great Story Telling Tool !

Data Democratizer!Beyond Dashboard!Can have best ML, best model, best team, all useless if u cannot tell a story of results!

Page 33: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

What Is Big Data ?

(academic)

Page 34: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Beyond Traditional Means !

Traditional Processing

Traditional Database

Page 35: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

•Too Big

•Too Fast

•Unstructured

Page 36: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Forcing new ways of thinking !

Page 37: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Big Data Sources...

Catch all wordsjust like “Cloud” was 3 year ago !

Page 38: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •
Page 39: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

WebLogs

Page 40: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

“Internet Of Things”

Page 41: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Imagery

Page 42: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Health Records

Page 43: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

VOLUME

VELOCITY VARIETY

Page 44: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Volume

• Very Large Amount

• More Parameters

• Multi Node

• Storage

• Processing -Simple math is more effective with large parameters-Scalable storage-Program to data rather data to program

Page 45: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Velocity

• Rate of digital flow

• Streaming

• Event Processing

• Feedback Loop

• Recommendations - Clicks, locations- Mobile / Smartphones- Last 5 min snapshot of traffic is no good when crossing the street- CERN

Page 46: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Velocity Engines

• IBM InfoSphere Streams

• Twitter Storm

• Apache S4

Page 47: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Variety

• Unstructured

• Incomplete

• Semantically Different

Data is messy

Page 48: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Storage Variety

• NoSQL

• Columnar (HBase)

• Key/Value (Redis)

• Document (MongoDB)

• Graph (Neo4J)

Page 49: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Hadoop

Page 50: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

HDFS

• Multi-TB Storage

• Inexpensive Nodes

• Fault Tolerant

• Concurrent Reading

• Brings Programs To Data

Page 51: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

MapReduce

• Software Framework

• Parallel Processing

• Jobs Executed on HDFS

• Java / Python / C++

• Spatial Libraries

Page 52: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

MapReduce Job

input | map | sort | reduce | output

Java Jars packaged and sent to data nodes for execution

Page 53: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Apache Hive

Page 54: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

“SQL”

MapReduce Job

Page 55: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

HDFSCSVTSV

JSONBINARY

MapReduce

hive> select * from cities where country=‘lebanon’;

Page 56: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Spatial Storage

• CSV,TSV Lat,Lon

• Esri JSON format

• {geometry:{x:-123,y:45},attributes:{}}

• Custom

Page 57: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

What About Spatial ?

Page 58: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

User Defined Functions

• select tolower(“ESRI”);

• select * from mytable where cos(rad) < 0.1;

Page 59: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Spatial UDF !

Page 60: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

select * from citieswhere near(x,y,-84.2,39.4);

Page 61: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

select * from citieswhere contains(x,y,’#mypolys’);

Page 62: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

PythonGeoProcessing

Page 63: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

HDFSRDBMS

“small data” “big data”

HadoopTools

ArcMapCatalog

Page 64: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Demo Time

Page 65: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

The “Zoo”

• Pig - high level language for hadoop

• HBase - real/time random access to hdfs

• Flume - streaming data flow

• Mahout - machine learning

• Zookeeper - distributed state management

Page 66: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Processing Evolution

• Transactional - Batch

• Operational - Dashboard

• Analytical - Exploratory

• Intelligent - Real/Time, predictive

Fixed Schema

Variable Schema

Page 67: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

“[T]here are known knowns; there are things we know that we know.There are known unknowns; that is to say there are things that, we now know we don't know.But there are also unknown unknowns – there are things we do not know we don't know.”

—United States Secretary of Defense, Donald Rumsfeld

Page 68: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

[email protected]

@mraad

Page 69: BigData Spatial Analytics - Amazon S3...Hadoop Tools ArcMap Catalog Demo Time The “Zoo” • Pig - high level language for hadoop • HBase - real/time random access to hdfs •

Date Event Location

March 21, 2013Esri DC Meet Up – Big Data & Location Analytics Washington, DC

April 18, 2013 Esri DC Meet Up Washington, DC

March 23–26, 2013 Esri Partner Conference Palm Springs, CA

March 25–28, 2013 Esri Developer Summit Palm Springs, CA

July 6–9, 2013 Esri National Security Summit San Diego, CA

July 8–12, 2013 Esri International User Conference San Diego, CA

Upcoming Events