data integration with spark | mar 18, 2015 · pdf file3 founded from paypal in 2004 palantir...

21
Data Integration with Spark | Mar 18, 2015

Upload: ledieu

Post on 03-Feb-2018

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

Data Integration with Spark | Mar 18, 2015

Page 2: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

2

ABOUT ME

Brian Schimpf

Palantirian since 2007

Director of Engineering

Page 3: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

3

Founded from Paypal in 2004

PALANTIR

Human computer symbiosis, data integration

Government space Counter-terrorism

Page 4: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

4

PALANTIR

Page 5: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

THE PROBLEM TRADER OVERSIGHT

Page 6: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

6

1995

BARINGS BANK €825M loss

2010

SOCIETE GENERALE €5B loss

2012

UBS £1.4B loss £30M fine

2013

GOLDMAN SACHS Unauthorized $8B position $120M loss

THE PROBLEM

Page 7: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

7

LACK OF CONTEXT

HIGH NOISE, LOW SIGNAL

DATA SCALE & DIVERSITY

Single-point alerting drown analysts in noise that fail to capture complex patterns of behavior.

Majority of incidents begin with a small breach that may not look very different from normal trading activity.

Data scale is both massive and incredibly diverse (including structured and unstructured data).

THE PROBLEM

Page 8: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

OUR SOLUTION TRADER OVERSIGHT

Page 9: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

9

IMPROVE THE RISK MODEL

Page 10: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

10

IMPROVE THE INTERFACE

Page 11: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

11

DATA INTEGRATIO

N

ANALYTICS DECISIONS

PALANTIR IN PRACTICE

Page 12: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

12

FOUNDRY

Developer tools for data Variety of incoming data

Manage lots of transformations Spark & open source

Page 13: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

13

SNAPSHOT – VER 1

UPDATE – VER 3

Dataset in HDFS/S3

LOGS

JDBC

STREAMS

UPDATE – VER 2

/dataset/1/main.avro  

/dataset/2/main.avro  

/dataset/3/main.avro  

FOUNDRY

Page 14: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

14

DATASET A

VIEW D Spark Transform

VIEW E Python Script

VIEW F SparkSQL

DATASET B

DATASET C

SchemaRDD

Pandas Dataframe

FOUNDRY

Page 15: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

15

DATASET A

VIEW D

VIEW E

VIEW F DATASET B

DATASET C

CURRENT VERSION 1

CURRENT VERSION 1

CURRENT VERSION 1 CURRENT VERSION 1 ->

CURRENT VERSION 1-> DEP A VERSION = 1 DEP B VERSION = 1

DEP C VERSION = 1

CURRENT VERSION 1 -> DEP D VERSION = 1

DEP E VERSION = 1

FOUNDRY

Page 16: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

16

DATASET A

VIEW D

VIEW E

VIEW F DATASET B

DATASET C

CURRENT VERSION 1

CURRENT VERSION 2

CURRENT VERSION 1 CURRENT VERSION 1 ->

CURRENT VERSION 1-> DEP A VERSION = 1 DEP B VERSION = 1

DEP C VERSION = 1

CURRENT VERSION 1 -> DEP D VERSION = 1

DEP E VERSION = 1

FOUNDRY

Page 17: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

17

FOUNDRY

Page 18: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

18

DATASET A

VIEW D

VIEW E

VIEW F DATASET B

DATASET C

Inspect the data - SchemaRDD

MERIDIAN

Page 19: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism
Page 20: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

20

Take advantage of Spark improvements

Human/Computer Symbiosis

FUTURE

Page 21: Data Integration with Spark | Mar 18, 2015 · PDF file3 Founded from Paypal in 2004 PALANTIR Human computer symbiosis, data integration Government space Counter-terrorism

THANK YOU!

WE’RE RECRUITING! palantir.com/jobs Questions? [email protected]