external presentation template - twosigma.com · software engineer @ focus on analytics related...

49
www.twosigma.com Huohua 火花 Distributed Time Series Analysis Framework For Spark August 28, 2017 Wenbo Zhao Spark Summit 2016

Upload: dinhkien

Post on 19-Jul-2019

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

www.twosigma.com

Huohua 火花Distributed Time Series Analysis Framework For Spark

August 28, 2017

Wenbo Zhao

Spark Summit 2016

Page 2: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

About Me

August 28, 2017

Software Engineer @

Focus on analytics related tools, libraries and Systems

Page 3: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

$0.0

$500.0

$1,000.0

$1,500.0

$2,000.0

$2,500.0

1/3

/195

0

1/3

/195

3

1/3

/195

6

1/3

/195

9

1/3

/196

2

1/3

/196

5

1/3

/196

8

1/3

/197

1

1/3

/197

4

1/3

/197

7

1/3

/198

0

1/3

/198

3

1/3

/198

6

1/3

/198

9

1/3

/199

2

1/3

/199

5

1/3

/199

8

1/3

/200

1

1/3

/200

4

1/3

/200

7

1/3

/201

0

1/3

/201

3

1/3

/201

6

S&P 500

We view everything as a time series

August 28, 2017

Stock market prices

Temperatures

Sensor logs

Presidential polls

50°F

55°F

60°F

65°F

70°F

75°F

80°F

85°F

90°F

95°F

100°F

New York

San Francisco

Page 4: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

What is a time series?

August 28, 2017

A sequence of observations obtained in successive time order

Our goal is to forecast future values given past observations

$8.90 $8.95

$8.90

$9.06 $9.10

8:00 11:00 14:00 17:00 20:00

corn price?

Page 5: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Multivariate time series

August 28, 2017

We can forecast better by joining multiple time series

Temporal join is a fundamental operation for time series analysis

Huohua enables fast distributed temporal join of large scale unaligned time series

$8.90 $8.95

$8.90

$9.06 $9.10

8:00 11:00 14:00 17:00 20:00

corn price

75°F

72°F71°F

72°F

68°F67°F

65°F

temperature

Page 6: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

What is temporal join?

August 28, 2017

A particular join function defined by a matching criteria over time

Examples of criteria

look-backward – find the most recent observation in the past

look-forward – find the closest observation in the future

time series 1 time series 2

look-forward

time series 1 time series 2

look-backwardobservation

Page 7: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Temporal join with look-backward criteria

August 28, 2017

time weather

08:00 AM 60 °F

10:00 AM 70 °F

12:00 AM 80 °F

time corn price

08:00 AM

11:00 AM

time weather corn price

08:00 AM

10:00 AM

12:00 AM

Page 8: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Temporal join with look-backward criteria

August 28, 2017

time weather

08:00 AM

10:00 AM

12:00 AM

time corn price

08:00 AM

11:00 AM

time weather corn price

08:00 AM 60 °F

10:00 AM

12:00 AM

time weather

08:00 AM 60 °F

10:00 AM 70 °F

12:00 AM 80 °F

Page 9: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Temporal join with look-backward criteria

August 28, 2017

time weather

08:00 AM 60 °F

10:00 AM 70 °F

12:00 AM 80 °F

time corn price

08:00 AM

11:00 AM

time weather corn price

08:00 AM 60 °F

10:00 AM 70 °F

12:00 AM

Page 10: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Temporal join with look-backward criteria

August 28, 2017

time weather

08:00 AM 60 °F

10:00 AM 70 °F

12:00 AM 80 °F

time corn price

08:00 AM

11:00 AM

time weather corn price

08:00 AM 60 °F

10:00 AM 70 °F

12:00 AM 80 °F

Page 11: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

time corn price

08:00 AM

11:00 AM

time corn price

08:00 AM

11:00 AM

time corn price

08:00 AM

11:00 AM

time weather

08:00 AM 60 °F

10:00 AM 70 °F

12:00 AM 80 °F

time weather

08:00 AM 60 °F

10:00 AM 70 °F

12:00 AM 80 °F

time weather

08:00 AM 60 °F

10:00 AM 70 °F

12:00 AM 80 °F

Temporal join with look-backward criteria

August 28, 2017

time weather

08:00 AM 60 °F

10:00 AM 70 °F

12:00 AM 80 °F

time corn price

08:00 AM

11:00 AM

time weather corn price

08:00 AM 60 °F

10:00 AM 70 °F

12:00 AM 80 °F

Hundreds of thousands of data sources with unaligned timestamps

Thousands of market data sets

We need fast and scalable distributed temporal join

Page 12: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Issues with existing solutions

August 28, 2017

A single time series may not fit into a single machine

Forecasting may involve hundreds of time series

Existing packages don’t support temporal join or can’t handle large time series

MatLab, R, SAS, Pandas

Even Spark based solutions fall short

PairRDDFunctions, DataFrame/Dataset, spark-ts

Page 13: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Huohua – a new time series library for Spark

August 28, 2017

Goal

provide a collection of functions to manipulate and analyze time series at scale

group, temporal join, summarize, aggregate …

How

build a time series aware data structure

extending RDD to TimeSeriesRDD

optimize using temporal locality

reduce shuffling

reduce memory pressure by streaming

Page 14: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

What is a TimeSeriesRDD in Huohua?

August 28, 2017

TimeSeriesRDD extends RDD to represent time series data

associates a time range to each partition

tracks partitions’ time-ranges through operations

preserves the temporal order

TimeSeriesRDD

operations

time series

functions

Page 15: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

TimeSeriesRDD– an RDD representing time series

August 28, 2017

time temperature

6:00 AM 60°F

6:01 AM 61°F

… …

7:00 AM 70°F

7:01 AM 71°F

… …

8:00 AM 80°F

8:01 AM 81°F

… …

(6:00 AM, 60°F)(6:01 AM, 61°F)

RDD

(7:00 AM, 70°F)(7:01 AM, 71°F)

(8:00 AM, 80°F)(8:01 AM, 81°F)

Page 16: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

TimeSeriesRDD– an RDD representing time series

August 28, 2017

range: [06:00 AM, 07:00 AM)

range:[07:00 AM, 8:00 AM)

range: [8:00 AM, ∞)

TimeSeriesRDDtime temperature

6:00 AM 60°F

6:01 AM 61°F

… …

7:00 AM 70°F

7:01 AM 71°F

… …

8:00 AM 80°F

8:01 AM 81°F

… …

(6:00 AM, 60°F)(6:01 AM, 61°F)

(7:00 AM, 70°F)(7:01 AM, 71°F)

(8:00 AM, 80°F)(8:01 AM, 81°F)

Page 17: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Group function

August 28, 2017

A group function groups rows with exactly the same timestamps

time city temperature

1:00 PM New York 70°F

1:00 PM San Francisco 60°F

2:00 PM New York 71°F

2:00 PM San Francisco 61°F

3:00 PM New York 72°F

3:00 PM San Francisco 62°F

4:00 PM New York 73°F

4:00 PM San Francisco 63°F

group 1

group 2

group 3

group 4

Page 18: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Group function

August 28, 2017

A group function groups rows with nearby timestamps

time city temperature

1:00 PM New York 70°F

1:00 PM San Francisco 60°F

2:00 PM New York 71°F

2:00 PM San Francisco 61°F

3:00 PM New York 72°F

3:00 PM San Francisco 62°F

4:00 PM New York 73°F

4:00 PM San Francisco 63°F

group 1

group 2

Page 19: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Group in Spark

August 28, 2017

Groups rows with exactly the same timestamps

RDD

1:00PM

2:00PM

2:00PM

1:00PM

3:00PM

3:00PM

4:00PM

4:00PM

Page 20: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Data is shuffled and materialized

Group in Spark

August 28, 2017

RDD

groupBy

RDD

1:00PM

2:00PM

2:00PM

1:00PM

3:00PM

3:00PM

4:00PM

4:00PM

Page 21: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Group in Spark

August 28, 2017

Data is shuffled and materialized

RDD

groupBy

RDD

1:00PM 1:00PM

3:00PM 3:00PM

2:00PM

4:00PM

2:00PM

4:00PM

Page 22: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Group in Spark

August 28, 2017

Data is shuffled and materialized

RDD

groupBy

RDD

1:00PM 1:00PM

2:00PM 2:00PM

3:00PM 3:00PM

4:00PM 4:00PM

Page 23: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Group in Spark

August 28, 2017

Temporal order is not preserved

RDD

groupBy

RDD

1:00PM

2:00PM

2:00PM

1:00PM

3:00PM

3:00PM

4:00PM

4:00PM

1:00PM 1:00PM

2:00PM 2:00PM

3:00PM 3:00PM

4:00PM 4:00PM

Page 24: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Group in Spark

August 28, 2017

Another sort is required

RDD

groupBy sortBy

RDD RDD

1:00PM

2:00PM

2:00PM

1:00PM

3:00PM

3:00PM

4:00PM

4:00PM

1:00PM 1:00PM

2:00PM 2:00PM

3:00PM 3:00PM

4:00PM 4:00PM

Page 25: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Group in Spark

August 28, 2017

Another sort is required

RDD

groupBy sortBy

RDD RDD

1:00PM

2:00PM

2:00PM

1:00PM

3:00PM

3:00PM

4:00PM

4:00PM

2:00PM 2:00PM

4:00PM 4:00PM

1:00PM 1:00PM

3:00PM 3:00PM

Page 26: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Group in Spark

August 28, 2017

Back to correct temporal order

RDD

groupBy sortBy

RDD RDD

1:00PM

2:00PM

2:00PM

1:00PM

3:00PM

3:00PM

4:00PM

4:00PM

1:00PM 1:00PM

2:00PM 2:00PM

3:00PM 3:00PM

4:00PM 4:00PM

Page 27: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Group in Spark

August 28, 2017

Back to temporal order

RDD

groupBy sortBy

RDD RDD

1:00PM

2:00PM

2:00PM

1:00PM

3:00PM

3:00PM

4:00PM

4:00PM

1:00PM 1:00PM

2:00PM 2:00PM

3:00PM 3:00PM

4:00PM 4:00PM

1:00PM 1:00PM

2:00PM 2:00PM

3:00PM 3:00PM

4:00PM 4:00PM

Page 28: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Group in Huohua

August 28, 2017

Data is grouped locally as streams

TimeSeriesRDD

1:00PM

2:00PM

2:00PM

1:00PM

3:00PM

3:00PM

4:00PM

4:00PM

Page 29: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Group in Huohua

August 28, 2017

Data is grouped locally as streams

TimeSeriesRDD

1:00PM

2:00PM

2:00PM

1:00PM

3:00PM

3:00PM

4:00PM

4:00PM

Page 30: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Group in Huohua

August 28, 2017

Data is grouped locally as streams

TimeSeriesRDD

1:00PM

2:00PM

1:00PM

3:00PM 3:00PM

4:00PM

4:00PM

2:00PM

Page 31: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Group in Huohua

August 28, 2017

Data is grouped locally as streams

TimeSeriesRDD

1:00PM

2:00PM

1:00PM

3:00PM 3:00PM

4:00PM 4:00PM

2:00PM

Page 32: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Benchmark for group

August 28, 2017

Running time of count after group

16 executors (10G memory and 4 cores per executor)

data is read from HDFS

0s

20s

40s

60s

80s

100s

20M 40M 60M 80M 100M

RDD DataFrame TimeseriesRDD

Page 33: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Temporal join

August 28, 2017

A temporal join function is defined by a matching criteria over time

A typical matching criteria has two parameters

direction – whether it should look-backward or look-forward

window - how much it should look-backward or look-forward

look-backward temporal join

window

Page 34: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Temporal join

August 28, 2017

A temporal join function is defined by a matching criteria over time

A typical matching criteria has two parameters

direction – whether it should look-backward or look-forward

window - how much it should look-backward or look-forward

look-backward temporal join

window

Page 35: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Temporal join

August 28, 2017

Temporal join with criteria look-back and window of length 1

2:00AM

1:00AM

4:00AM

5:00AM

1:00AM

3:00AM

5:00AM

time series time series

Page 36: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Temporal join

August 28, 2017

Temporal join with criteria look-back and window of length 1

How do we do temporal join in TimeSeriesRDD?

TimeSeriesRDD TimeSeriesRDD

2:00AM

1:00AM

4:00AM

5:00AM

1:00AM

3:00AM

5:00AM

Page 37: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Temporal join in Huohua

August 28, 2017

Temporal join with criteria look-back and window of length 1

partition time space into disjoint intervals

TimeSeriesRDD TimeSeriesRDDjoined

2:00AM

1:00AM

4:00AM

5:00AM

1:00AM

3:00AM

5:00AM

Page 38: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Temporal join in Huohua

August 28, 2017

Temporal join with criteria look-back and window of length 1

Build dependency graph for the joined TimeSeriesRDD

TimeSeriesRDD TimeSeriesRDDjoined

2:00AM

1:00AM

4:00AM

5:00AM

1:00AM

3:00AM

5:00AM

Page 39: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Temporal join in Huohua

August 28, 2017

Temporal join with criteria look-back and window 1

Join data as streams per partition

1:00AM 1

TimeSeriesRDD TimeSeriesRDDjoined

1:00AM 1:00AM1:00AM

2:00AM

4:00AM

5:00AM

3:00AM

5:00AM

Page 40: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Temporal join in Huohua

August 28, 2017

Temporal join with criteria look-back and window 1

Join data as streams

2:00AM

1:00AM

4:00AM

5:00AM

1:00AM

3:00AM

5:00AM

TimeSeriesRDD TimeSeriesRDDjoined

1:00AM 1:00AM1:00AM

2:00AM

Page 41: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Temporal join in Huohua

August 28, 2017

Temporal join with criteria look-back and window 1

Join data as streams

2:00AM

1:00AM

4:00AM

5:00AM

1:00AM

3:00AM

5:00AM

TimeSeriesRDD TimeSeriesRDDjoined

1:00AM

1:00AM

1:00AM

2:00AM

4:00AM

3:00AM

Page 42: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Temporal join in Huohua

August 28, 2017

Temporal join with criteria look-back and window 1

Join data as streams

2:00AM

1:00AM

4:00AM

5:00AM

1:00AM

3:00AM

5:00AM

TimeSeriesRDD TimeSeriesRDDjoined

1:00AM

1:00AM

1:00AM

2:00AM

4:00AM 3:00AM

5:00AM 5:00AM

Page 43: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Benchmark for temporal join

August 28, 2017

Running time of count after temporal join

16 executors (10G memory and 4 cores per executor)

data is read from HDFS

0s

20s

40s

60s

80s

100s

20M 40M 60M 80M 100M

RDD DataFrame TimeseriesRDD

Page 44: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Functions over TimeSeriesRDD

August 28, 2017

group functions such as window, intervalization etc.

temporal joins such as look-forward, look-backward etc.

summarizers such as average, variance, z-score etc. over

windows

Intervals

cycles

Page 45: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Open Source

August 28, 2017

Not quite yet …

https://github.com/twosigma

Page 46: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Future work

August 28, 2017

Dataframe / Dataset integration

Speed up

Richer APIs

Python bindings

More summarizers

Page 47: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Key contributors

August 28, 2017

Christopher Aycock

Jonathan Coveney

Jin Li

David Medina

David Palaitis

Ris Sawyer

Leif Walsh

Wenbo Zhao

Page 48: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

Thank you

August 28, 2017

QA

Page 49: External Presentation Template - twosigma.com · Software Engineer @ Focus on analytics related tools, libraries and Systems. $0.0 $500.0 $1,000.0 $1,500.0 $2,000.0 $2,500.0 0 3 6

This document is being distributed for informational and educational purposes only and is not an offer to sell or the solicitation of an offer to buy

any securities or other instruments. The information contained herein is not intended to provide, and should not be relied upon for investment

advice. The views expressed herein are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two Sigma”).

Such views reflect significant assumptions and subjective of the author(s) of the document and are subject to change without notice. The

document may employ data derived from third-party sources. No representation is made as to the accuracy of such information and the use of

such information in no way implies an endorsement of the source of such information or its validity.

The copyrights and/or trademarks in some of the images, logos or other material used herein may be owned by entities other than Two Sigma. If

so, such copyrights and/or trademarks are most likely owned by the entity that created the material and are used purely for identification and

comment as fair use under international copyright and/or trademark laws. Use of such image, copyright or trademark does not imply any

association with such organization (or endorsement of such organization) by Two Sigma, nor vice versa.