stream processing with kafka in uber, danny yuan

84
STREAM PROCESSING IN UBER MARKETPLACE

Upload: confluent

Post on 21-Apr-2017

2.738 views

Category:

Engineering


4 download

TRANSCRIPT

Page 1: Stream Processing with Kafka in Uber, Danny Yuan

STREAM PROCESSING IN UBER MARKETPLACE

Page 2: Stream Processing with Kafka in Uber, Danny Yuan

~ 68 countries / 350+ cities Transportation as reliable as running water, everywhere, for everyone

2

Page 3: Stream Processing with Kafka in Uber, Danny Yuan

AgendaWhat’s on the menu?

•Use Cases •Problem Space •Overall Architecture •Choices & Tradeoffs •Q & A

Page 4: Stream Processing with Kafka in Uber, Danny Yuan

Use Case: Realtime OLAP

Page 5: Stream Processing with Kafka in Uber, Danny Yuan

There is always need for quick exploration

Page 6: Stream Processing with Kafka in Uber, Danny Yuan

How many open cars in the world, NOW?

Page 7: Stream Processing with Kafka in Uber, Danny Yuan
Page 8: Stream Processing with Kafka in Uber, Danny Yuan

How many UberXs were driving clients in SF in the past 10 minutes by hexagons?

Page 9: Stream Processing with Kafka in Uber, Danny Yuan

How many UberXs were driving clients in SF in the past 10 minutes by hexagons?

Page 10: Stream Processing with Kafka in Uber, Danny Yuan

Driving time and other metrics over time by hexagonal area

Page 11: Stream Processing with Kafka in Uber, Danny Yuan
Page 12: Stream Processing with Kafka in Uber, Danny Yuan

Use Case: Complex Event Processing

Page 13: Stream Processing with Kafka in Uber, Danny Yuan

There are patterns in event streams

Page 14: Stream Processing with Kafka in Uber, Danny Yuan

How many drivers cancel requests more than 3 times in a row within a 10-

minute window?

Page 15: Stream Processing with Kafka in Uber, Danny Yuan

Report riders requesting a pickup 100 miles apart within a half hour window?

Page 16: Stream Processing with Kafka in Uber, Danny Yuan

IF

This —>

Then that —>

● Sigma is similar - but for offline/batch applications

Complex Event Processing

Page 17: Stream Processing with Kafka in Uber, Danny Yuan

Use Case: Supply Positioning

Page 18: Stream Processing with Kafka in Uber, Danny Yuan

Clusters Of Supply & Demand

Page 19: Stream Processing with Kafka in Uber, Danny Yuan

Predicted Health Metrics

Actual Health Metrics

Monitor Marketplace Health

Page 20: Stream Processing with Kafka in Uber, Danny Yuan

Challenges

Page 21: Stream Processing with Kafka in Uber, Danny Yuan

OLAP of Geo-spatial Temporal Data

Reasonably Large Scale

Near Real Time

Page 22: Stream Processing with Kafka in Uber, Danny Yuan

• Indexing, Lookup, Rendering

• Symmetric Neighbors

• Convex & Compact Regions

• Equal Areas

• Equal Shape

Hexagons

Page 23: Stream Processing with Kafka in Uber, Danny Yuan

Scale

Geo Space Vehicle Types Time Status

X X X

Page 24: Stream Processing with Kafka in Uber, Danny Yuan

Granular Geo Areas

Page 25: Stream Processing with Kafka in Uber, Danny Yuan

Granular Geo Areas

Over 10,000 hexagons in a city

Page 26: Stream Processing with Kafka in Uber, Danny Yuan

Multiple Vehicle Types

7 vehicle types

Page 27: Stream Processing with Kafka in Uber, Danny Yuan

Minute-level Time Buckets

1440 minutes in a day

Page 28: Stream Processing with Kafka in Uber, Danny Yuan

Many Driver States

13 driver states

Page 29: Stream Processing with Kafka in Uber, Danny Yuan

Many Cities

300 cities

Page 30: Stream Processing with Kafka in Uber, Danny Yuan

Granular Data

1 day of data: 300 x 10,000 x 7 x 1440 x 13 = 393 billion possible combinations

Page 31: Stream Processing with Kafka in Uber, Danny Yuan

Unknown Query Patterns

Any combination of dimensions

Page 32: Stream Processing with Kafka in Uber, Danny Yuan

Variety of Aggregations - Heatmap

- Top N

- Histogram

- count(), avg(), sum(), percent(), geo

Page 33: Stream Processing with Kafka in Uber, Danny Yuan

Large Data Volume

• Hundreds of thousands of events per second

• At least dozens of fields in each event

Page 34: Stream Processing with Kafka in Uber, Danny Yuan

Multiple TopicsRider States Driver States

Page 35: Stream Processing with Kafka in Uber, Danny Yuan

Let’s build a stream processing pipeline

Page 36: Stream Processing with Kafka in Uber, Danny Yuan

Pipeline Template

Page 37: Stream Processing with Kafka in Uber, Danny Yuan

Event Collection

Page 38: Stream Processing with Kafka in Uber, Danny Yuan

Multiple Event Types with Different Volume

Page 39: Stream Processing with Kafka in Uber, Danny Yuan

Hundreds of Thousands of Events Per Second

Page 40: Stream Processing with Kafka in Uber, Danny Yuan

Events Should Be Available Under a Second

Page 41: Stream Processing with Kafka in Uber, Danny Yuan

Events Should Rarely Get Lost

Page 42: Stream Processing with Kafka in Uber, Danny Yuan

Multiple Consumers

Page 43: Stream Processing with Kafka in Uber, Danny Yuan
Page 44: Stream Processing with Kafka in Uber, Danny Yuan

Natural Choice: Apache Kafka

- Low latency and high throughput

- Persistent events

- Distributes a topic by partitions

- Groups consumers by consumer groups

Page 45: Stream Processing with Kafka in Uber, Danny Yuan
Page 46: Stream Processing with Kafka in Uber, Danny Yuan

Event Processing

Page 47: Stream Processing with Kafka in Uber, Danny Yuan

Transformation

Page 48: Stream Processing with Kafka in Uber, Danny Yuan

Event Transformation Example

(Lat, Long) -> (zipcode, hexagon, S2)

Page 49: Stream Processing with Kafka in Uber, Danny Yuan

Pre-aggregation

Page 50: Stream Processing with Kafka in Uber, Danny Yuan

Joining Multiple Streams

Page 51: Stream Processing with Kafka in Uber, Danny Yuan

Sessionization

Page 52: Stream Processing with Kafka in Uber, Danny Yuan

Multi-Staged Processing

Page 53: Stream Processing with Kafka in Uber, Danny Yuan

Minimum Requirements

- Statement Management

- Checkpointing

- Automatic Resource Management

- Multi-staged processing

Page 54: Stream Processing with Kafka in Uber, Danny Yuan

Apache Samza

Page 55: Stream Processing with Kafka in Uber, Danny Yuan

Why Apache Samza? - DAG on Kafka

- Excellent integration with Kafka

- Built-in checkpointing

- Built-in state management

- Excellent support from our data team

Page 56: Stream Processing with Kafka in Uber, Danny Yuan

Samza Is Conceptually Simple

Page 57: Stream Processing with Kafka in Uber, Danny Yuan

IF

This —>

Then that —>

● Sigma is similar - but for offline/batch applications

Complex Event Processing

Page 58: Stream Processing with Kafka in Uber, Danny Yuan

● Sigma is similar - but for offline/batch applications

Complex Event Processing

Page 59: Stream Processing with Kafka in Uber, Danny Yuan

● Sigma is similar - but for offline/batch applications

Complex Event Processing

Page 60: Stream Processing with Kafka in Uber, Danny Yuan

● Sigma is similar - but for offline/batch applications

Complex Event Processing

Page 61: Stream Processing with Kafka in Uber, Danny Yuan

● Sigma is similar - but for offline/batch applications

Complex Event Processing

Page 62: Stream Processing with Kafka in Uber, Danny Yuan

● Sigma is similar - but for offline/batch applications

Slightly Expanded Version

Page 63: Stream Processing with Kafka in Uber, Danny Yuan

● Sigma is similar - but for offline/batch applications

Slightly Expanded Version

Page 64: Stream Processing with Kafka in Uber, Danny Yuan

● Sigma is similar - but for offline/batch applications

Slightly Expanded Version

Page 65: Stream Processing with Kafka in Uber, Danny Yuan

● Sigma is similar - but for offline/batch applications

Slightly Expanded Version

Page 66: Stream Processing with Kafka in Uber, Danny Yuan
Page 67: Stream Processing with Kafka in Uber, Danny Yuan

Applications

Page 68: Stream Processing with Kafka in Uber, Danny Yuan

Dashboard of Realtime Business Metrics

Page 69: Stream Processing with Kafka in Uber, Danny Yuan

Ad-Hoc Queries

Page 70: Stream Processing with Kafka in Uber, Danny Yuan

Visualization with Streaming

Page 71: Stream Processing with Kafka in Uber, Danny Yuan

Visualization with Streaming

LocationUpdatewherecity=X

LocationUpdatewherecity=Yandvehicle=‘UberX’

100%

100%

100%

10%

5%

Page 72: Stream Processing with Kafka in Uber, Danny Yuan

Visualization with Streaming

LocationUpdatewherecity=X

LocationUpdatewherecity=Yandvehicle=‘UberX’

100%

100%

100%

10%

5%

Page 73: Stream Processing with Kafka in Uber, Danny Yuan

Visualization with Streaming

LocationUpdatewherecity=X

LocationUpdatewherecity=Yandvehicle=‘UberX’

100%

100%

100%

10%

5%

Page 74: Stream Processing with Kafka in Uber, Danny Yuan

Visualization with Streaming

LocationUpdatewherecity=X

LocationUpdatewherecity=Yandvehicle=‘UberX’

100%

100%

100%

10%

5%

Page 75: Stream Processing with Kafka in Uber, Danny Yuan

Visualization with Streaming

LocationUpdatewherecity=X

LocationUpdatewherecity=Yandvehicle=‘UberX’

100%

100%

100%

10%

5%

Page 76: Stream Processing with Kafka in Uber, Danny Yuan

Visualization with Streaming

LocationUpdatewherecity=‘SF’

LocationUpdatewherecity=‘LA’andvehicle

10%

5%

100% 100%

Page 77: Stream Processing with Kafka in Uber, Danny Yuan

Ad-hoc Exploration

Page 78: Stream Processing with Kafka in Uber, Danny Yuan

A Few Trade-Offs

Page 79: Stream Processing with Kafka in Uber, Danny Yuan

Lambda vs Kappa

Page 80: Stream Processing with Kafka in Uber, Danny Yuan

We Use Lambda - Spark + HDFS/S3 for batch processing - Yes, it is painful, but

- We may need to go way back due to change of business requirements

- Batch process can run faster — they scale differently - It was not easy to start a new stream processing instance

Page 81: Stream Processing with Kafka in Uber, Danny Yuan

Processing by Event Time Is Not Always Easy

Page 82: Stream Processing with Kafka in Uber, Danny Yuan

Leverage The Storage Layer

Page 83: Stream Processing with Kafka in Uber, Danny Yuan

Dealing with Limitation of Samza -No broadcasting. We have to override SystemStreamPartitionGrouper

-No dynamic topology. Can’t have arbitrary number of

nested CEP queries

-Tedious configuration and deployment of jobs. In house

code-gem and deployment solution

Page 84: Stream Processing with Kafka in Uber, Danny Yuan

Thank You