![Page 1: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/1.jpg)
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.
Near Real-Time AnalyticsChallenges & Lessons at Uber Engineering
![Page 2: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/2.jpg)
U B E R | Data
Chinmay Soman @ChinmaySoman
● Staff Software Engineer @ Uber● Tech Lead on Streaming Platform● Background in distributed storage and filesystems● Apache Samza Committer, PMC
Quick Introduction
![Page 3: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/3.jpg)
LAS VEGAS SUMMIT 2015
Billion to Trillions ~ PBMessages/day bytes/day
Apache Kafka at Uber
![Page 4: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/4.jpg)
LAS VEGAS SUMMIT 2015
Billions 100s of TB - PBMessages Processed / day Bytes Processed / day
Near Real-Time Analytics at Uber
![Page 5: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/5.jpg)
U B E R | Data
What is near real-time ?
Decision Time
Seconds to 5 mins
![Page 6: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/6.jpg)
U B E R | Data
Agenda
● Evolution of Business Needs● The case for SQL as building block● New ecosystem using Flink● The road ahead
![Page 7: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/7.jpg)
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.
Evolution of Business Needs
![Page 8: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/8.jpg)
U B E R | Data
Case I - Growth Metrics“How many cars are active
right now ?”
“What % of trips have been delayed in the last 5 mins ?”
“What is the % of Uber X trips taken by Android users ?”
![Page 9: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/9.jpg)
U B E R | Data
Events logged to Kafka
Rider eyeballs
Trip updates
![Page 10: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/10.jpg)
U B E R | Data
Artemis
DatabaseKAFKA
Categorize by time
Aggregate value
5 mins
![Page 11: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/11.jpg)
U B E R | Data
KAFKA
Artemis
Database
Delayed events
Corruption
Backpressure
Duplicate events
![Page 12: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/12.jpg)
U B E R | Data
Artemis
Database
Pre-aggregation
KAFKA
Backfill pipelineData Lake
![Page 13: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/13.jpg)
U B E R | Data
Artemis
Database
Pre-aggregationPer dimension
KAFKA Data Explosion
![Page 14: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/14.jpg)
U B E R | Data
Apollo
Intelligent CachingKAFKA
Query Language
Fast Accurate Scalable
![Page 15: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/15.jpg)
U B E R | Data
Case II - Event processing
FRAUD
“If # Signups per device look suspicious -> Ban the driver/rider”
![Page 16: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/16.jpg)
U B E R | Data
Case II - Event processing
INTELLIGENT ALERTS
“Send me an alert if a leased vehicle leaves a geo-fence”
![Page 17: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/17.jpg)
U B E R | Data
Athena platform using Apache Samza
Robust
Ease of operation
No backpressure issues
Built in state management
![Page 18: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/18.jpg)
U B E R | Data
Fraud Rule Engine
Track count # of sign_ups categorized by device_imei
Ban fraudsters in
real-time
Event processing - Apache Samza
KAFKA
![Page 19: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/19.jpg)
U B E R | Data
Case III - OLAP (OnLine Analytical Processing)
A / B Tests
See progress of tests in real-time
![Page 20: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/20.jpg)
U B E R | Data
Case III - OLAP use case
FORECASTING
“How many first time riders will be dropped off
in a given geofence ?”
![Page 21: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/21.jpg)
U B E R | Data
Our integrated platform
● Filter events● Merge streams● Decorate with external data
![Page 22: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/22.jpg)
U B E R | Data
Are we there yet ?
Artemis
Apollo
Event Processing
OLAP
?
2014 2015 2016 2017
SystemComplexity
![Page 23: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/23.jpg)
U B E R | Data
What’s missing ?
● Cumbersome for data scientists / Ops people● Redundant code● Custom backfill pipelines
![Page 24: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/24.jpg)
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.
SQL as the building block
![Page 25: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/25.jpg)
U B E R | Data
SQL + Stream Processing
70-80% of jobs can be implemented via SQL
![Page 26: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/26.jpg)
Intelligent Promotions
SQL + Stream Processing: Powerful abstraction
Rule Threshold Action
“All trips worth > 10$ in San Francisco between Friday 5 pm and Sunday 9 pm
> 100 “Give bonus of $500”
![Page 27: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/27.jpg)
U B E R | Data
Complicated rules
● “If number of hours online > 10 …”● “If amount earned > 700 in a given week, then …”● “If # uberPOOL rides >10, then …”● “If trip happens over some geo-fence 10 times in a given weekend, then …”
SQL + Stream Processing: Powerful abstraction
![Page 28: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/28.jpg)
U B E R | Data
Intelligent Promotions
SQL + Stream Processing: Powerful abstraction
> 100 trigger_payment()select count(*) from hp_api_created_trips WHERE city_id = 1 AND fare > 10 AND request_at > 1491105600 AND request_at <= 1491177600
Rule Threshold Action
![Page 29: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/29.jpg)
U B E R | Data
Complicated rules
● “If number of hours online > 10 …”● “If amount earned > 700 in a given week, then …”● “If # Uber Pool rides >10, then …”● “If trip happens over some geo-fence 10 times in a given weekend, then …”
What if we created specific rules for specific driver partners ?
SQL + Stream Processing: Powerful abstraction
![Page 30: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/30.jpg)
U B E R | Data
Can be used for alerts as well:
“If a driver X is outside a geofence, then …”
SQL + Stream Processing: Powerful abstraction
![Page 31: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/31.jpg)
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.
New eco-system: Athena X
![Page 32: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/32.jpg)
U B E R | Data
Enter Flink
Apache Calcite (SQL) Integration
Easy to manage and scale
No backpressure problem
Built in state management support
HDFS integration
Not dependent on Kafka
![Page 33: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/33.jpg)
U B E R | Data
Rule Store
KAFKA Database
select count(*) from hp_api_created_trips WHERE city_id = 1 AND fare > 10 AND request_at > 1491105600 AND request_at <= 1491177600
Promotions using Flink
![Page 34: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/34.jpg)
U B E R | Data
Promotions using Flink
Rule Store
DatabaseKAFKA
select count(*) from hp_api_created_trips WHERE city_id = 1 AND fare > 10 AND request_at > 1491105600 AND request_at <= 1491177600
![Page 35: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/35.jpg)
U B E R | Data
New Eco-system: Athena X
HDFS
Kafka
Alerts
Kafka Streams
Other data destinations
Database Streams
HDFS Cassandra
Rule Store
HTTP
![Page 36: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/36.jpg)
U B E R | Data
Are we there yet ?
Artemis
Apollo
Event Processing
OLAP
?
2014 2015 2016 2017
SystemComplexity Flink
![Page 37: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/37.jpg)
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.
The road ahead ...
![Page 38: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/38.jpg)
U B E R | Data
Future Discussions
● To (Apache) Beam or not to Beam?● Real-time Machine Learning● Auto scaling
![Page 39: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/39.jpg)
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.
AthenaX - Flink deep diveHaohui Mai
Bill Liu
(11:45 am)
![Page 40: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber](https://reader031.vdocuments.us/reader031/viewer/2022030310/58f9a8ad1a28aba5278b458b/html5/thumbnails/40.jpg)
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.
Thank you
For more: eng.uber.comTwitter: @UberEng