better stream processing with python...kafka streams 9 • simple library, not a framework • event...

20
Better Stream Processing with Python Taking the Hipster out of Streaming Andreas Heider, Robert Wall 12.07.2017 EuroPython

Upload: others

Post on 03-Jun-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Better Stream Processingwith PythonTaking the Hipster out of Streaming

Andreas Heider, Robert Wall12.07.2017 EuroPython

Page 2: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Who are we?

• DevelopersatWinton

• Wintonisaglobalinvestmentmanagementanddatasciencecompany,foundedin1997

• Webelievethescientificmethodcanbeprofitablyappliedtothefieldofinvesting

2

Page 3: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

What do we mean by Stream processing?

3

Batch Stream

Page 4: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Example: Real Time Financial Market Data

4

Time Symbol Price Qty

10:15:01 AAPL $144 10

10:15:02 GOOG $940 5

10:15:03 AAPL $145 11

Exchange10:15:02GOOG

5@$940

10:15:01AAPL

10@$144

Trades

Page 5: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Stream processing: Binning

5

Time Symbol Price Qty

10:15:01 AAPL $144 10

10:15:02 GOOG $940 5

10:15:03 AAPL $145 11

BinningProcess

Time Symbol Avg.Price

Volume

10:15 AAPL $144.5 1300

10:15 GOOG $943 1250

10:16 AAPL $145.3 1450

Page 6: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Streaming Data at Winton

6

EventStreams

EventStreams

MarketData

AlternativeData

Internal/BusinessEvents

Monitoring

Databases

RiskManagement

InvestmentManagement

Analytics

Transformations

Research

Page 7: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Apache Kafka

7

Producer Consumer

Topic

Partition1

Partition2

Partition3

Page 8: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Sprawl of Stream Processing systems

8

Page 9: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Kafka Streams

9

• Simplelibrary,notaframework• Eventatatimestreamprocessing• Stateful processing,joinsandaggregations• Distributedprocessingandfaulttolerance• PartofmainApacheKafkaproject• Javaonlysofar:(

Page 10: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Python at Winton

Manyusers,withdifferentskillsets:

• Developers

• Researchers

• Operations

• …

10

Page 11: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Talking to Kafka using kafka-python

11

Hipster Stream Processing

Page 12: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Python Kafka Clients

12

https://github.com/dpkp/kafka-python

• PurePythonimplementation

• Friendly,pythonic interface

https://github.com/confluentinc/confluent-kafka-python

• WrapperaroundClibrary• Amazinglyhighperformanceandrobustness

Page 13: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Experiences using low-level client

13

• Whatstartsoutasa10linescriptendsupasyetanotherhomegrownstreamingframework

• Thedevilisinthedetails:• Guaranteeingatleastonce(orevenexactly-onceprocessing)• Handlingstateful processing• Distributingloadovervariousmachines• Microbatching• Handlingrebalancesnicely

Page 14: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Kafka Streams for Python

https://github.com/wintoncode/winton-kafka-streams

14

Page 15: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Demo

15

Page 16: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Goals / Roadmap

1. CleanimplementationofKafka’scorestreamsAPIinPython

2. Experimentwithmorepythonic API/DSL

3. Optimise performanceviabatching/numpy/Arrow

4. ImplementmoreadvancedfeaturesofKafka’sstreamsAPI(exactlyonce,…)

16

Page 17: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Get in touch!

• ProjectonGitHub:https://github.com/wintoncode/winton-kafka-streams

• Roadmap:https://github.com/wintoncode/winton-kafka-streams/blob/master/ROADMAP.md

• Announcementonkafka-dev

• Cometoourstandandtalktous

• ThankstoConfluent

17

Page 18: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Questions?

• ProjectonGitHub:https://github.com/wintoncode/winton-kafka-streams

• Roadmap:https://github.com/wintoncode/winton-kafka-streams/blob/master/ROADMAP.md

• Announcementonkafka-dev

• Cometoourstandandtalktous

• ThankstoConfluent

18

Page 19: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Backup

19

Page 20: Better Stream Processing with Python...Kafka Streams 9 • Simple library, not a framework • Event at a time stream processing • Statefulprocessing, joins and aggregations •

Some words of experience

• Noteverythingfitsthestreamingmodel

• Manuallychangingdataistricky• Becarefulwhatyouputin,haverecoverymethod

• Stabledeploymentcanbechallenging• EspeciallyZookeeperandbuggyclients

• Setupmonitoringfromthestart• WeusePrometheusandGrafana• https://github.com/yahoo/kafka-manager

20