better stream processing with python...kafka streams 9 • simple library, not a framework • event...
TRANSCRIPT
Better Stream Processingwith PythonTaking the Hipster out of Streaming
Andreas Heider, Robert Wall12.07.2017 EuroPython
Who are we?
• DevelopersatWinton
• Wintonisaglobalinvestmentmanagementanddatasciencecompany,foundedin1997
• Webelievethescientificmethodcanbeprofitablyappliedtothefieldofinvesting
2
What do we mean by Stream processing?
3
Batch Stream
Example: Real Time Financial Market Data
4
Time Symbol Price Qty
10:15:01 AAPL $144 10
10:15:02 GOOG $940 5
10:15:03 AAPL $145 11
…
Exchange10:15:02GOOG
5@$940
10:15:01AAPL
10@$144
Trades
Stream processing: Binning
5
Time Symbol Price Qty
10:15:01 AAPL $144 10
10:15:02 GOOG $940 5
10:15:03 AAPL $145 11
…
BinningProcess
Time Symbol Avg.Price
Volume
10:15 AAPL $144.5 1300
10:15 GOOG $943 1250
10:16 AAPL $145.3 1450
…
Streaming Data at Winton
6
EventStreams
EventStreams
MarketData
AlternativeData
Internal/BusinessEvents
Monitoring
Databases
RiskManagement
InvestmentManagement
Analytics
Transformations
Research
Apache Kafka
7
Producer Consumer
Topic
Partition1
Partition2
Partition3
Sprawl of Stream Processing systems
8
Kafka Streams
9
• Simplelibrary,notaframework• Eventatatimestreamprocessing• Stateful processing,joinsandaggregations• Distributedprocessingandfaulttolerance• PartofmainApacheKafkaproject• Javaonlysofar:(
Python at Winton
Manyusers,withdifferentskillsets:
• Developers
• Researchers
• Operations
• …
10
Talking to Kafka using kafka-python
11
Hipster Stream Processing
Python Kafka Clients
12
https://github.com/dpkp/kafka-python
• PurePythonimplementation
• Friendly,pythonic interface
https://github.com/confluentinc/confluent-kafka-python
• WrapperaroundClibrary• Amazinglyhighperformanceandrobustness
Experiences using low-level client
13
• Whatstartsoutasa10linescriptendsupasyetanotherhomegrownstreamingframework
• Thedevilisinthedetails:• Guaranteeingatleastonce(orevenexactly-onceprocessing)• Handlingstateful processing• Distributingloadovervariousmachines• Microbatching• Handlingrebalancesnicely
Kafka Streams for Python
https://github.com/wintoncode/winton-kafka-streams
14
Demo
15
Goals / Roadmap
1. CleanimplementationofKafka’scorestreamsAPIinPython
2. Experimentwithmorepythonic API/DSL
3. Optimise performanceviabatching/numpy/Arrow
4. ImplementmoreadvancedfeaturesofKafka’sstreamsAPI(exactlyonce,…)
16
Get in touch!
• ProjectonGitHub:https://github.com/wintoncode/winton-kafka-streams
• Roadmap:https://github.com/wintoncode/winton-kafka-streams/blob/master/ROADMAP.md
• Announcementonkafka-dev
• Cometoourstandandtalktous
• ThankstoConfluent
17
Questions?
• ProjectonGitHub:https://github.com/wintoncode/winton-kafka-streams
• Roadmap:https://github.com/wintoncode/winton-kafka-streams/blob/master/ROADMAP.md
• Announcementonkafka-dev
• Cometoourstandandtalktous
• ThankstoConfluent
18
Backup
19
Some words of experience
• Noteverythingfitsthestreamingmodel
• Manuallychangingdataistricky• Becarefulwhatyouputin,haverecoverymethod
• Stabledeploymentcanbechallenging• EspeciallyZookeeperandbuggyclients
• Setupmonitoringfromthestart• WeusePrometheusandGrafana• https://github.com/yahoo/kafka-manager
20