gnw03: stream processing with apache kafka by gwen shapira
TRANSCRIPT
I’lltellyouabout
• Whatisstreamprocessingandwhyitmatters• WhatisApacheKafka• HowKafkahelpsstreamprocessing
Stayawakeforthispart
StreamProcessingParadigm
• Dataisgeneratedatitsownrateas“Streams”• Wecanprocessasmuchoraslittleaswewant• Continuously• Resultsareavailableinreal-time• Butnothingwaitsforspecificresults• Timefordataavailability?• Morethan“fewms”• Lessthan“hours”
Thisistheworldchangingbit
• Mostofthebusinessis…• Noturgentenoughtorequireimmediateresponse• Butcan’twaitforthenextday
• “Streamsofevents”representssomethingfundamental• Samewayrelationaltablesarefundamental
ButLogsarealsoaSTREAMofeventsAndKafkastoresthoselogs
Allowingtoreadthepastandkeepgettingupdatesonthefuture
Method2:TheStreamProcessingFrameworks• Storm• Spark• Flink• Samza• Apex• Nifi• StreamBase• InfoSphere Streams• GoogleDataFlow (AKABeam)• Icangoonfor5morepages…
WhatdoImeanbytoocomplex?
HadoopClusterIIStorage Processing
SolR
HadoopClusterI
ClientClientFlumeAgents
Hbase /Memory
SparkStreaming
HDFS
Hive/Impala
Map/Reduce
Spark
Search
Automated&Manual
AnalyticalAdjustmentsandPatterndetection
Fetching&UpdatingProfiles
AdjustingNRTStats
HDFSEventSink
SolR Sink
BatchTimeAdjustments
Automated&Manual
ReviewofNRTChangesandCounters
LocalCache
Kafka
Clients:(Swipehere!)
WebApp
Whysomanymovingparts?
Weneeded…Hbase tohandlecomplexstateSparkrequiresHDFSIngestlayerBatchlayertohandlere-calculations
NoFramework
• Itisjustalibrarythatdoestransformations• Wecanaddlanguagesontop• Kafkadoeseverythingweneededtheframeworktodo• Youdon’tneed“framework”torunqueries,whydoyouneedittorunqueriescontinuously?
Wecanconverttablestostreamsandback:
Stream->Apply->TableTable->ChangeCapture->Stream
ThisiscalledTable-StreamDuality.