kafka on yarn (koya) at slider meetup 20150304
Post on 14-Jul-2015
Embed Size (px)
DataTorrent RTS Target Market
Kafka On YARN (KOYA)An Open Source Initiative to Integrate Kafka & YARNThomas Weise firstname.lastname@example.orgSiyuan Hua email@example.comMarch 4th, 2015Apache Kafka
A high-throughput distributed messaging system.Fast, Scalable, Durable, Distributed
Kafka is a natural fit to deliver events into a our stream processing platform.
FeedKafka feeds Stream ProcessingKafka ClusterServer-1P1P2P3Server-2P1P2P3Server-3P1P2P3YARN ClusterNode ManagerDT ContainerNode ManagerDT AppMasterDT ContainerResource Manager3Problem?It is not easy to get started with KafkaInitial deployment difficult (build your own tool)It is not easy to keep it runningNo central management (status, configuration changes,) No automatic replacement for failed brokerOperational InefficienciesResource fragmentation, underutilizationCommon infrastructure not leveraged, extra skill setsAdaption Barrier!
Why Kafka on YARNYARN enables:Horizontal scalability with commodity hardwareCentral resource management with queues, limits and locality preferencesFramework for achieving fault tolerance and securityAutomate:Broker recoveryDeployment of Kafka clustersIntegrate:User friendly management (alternative to Kafka command line utilities)YARN ClusterKafka on YARN through SliderNode ManagerNode ManagerDT AppMasterDT ContainerResource ManagerNode ManagerNode ManagerSlider AMDT ContainerServer-1P1P2P3Server-2P1P2P3Slider AgentSlider Agent6Why Slider?Automates deployment and configuration of componentsSimplify on-demand cluster creationGeneric AM for long running servicesManagement of container failures automates recoverySticky allocation of components to hosts across AM restartIsolation: node labels to pin components to specific set of machinesCentral statusView all servers in one placeAreas for improvementAnti-affinity support (YARN limitation)Agent API documentationFlexibility in component instance specification7Configuration Example
Project StatusOpen Source: https://github.com/DataTorrent/koyaPython Scripts + ConfigurationWorks on Hadoop 2.6 through Slider 0.6Install: Embedded Slider or Application PackageFirst Release by Q2Future EnhancementsExpanded Status Info through Slider AMExplore Kafka management UI optionsSupport for Disk as a Resource in YARN - YARN-2139Better control over server placement (anti-affinity)Slider-799
Q & AThank You!