Transcript
Page 1: Kafka on YARN (KOYA) at Slider Meetup 20150304

Kafka On YARN (KOYA)

An Open Source Initiative to Integrate Kafka & YARN

Thomas Weise – [email protected]

Siyuan Hua – [email protected]

March 4th, 2015

Page 2: Kafka on YARN (KOYA) at Slider Meetup 20150304

Apache Kafka

“A high-throughput distributed messaging system.”

“Fast, Scalable, Durable, Distributed”

Kafka is a natural fit to deliver events into a our stream processing platform.

Feed

Page 3: Kafka on YARN (KOYA) at Slider Meetup 20150304

Kafka feeds Stream Processing

Kafka Cluster

Server-1

P1 P2 P3

Server-2

P1 P2 P3

Server-3

P1 P2 P3

YARN Cluster

Node Manager

DT Container

Node Manager

DT AppMaster

DT Container

… …

Resource Manager

Page 4: Kafka on YARN (KOYA) at Slider Meetup 20150304

Problem?

• It is not easy to get started with Kafka

– Initial deployment difficult (build your own tool)

• It is not easy to keep it running

– No central management (status, configuration changes,…)

– No automatic replacement for failed broker

• Operational Inefficiencies

– Resource fragmentation, underutilization

– Common infrastructure not leveraged, extra skill sets

• Adaption Barrier!

Page 5: Kafka on YARN (KOYA) at Slider Meetup 20150304

Why Kafka on YARN

• YARN enables:

– Horizontal scalability with commodity hardware

– Central resource management with queues, limits and locality preferences

– Framework for achieving fault tolerance and security

• Automate:

– Broker recovery

– Deployment of Kafka clusters

• Integrate:

– User friendly management (alternative to Kafka command line utilities)

Page 6: Kafka on YARN (KOYA) at Slider Meetup 20150304

YARN Cluster

Kafka on YARN through Slider

Node Manager

Node Manager

DT AppMaster

DT Container

… …

Resource Manager

Node Manager

Node Manager

Slider AM

DT Container

Server-1

P1

P2

P3

Server-2

P1

P2

P3

Slider Agent

Slider Agent

Page 7: Kafka on YARN (KOYA) at Slider Meetup 20150304

Why Slider?

• Automates deployment and configuration of components– Simplify on-demand cluster creation

• Generic AM for long running services– Management of container failures – automates recovery– Sticky allocation of components to hosts across AM restart– Isolation: node labels to pin components to specific set of machines

• Central status– View all servers in one place

• Areas for improvement– Anti-affinity support (YARN limitation)– Agent API documentation– Flexibility in component instance specification

Page 8: Kafka on YARN (KOYA) at Slider Meetup 20150304

Configuration Example

Page 9: Kafka on YARN (KOYA) at Slider Meetup 20150304

Demo

Page 10: Kafka on YARN (KOYA) at Slider Meetup 20150304
Page 11: Kafka on YARN (KOYA) at Slider Meetup 20150304
Page 12: Kafka on YARN (KOYA) at Slider Meetup 20150304
Page 13: Kafka on YARN (KOYA) at Slider Meetup 20150304
Page 14: Kafka on YARN (KOYA) at Slider Meetup 20150304
Page 15: Kafka on YARN (KOYA) at Slider Meetup 20150304
Page 16: Kafka on YARN (KOYA) at Slider Meetup 20150304

Project Status

• Open Source: https://github.com/DataTorrent/koya

• Python Scripts + Configuration

• Works on Hadoop 2.6 through Slider 0.6

• Install: Embedded Slider or Application Package

• First Release by Q2

• Future Enhancements

– Expanded Status Info through Slider AM

– Explore Kafka management UI options

– Support for Disk as a Resource in YARN - YARN-2139

– Better control over server placement (anti-affinity)

– Slider-799

Page 17: Kafka on YARN (KOYA) at Slider Meetup 20150304

Q & A

Thank You!


Top Related