apache apex connector with kafka 0.9 consumer api

20
New Kafka Input Operator based on Kafka 0.9 Consumer API Siyuan Hua, DataTorrent, Committer Apache Apex March 28 th 2016

Upload: datatorrent

Post on 08-Jan-2017

71 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Apache Apex connector with Kafka 0.9 consumer API

New Kafka Input Operator based on Kafka 0.9 Consumer API

Siyuan Hua, DataTorrent, Committer Apache ApexMarch 28th 2016

Page 2: Apache Apex connector with Kafka 0.9 consumer API

0.8 vs 0.9 Overview

Apache Apex Meetup

0.8 (Simple Consumer) 0.9

LoC 5900 2406

Fault-Tolerant Yes (At least once, exactly once) Yes (At least once, exactly once)

Scalability Scale with Kafka(static and dynamic)

Scale with Kafka(static and dynamic)

Multi-Cluster/Topic Yes Yes

Throughput throttle Yes Yes

Idempotent Yes Yes

2

Page 3: Apache Apex connector with Kafka 0.9 consumer API

0.8 vs 0.9 Overview

Apache Apex Meetup

0.8 (Simple Consumer) 0.9

Offset Management Customized management Implicit but out-of-box management

Partition Strategy 1:1, 1:M, Dynamic(Unstable), Customized

1:1, 1:M, Customized

Dependency Both public and internal API Public API

Metrics Using old Counters API Using new Apex @AutoMetric

Maturity Used in production Not mature yet

3

Page 4: Apache Apex connector with Kafka 0.9 consumer API

0.8 Consumer problem

Apache Apex Meetup

● 0.8 has Simple Consumer and High-level Consumer

● High-level Consumer couldn’t meet our requirement

● High-level Consumer doesn’t support customized assignor

● Simple Consumer is not easy to use

● Simple Consumer has no knowledge about the brokers

● Simple Consumer doesn’t handle metadata change on broker side automatically

4

Page 5: Apache Apex connector with Kafka 0.9 consumer API

0.9 Consumer API

Apache Apex Meetup

● 0.9 replaced Simple Consumer and High-level consumer with subscribe API and assign API and combine them together into one consumer class

● Assign API is good replacement for Simple Consumer in the new Kafka Input Operator

● You can explicitly assign partitions for each operator instance

● Once consumer is assigned it handle partition metadata change internally

● Confused with mixed API in one class

5

Page 6: Apache Apex connector with Kafka 0.9 consumer API

Workflow

Apache Apex Meetup

6

Page 7: Apache Apex connector with Kafka 0.9 consumer API

Partition Strategy

Apache Apex Meetup

1 to 1 Partition 1 to N Partition

7

Thomas Weise
Isn't this multiple to one?
Siyuan Hua
It is called one-to-many in the code. Maybe I should rotate the pic :)
Page 8: Apache Apex connector with Kafka 0.9 consumer API

Partition Strategy

Apache Apex Meetup

Public abstract class AbstractKafkaPartitioner{

...abstract List<Set<PartitionMeta>> assign(Map<String,

Map<String,List<PartitionInfo>>> metadata)...void partitioned(Map<Integer, Partition<AbstractKafkaInputOperator>>

map)…Response processStats(BatchedOperatorStats batchedOperatorStats)

} Customized Partition Strategy

8

Page 9: Apache Apex connector with Kafka 0.9 consumer API

Partition Strategy

Apache Apex Meetup

● Sticky Partition (Each operator instance only consumes from Kafka partitions that are assigned by AM)

● We cannot use subscribe API with customized Assignor

● 0.9 Partition logic is decoupled from Operator code

9

Page 10: Apache Apex connector with Kafka 0.9 consumer API

Offset management for 0.9 operator

Apache Apex Meetup

Offset Checkpointing:

W = last offset in window i

W W W

Current offset

Downstream operator window

. . . . . . . . . . . .

Check pointed offsets with window id

Resume from offsets of any window below

i

k ji

10

Page 11: Apache Apex connector with Kafka 0.9 consumer API

Offset management for 0.9 operator

Apache Apex Meetup

Offset committed:

W = last offset in window i

. . . . . . . . . . . .

W

Current offset

. . .

Commit Window i

Offset Topic contains App name

Offset is saved in kafka

i

i

11

Thomas Weise
Should be "Window committed" ?
Page 12: Apache Apex connector with Kafka 0.9 consumer API

Some important configuration (0.9)

Apache Apex Meetup

● initialOffset

● topics

● clusters

● strategy

● maxTuplesPerWindow

● initialPartitionCount

● consumerProps

12

Page 13: Apache Apex connector with Kafka 0.9 consumer API

MapR Streams support

Apache Apex Meetup

● MapR Streams is compatible with 0.9 Kafka client API

● The 0.9 Input Operator has been tested with MapR sandbox and all major features are working without any code change

● Use MapR Streams Client library instead of Kafka one

● Leave “clusters” property empty because MapR doesn’t require broker host name settings

● Support special character “/” in topic name because MapR Streams topic name is just path to the topic file

● Multi-cluster is not supported

13

Page 14: Apache Apex connector with Kafka 0.9 consumer API

Demo 1

Page 15: Apache Apex connector with Kafka 0.9 consumer API

Demo 2

Page 16: Apache Apex connector with Kafka 0.9 consumer API

Performance : Kafka Input Operator

Apache Apex Meetup

● 4 Kafka Brokers - 8 partitions

● 1 Zookeeper

● Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz

● 256GB RAM

● 10 GigE between nodes

Page 17: Apache Apex connector with Kafka 0.9 consumer API

Load Generator

Page 18: Apache Apex connector with Kafka 0.9 consumer API

Q & A

Apache Apex Meetup

Follow Apex meetups:http://apex.incubator.apache.org/announcements.html

Learn more about Apex:http://apex.incubator.apache.org/docs.html

18

Page 19: Apache Apex connector with Kafka 0.9 consumer API

Resources

19

• Apache Apex - http://apex.apache.org/• Subscribe - http://apex.apache.org/community.html• Download - https://www.datatorrent.com/download/• Twitter

ᵒ @ApacheApex; Follow - https://twitter.com/apacheapexᵒ @DataTorrent; Follow – https://twitter.com/datatorrent

• Meetups - http://www.meetup.com/topics/apache-apex• Webinars - https://www.datatorrent.com/webinars/• Videos - https://www.youtube.com/user/DataTorrent• Slides - http://www.slideshare.net/DataTorrent/presentations • Startup Accelerator Program - Full featured enterprise product

ᵒ https://www.datatorrent.com/product/startup-accelerator/

Page 20: Apache Apex connector with Kafka 0.9 consumer API

We Are Hiring

20

[email protected]• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders