apache apex connector with kafka 0.9 consumer api

18
New Kafka Input Operator based on Kafka 0.9 Consumer API Siyuan Hua, DataTorrent, Committer Apache Apex March 28 th 2016

Upload: apache-apex

Post on 26-Jan-2017

508 views

Category:

Technology


3 download

TRANSCRIPT

New Kafka Input Operator based on Kafka 0.9 Consumer API

Siyuan Hua, DataTorrent, Committer Apache ApexMarch 28th 2016

0.8 vs 0.9 Overview

Apache Apex Meetup

0.8 (Simple Consumer) 0.9

LoC 5900 2406

Fault-Tolerant Yes (At least once, exactly once) Yes (At least once, exactly once)

Scalability Scale with Kafka(static and dynamic)

Scale with Kafka(static and dynamic)

Multi-Cluster/Topic Yes Yes

Throughput throttle Yes Yes

Idempotent Yes Yes

2

0.8 vs 0.9 Overview

Apache Apex Meetup

0.8 (Simple Consumer) 0.9

Offset Management Customized management Implicit but out-of-box management

Partition Strategy 1:1, 1:M, Dynamic(Unstable), Customized

1:1, 1:M, Customized

Dependency Both public and internal API Public API

Metrics Using old Counters API Using new Apex @AutoMetric

Maturity Used in production Not mature yet

3

0.8 Consumer problem

Apache Apex Meetup

● 0.8 has Simple Consumer and High-level Consumer

● High-level Consumer couldn’t meet our requirement

● High-level Consumer doesn’t support customized assignor

● Simple Consumer is not easy to use

● Simple Consumer has no knowledge about the brokers

● Simple Consumer doesn’t handle metadata change on broker side automatically

4

0.9 Consumer API

Apache Apex Meetup

● 0.9 replaced Simple Consumer and High-level consumer with subscribe API and assign API and combine them together into one consumer class

● Assign API is good replacement for Simple Consumer in the new Kafka Input Operator

● You can explicitly assign partitions for each operator instance

● Once consumer is assigned it handle partition metadata change internally

● Confused with mixed API in one class

5

Workflow

Apache Apex Meetup

6

Partition Strategy

Apache Apex Meetup

1 to 1 Partition 1 to N Partition

7

Thomas Weise
Isn't this multiple to one?
Siyuan Hua
It is called one-to-many in the code. Maybe I should rotate the pic :)

Partition Strategy

Apache Apex Meetup

Public abstract class AbstractKafkaPartitioner{

...abstract List<Set<PartitionMeta>> assign(Map<String,

Map<String,List<PartitionInfo>>> metadata)...void partitioned(Map<Integer, Partition<AbstractKafkaInputOperator>>

map)…Response processStats(BatchedOperatorStats batchedOperatorStats)

} Customized Partition Strategy

8

Partition Strategy

Apache Apex Meetup

● Sticky Partition (Each operator instance only consumes from Kafka partitions that are assigned by AM)

● We cannot use subscribe API with customized Assignor

● 0.9 Partition logic is decoupled from Operator code

9

Offset management for 0.9 operator

Apache Apex Meetup

Offset Checkpointing:

W = last offset in window i

W W W

Current offset

Downstream operator window

. . . . . . . . . . . .

Check pointed offsets with window id

Resume from offsets of any window below

i

k ji

10

Offset management for 0.9 operator

Apache Apex Meetup

Offset committed:

W = last offset in window i

. . . . . . . . . . . .

W

Current offset

. . .

Commit Window i

Offset Topic contains App name

Offset is saved in kafka

i

i

11

Thomas Weise
Should be "Window committed" ?

Some important configuration (0.9)

Apache Apex Meetup

● initialOffset

● topics

● clusters

● strategy

● maxTuplesPerWindow

● initialPartitionCount

● consumerProps

12

MapR Streams support

Apache Apex Meetup

● MapR Streams is compatible with 0.9 Kafka client API

● The 0.9 Input Operator has been tested with MapR sandbox and all major features are working without any code change

● Use MapR Streams Client library instead of Kafka one

● Leave “clusters” property empty because MapR doesn’t require broker host name settings

● Support special character “/” in topic name because MapR Streams topic name is just path to the topic file

● Multi-cluster is not supported

13

Demo 1

Demo 2

Performance : Kafka Input Operator

Apache Apex Meetup

● 4 Kafka Brokers - 8 partitions

● 1 Zookeeper

● Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz

● 256GB RAM

● 10 GigE between nodes

Load Generator

Q & A

Apache Apex Meetup

Follow Apex meetups:http://apex.incubator.apache.org/announcements.html

Learn more about Apex:http://apex.incubator.apache.org/docs.html

18