kafka internals

53
Kafka internals David Gruzman, www.nestlogic.com

Upload: david-groozman

Post on 26-Jan-2017

985 views

Category:

Engineering


12 download

TRANSCRIPT

Page 1: Kafka internals

Kafka internalsDavid Gruzman, www.nestlogic.com

Page 2: Kafka internals

What is Kafka?Why it is so interesting? Is it just "yet another queue" with better performance?It is not queue, although can be used in that sense. Lets look on it as a database / storage technology

Page 3: Kafka internals

Data model● Ordered, Partitioned collection of Key-

Values.● Key is optional● Values - opaque

Page 4: Kafka internals

High level architecture

Page 5: Kafka internals

Role of the brokerBroker is handling read/writesIt forward messages for replicationIt does compactions on its own logs replica (without influence to any other copies)

Page 6: Kafka internals

Role of the controllerHandling “cluster wide events” sent by Zookeeper (all in sync with Zookeeper registries)● Brokers list change (registration, failure)● Leaders election● Change in topics (deleted, added, num of partitions

changed) ● Track partitions replica

Page 7: Kafka internals

Kafka controller

Page 8: Kafka internals

Zookeeper role- Kafka controller registration- List of topics and partitions- Partition states- Brokers registration (id, host, port)- Consumer registration & Subscriptions.

Page 9: Kafka internals

Partitions- Each partition has its leader broker and N followers.- Consumers and producers works with leaders only.- Partition is the main mechanism of a scale, within the

topic.- Producer specify the target partition via Partitioner

implementation (balancing within available topic partitions)

Page 10: Kafka internals

Access Pattern● Writers write data in massive streams. Data

is already "ordered". (This ordering is re-used)

● Readers consume data, each one from some position, sequentially.

Page 11: Kafka internals

Write pathProducer

broker

brokerbroker

Local Storage Local Storage Local Storage

Elected leader FollowersFollowers

Page 12: Kafka internals

Read pathRead always happens via partition leader.Kafka helps to balance consumers within the group. Each topic’s partition could be read by single consumer at a time to avoid simultaneous read of the same message by several consumers within the same group.

Page 13: Kafka internals

Data transfer efficiency1. Sequential disk access - optimal disk

utilization2. Zero copy - save CPU cycles.3. Compression - save network bandwidth.

Page 14: Kafka internals

CompressionUp to 0.7 Kafka - special kind of compressed messages was handled by clients (producer and consumer parts), the Broker was not involved.Starting 8.1, Kafka broker repackages messages in order to support logical offsets.

Page 15: Kafka internals

Indexing- When data flows into the broker, it is being

“indexed”. - Index files are stored alongside “segments”.- Segments are files with the data.

Page 16: Kafka internals

Consumer API Levels● Low level API : work with partitions and

offsets● High level API : Work with topics, automatic

offset management, load balancing. Can be rephrased as● Low level API : Database● High level : Queue

Page 17: Kafka internals

Layers of functionality

Page 18: Kafka internals

Levels of abstraction

Page 19: Kafka internals

Offset managementPrior to 0.81 it was pure Zookeeper responsibility to hold offsets metadata. Starting from 0.81 - there is special offset manager service. It runs with Broker, use special topic to store offsets and also do in-memory caching as optimization. We can choose what mechanism to use

Page 20: Kafka internals

Key CompactionKafka is capable to store only latest value per key.It is not a Queue. It is a table.This capability enables to store the whole state (the historical data flow), not only latest X days (in comparison to auto-deletion approach).

Page 21: Kafka internals

PerformanceWhy it is so fast?1. Network and Disk formats of messages are

the same. What is to be done is just append.2. Local storage is used.3. No attempts to cache / optimize.

Page 22: Kafka internals

Something big happensWe have new world of needs of real time data processing. In many cases - it means streams.For many years I thought it is just counters to be calculated before saving data into HDFS for “real work”. Now I see it quite different.

Page 23: Kafka internals

Naive use of Kafka

Page 24: Kafka internals

Possible simple solution...

Page 25: Kafka internals

Kafka as NoSQL- Sync replication as resilience model- Single master per partition- Opaque data- Compactions- Optimized for read in the same order as

write was done- Optimized for massive writes- Optimized for temporal data

Page 26: Kafka internals

ComputeSamza, Kafka Streams relation to Kafka is like MapReduce, Spark relation to HDFSKafka became media on top of which we build computational layers.Have to be said - no data locality. Samza, Kafka Streams solve common problems

Page 27: Kafka internals

State, recovery approachBoth Samza and Kafka streams took approach, for long time serving RDBMS. Snapshot + Redo log.They force stateful stream processing applications to follow this paradigm.

Page 28: Kafka internals

NestLogic caseWhat are we doing?Why do we need Kafka / SparkHow Kafka helped us?

Page 29: Kafka internals

Statistical analysis of data segments

Page 30: Kafka internals

First shot - Spark

Page 31: Kafka internals

What was a problem- All data have to be processed. We might

have not enough resources to process particular - huge - segment.

- Spark shuffle when data is bigger than RAM is challenging

- We are moving to more “real time” and streaming.

- We do not want to pay for sorting

Page 32: Kafka internals

Kafka as shuffle engine

Page 33: Kafka internals

What we learned - flexibility ● We can re-run “reduce” stage several times.● Kafka clients could wait for connection to be

reestablished with no timeouts, so we can repair failed Kafka resource leader, and the job will proceed.

● We can run clusters for map and reduce separately : flexibility to select their sizes. It saves us some money.

● Now we can have different technologies for Map and Reduce. We are about to replace map stage (transformations) with ImpalaToGo

● It can be much faster (like this happened in our case)● Significantly less RAM problems. It became true stream

processing

Page 34: Kafka internals

What we learned - contMore concise resource management.We can look on size of shuffle data, number of groups (available from Kafka cluster metadata), and only than decide on size of “reduce” cluster. We can interleave map and reduce stages, because there is no sorting requirements.

Page 35: Kafka internals

Is it universal solution?● If you need dozens of different, concurrent

jobs : Yarn + Spark probably the best● If you need single job to run smoothly and be

flexible with it - our approach comes into place

Page 36: Kafka internals

So, what we do?We help to distinguish your data by its nature, present it and help to decide what should be done with each

Page 37: Kafka internals

As data scientists...We believe that checking statistical homogeneity of data is very important

Page 38: Kafka internals

As business people...- Do not count attack as popularity- Do not count fraud as profit- Do not count bug as lack of interestAnd most important- Work hard to distinguish all above

Page 39: Kafka internals

As big data expertsIt is not simple to achieve. It took a lot of efforts to get good results, orchestrate operation etc.We believe - you have better utilization of your Big data, data science and devops resources.

Page 40: Kafka internals

How it looks

Page 41: Kafka internals

NestLogic incWe work hard to help you to Know your data.

Page 42: Kafka internals

Thank you for your attention

Page 43: Kafka internals

Сontact us Contact us on [email protected] in our site www.nestlogic.com

Page 44: Kafka internals

Helper slides

Page 45: Kafka internals

State is in RocksDBRocksDB was selected. A few quick facts: 1. Developed by Facebook, based on LevelDB2. Single node3. C++ library4. HBase ideas of sorting, snapshot, transaction logs.5. My speculation - transaction log is what “glue” it with

Kafka streams

Page 46: Kafka internals

Rebalancing - part 1- One of the brokers is elected as the coordinator for a

subset of the consumer groups. It will be responsible for triggering rebalancing attempts for certain consumer groups on consumer group membership changes or subscribed topic partition changes.

- It will also be responsible for communicating the resulting partition-consumer ownership configuration to all consumers of the group undergoing a rebalance operation.

Page 47: Kafka internals

Rebalancing - part 2- On startup or on co-ordinator failover, the consumer

sends a ClusterMetadataRequest to any of the brokers in the "bootstrap.brokers" list. In the request, it receives the location of the co-ordinator for it's group.

- The consumer sends a RegisterConsumer request to it's co-ordinator broker. In the response, it receives the list of topic partitions that it should own.

- At this time, group management is done and the consumer starts fetching data and (optionally) committing offsets for the list of partitions it owns.

Page 48: Kafka internals

Consumer balancingIt is the capability to balance load, fail over between consumers in the same group.

Kafka consumer communicates with Co-ordinator Broker for this. Co-ordinator broker info is stored on ZK and is available from any broker.

These mechanisms are reused in Kafka Streams

Page 49: Kafka internals

Co-ordinator broker, part 11. Reads the list of groups it manages and their

membership information from ZK.2. If discovered membership is alive (as from ZK), waits for

consumers in each of the groups to re-register with it.3. Does failure detection for all consumers in a group.

Consumers marked as dead by the co-ordinator's failure detection protocol are removed from the group and the co-ordinator marks the rebalance for a group completed by communicating the new partition ownership to the remaining consumers in the group.

Page 50: Kafka internals

Co-ordinator broker, part 24. The co-ordinator tracks the changes to topic partition changes for all topics that any consumer group has registered interest for. If it detects a new partition for any topic, it triggers a rebalance operation (killing consumers socket connection with itself). The creation of new topics can also trigger a rebalance operation as consumers can register for topics before they are created

Page 51: Kafka internals

Consumer and Co-ordinator

Page 52: Kafka internals

Log compaction

Any read from offset 0 to any offset Q where Q > P that completes in less than a configurable SLA will see the final state of all keys as of time Q. Log head is always a single segment (default 1GB)