redlining kafka pipelines

91
Joel Koshy Sr. StaSoware Engineer LinkedIn

Upload: joel-koshy

Post on 21-Apr-2017

88 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Redlining Kafka Pipelines

 Joel Koshy  Sr. Staff Software Engineer

 LinkedIn

Page 2: Redlining Kafka Pipelines

Outline

Background

A look back at Kafka 0.7

Kafka 0.8, 0.9 and shifting bottlenecks

Kafka 0.10 message format changes

Mirroring – the 0.7 way

Q&A

Page 3: Redlining Kafka Pipelines
Page 4: Redlining Kafka Pipelines

Kafka at LinkedIn

Page 5: Redlining Kafka Pipelines

Kafka 0.7

Page 6: Redlining Kafka Pipelines

Kafka 0.7 message format (uncompressed)

Page 7: Redlining Kafka Pipelines

Kafka 0.7 message format (compressed)

Page 8: Redlining Kafka Pipelines

Kafka 0.7 message format

Message offset is the physical offset

Page 9: Redlining Kafka Pipelines

Kafka 0.7 message format

Internal messages of compressed message sets are not addressable via a scalar offset

Page 10: Redlining Kafka Pipelines

Kafka 0.7 message format

Consumer checkpoints offset M for this message

Page 11: Redlining Kafka Pipelines

•  Tricky to checkpoint within compressed message set

•  Hard to rewind by N messages

•  Unsuitable for features such as log compaction

Drawbacks of 0.7 message format

Page 12: Redlining Kafka Pipelines

B U T ! ! !

Very efficient (broker did not need to modify messages)

Page 13: Redlining Kafka Pipelines

Predominantly network in 0.7

S H I F T I N G B O T T L E N E C K S O V E R T I M E

Page 14: Redlining Kafka Pipelines

Kafka 0.8 Replication

Page 15: Redlining Kafka Pipelines

Replication in Kafka 0.8

Page 16: Redlining Kafka Pipelines

Replication in Kafka 0.8

Page 17: Redlining Kafka Pipelines

S H I F T I N G B O T T L E N E C K S O V E R T I M E

Predominantly network in 0.7 Still network in 0.8, gradually tilting toward storage

Page 18: Redlining Kafka Pipelines

Kafka 0.8 message format (uncompressed)

NOT a physical offset!

Page 19: Redlining Kafka Pipelines

Kafka 0.8 message format (compressed)

Envelope offset is the largest offset in the set

Page 20: Redlining Kafka Pipelines

Broker now needs to assign logical offsets even to internal messages

Page 21: Redlining Kafka Pipelines

Handling produce requests

Page 22: Redlining Kafka Pipelines

Handling produce requests

Page 23: Redlining Kafka Pipelines

Handling produce requests

Page 24: Redlining Kafka Pipelines

Handling produce requests

Page 25: Redlining Kafka Pipelines

Kafka 0.9 Security (and many other features)

Page 26: Redlining Kafka Pipelines

SSL

•  Forego zero-copy optimization

•  CPU overhead to decrypt/encrypt

•  Minor impact

•  (Used only on our mirroring pipelines at the time) Kafka 0.9

Page 27: Redlining Kafka Pipelines

Mirroring topology and storage policy changes

Page 28: Redlining Kafka Pipelines

Mirroring topology and storage policy changes

Page 29: Redlining Kafka Pipelines

Mirroring topology and storage policy changes

•  Reduced retention period across the board

•  File system tuning

Page 30: Redlining Kafka Pipelines

2 0 1 5 - 2 0 1 6

New hardware 10Gbps NICs, bigger disks, XFS

Page 31: Redlining Kafka Pipelines

S H I F T I N G B O T T L E N E C K S O V E R T I M E

Predominantly network in 0.7 Storage and network in 0.8, 0.9 (1Gbps NICs)

Increasingly CPU in 0.9 (10Gbps NICs)

Page 32: Redlining Kafka Pipelines

Kafka 0.9 broker profile

Page 33: Redlining Kafka Pipelines

Kafka 0.9 broker profile

Cluster expansion

Page 34: Redlining Kafka Pipelines

Kafka 0.10 New message format

Page 35: Redlining Kafka Pipelines

Kafka 0.10 message format (uncompressed)

Page 36: Redlining Kafka Pipelines

Kafka 0.10 message format (compressed + append time)

Page 37: Redlining Kafka Pipelines

Kafka 0.10 message format (compressed + create time)

Page 38: Redlining Kafka Pipelines

Handling produce requests

Page 39: Redlining Kafka Pipelines

Handling produce requests

Page 40: Redlining Kafka Pipelines

Handling produce requests

Page 41: Redlining Kafka Pipelines

Handling produce requests

Page 42: Redlining Kafka Pipelines

Handling produce requests

Page 43: Redlining Kafka Pipelines

Handling produce requests

Page 44: Redlining Kafka Pipelines

Handling produce requests

Page 45: Redlining Kafka Pipelines

Handling produce requests

Page 46: Redlining Kafka Pipelines

Handling fetch requests

Page 47: Redlining Kafka Pipelines

Handling fetch requests

Page 48: Redlining Kafka Pipelines

Handling fetch requests

Page 49: Redlining Kafka Pipelines

Handling fetch requests

Page 50: Redlining Kafka Pipelines

Handling fetch requests

Page 51: Redlining Kafka Pipelines

Migrate clients before switching to 0.10 message format

Ideal Less ideal Worse Worst

Majority producer version 0.10 0.9 0.10 0.9

Majority consumer version 0.10 0.10 0.9 0.9

Page 52: Redlining Kafka Pipelines

Migrating to the new message format

Page 53: Redlining Kafka Pipelines

Migrating to the new message format

Page 54: Redlining Kafka Pipelines

Migrating to the new message format

Page 55: Redlining Kafka Pipelines

Migrating to the new message format

Page 56: Redlining Kafka Pipelines

Migrating to the new message format

Page 57: Redlining Kafka Pipelines

Migrating to the new message format

Page 58: Redlining Kafka Pipelines

Migrating to the new message format

Page 59: Redlining Kafka Pipelines

Migrating to the new message format

Page 60: Redlining Kafka Pipelines

Migrating to the new message format

Page 61: Redlining Kafka Pipelines

Migrating to the new message format

Page 62: Redlining Kafka Pipelines

Migrating to the new message format

Page 63: Redlining Kafka Pipelines

Migrating to the new message format

Page 64: Redlining Kafka Pipelines

Migrating to the new message format

Page 65: Redlining Kafka Pipelines

Migrating to the new message format

Page 66: Redlining Kafka Pipelines

U S E C A U T I O N ! !

Severe performance degradation with older clients and there is no roll-back after switching

Page 67: Redlining Kafka Pipelines

So know your clients!

•  Useful to have a shepherding system in your service infra

•  EOL older libraries

•  Check API versions in public access logs

•  Add API version metrics to the Kafka broker

PRODUCERS CONSUMERS

Page 68: Redlining Kafka Pipelines

Kafka 0.10 broker profile

Page 69: Redlining Kafka Pipelines

Impact of new message format

Page 70: Redlining Kafka Pipelines

•  Fine-grained time-based offset lookup

•  Facilitates correctness of time-based retention

Other benefits of 0.10 format

Page 71: Redlining Kafka Pipelines

“Time” under 0.9 message format

Message time is mtime of containing segment so time granularity is at segment-level

Page 72: Redlining Kafka Pipelines

“Time” under 0.9 message format

Time resets on partition reassignment

Page 73: Redlining Kafka Pipelines

Time in 0.10 message format

Time indexes built from timestamps in messages

Page 74: Redlining Kafka Pipelines

Time in 0.10 message format

Page 75: Redlining Kafka Pipelines

Mirroring pipelines (embracing the 0.7 way)

Page 76: Redlining Kafka Pipelines

0.7 mirror maker was a dumb fast pipe

Page 77: Redlining Kafka Pipelines

0.7 mirror maker was a dumb fast pipe

Page 78: Redlining Kafka Pipelines

0.7 mirror maker was a dumb fast pipe

Page 79: Redlining Kafka Pipelines

0.7 mirror maker was a dumb fast pipe

Page 80: Redlining Kafka Pipelines

… limited only by network

0 . 7 M I R R O R M A K E R

Page 81: Redlining Kafka Pipelines

0.8+ mirror maker

•  Needs to preserve order of keyed messages

Page 82: Redlining Kafka Pipelines

0.8+ mirror maker

•  Needs to preserve order of keyed messages

Page 83: Redlining Kafka Pipelines

0.8+ mirror maker

•  Needs to preserve order of keyed messages

Page 84: Redlining Kafka Pipelines

0.8+ mirror maker

•  Needs to preserve order of keyed messages

•  0.8+ consumers do not support shallow iteration (KAFKA-732)

•  0.8+ producers do not support pass-through mode

Page 85: Redlining Kafka Pipelines

Kafka 0.8+ mirror maker profile

Page 86: Redlining Kafka Pipelines

Handling keyed messages in pass-through mode

•  Need to preserve order of keyed messages… but pass-through mirror maker cannot repartition

Page 87: Redlining Kafka Pipelines

Handling keyed messages in pass-through mode

•  Need to preserve order of keyed messages… but pass-through mirror maker cannot repartition

•  Work around is to require identical partition counts across all clusters and do identity partitioning

•  i.e., Pinput= Poutput

Page 88: Redlining Kafka Pipelines

•  Restore shallow iteration in consumer (KAFKA-1895)

•  “Todd’s trick” – introduce an identity compression codec in producer

•  Uniform partition counts across clusters

0.10 pass-through mirroring how-to

Page 89: Redlining Kafka Pipelines

•  Restore shallow iteration in consumer (KAFKA-1895)

•  “Todd’s trick” – introduce an identity compression codec in producer

•  Uniform partition counts across clusters

•  … and a few more subtleties (future talk)

0.10 pass-through mirroring how-to

Page 90: Redlining Kafka Pipelines

•  Jiangjie Qin (KIP-3[1,2,3])

•  Todd Palino (pass-through mirroring)

•  Kafka open source community

Acknowledgments

Page 91: Redlining Kafka Pipelines

+