Kafka Quotas Talk at LinkedIn

Download Kafka Quotas Talk at LinkedIn

Post on 28-Jan-2018

1.780 views

Category:

Engineering

0 download

TRANSCRIPT

  1. 1. 2015 LinkedIn Corporation. All Rights Reserved. Aditya Auradkar & Dong Lin
  2. 2. 2015 LinkedIn Corporation. All Rights Reserved. Motivation: Why is this important? Shared resources in a multi-tenant environment Bad clients can hurt others Bootstrapping consumers Buggy clients Better QOS for well-behaved clients Preserve throughout and latency for everyone else API Limits/Billing
  3. 3. 2015 LinkedIn Corporation. All Rights Reserved. Clients and Client-Ids Quotas are enforced per client-id Why client-id? No quotas per topic No quotas per topic * client-id combination Blanket produce and fetch quota for all clients
  4. 4. 2015 LinkedIn Corporation. All Rights Reserved. Quota Overrides Certain clients justify higher quotas Rolling bounces take too long and require too much effort Store overrides in ZooKeeper Brokers parse config change notifications Apply new quota immediately
  5. 5. 2015 LinkedIn Corporation. All Rights Reserved. Quota Overrides { "version":1, "config": { "producer_byte_rate":"1048576", "consumer_byte_rate":"1048576 } }
  6. 6. 2015 LinkedIn Corporation. All Rights Reserved. Broker Metrics Metrics created for each client Clients can come and go Dont need to retain client metrics forever GC metrics if inactive for longer than 1 hr Recreate if client reconnects
  7. 7. 2015 LinkedIn Corporation. All Rights Reserved. Enforcement Reduce client throughput to desired rate Compute delay based on current throughput Small violations result in small delays Use smaller measurement windows to avoid long pauses Client side metrics available to detect throttling
  8. 8. 2015 LinkedIn Corporation. All Rights Reserved. Delay Calculation Delay = W * ( - Q) / W = window size, = observed rate, Q = desired rate
  9. 9. 2015 LinkedIn Corporation. All Rights Reserved. replica manager log quota manager Enforcement producer r e q u e s t c h a n n e l 1. request 7. response 3. append 4. record metric 5. delay delay queue 6. dequeue delay queue 2. process
  10. 10. 2015 LinkedIn Corporation. All Rights Reserved. replica manager log quota manager Enforcement r e q u e s t c h a n n e l 1. request 7. Response (zero copy) 3. fetch offsets 4. record metric delay queue 6. dequeue delay queue 2. process 5. delay consumer
  11. 11. 2015 LinkedIn Corporation. All Rights Reserved. Slowdown vs Error Error handling is hard Tricky to implement backoff and retries All client implementations need to handle quota errors Need something easier
  12. 12. 2015 LinkedIn Corporation. All Rights Reserved. Getting Started Important Broker configs quota.producer.default (in bytes/sec) quota.consumer.default (in bytes/sec) Apply overrides ./bin/kafka-configs.sh --alter --add-config 'producer_byte_rate=1048576,consumer_byte_rate=1048576 --entity-type clients --entity-name TestTopic --zookeeper localhost:2181 Read overrides ./bin/kafka-configs.sh --describe --entity-type clients --entity-name TestTopic --zookeeper localhost:2181
  13. 13. 2015 LinkedIn Corporation. All Rights Reserved. Monitoring Producer metrics throttle-time avg and max Consumer metrics throttle-time avg and max Broker metrics byte-rate and avg throttle-time per client-id byte-rate is used for enforcement ZookeeperConsumerConnector and SimpleConsumer metrics also available
  14. 14. 2015 LinkedIn Corporation. All Rights Reserved. Rollout Strategy Deploy without enforcement Monitor metrics to track throughput for all clients Identify candidates for overrides Start with high thresholds
  15. 15. 2015 LinkedIn Corporation. All Rights Reserved. Evaluation Validate quota functionality - broker-throughput