cloud messaging service: technical overview
TRANSCRIPT
![Page 1: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/1.jpg)
Cloud Messaging Service Technical Overview
P R E S E N T E D B Y M a t t e o M e r l i| S e p t e m b e r 2 1 , 2 0 1 5
![Page 2: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/2.jpg)
Sections
2
1. Introduction 2. Architecture 3. Bookkeeper 4. Future 5. Q & A
CMS - Technical Overview
![Page 3: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/3.jpg)
What is CMS
3
• Hosted Pub / Sub • Multi tenant (Auth / Quotas / Load Balancer) • Horizontally scalable • Highly available, durable and consistent storage • Geo Replication • In production since 2013
CMS - Technical Overview
CMS ClusterProducer
Broker
Consumer
Bookie
ZKGlobalZK
Replication
![Page 4: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/4.jpg)
CMS key features
4 CMS - Technical Overview
• Multi-tenancy / hosted • Operating a system at scale is hard and requires deep understanding of internals • Authentication / Self service provisioning / Quotas
• SLAs (Write latency 2ms avg - 5ms 99pct) • Maintain the same latencies and throughput under backlog draining scenarios
• Simple high level API with clear ordering, durability and consistency semantics • Geo-replication
• Single API call to configure regions to replicate to • Load balancer: Dynamically optimize topics assignment to brokers • Support large number of topics • Store subscription position
• Apps don’t need to store it • Able to delete data as soon as it's consumed • Support round-robin distribution across multiple consumers
![Page 5: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/5.jpg)
Work load examples
5 CMS - Technical Overview
Challenge # Topics # Producers / topic
# Subscriptions / topic
Produced msg rate / s / topic
Fan-out 1 1 1 K 1 KThroughput & latency 1 1 1 100 K
# Topics & latency 1 M 1 10 10
Fan-in 1 1 K 1 > 100 K
• Design to support wide range of use cases • Need to be cost effective in every case
![Page 6: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/6.jpg)
2. Archi tecture
![Page 7: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/7.jpg)
Messaging model
7 CMS - Technical Overview
• Producers can attach to a topic and send messages to it • A subscription is a durable resources that is the recipient of all messages sent to
the topic, after its creation • Subscriptions do have a type:
• “Exclusive” means that only one consumer is allowed to attach to this subscription. First consumer decides the type.
• “Shared” allows multiple consumers. Messages are sent in round-robin distribution. No ordering guarantees.
• “Failover” allows multiple consumers, though only one is receiving messages at a given point, while others are in standby mode.
Consumer-5
Failover
Subscription-C
Consumer-4
Consumer-3
Consumer-2
Subscription-B
Shared
Exclusive
Consumer-1Subscription-AProducer-X
Producer-Y
Topic
![Page 8: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/8.jpg)
Client API
8
▪ Expose messaging model concepts (producer/consumer) ▪ C++ and Java ▪ Connection pooling ▪ Handle recoverable failures transparently (reconnect / resend
messages) without compromising ordering guarantees ▪ Sync / async version of every operation
CMS - Technical Overview
![Page 9: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/9.jpg)
Java producer example
9
CmsClient client = CmsClient.create("http://<broker vip>:4080");
Producer producer = client.createProducer("my-topic");
// handles retries in case of failure producer.send("my-message".getBytes());
// Async version: producer.sendAsync("my-message".getBytes()).thenRun(() -> { // Message was persisted });
CMS - Technical Overview
![Page 10: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/10.jpg)
Java consumer example
10
CmsClient client = CmsClient.create(“http://<broker vip>:4080");
Consumer consumer = client.subscribe( “my-topic", "my-subscription-name", SubscriptionType.Exclusive);
// Blocks until message available Message msg = consumer.receive(); // Do something... consumer.acknowledge(msg);
CMS - Technical Overview
![Page 11: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/11.jpg)
System overview
11 CMS - Technical Overview
Broker • State-less • Maintain in memory cache of
messages • Read from Bookkeeper when
cache miss
Bookkeeper • Distributed write-ahead log • Create many ledgers
• Append entries • Read entries • Delete ledger
• Consistent reads • Single writer (the broker)
CMS Cluster
Broker
Bookie
ZKGlobal
ZK
Replication
Nativedispatcher
ManagedLedger
BK Client
Globalreplicators
Cache
LoadBalancer
Producer App
CMS client
Consumer App
CMS client
![Page 12: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/12.jpg)
System overview
12 CMS - Technical Overview
Native dispatcher • Async Netty server
Global replicators • If topic is global, republish
messages in other regions
Global Zookeeper • ZK instance with participants in
multiple US regions • Consistent data store for
customers configuration • Accept writes with one region
down CMS Cluster
Broker
Bookie
ZKGlobal
ZK
Replication
Nativedispatcher
ManagedLedger
BK Client
Globalreplicators
Cache
LoadBalancer
Producer App
CMS client
Consumer App
CMS client
![Page 13: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/13.jpg)
Partitioned topics
13
▪ Client lib has a wrapper producer/consumer implementation
▪ No API changes
▪ Producers can decide how to assign messages to partitions: ▪ Single partition ▪ Round robin ▪ Provide a key on the message ▪ Hash of the key determines the
partition
▪ Custom routing
CMS - Technical Overview
App
CMS Cluster
Broker 1
Prod
ucer
T1
P0
P1
P2
P3
P4
T1-P0
Broker 2
Broker 3
T1-P1
T1-P2
T1-P3
T1-P4
![Page 14: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/14.jpg)
Partitioned topics
14
▪ Consumers can use all subscription type with the same semantics
▪ In “Failover” subscription type, the election is done per partition
▪ Evenly spread the partitions assignment across all available consumers
▪ No need for ZK coordination
CMS - Technical Overview
CMS Cluster
Broker 1 App
Con
sum
er-1
T1
C0
C1
C2
C3
C4
T1-P0
Broker 2
Broker 3
T1-P1
T1-P2
T1-P3
T1-P4
App
Con
sum
er-2
T1
C0
C1
C2
C3
C4
![Page 15: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/15.jpg)
3. Bookkeeper
![Page 16: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/16.jpg)
CMS Bookkeeper usage
16
▪ CMS uses Bookkeeper through a higher level interface of ManagedLedger: › A single managed ledger represent the storage of a single topic › Maintains list of currently active BK ledgers › Maintains the subscription positions using an additional ledger to checkpoint the last
acknowledged message in the stream › Cache data › Deletes ledgers when all cursors are done with them
CMS - Technical Overview
![Page 17: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/17.jpg)
Bookie internal structure
17 CMS - Technical Overview
• Writes are written both to journal and to ledger storage (in different device)
• Ledger storage writes are fsynced periodically
• Reads are only coming from ledger storage
• Entries are interleaved in entry log files
• Ledger indexes are used to find entries offset
![Page 18: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/18.jpg)
Bookkeeper issues
18
▪ Performance degrades when writing to many ledgers at the same time ▪ When there are heavy reads, the ledger storage device gets slow and
will impact writes ▪ Ledger storage flushes need to fsync many ledger index files each time
CMS - Technical Overview
![Page 19: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/19.jpg)
Bookie storage improvements
19 CMS - Technical Overview
• Writes are written both to journal and to in memory write cache
• Entries are periodically flushed • Entries are sorted by ledger to
be sequential on disk (per flush period)
• Since entries are sequential, we added read-ahead cache
• Location index is mostly kept in memory and only updated during flush
![Page 20: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/20.jpg)
Bookkeeper write latency
20
▪ After hardware, next limit to achieve low latency is JVM GC ▪ GC pauses are unavoidable. Try to keep them around ~50ms and as
least as frequents as possible › Switched BK client and servers to use Netty pooled ref-counted buffers and direct
memory to hide it from GC and eliminate payload copies › Extensively profiled allocations and substantially reduced per-entry objects allocations • Use Recycler pattern to pool objects (very efficient for same thread allocate/release) • Primitive collections • Array queue instead of linked queues in executors • Open hash maps instead of linked hash maps • BTree instead of ConcurrentSkipList
CMS - Technical Overview
![Page 21: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/21.jpg)
Bookie ledgers scalability
21 CMS - Technical Overview
Single bookie — 15K write/sB
K w
rite
late
ncy
(ms)
0
1
2
3
4
Ledgers / bookie
1 1000 5000 10000 20000 50000
Avg 99pct
![Page 22: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/22.jpg)
4. Future
![Page 23: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/23.jpg)
Auto batching
23
▪ Send messages in batches throughout the system ▪ Transparent to application ▪ Configure group timing and size: e.g.: 1ms / 128Kb ▪ For the same byte/s throughput lower the txn/s through the system › Less CPU usage in broker/bookies › Lower GC pressure
CMS - Technical Overview
![Page 24: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/24.jpg)
Low durability
24
▪ Current throughput bottleneck for bookie writes is journal syncs ▪ Could add more bookies but bigger cost ▪ Some use cases are ok to lose data in rare occasions ▪ Solution › Store data in bookies • No memory limitation, can build big backlog
› Don’t write to bookie journal • Data is stored in write cache in 2 bookies + broker cache
› Can lose < 1min data in case 1 broker & 2 bookies crash
▪ Higher throughput with less bookies ▪ Lower publish latency
CMS - Technical Overview
![Page 25: Cloud Messaging Service: Technical Overview](https://reader033.vdocuments.us/reader033/viewer/2022042706/58a45bb91a28abb8288b45ed/html5/thumbnails/25.jpg)
5. Q & A