optimizing your cloud for millions of connections: a case study · 2012. 5. 25. · optimizing your...

49
Optimizing your cloud for millions of connections: A Case Study Zbyněk Šlajchrt

Upload: others

Post on 21-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Optimizing your cloud for millions of connections: A Case StudyZbyněk Šlajchrt

Page 2: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Overall architecture

Page 3: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 4: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

DeliveryRate vs. Cost

• Max(DeliveryRate)• Min(Cost)

• DeliveryRate ~ Servers*BandwithPerServer• Servers = ActiveClients/MaxConsPerServer• Cost = ServerCost + NetworkCost• ServerCost = CostPerServer*Servers• CostPerServer ~ 1/Servers• NetworkCost ~ MinimalDeliveryRate

Page 5: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Pull vs. Push

• Pull+ Existing distribution technology, simple

- Danger of SYN flood, minimal control over connected clients

• Push+ Better control over clients, kept-alive connections => less SYNs

- Unproven technology, more complex- How many connections can we keep on one

server? Less than 1M => Pull, otherwise Push

Page 6: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 7: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

System Configuration

• CentOS 6• Max open file descriptors (startApp.sh)

ulimit -n 3000000

• System wide settings for file descriptors (/etc/sysctl.conf)fs.nr_open = 6291456

fs.file-max = 6291456

• System wide limit for open files per user (/etc/security/limits.conf)* hard nofile 6291456

Page 8: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 9: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 10: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Technologies

Page 11: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Netty

• Asynchronous event-driven network application framework written in Java by Trustin Lee

• Very good performance thanks to NIO and underlying epoll or kqueue kernel functions

• Easy programming model

Page 12: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 13: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 14: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 15: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Google Protocol Buffers

• Mechanism for serializing structured data• Language neutral• Platform neutral• Java, C++, Python• Very easy to grasp• Suitable for specifications (SRS)

Page 16: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 17: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Components

Page 18: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Key components

• Comet Session Handler• Distributor• Commands Queue Store• Bandwidth Limiter• Client module

Page 19: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 20: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Comet Session Handler

• Netty handler managing comet sessions• Creates, registers and destroys comet

sessions• Decodes command results from HTTP

requests• Encodes commands as HTTP responses

Page 21: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 22: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Distributor

• Responsible for continuous delivery of commands to clients (scheduled thread)

• Loops over all currently connected sessions

• For each session selects commands to be sent

• Uses client’s Queue Status held in session

Page 23: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 24: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 25: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 26: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Commands Queue Store

• Storage for commands to be distributed• Populated by the Viruslab• Commands are stored in one or more queues• Every queue has a priority ordinal (zero-based)• Client holds and sends the Queue Status

– List of last command IDs for every queue (used)– Serves for selecting command for a client by

Distributor

Page 27: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Command Scopes

• Priority 0 – request scope– Not kept in the client’s Queue Status =>

Command with the same ID can be delivered multiple times– Similar to JMS Topic (publish-subscribe) – commands are

distributed to the currently connected only• Priority 1 – session scope

– the Queue Status until the application is running => No command is delivered twice as long as the application runs

• Priority >1 – persistent scope– Last cmd IDs are kept persistently on the client– No command is sent twice unless the client is reinstalled

Page 28: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 29: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 30: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 31: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Bandwidth Limiter

• Protects kernel’s socket memory from exhaustion

• In case of big packets the distribution rate may exceed the bitrate of the NIC

• Limiter continuously computes the distribution rate and compares it with the predefined max bitrate

• Auto-regulating algorithm

Page 32: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 33: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 34: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Client side

• Command interpreter• DLL module written in C • Integrated with Avast AV• CURL library

Page 35: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Monitoring

Page 36: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

JMX Aggregator

• From a defined set of servers – Aggregates read-only attributes into one

tabular JMX attribute– Corresponding read-write attributes on

servers are represented by one attribute on the aggregator

Page 37: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Aggregated read-only JMX attribute example:

HeapMemoryUsage.used[Node1]=18GHeapMemoryUsage.used[Node2]=16GHeapMemoryUsage.used[Node3]=19G

Aggregated read-write attributeexample:

MaxThroughput = 40000

Page 38: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Zenoss

• Network management platform• Based on Zope app server (Python)• GPL v2• Supports JMX• Nice web interface• Maintains history

Page 39: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 40: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 41: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 42: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 43: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

SLOGAN

• Analyzer of log streams• Command line tool• SQL-like syntax (filtering, grouping,

sorting)• Events are POJOs• Can be represented in Google Protobuf

(very handy for Streaming Updates)• Events are serialized and appended to a

log file

Page 44: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

SLOGAN - Examples

Usage:tail –f app.log | slogan <query>

Query examples:

tstamp etype

tstamp –f etype==‘NOP’

tstamp max(tstamp)-min(tstamp) –g reqID

Page 45: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

H2 Database

• Used as the storage for the commands• Very fast in-memory database• Both embedded and server modes• JDBC API• Nice web-based console

Page 46: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery
Page 47: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Q&A

Thanks for your listening!

[email protected]

Page 48: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Title

• Text

Page 49: Optimizing your cloud for millions of connections: A Case Study · 2012. 5. 25. · Optimizing your cloud for millions of connections: A Case ... • Responsible for continuous delivery

Title

• Text