Download - Scalable Web Apps
Types of scaling
Horizontal scalingscale out
Vertical scalingscale up
Server #3Server #2Server #1
Think about your app as a workernot single instance
Load balancer
App #1 App #2 App #3 App #4 App #5
OS
App
Server #3
Server #2
Server #1
Load balancer
App #1 App #2
App #3 App #4
App #5Load balancer
Server #n
Think about your app as a workernot single instance
Sessions
We need:
• Common• Fast• Persistent
Storage for sessions.
Sessions
Server #3Server #2Server #1
Load balancer
App #1 App #2 App #3 App #4 App #5
Session storage
App
OS
Sessions - Redis
• Key-value in memory database (hash-tabled)• Scalable up to 1k nodes• Partitioning with Query routing• Non blocking M-S replication on nodes• Clustered (currently not production ready)http://athlan.pl/symfony2-redis-session-handler/
Redis - Partitioning with Query routing
Node #1 Node #2 Node #3
Query random
node
Miss Hit, abort
Also supported:• Client-side partitioning (app calls appropriate
node)• Proxy assisted partitioning (proxy selects
appropriate node)
Centralized Logging
• Logs should be centrailzed to avoid taking notice to each node separately
• Approaches:– File replication (rsync + cron)– syslog (easy to integrate with log4j)• syslogd over UDP p:514• rsyslog over TCP, stores data in db
Common storage, no local changes!
• Keep storage avaliable to all nodes– Symfony2 Gaufrette Bundle• FTP• Amazon S3• OpenCloud• AzureBlobStorage• Rackspace
Architecture
Server #3Server #2Server #1
Load balancer
App #1 App #2 App #3 App #4 App #5
Session storage Files storage abstraction
Centralized logging
App
OS
OS
Continuous Integration
• To keep all nodes up-to-date, you need CI• Automatize disabling nodes, building,
deploying– Jenkins CI
Contineous Integration
1. Disable service on node2. Deploy/build app
1. Copy files2. Update db schema (liquibase, ORM schema
update)3. Execute scripts
3. Re-run service
Balance the payload - HAProxy
• Very, very fast proxy!• Software TCP/HTTP load balancer• Different node selecting algorithms:– roudrobin (limit 4128)– static-rr– leastconn (lowest number of connections)
Yeah guys, this is logo :)
But no schema is needed just imagine how it works.
Balance the payload - HAProxy
• You can check node’s status by pinging• Dead node is excluded from balancing strategy
vi /etc/haproxy/haproxy.cfgoption httpchk HEAD /check.txt HTTP/1.0server webA 192.168.0.102:80 checkserver webB 192.168.0.103:80 check
Balance the payload - HAProxy
• Monitor node’s status by read stats from socket via socat.
echo "show stat" | socat /tmp/haproxy.sock stdio
Balance the payload - HAProxy
• Monitor node’s status by native stats webapp console
Nodes Monitoring - Zabbix
• Zabbix, centralized server monitoring
Zabbix + HAProxy
• UserParameter=haproxy.qcur[*], echo "show stat" | socat /tmp/haproxy.sock stdio | grep -i '$1' | sed 's/,/\ /g' | awk '{print $$3}'
Reverse Proxy and Varnish cache
http://tomayko.com/writings/things-caches-do
• Global virtual user = global cache
Reverse Proxy – Expiration model
http://tomayko.com/writings/things-caches-do
Reverse Proxy – Expiration model
http://tomayko.com/writings/things-caches-do
Reverse Proxy – Validation model
http://tomayko.com/writings/things-caches-do
Reverse Proxy – Validation model
http://tomayko.com/writings/things-caches-do
Reverse Proxy and Varnish cache
Varnish:80
Apache:81
App
Reverse Proxy and Varnish cache
Varnish:8080
Apache:8081
App
Varnish:8082
Apache:8083
App
HAProxy:80
Reverse Proxy and Varnish cache
Apache:8081
App
Apache:8082
App
HAProxy:81
Varnish:80
Varnish and ESI
<!DOCTYPE html><html><body><!-- ... some content --><!-- Embed the content of another page here --> <esi:include src="http://..." /><!-- ... more content --></body></html>
Scaling databases - Master slave
Master
Slave Slave Slave
Write
Read
• All data redundancy
MongoDB scaling
• Common models to spread data over nodes:– range keys– hash keys
• Many nodes on cheap machines• No all data redundancy in each node
MongoDB – range-based keys
• Awesome for range queries (grab data from min nodes – Query isolation)
• Not good enough to distribute data over nodes in case of monotinic incemental
http://docs.mongodb.org
MongoDB – hash-based keys
• Take notice: not good for range queries whilemerge-sorting, no Query isolation in this case
• Write scaling – Write to many nodes simultaneously (take notice to readers-writer lock, where write is exclusive)
http://docs.mongodb.org
CQRS
• Command Query Responsibility Segregation– separate application service layers for writing and
readng from DB (possibility to use different data sources like RAM or DB)
CQRS
• Examples– post-insert population cache• all SELECTs are from cache (even invalid)• consider LFU instead of LRU to invaidate cache
– pre-insert into memory• dump results periodicaly
In both approaches there is convenient to use Queues or data bus !
Queues, RabbitMQ
• RabbitMQ is based on AMQP (Advanced Message Queuing Protocol)– point-to-point– publish-and-subscribe– queueing, routing
• AMQP is not JMS (Java Message Service is an API, not protocol)
• Happy Rabit is empty Rabbit– do not try to store any data (messages) in queue
system in persistent mode to keep HA
• Simple queue
• Work queues(one consumer)
• Publish/Subscribe(many consumers)
Queues, RabbitMQ
Box vs spread architecture.
• Box architecture– no scaling– easy to maintenance
Server
RabbitMQ DBRedis
Webapp Varnish
Server #2
Box vs spread architecture.
• Spread architecture– High availability– more integrations, more administrative
Server #1
RabbitMQ
DB shardRedis
HAProxy
Varnish
Webapp
Server #3
DB shard Varnish
Webapp