mysql latency

26
Jeff Freund, CTO Clickability

Upload: srubinstein

Post on 04-Jul-2015

1.840 views

Category:

Technology


2 download

DESCRIPTION

Clickability CTO Jeff Freund addresses the MySQL Conference, April 14 2008.

TRANSCRIPT

Page 1: Mysql Latency

Jeff Freund, CTOClickability

Page 2: Mysql Latency

End of a long day, I am the last stop between you and …..

~6 hours 42 mins left

Yippee!

Future CTO

Page 3: Mysql Latency

• Software-as-a-Service Web CMS• True Multi-Tenant SaaS platform from the

ground up• Integrated solution of all services required to run

a sophisticated business website• HQ in San Francisco, 8+ years old, 60+

employees

Global leader in On Demand Web Content Management

Page 5: Mysql Latency

250+ million pages delivered per month

Page 6: Mysql Latency

• Linux• Apache• MySQL• Java• Tomcat

Proven open source building blocks

Page 7: Mysql Latency

• Scale-out horizontally• Distributed infrastructure, including

multiple datacenters• Multiple Layers of caching for performance• Loose-coupling of applications around

data

Page 8: Mysql Latency

M1 M2

S3 S4S2S1 S5 S6

VPN Tunnel

Data Center 1 Data Center 2

Page 9: Mysql Latency

SlaveMaster

RO ConnectionManager

RW ConnectionManager

con = db.getReadWriteConnection(); con = db.getReadOnlyConnection();con = db.getSafeReadConnection();

Application Code• Intelligently Split Queries between Masters and Slaves

• Inserts/Updates/Deletes sent to Master

• Most Reads sent to Slaves

• “Safe” Reads sent to Masters – zero tolerance for latency

• Manual code updates to implement the split

• 6+ months in production to find all “Safe” Reads

Page 10: Mysql Latency

• The difference in time between when a transaction is committed on one database and then subsequently committed on a replicated database.

• Latency can either be “slowness” or “breakage”

Page 11: Mysql Latency

7… Hardware Maintenance / Recovery

6… Schema updates / DB Maintenance

5… Elevated transaction rates (i.e. bulk loads)

4... High query load on slaves

3… Network bottlenecks / Loss of connectivity

2… “Slave Errors” (ie Duplicate keys, deadlocks)

Page 12: Mysql Latency
Page 13: Mysql Latency

while ( 1 )while? echo "show slave status \G;" | mysql -u USER --

password=PASSWORD | grep Seconds_Behind_Master >> replication.log

while? sleep 1while? end

Seconds

Page 14: Mysql Latency

M1 M2

S4 S6S3S2S1 S5

VPN Tunnel

Data Center 1 Data Center 2

Page 15: Mysql Latency

M1 M2

S4 S6

V PNTunnel

CREATE TABLE `replTest` ( timecol` bigint(20) default NULL, KEY `idx_timecol` (`timecol`) )

Loop:$val = current timestamp in epoch millisecondsM2: INSERT INTO replTest (timecol) VALUES ($val)M1: SELECT $val -max(timecol) from replTest;S4: SELECT $val -max(timecol) from replTest;S6: SELECT $val -max(timecol) from replTest;

INSERT

Page 16: Mysql Latency

Database Characteristics Average Latency Max Latency

M2 Transaction Source N/A N/A

M1 Local; Moderate Load ~ 6 ms ~ 315 ms

S4 Local; High Load ~ 190 ms ~12 seconds

S6 Remote; Minimal load ~ 5 ms ~ 400 ms

• All DBs are 1 replication hop away from transaction source• All hardware is roughly equal• Remote location is ~ 60 miles away

• Data taken from 100,000 samples over an hour of standard operations

Page 17: Mysql Latency

S4 Database

95 % of the time, replication latency will be 1 second or less

milliseconds

Page 18: Mysql Latency

• Now what?

Page 19: Mysql Latency

Assume that it will happen in the course of standard operations. Build the application to accommodate it.

If you do, your Ops Team will love you for it.

Page 20: Mysql Latency

• Local ehcache on application servers

• Distributed Object Cache (memcached)

• Need to clear all caches effectively on object updates

Pub 1 Pub 2 Pub 3

Distributed Object Cache

Local cache

Reliable Cache Clearing Messages

Page 21: Mysql Latency

• Multicast Notification Bus for “clear cache” messages

• The race is on! If message arrives before transaction is replicated, stale object maybe reloaded….

• Frequently accessed objects most susceptible to problems

CMS Pub

DB1 DB2

Page 22: Mysql Latency

• Multicast Notification Bus with tuning parameters

• The race is on again! But the database transaction gets a tunable head start. 0.5 sec, 1 sec, 2 secs, 5 secs

• Better – lasted for years, but in the end 99.99+% still wasn’t reliable enough...(remember the long tail on chart?)

CMS PUB

DB1 DB2

Page 23: Mysql Latency

• Database Queue table for messages

• Messages are committed after data, injecting them into the replication data stream.

• All apps poll the database queue table once per second.• Guaranteed that data will arrive before message!!!

CMS PUB

DB1 DB2

QueuePoller

Page 24: Mysql Latency

• If you don’t need to replicate it, don’t!

• Split data functionally (i.e. separate large blog storage from relational transactions to keep the pipes clear)

• Build the appropriate recovery tools – our “rewind button”

Page 25: Mysql Latency

• Masters in multiple data centers

• Greater geographic distance between data centers

• MySQL load balancing – will messaging still be reliable???

Page 26: Mysql Latency

[email protected]

Questions? Feedback?