zarafa scaling & performance
DESCRIPTION
Presentation of Steve Hardy about Scaling and performance tuning at Zarafa SummerCamp 2011TRANSCRIPT
Scaling & Performance: Seven new insights
Steve Hardy, Zarafa
• Basics presented at last summercamp– Split-table ‘properties’– One for row-order data– One for column-order data
Zarafa 7.0: I/O improvements
• Row order
Row-order and column-order
• Column order
Row-order and column-order
Why is hybrid better ?
IOPS Load e-mail
Sort column
Roworder (Zarafa 4.0)
1 ~30
Column order (Zarafa 4.1)
~30 1
Hybrid (Zarafa 7.0)
1 1
• Best of both worlds• Drawback: needs
more storage• But this storage can
be cheaper• Not double
– Lots of data not needed in column order (eg attachments)
• Much faster load of e-mails (less IOPS)
Read flags
• Read flags no longer in index– Not actually used,
even in Zarafa 6– Modifying read flag
meant key change, causing I/O
IOPS Set All Read
Sort on read flag
Zarafa 6 5 10
Zarafa 7 2 1
Sorting
• With many folders, Zarafa 6 becomes less efficient
• Assumes N folders with equally distributed messages
IOPS Sorting Move Save
Zarafa 6
N 1 1
Zarafa 7
1 2 1
Counters
• Many counters are used:– Total count– Unread count– Deleted count– Folder count– Deleted folder count– Associated count– Deleted associated count
• In Zarafa 7, counters are tracked incrementally
• Assumes folder with 10k messages
Records scanned / updated
Open folder
Modify flag
Zarafa 6 10k 1
Zarafa 7 7 2
• More efficient when writes are grouped together• Writes are ‘deferred’ until later• Separate process in Zarafa purges deferred write• Saves about up to 50% in I/O while writing to tproperties
Deferred writes to tproperties (column order)
Write data DeferWrite
tproperties
• Control number of deferred objects• Counter reset (set to no for higher performance)• Counter resets shown in stats (zarafa-stats –system)• Deferred purges shows in stats
New options in Zarafa 7.0
Setting Default
max_deferred_records 10000
counter_reset yes
• Biggest advantages in slower I/O subsystems• Should show most advantages on low-RPM disks (eg SATA)• Allows to have about 15% of your data on SSD, which uses 50% of
IOPS– Example: 100GB of SSD is enough to reduce IOPS by half for total data
size of 600GB– Depending on your vender 100GB of SSD is between EUR100 and
EUR1600– Example 2: Instead of buying 6 15k SAS disks @ 200 IOPS each (total
1200 IOPS), also possible to buy 1 SSD disk and 5 SATA disks @ 120 IOPS, which even has a higher capacity.
– Especially interesting for large-scale (cloud) rollouts
I/O changes: things to remember
• Records are much more compact in memory inside MySQL in Zarafa 7.0 than in Zarafa 6
• Possibly more cache hit ratio achievable by using more cache memory in MySQL, less in Zarafa
• No large-scale testing done yet, this will follow• Even when cache is more efficiently packed in MySQL RAM, latency
between Zarafa and MySQL server is still an issue, so you always need some cache in Zarafa
Cache tuning
MySQL 5.5 introduced linux native AIO
• AIO = asynchronous I/O, meaning that multiple requests can be sent to RAID at once
• Before, there was ‘emulated AIO’ which could do at most X requests simultaneously (X = 4 for most installations)
• Some requirements:– O_DIRECT enabled– ext3, ext4, xfs or direct
block device– AIO-capable kernel
(was introduced in 2.6.24 or so)
• Multiple requests are only sent for parallel queries• If you run one query, you only can have one outstanding I/O request• Makes some things slow:
– Mysqldump– Alter table– Queries doing non-trivial range queries (eg select * from a where id > 10
and id < 10000)
Why AIO is not as great as it sounds
• http://bugs.mysql.com/bug.php?id=60087• Enables prefetch mechanism for sequential reads in innodb• Main advantage for Zarafa:
– Table queries for large tables (sorting)– Faster mysqldumps
Our patch for MySQL
• Anyone who signs up to mysql’s bugtracker and comments on the bug, asking for it to be reviewed will receive a free beer from me this evening (no kidding)
Free Beer (Free as in beer)
http://bugs.mysql.com/bug.php?id=60087
http://bit.ly/kVlQVv
MSR: Mail Store Relocator
• Used to migrate user stores between cluster nodes
Server1 Server2
• Little known feature of MSR– Normally used for user migration between nodes– Now also allows bi-directional synchronisation– Possible within cluster AND outside cluster– ACLs only synchronized when inside cluster
• Applications:– Distributed public store
• Requires EVERY server in cluster to contain public replica
– Shared store between geographic locations (eg. Shared ‘info’ inbox)– Shared store between organisations
• Will improve:– Currently does realtime message sync, but no realtime folder sync
• Demo in session Friday 10:00
Scaling: store replication (kind of beta)
• Epoll() instead of select() interface for main socket dispatcher (improves cpu overhead/delay)
• Compressed records– E-mail currently stores bodies uncompressed in MySQL– Compressing plaintext & HTML bodies can save about 20% storage– Schema already compression-ready (no schema change needed in the
future)
Future additions