zarafa scaling & performance

Scaling & Performance: Seven new insights

Steve Hardy, Zarafa

• Basics presented at last summercamp– Split-table ‘properties’– One for row-order data– One for column-order data

Zarafa 7.0: I/O improvements

• Row order

Row-order and column-order

• Column order

Row-order and column-order

Why is hybrid better ?

IOPS Load e-mail

Sort column

Roworder (Zarafa 4.0)

1 ~30

Column order (Zarafa 4.1)

~30 1

Hybrid (Zarafa 7.0)

1 1

• Best of both worlds• Drawback: needs

more storage• But this storage can

be cheaper• Not double

– Lots of data not needed in column order (eg attachments)

• Much faster load of e-mails (less IOPS)

Read flags

• Read flags no longer in index– Not actually used,

even in Zarafa 6– Modifying read flag

meant key change, causing I/O

IOPS Set All Read

Sort on read flag

Zarafa 6 5 10

Zarafa 7 2 1

Sorting

• With many folders, Zarafa 6 becomes less efficient

• Assumes N folders with equally distributed messages

IOPS Sorting Move Save

Zarafa 6

N 1 1

Zarafa 7

1 2 1

Counters

• Many counters are used:– Total count– Unread count– Deleted count– Folder count– Deleted folder count– Associated count– Deleted associated count

• In Zarafa 7, counters are tracked incrementally

• Assumes folder with 10k messages

Records scanned / updated

Open folder

Modify flag

Zarafa 6 10k 1

Zarafa 7 7 2

• More efficient when writes are grouped together• Writes are ‘deferred’ until later• Separate process in Zarafa purges deferred write• Saves about up to 50% in I/O while writing to tproperties

Deferred writes to tproperties (column order)

Write data DeferWrite

tproperties

• Control number of deferred objects• Counter reset (set to no for higher performance)• Counter resets shown in stats (zarafa-stats –system)• Deferred purges shows in stats

New options in Zarafa 7.0

Setting Default

max_deferred_records 10000

counter_reset yes

• Biggest advantages in slower I/O subsystems• Should show most advantages on low-RPM disks (eg SATA)• Allows to have about 15% of your data on SSD, which uses 50% of

IOPS– Example: 100GB of SSD is enough to reduce IOPS by half for total data

size of 600GB– Depending on your vender 100GB of SSD is between EUR100 and

EUR1600– Example 2: Instead of buying 6 15k SAS disks @ 200 IOPS each (total

1200 IOPS), also possible to buy 1 SSD disk and 5 SATA disks @ 120 IOPS, which even has a higher capacity.

– Especially interesting for large-scale (cloud) rollouts

I/O changes: things to remember

• Records are much more compact in memory inside MySQL in Zarafa 7.0 than in Zarafa 6

• Possibly more cache hit ratio achievable by using more cache memory in MySQL, less in Zarafa

• No large-scale testing done yet, this will follow• Even when cache is more efficiently packed in MySQL RAM, latency

between Zarafa and MySQL server is still an issue, so you always need some cache in Zarafa

Cache tuning

MySQL 5.5 introduced linux native AIO

• AIO = asynchronous I/O, meaning that multiple requests can be sent to RAID at once

• Before, there was ‘emulated AIO’ which could do at most X requests simultaneously (X = 4 for most installations)

• Some requirements:– O_DIRECT enabled– ext3, ext4, xfs or direct

block device– AIO-capable kernel

(was introduced in 2.6.24 or so)

• Multiple requests are only sent for parallel queries• If you run one query, you only can have one outstanding I/O request• Makes some things slow:

– Mysqldump– Alter table– Queries doing non-trivial range queries (eg select * from a where id > 10

and id < 10000)

Why AIO is not as great as it sounds

• http://bugs.mysql.com/bug.php?id=60087• Enables prefetch mechanism for sequential reads in innodb• Main advantage for Zarafa:

– Table queries for large tables (sorting)– Faster mysqldumps

Our patch for MySQL

http://bugs.mysql.com/bug.php?id=60087



• Anyone who signs up to mysql’s bugtracker and comments on the bug, asking for it to be reviewed will receive a free beer from me this evening (no kidding)

Free Beer (Free as in beer)


http://bit.ly/kVlQVv





MSR: Mail Store Relocator

• Used to migrate user stores between cluster nodes

Server1 Server2

• Little known feature of MSR– Normally used for user migration between nodes– Now also allows bi-directional synchronisation– Possible within cluster AND outside cluster– ACLs only synchronized when inside cluster

• Applications:– Distributed public store

• Requires EVERY server in cluster to contain public replica

– Shared store between geographic locations (eg. Shared ‘info’ inbox)– Shared store between organisations

• Will improve:– Currently does realtime message sync, but no realtime folder sync

• Demo in session Friday 10:00

Scaling: store replication (kind of beta)

• Epoll() instead of select() interface for main socket dispatcher (improves cpu overhead/delay)

• Compressed records– E-mail currently stores bodies uncompressed in MySQL– Compressing plaintext & HTML bodies can save about 20% storage– Schema already compression-ready (no schema change needed in the

future)

Future additions

zarafa scaling & performance

Technology

zarafa purges

countin zarafa

column orderroworder

columnorder datazarafa

mysql server

row orderroworder

asynchronous io

mysql ram