web scale mysql at facebook (domas mituzas)
DESCRIPTION
TRANSCRIPT
![Page 1: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/1.jpg)
![Page 2: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/2.jpg)
Web scale MySQL@ facebook
Domas Mituzas2011-10-03
![Page 3: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/3.jpg)
1 Intro
2 Current
3 Future
Agenda
![Page 4: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/4.jpg)
Facebook• 800M active monthly users
• 500M active daily users
• 350M mobile users
• 7M apps and websites integrated via platform
![Page 5: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/5.jpg)
1 Setup
2 Performance Overview
3 Stalls
4 Efficiency
5 Projects
Current
![Page 6: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/6.jpg)
Setup▪ Software
▪ MySQL 5.1
▪ Custom facebook patch
▪ Launchpad - mysqlatfacebook
▪ Extra resiliency
▪ Reduced operations effort
▪ Hardware
▪ Variety of generations
▪ Many core
▪ Local storage
▪ Some flash storage
![Page 7: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/7.jpg)
UDB Performance numbers(From Sep. 2011)
▪ Query response time
▪ 4ms reads, 5ms writes
▪ Network bytes sent per second
▪ 90GB peak
▪ Queries per second
▪ 60M peak
▪ Rows read per second
▪ 1450M peak
▪ Rows changed per second
▪ 3.5M peak
▪ InnoDB page IO per second
▪ 8.1M peak
![Page 8: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/8.jpg)
Performance focus▪ Focus on reliable throughput in production
▪ Avoid performance stalls
▪ Make sure hardware is used
▪ 99th percentile rather than average or median
▪ Worst offender analysis – topN & histograms instead of tier averages
![Page 9: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/9.jpg)
Stalls▪ “Dogpiles”
▪ Temporary slow down – even 0.1s is huge
![Page 10: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/10.jpg)
Stall tools▪ Dogpiled (in-house)
▪ Snapshot aggregation of server state at distress
▪ “time machine” view into logs before the event too
▪ Aspersa (stalk, collect)
▪ Poor man’s profiler (.org)
▪ Later iterations – apmp, hpmp, tpmp
▪ GDB
![Page 11: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/11.jpg)
Stalls found▪ Tables extending – global I/O mutex held
▪ Drop table – both SQL layer and InnoDB global mutexes held
▪ Purge contention – unnecessary dictionary lock held
▪ Binlog reads – no commits can happen if old events read
▪ Kernel mutex – O(N) and O(N^2) operations
▪ Transaction creation
▪ Lock creation/removal, deadlock detection
▪ Background page flushing not really background
▪ Many more
![Page 12: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/12.jpg)
Efficiency▪ Increasing utilization of hardware
▪ Memory to Disk ratio
▪ Finding bottlenecks
▪ Disk bound normally
▪ Sometimes network
▪ Application or server software chokepoints
▪ Rarely CPU/memory bandwidth
▪ Application design
▪ Biggest wins are in optimizing the workload
![Page 13: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/13.jpg)
Disk efficiency▪ Normally disk IOPS bound
▪ Allowing higher queue lengths
▪ Can operate at more than 8 pending operations per disk
▪ InnoDB page size
▪ Need adjustable per table or index for real gain
▪ XFS/deadline
▪ Parallelism at MySQL layer
▪ >300 iops on 166 rps disks
![Page 14: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/14.jpg)
Memory efficiency▪ Compact records – Thrift compaction for objects, etc
▪ Clustered and covering index planning
▪ FORCE INDEX – avoid unnecessary I/O and cached pages
▪ Historical data access costly
▪ Full table scans
▪ ETL-type queries, mysqldump, …
▪ Tune midpoint insertion LRU for InnoDB
▪ Incremental updating, incremental binary backups
▪ O_DIRECT data and logs access
![Page 15: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/15.jpg)
Pure flash(Cheating)
▪ Data stored directly on flash
▪ Limited data size
▪ Not utilizing flash card fully
▪ Still used in some cases
![Page 16: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/16.jpg)
Flashcache▪ Flash in front of disks
▪ Can use slower disks
▪ Write-back cache
▪ Much more data storage
▪ Able to utilize much more of flash card
▪ Very long warmup time
▪ Open source (github/facebook/flashcache)
![Page 17: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/17.jpg)
MySQL 2x▪ Flash allows for large loads
▪ Large performance difference from pure disk servers
▪ Many older servers still being used
▪ Solution?
▪ Run multiple MySQL instances per server
▪ Use ports 3307, 3308, 3309, etc…
▪ Replication prevents direct consolidation
▪ Redo a lot of port assumptions in code
![Page 18: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/18.jpg)
Application caching▪ Old: memcached
▪ Cache invalidation stampedes, refetching full dataset on refresh, many copies
▪ New: write-through caching
▪ Incremental cache updates
▪ Cache hierarchies for datacenter local copies
▪ Efficient operations for association set
▪ Common API for all use cases
![Page 19: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/19.jpg)
Group commit▪ Some OLTP workloads too busy even for modern RAID cards
▪ High I/O pressure increases response times
▪ Durability compromises increase operational overhead
▪ Dead batteries are extremely painful otherwise
▪ Now in 5.1.52-fb
![Page 20: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/20.jpg)
Admission control▪ Server resources are limited
▪ Per account thread concurrency
▪ Reduces O(N^2) blowup chance
▪ max_connections are no longer impacting server load
▪ Per-application resource throttling
▪ Now in 5.1.52-fb
![Page 21: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/21.jpg)
Online Schema Change▪ External PHP script, open source
▪ Utilizes triggers for change tracking
▪ Used on 100G+ sized tables
▪ Dump/reload + fast index creation
▪ Extendable class, may allow:
▪ PK composition changes with conflict resolution
▪ Indexing previously unindexed datasets
![Page 22: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/22.jpg)
Tools▪ Table and user statistics
▪ Shadows
▪ Slocket
▪ pmysql
▪ Replication sampling
▪ Client log aggregation
▪ Query comments
▪ Indigo (Query monitor)
![Page 23: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/23.jpg)
1 Visibility
2 Replication
3 Compression
Future
![Page 24: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/24.jpg)
Future▪ MySQL is never a solved problem
▪ Always investigating better/new solutions
▪ New hardware types
▪ New datacenters and topologies
▪ New use cases and clients
▪ New neighbors to share data with
![Page 25: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/25.jpg)
Visibility▪ Never assume
▪ Use metrics to measure
▪ When metrics aren’t available, add them
▪ Full stack
▪ More InnoDB info
▪ More application info
![Page 26: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/26.jpg)
Replication▪ Lag used to be a big problem, still is a bottleneck
▪ Possible solutions:
▪ “Better” slave prefetch
▪ Maatkit version has problems
▪ Our own version being used on some tiers successfully
▪ May be possible with InnoDB cooperation
▪ Continuent parallel slave
▪ Oracle parallel slave in 5.6
![Page 27: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/27.jpg)
InnoDB Compression▪ Originally was planned during 5.1 upgrade
▪ Problems
▪ Replication stream cost
▪ Increased log writes
▪ Performance in some cases
▪ Stability, monitoring, etc
![Page 28: Web scale MySQL at Facebook (Domas Mituzas)](https://reader035.vdocuments.us/reader035/viewer/2022062511/54b7aabd4a795993718b4951/html5/thumbnails/28.jpg)
(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0