scaling and hardware provisioning for databases (lessons learned … · 2017-10-02 · scaling and...

55
Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) © 2017 Jaime Crespo. https://jynus.com . License: CC-BY-SA-4.0 Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) Jaime Crespo Percona Live Europe 2017 -Dublin, 27 Sep 2017-

Upload: others

Post on 09-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases

(Lessons Learned at Wikipedia)

Jaime CrespoPercona Live Europe 2017

-Dublin, 27 Sep 2017-

Page 2: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

2

Agenda

1. Introduction 5. Scaling by Throwing Hardware at the Problem

2. Scaling by Introducing New Technologies

6. Which Hardware is Right for Me?

3. Scaling by Rewriting Code

7. Conclusions

4. Scaling by Rearchitecturing

Page 3: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

3

@jynus● Sr. Database Administrator

at Wikimedia Foundation

● Used to work as a trainer for Oracle (MySQL), as a Consultant (Percona) and as a Freelance administrator (DBAHire.com)

Page 4: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

4

I have already mentioned some related topics at #PerconaLive

• Check my previous presentations at:http://www.slideshare.net/jynus/

Page 5: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

5

Disclaimers• Some negative anecdotes are going to be presented:

– Your mileage may vary

– They didn’t work for us, then, in our particular use case

• It is not the intention to criticize (great) open source developers

– Take home the ideas, not the particular details

• Intended as a “beginners” talk

– New MySQL users, developers or new purchase owners

Page 6: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

6

SCALING BY INTRODUCING NEW TECHNOLOGIES

Scaling and Hardware Provisioning for Databases(Lessons Learned at Wikipedia)

Page 7: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

7

Ways to Show off One’s Ignorance● Why did you not use ... instead?● I read on an article that … is better● You should migrate to ...● I heard … is horrible/dead/fate worse than death

– Those are not conversation starters, you are trying to make a point

– Discussing “best” technologies without context is worthless

http://smalldatum.blogspot.com.es/2016/09/excited-about-percona-live-amsterdam.html

Page 8: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

8

Ways to Show Genuine Interest● Why do you use … ?● I found fascinating …, can you tell me more ?● I am thinking of using myself … ?● I sell … and I am trying to improve my product

– Listen to what people have to say– Focus on your product virtues, not others’ defects

Page 9: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

9

https://blog.wikimedia.org/2013/04/22/wikipedia-adopts-mariadb/

Page 10: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

10

Compare with: https://www.postgresql.org/message-id/579795DF.10502%40commandprompt.com

Page 11: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

11

MySQL at Wikimedia History● <2011: Heavily patched MySQL 4.0● 2011-2012: Facebook fork of MySQL 5.1● 2013-2015: MariaDB 5.5 with patches● 2015-2017: custom MariaDB 10.0 package● 2017- : MariaDB 10.1 (on non-production/testing)

Page 12: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

12

Products at the Wikimedia Foundation

● To support our users, we develop and maintain 2 main IT products/services:– Mediawiki (the code that runs Wikipedia)– Wikimedia Infrastructure (the servers and services

where that code runs)

Page 13: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

13

MySQL/MariaDB was the chosen backend since 2001

● Maintaining MediaWiki for multiple storage backends is “easy”

● Maintaining multiple backends on WMF infrastructure is really hard– Snowflakes take comparatively a huge time– It is ok to have specialized backends: search and logs

(elastic), analytics (hadoop), cache (memcache/ cassandra), queueing (redis/kafka), postgres (gis), dynamic config (etcd), testing (sqlite)

Page 14: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

14

What I never get● I’ve analyzed your statistics regarding jobqueue

processing based on Redis and after spending time reading your documentation I think your message passing subsystem is inefficient, here it is the code prototype I wrote integrating MediaWiki with Apache Kafka that would make it work better. Do you have some time so I can try to convince you?

PS: Please let me take care of your next Asia Pacific TZ emergency

Page 15: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

15

WMF Focus: Efficiency● 30K HTTP requests / Operations employee / second

– We include operations from DBAs, to managers to people racking servers (sometimes they are the same!)

– Word-wide redundancy– Owning the full stack: No external provider other

than network providers and datacenter space (no external cloud, CDN, specialized hardware, code repository or CI)

Page 16: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

16

Some statistics are misleading/not useful

● HTTP req/s or MySQL queries/s can be meaningless– “getting the HTML content of en:Dublin” and

“uploading a photo of Dublin with its structured metadata” count as 1 hit

– In many cases your aim is to minimize hits, not increase them

● Encouraging mirroring or downloading content for offline usage is part of the goal

Page 17: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

17

WMF Focus: Open Source + Bare metal

● Being self-sufficient adds a huge overhead, specially to adapt to new technologies

● It pays off for us in the long term, as the vendors rise and fail, suffer outages, data leaks, espionage, etc.

Page 18: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

18

Common “not scaling” occurrence ● “Icinga” no longer scales for us with over 1500 hosts

and 15K service checks● Maybe a replacement is needed that allows “clustering”● TLS certificate checking (huge CPU penalty) is moved

off-server● Load reduced dramatically, that plus a hardware

upgrade make migration a lower priority

Page 19: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

19

Sometimes new technologies are the right path

● Ganglia, graphite were used● Lacking in features, too large footprint per metrics,

“ugly” (custom dashboards)● Prometheus is tested

– Provides new features, substitutes fully Ganglia– Grafana can be used as a frontend for both Graphite

and Prometheus, integrating the service even if not fully substituting it

Page 20: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

20

Page 21: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

21

Media storage at Wikimedia History● Very early solution: Shared NFS-mounted partitions into

application servers– It worked when scaling was not a problem as a quick

solution● 2012: OpenStack Swift introduction

– Limitations on consistency and coordiation● 2013: Evaluation Ceph as a solution● Now: Swift still in place, improvements made both on

upstream and Wikimedia (thumbor)

Page 22: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

22

New Technology Adoption● Benchmarketing● New features looking “too good/too nice”● 3rd party support● Operational and team knowledge considerations

Page 23: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

23

Example: TokuDB (1/3)● Great on paper!

– Nice feature set– Fewer IOPS than InnoDB– Great compression ratio (1/4th of the original

dataset)– It allowed us to integrate 7 replica groups in a single

server● We migrated analytics and backups to it

Page 24: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

24

Example: TokuDB (2/3)● Main issue: “it is not InnoDB”

– Parallel replication, locking model and query plan issues

– Replication stopped due to “index crashes”– Bogus results on non-primary key tables– Bad tooling and upstream (Oracle) support– MariaDB didn’t support it well, TokuDB didn’t

support MariaDB

Page 25: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

25

Example: TokuDB (3/3)● We end up migrating back to InnoDB (compressed)

– Worse IOPS, but much more stable and matching production

– Makes operations much easier– Separate datasets (analytics) work nicely on Toku

● Excited and looking closely at RocksDB for our key-value storage, but we are not going to be the beta testers

Page 26: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

26

SCALING BY REWRITING CODE

Scaling and Hardware Provisioning for Databases(Lessons Learned at Wikipedia)

Page 27: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

27

Let’s rewrite ...● X doesn’t scale anymore● A monolithic architecture will not work● Y language is not appropriate for ...

Page 28: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

28

Mediawiki full rewrite history● January 2001 – “UseModWiki” (Phase I)● January 2002 – “The PHP script” (Phase II)● July 2002 – “MediaWiki” (Phase III)

– Since 2002 (over 15 years ago) only small scope/gradual refactorings had happened

Page 29: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

29

Problems with rewriting● Old knowledge is lost (obscure bugs)● Community of 3rd party developers & users lost

– Large list of plugins/bots no longer compatible– Hard to sell: “It works for me, why change it?”

● Alienate volunteers● Increase of the number of technologies to maintain

– The best technological solution is not necessarily the best socially and organizationally

Page 30: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

30

In a way, we are constantly rewriting...

Page 31: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

31

Sometimes we Have to Rewrite● OCG (offline content generator):

– Originally created by a third party, OCG [its replacement] has been running on outdated code which may introduce security vulnerabilities and other major issues in the future.

● Web Printing Service is about to substitute it

Page 32: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

32

SCALING BY REARCHITECTURING

Scaling and Hardware Provisioning for Databases(Lessons Learned at Wikipedia)

Page 33: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

33

R

Topology● (non-sharded) Replication fits best for a mostly read-

heavy model– 30 million wiki edits vs. 20 billion wiki pages served

per month (2016, aprox)

W

R

W

R

W

R

Wvs.R

W

R

W

Page 34: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

34

Sharding● We have avoided sharding as much as possible

– Vertical slices allow for flexible growth● When a server scalability is reached:

– New functional groups are created (search, content, global users, application-level alerts, disk cache, ...)

– More replicas are added to a project– More groups are created to handle heavier projects

Page 35: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

35

Redundancy/read scaling

Muti-tier/Write scaling

Specialization/optimization

Consolidation/efficiency

Page 36: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

36

Scalability as handicap for high availability

● “Recent changes” role servers may seem like a great idea– It requires to multiply by 4 the number of servers per

group to have redundancy over multiple datacenters● Multiple datacenters is a huge investment if HA wants

to be kept– If you have passive components it is very easy to “get

behind”

Page 37: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

37

Multisource● Seemed like a great improvement on MariaDB!

– Analytics– Consolidation

● But:– Bugs with GTID, TokuDB– Bugs with namespace importing– MariaDB is more likely to crash than the host-

recovering expensive

Page 38: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

38

Migration to Multi-instance model● Consolidation is still possible (specially for smaller

services)● Less maintenance overhead● Analytics moved to other stores

Page 39: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

39

Example: ProxySQL for Public Wiki Replica service (Cloud Storage)

● We evaluated ProxySQL as a High Availability Handler– It had more features that we needed (level 7 proxy)– It had high operational overhead- account maintenance– Other ops not familiar MySQL-only solutions– It lacked (like most other solutions) features we needed

● We chose to use HAProxy as a simpler approach– We will reevaluate ProxySQL for production, where

most likely will be a better fit

Page 40: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

40

SCALING BY THROWING HARDWARE AT THE PROBLEM

Scaling and Hardware Provisioning for Databases(Lessons Learned at Wikipedia)

Page 41: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

41

Reaching resource limits? Just throw some hardware at it! (1/2)

● Software must be ready for the hardware expansion– Can your clusters scale efficiently either horizontally or

vertically?– Better hardware solves (hides?) bugs – but it reveals new

bottlenecks● Redundancy requirements?● Services we will have to support that have not yet been

even designed– Or services that will be discontinued in the future

Page 42: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

42

Reaching resource limits? Just throw some hardware at it! (2/2)

● Did you have into account the management overhead?– Can your staff cope with the extra servers?– Is your automation level on par with the redundancy?

● Support services must scale at the same rhythm (sometimes non-trivial)– Backups– Analytics– Logs– Rack space? Network? DC ops?

Page 43: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

43

In Some Cases, Buying Better Hardware Can Create Regressions

● We had databases hosted on RAIDs of HD drives● 25% of the fleet was renewed on one datacenter,

adding SSDs– They were pooled to handle the bulk of the queries– Master-replica lag increased, why?

Page 44: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

44

The New replica servers were “too fast”

● Replicas were pooled if they could keep up with the master’s writes– The “bulk” of the reads could, so the slower host

were not waited for– Most older servers started lagging at peak times

● We had to make the faster servers got at the speed of the slowest host, negating its impact

Page 45: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

45

Sometimes the Scaling Limit is Not the Hardware

● https://dom.as/2009/06/26/embarrassment/– Cache invalidation stampede creating CPU spikes on

app servers● https://blog.wikimedia.org/2016/04/22/prince-death-wi

kipedia/– The service created to handle the previous problem

needs a fix to allow serving expired content

Page 46: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

46

WHICH HARDWARE IS RIGHT FOR ME?

Scaling and Hardware Provisioning for Databases(Lessons Learned at Wikipedia)

Page 47: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

47

Current Hardware used for Databases (1/3)

● Quad cores● 512GB Memory● 4TB usable disk in RAID 10

of SSDs● 1GB ethernet● 1U● Life expectancy of 5 years

Page 48: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

48

Current Hardware used for Databases (2/3)

● HW RAID -must● RAID10 Level -must

– RAID5/6 would not work for us for performance and availability reasons

– Number of drives- compromise between performance, price and expandability

● SSDs – impact difficult to measure as we are right now overprovisioning– A good guess work tells us we can have 5x the load than on

older HDD hosts

Page 49: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

49

Current Hardware used for Databases (3/3)

● Large amounts of memory– Relative low disk usage ideal for capacity planing/consolidation

● Available disk space – usage at initial buy at 40% (25% with compression)– Allows for consolidation at first

● 1U– old HDD hosts required 2U– Rack space and operational cost can be more expensive than

hardware itself

Page 50: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

50

Hardware Standarization● It is incredible difficult to predict hardware needs with

over a year in advance● Sometimes overprovisioning and buying servers with

the same capacity makes easier pivoting existing hardware and lowers costs– In an eventuality, a role can be substitute with the

same hardware– Parts are more common and can be substituted with

decommissioned hardware

Page 51: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

51

Backup Storage● Past purchase of a costly, branded shared-storage

solutions● In the end it was buggy, not cost-effective and didn’t have

the features we needed● Open source + custom glue = lower TCO and better suited

for the needs– Shared nothing architecture ends up being cheaper in

the real world– More work, but also more flexibility

Page 52: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

52

CONCLUSIONS

Scaling and Hardware Provisioning for Databases(Lessons Learned at Wikipedia)

Page 53: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

53

Conclusions● Listen what other people are doing● Test on your own● Make mistakes == Learn

– But do not spend too much time & money on them!● Ask questions

– What people talk about vs. what they really use– But don’t become internet trolls

Page 54: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

54

Q&A

Page 55: Scaling and Hardware Provisioning for Databases (Lessons Learned … · 2017-10-02 · Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia) 19 Sometimes

© 2017 Jaime Crespo. https://jynus.com. License: CC-BY-SA-4.0

Scaling and Hardware Provisioning for Databases (Lessons Learned at Wikipedia)

55

Thank You for Attending!● Do not forget, after the session finishes,

to please login with your Percona app and “Rate This Session”

● Special thanks to in order by rand() to: Sean Pringle, Domas Mituzas, Mark Callaghan, Mark Bergsma, Manuel Arostegui, Ariel Glenn, the whole Wikimedia Team, and all people at the MariaDB, Percona and MySQL/Oracle teams, and the Percona Live Organization and Sponsors