black friday and cyber monday- best practices for your e-commerce database
TRANSCRIPT
Black Friday and Cyber Monday:Best Practices for Your E-
Commerce Database
Tim VaillancourtSr. Technical Operations Architect
@ Percona
Agenda
●Synchronous versus Asynchronous Applications●Scaling a Synchronous/Latency-sensitive Application●Scaling an Asynchronous Application●Efficient Usage of Data at Scale
Secondary/Slave HostsCachingQueuing
●Efficient Usage of Data at ScaleMoving Expensive WorkCaching TechniquesCounters and In-memory StoresConnection Pooling
Agenda
●Scaling Out (Horizontal) TricksPre-ShardingKill SwitchesLimits and Graphs
●Scaling with Hardware (Vertical Scaling)●Testing Performance and Capacity●Knowing Your Application and Questions to Ask at Development
Time●Questions
About Me
●Started at Percona in January 2016●Experience
Web PublishingBig-scale LAMP-based Websites
EcommerceLarge Inventory SaaS
GamingDevOps
50-100 Microservices5-7+ x Massive Launches / YearDesign, launch and maintain apps
About Me
DBA at EA DICE2 x New Titles5+ x Legacy Titles
TechnologiesMySQLMongoDBCassandraRedis and MemcachedRabbitMQ, Kafka and ActiveMQSolr and Elasticsearch(Sort of) AWS, HDFS, HBase, Postgres, etc…
Services
MonolithOne application that does everythingExample: Chrome, MySQL, huge Python appMicroserviceDifferent purposes, pain points, SLA apps are discreet servicesOften easier to scale/troubleshootReduces risk of outageExample: frontend PHP app, messaging app, encoding app, etcIn PracticeBoth can be scaled up and down with the right featuresMicroservices offer more flexibilityMonolith services bring problems at scale
Application Operations
SynchronousBlocking operation until success or failureSlower requestsExample: a file uploading appAsynchronousRequest and response are separatedFast response time back to user/applicationExample: a social media siteSlow OperationsCan cause pileups in a tiered system
Applications
SynchronousPros: less code, always the right answerCons: blocking operations and poorer efficiencyExample: a file uploading appLatency/Integrity SensitivePros: always the right answerCons: less scalability tricks availableExample: a stock trading app that cannot accept “slave lag”AsynchronousPros: light operations and more scalabilityCons: eventual consistency (and sometimes more code)Example: a social media site
Types of Data Designs
DecentralisedData is duplicated in several placesPros: lighter to read, decreased locking, easy to shardCons: increased storage space, extra duplication effortCentralisedData is kept in one (or few) places and referencedPros: less storage, one source-of-truthCons: locking, inefficiencies, sharding issues
Balancing Request Impact
Read-focused AppsBenefit from
Values pre-computed at write/change-timeIndices and/or few “scans” for dataNo/few JOINs/operations to get resultWrite-focused Apps
Benefit fromNo pre-computing of values (compute at read-time)No/few indices to updateInsert/Append > UpdateReads: compute read summaries with replicas, add indices to
secondaries only, etc
Event MetadataExample: “UserX has the new top score!”Without Queue example
Update Top Score in Database(s)Send Email to FriendsPost to Facebook PageUpdate cache...
With Queue exampleAdd event to queue ‘topscore’Apps read queue
Queuing Updates
Queuing Updates
Update BufferingScenario: there is a high rate of updates to bufferQueue-based example
App adds to update buffer (queue)Worker app works from the bottom of buffer
Queue Operational BenefitsSpikes in trafficBackend downtimeCommunication bus
Scaling Sync./Latency-Sensitive Apps
Rethink the Flow Using AsyncUse lots of database RAMShard the databaseReduce impact of request flowApache Cassandra
SynchronousVery write optimized
Percona XtraDB Cluster, NDBUse memory-based storage
Queue persistence to database
Expensive DB WorkFocus on lightweight user-facing operationsMove aggregations/summaries/reporting to backgroundUse replicas for expensive jobsAvoid or reduce (maybe cache) “JOINs”Enable and monitor metrics
MySQLlog_queries_not_using_indexes
MongoDBEnable operationProfiling
Review metrics and improve!Percona Monitoring and Management
Efficient Usage of Data at Scale
Efficient Usage of Data at Scale
Caching / In-Memory StoresAlleviates load from databaseVery fast lookupsLow connection overhead
MySQL connection buffers: ~1MB+MongoDB connection buffers: ~1MBRedis or Memcache connection buffers: 0-limit/infinity**
Server-SideHit/Miss Caching
If something is not in the cache: find + add it. TTL expiryInline/Preemptive Caching
Update/Delete cache data at change time/preemptively
Efficient Usage of Data at Scale
Caching / In-Memory Stores (continued)Client-Side
Cache client data in the client app/browser/etcIn-memory Stores
MemcachedRedisPercona Server for MongoDB with Memory Engine :)
Use TTLs to trim data
Efficient Usage of Data at Scale
Storing Numerical Counters and StatsOffload to in-memory stores
Incremented/decremented countersAggregations, summaries, counts
Count-style Queries to CountersIncrement counter at request/change timeRead counter value at read-request timeOr, try to use an index
Efficient Usage of Data at Scale
Connection PoolingRemoves 3-way TCP “handshake” from request (more w/SSL)Reduces threading overhead on databasesProxies on App server localhost/loopback
Reduces 1 x TCP ‘hop’, ie: faster connect timeCan create a LOT of DB connections with many app servers
Efficient Usage of Data at Scale
Connection Pooling (continued)MySQL Proxies
ProxySQLHAProxyMaxscaleOthers…
MongoDB ProxiesMongos (sharding) process
Proxy-on-Localhost or direct is fastest
VirtualizationPretends to be a real computer from BIOS upOS + Software run under a hypervisor layerPros
Full hardware-level emulation, eg: CentOS, Redhat, Win 10Automation of platform (sometimes)
ConsEmulation overheadSlow boot-up timeLots of OSs to update
Virtualization, Containers, etc
Virtualization, Containers, etc
Containers (cgroups, jails)Several can run inside a single operating system and kernelOffers controls to limit resources like RAM, CPU time, etcPros
Low overheadContainer creation is very fast
Virtualization, Containers, etc
Mesos, Kubernetes, etcMake a lot of servers distribute work, containers, etcApache Mesos: “Distributed systems kernel”
Agent on every host and manager servers give out workKubernetes
Virtualization, Containers, etc
Many Processes per HostRun un-related processes on hostsAdd/remove from load balancersNot advised for disk-bound or high-bandwidth apps
Scaling Out Tricks
ShardingTechniques
ModulusEven distribution of keysHard to reshape data
Map-based1-to-1 shard mapping using another table, config, etcEasy to reshape data
Launch with many shards in advance1-4 MySQL/MongoDB Instance/host1 MySQL/MongoDB Instance/host, 4 x databases as shards1 MySQL/MongoDB Instance/host, small hardware
Scaling Out Tricks
Sharding
Modulus: Mapping:
Scaling Out Tricks
HardwareHave a strategy to add/remove capacity quickly
Cloud InstancesMesos/KubernetesAutomation
Use cheap application servers for in-memory stores and appsLaunch with lots of RAM, scale down post-launch
Scaling Out Tricks
ElasticityEnsure there is a way to add/remove hosts, examples:
Load BalancersGood health-checks are important
Application ConfigsFileDatabaseZookeeper
Scaling Out Tricks
At Launch...Scale-out
Keep spare servers online, partially configuredLaunch with extra database replicas (slave/secondary)Monitor usage and remove extra hardware post-launchMonitor and adjust capacity
Scale-upLaunch with lots of RAM
Traffic ControlLaunch one region at a timeLaunch with rate limits
Scaling Out Tricks
Application “Kill switches”A switch to disable certain app features/functionsUseful when there is:
Too much traffic/scale-upDDoSA maintenance
Scaling Out Tricks
Limiting Graph Structures“Friends” / ”Followers” features are often graphsIf Katy Perry or Barack Obama used your “friends” feature…Limit the size of graphs, or queue events for fan-out updating
Scaling Out Tricks
Batching and Parallel WorkDo large queries in parallel
Modern CPUs have many cores (2, 4, 8+)1 connection = 1 thread = 1 CPU core
Batch inserts/updates1 x update with 1000 items > 1000 x updates with 1 item
Scaling Up Tricks
Test provider turn-around time on hardware upgradingTest application performance on improved hardware in advanceScale up only resources needed
Databases
GeneralMonitoring/reviewing slow queries reduces most inefficienciesMore memory will reduce disk requestsSSDs will reduce disk request timeProper database and kernel tunings will help further
Linux has very inefficient defaults!Try to use real local-disks, not EBS, NFS, etc
QueriesDon’t try to make MySQL/MongoDB a queue or search engine!Decentralizing data and pre-computing answers for reads will take
you farThe best query is no query (cache)
Testing Performance and Capacity
GeneralTry to emulate the real user trafficAdd micro-pauses to simulate realityCloud-based providers are great for running load generationApplicationsComponent testing
Test the max volume of each component on a single hostTest the max volume of each component on many hostsCalculate host scalability, ie: “+1 host = +80% more traffic”
Feature capacityTest the impact of each feature if not separate
Testing Performance and Capacity
DatabasesReplay real user traffic on real backupsLoad test tools: Linkbench, Sysbench, TPCC, JMeter, etcSingle feature/query testing
Understand host capacity per feature, eg: “2000 user login queries/sec per db replica”
Know your slowest query!
Development-time Questions
GeneralWhat does the app do?If I break X, what happens?Are connections to data stores “pooled”?ReplicasCan the app use replicas (with possible lag)?
Tip: start early, deploy replication from the startCan we Add/Remove replicas without disruption?ShardingCan the app understand shards/partitions?How is data balanced post-sharding?Are there cross-shard references?
Development-time Questions
CachingWhat data can be cached?Will an change be read immediately?
Can we pre-cache this change?When should the cache delete an item?
Can we set TTLs on our keys?How do we add/remove cache servers easily?
Knowing Your App
If you see…The app is write heavy
Remove overhead from immediate write pathBatch writes if possible
The app is read heavyReduce scans/operations from the read path (index, etc)Add as many replicas (slave/secondary) as needed
The app queries for counts often, ie: # of items, friends, etcMove count-queries to incremented in-memory countersOr, create an index for the count query
The app uses references or joins oftenConsider decentralising the data (with fan-out updates)
Themes
Make all features, apps, databases elasticRequest Flow
Make the heavy workload easy / make the light workload hardMove graph updates to background (queues, async, etc)Move ‘counts’ to counters
CachingCheaper/faster to access than DBTry to cache before anyone reads data
QueuesGreat for replicating events while simplifying updateGreat for batching changes
Monitor everything! Try Percona Monitoring and Management!
Join us at Percona Live Europe
When: October 3-5, 2016Where: Amsterdam, Netherlands
The Percona Live Open Source Database Conference is a great event for users of any level using open source database technologies.
Get briefed on the hottest topicsLearn about building and maintaining high-performing deployments Listen to technical experts and top industry leaders
Use promo code “WebinarPLAM16” and receive €15 off the current registration price!
Sponsorship opportunities available as well here.
Questions?
Thanks for joining! Be sure to checkout
the Percona Blog for more technical blogs and topics!