black friday and cyber monday- best practices for your e-commerce database

Black Friday and Cyber Monday:Best Practices for Your E-

Commerce Database

Tim VaillancourtSr. Technical Operations Architect

@ Percona

Agenda

●Synchronous versus Asynchronous Applications●Scaling a Synchronous/Latency-sensitive Application●Scaling an Asynchronous Application●Efficient Usage of Data at Scale

Secondary/Slave HostsCachingQueuing

●Efficient Usage of Data at ScaleMoving Expensive WorkCaching TechniquesCounters and In-memory StoresConnection Pooling

Agenda

●Scaling Out (Horizontal) TricksPre-ShardingKill SwitchesLimits and Graphs

●Scaling with Hardware (Vertical Scaling)●Testing Performance and Capacity●Knowing Your Application and Questions to Ask at Development

Time●Questions

About Me

●Started at Percona in January 2016●Experience

Web PublishingBig-scale LAMP-based Websites

EcommerceLarge Inventory SaaS

GamingDevOps

50-100 Microservices5-7+ x Massive Launches / YearDesign, launch and maintain apps

About Me

DBA at EA DICE2 x New Titles5+ x Legacy Titles

TechnologiesMySQLMongoDBCassandraRedis and MemcachedRabbitMQ, Kafka and ActiveMQSolr and Elasticsearch(Sort of) AWS, HDFS, HBase, Postgres, etc…

Services

MonolithOne application that does everythingExample: Chrome, MySQL, huge Python appMicroserviceDifferent purposes, pain points, SLA apps are discreet servicesOften easier to scale/troubleshootReduces risk of outageExample: frontend PHP app, messaging app, encoding app, etcIn PracticeBoth can be scaled up and down with the right featuresMicroservices offer more flexibilityMonolith services bring problems at scale

Application Operations

SynchronousBlocking operation until success or failureSlower requestsExample: a file uploading appAsynchronousRequest and response are separatedFast response time back to user/applicationExample: a social media siteSlow OperationsCan cause pileups in a tiered system

Applications

SynchronousPros: less code, always the right answerCons: blocking operations and poorer efficiencyExample: a file uploading appLatency/Integrity SensitivePros: always the right answerCons: less scalability tricks availableExample: a stock trading app that cannot accept “slave lag”AsynchronousPros: light operations and more scalabilityCons: eventual consistency (and sometimes more code)Example: a social media site

Types of Data Designs

DecentralisedData is duplicated in several placesPros: lighter to read, decreased locking, easy to shardCons: increased storage space, extra duplication effortCentralisedData is kept in one (or few) places and referencedPros: less storage, one source-of-truthCons: locking, inefficiencies, sharding issues

Balancing Request Impact

Read-focused AppsBenefit from

Values pre-computed at write/change-timeIndices and/or few “scans” for dataNo/few JOINs/operations to get resultWrite-focused Apps

Benefit fromNo pre-computing of values (compute at read-time)No/few indices to updateInsert/Append > UpdateReads: compute read summaries with replicas, add indices to

secondaries only, etc

Event MetadataExample: “UserX has the new top score!”Without Queue example

Update Top Score in Database(s)Send Email to FriendsPost to Facebook PageUpdate cache...

With Queue exampleAdd event to queue ‘topscore’Apps read queue

Queuing Updates

Queuing Updates

Update BufferingScenario: there is a high rate of updates to bufferQueue-based example

App adds to update buffer (queue)Worker app works from the bottom of buffer

Queue Operational BenefitsSpikes in trafficBackend downtimeCommunication bus

Scaling Sync./Latency-Sensitive Apps

Rethink the Flow Using AsyncUse lots of database RAMShard the databaseReduce impact of request flowApache Cassandra

SynchronousVery write optimized

Percona XtraDB Cluster, NDBUse memory-based storage

Queue persistence to database

Expensive DB WorkFocus on lightweight user-facing operationsMove aggregations/summaries/reporting to backgroundUse replicas for expensive jobsAvoid or reduce (maybe cache) “JOINs”Enable and monitor metrics

MySQLlog_queries_not_using_indexes

MongoDBEnable operationProfiling

Review metrics and improve!Percona Monitoring and Management

Efficient Usage of Data at Scale


Caching / In-Memory StoresAlleviates load from databaseVery fast lookupsLow connection overhead

MySQL connection buffers: ~1MB+MongoDB connection buffers: ~1MBRedis or Memcache connection buffers: 0-limit/infinity**

Server-SideHit/Miss Caching

If something is not in the cache: find + add it. TTL expiryInline/Preemptive Caching

Update/Delete cache data at change time/preemptively


Caching / In-Memory Stores (continued)Client-Side

Cache client data in the client app/browser/etcIn-memory Stores

MemcachedRedisPercona Server for MongoDB with Memory Engine :)

Use TTLs to trim data


Storing Numerical Counters and StatsOffload to in-memory stores

Incremented/decremented countersAggregations, summaries, counts

Count-style Queries to CountersIncrement counter at request/change timeRead counter value at read-request timeOr, try to use an index


Connection PoolingRemoves 3-way TCP “handshake” from request (more w/SSL)Reduces threading overhead on databasesProxies on App server localhost/loopback

Reduces 1 x TCP ‘hop’, ie: faster connect timeCan create a LOT of DB connections with many app servers


Connection Pooling (continued)MySQL Proxies

ProxySQLHAProxyMaxscaleOthers…

MongoDB ProxiesMongos (sharding) process

Proxy-on-Localhost or direct is fastest

VirtualizationPretends to be a real computer from BIOS upOS + Software run under a hypervisor layerPros

Full hardware-level emulation, eg: CentOS, Redhat, Win 10Automation of platform (sometimes)

ConsEmulation overheadSlow boot-up timeLots of OSs to update

Virtualization, Containers, etc


Containers (cgroups, jails)Several can run inside a single operating system and kernelOffers controls to limit resources like RAM, CPU time, etcPros

Low overheadContainer creation is very fast


Mesos, Kubernetes, etcMake a lot of servers distribute work, containers, etcApache Mesos: “Distributed systems kernel”

Agent on every host and manager servers give out workKubernetes


Many Processes per HostRun un-related processes on hostsAdd/remove from load balancersNot advised for disk-bound or high-bandwidth apps

Scaling Out Tricks

ShardingTechniques

ModulusEven distribution of keysHard to reshape data

Map-based1-to-1 shard mapping using another table, config, etcEasy to reshape data

Launch with many shards in advance1-4 MySQL/MongoDB Instance/host1 MySQL/MongoDB Instance/host, 4 x databases as shards1 MySQL/MongoDB Instance/host, small hardware

Scaling Out Tricks

Sharding

Modulus: Mapping:

Scaling Out Tricks

HardwareHave a strategy to add/remove capacity quickly

Cloud InstancesMesos/KubernetesAutomation

Use cheap application servers for in-memory stores and appsLaunch with lots of RAM, scale down post-launch

Scaling Out Tricks

ElasticityEnsure there is a way to add/remove hosts, examples:

Load BalancersGood health-checks are important

Application ConfigsFileDatabaseZookeeper

Scaling Out Tricks

At Launch...Scale-out

Keep spare servers online, partially configuredLaunch with extra database replicas (slave/secondary)Monitor usage and remove extra hardware post-launchMonitor and adjust capacity

Scale-upLaunch with lots of RAM

Traffic ControlLaunch one region at a timeLaunch with rate limits

Scaling Out Tricks

Application “Kill switches”A switch to disable certain app features/functionsUseful when there is:

Too much traffic/scale-upDDoSA maintenance

Scaling Out Tricks

Limiting Graph Structures“Friends” / ”Followers” features are often graphsIf Katy Perry or Barack Obama used your “friends” feature…Limit the size of graphs, or queue events for fan-out updating

Scaling Out Tricks

Batching and Parallel WorkDo large queries in parallel

Modern CPUs have many cores (2, 4, 8+)1 connection = 1 thread = 1 CPU core

Batch inserts/updates1 x update with 1000 items > 1000 x updates with 1 item

Scaling Up Tricks

Test provider turn-around time on hardware upgradingTest application performance on improved hardware in advanceScale up only resources needed

Databases

GeneralMonitoring/reviewing slow queries reduces most inefficienciesMore memory will reduce disk requestsSSDs will reduce disk request timeProper database and kernel tunings will help further

Linux has very inefficient defaults!Try to use real local-disks, not EBS, NFS, etc

QueriesDon’t try to make MySQL/MongoDB a queue or search engine!Decentralizing data and pre-computing answers for reads will take

you farThe best query is no query (cache)

Testing Performance and Capacity

GeneralTry to emulate the real user trafficAdd micro-pauses to simulate realityCloud-based providers are great for running load generationApplicationsComponent testing

Test the max volume of each component on a single hostTest the max volume of each component on many hostsCalculate host scalability, ie: “+1 host = +80% more traffic”

Feature capacityTest the impact of each feature if not separate

Testing Performance and Capacity

DatabasesReplay real user traffic on real backupsLoad test tools: Linkbench, Sysbench, TPCC, JMeter, etcSingle feature/query testing

Understand host capacity per feature, eg: “2000 user login queries/sec per db replica”

Know your slowest query!

Development-time Questions

GeneralWhat does the app do?If I break X, what happens?Are connections to data stores “pooled”?ReplicasCan the app use replicas (with possible lag)?

Tip: start early, deploy replication from the startCan we Add/Remove replicas without disruption?ShardingCan the app understand shards/partitions?How is data balanced post-sharding?Are there cross-shard references?

Development-time Questions

CachingWhat data can be cached?Will an change be read immediately?

Can we pre-cache this change?When should the cache delete an item?

Can we set TTLs on our keys?How do we add/remove cache servers easily?

Knowing Your App

If you see…The app is write heavy

Remove overhead from immediate write pathBatch writes if possible

The app is read heavyReduce scans/operations from the read path (index, etc)Add as many replicas (slave/secondary) as needed

The app queries for counts often, ie: # of items, friends, etcMove count-queries to incremented in-memory countersOr, create an index for the count query

The app uses references or joins oftenConsider decentralising the data (with fan-out updates)

Themes

Make all features, apps, databases elasticRequest Flow

Make the heavy workload easy / make the light workload hardMove graph updates to background (queues, async, etc)Move ‘counts’ to counters

CachingCheaper/faster to access than DBTry to cache before anyone reads data

QueuesGreat for replicating events while simplifying updateGreat for batching changes

Monitor everything! Try Percona Monitoring and Management!

Join us at Percona Live Europe

When: October 3-5, 2016Where: Amsterdam, Netherlands

The Percona Live Open Source Database Conference is a great event for users of any level using open source database technologies.

Get briefed on the hottest topicsLearn about building and maintaining high-performing deployments Listen to technical experts and top industry leaders

Use promo code “WebinarPLAM16” and receive €15 off the current registration price!

Sponsorship opportunities available as well here.

https://www.percona.com/live/plam16/be-a-sponsor

Questions?

Thanks for joining! Be sure to checkout

the Percona Blog for more technical blogs and topics!

black friday and cyber monday- best practices for your e-commerce database

Documents