scaling during hypergrowth: lessons learned at appnexus

Scaling During Hypergrowth:

Lessons Learned at AppNexus

October 25, 2016

Brian Bulkowski

CTO and FounderAerospike, Inc.

2Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]

■ Talk originally given by

Christopher Bowman, Sr Director, Data Systems Operations,

AppNexus on Aug 24, 2016

■ Responsible for real-time and Hadoop operations groups

■ Advisor to new technical initiaves within AppNexus

■ Recipient of Founders’ Award, 2016

■ Hear Chris in his own words

http://aerospike.com/webinars/

Operational Techniques for Scaling Up Real-Time Mission Critical Infrastructure

Webinar Available

http://aerospike.com/webinars/


AppNexus Platform

■ World’s leading independent ad tech platform

■ Requirements

■ 50 milliseconds maximum to execute requests

■ Load doubled every 9-12 months over the past 6 years

■ QPS Growth from 50,000 per second in 2009 to over 2 Million per second now

■ 40X growth

■ 100% uptime required – “the internet never sleeps”


AppNexus Real-Time Architecture - Simplified


User Data Store: Server-side Object Store

■ Mission-Critical component of the infrastructure

■ Requires near 100% uptime

■ No maintenance windows

■ Explosive Growth for 6+ years

2010 2016

Object Count 1 billion 50 billion

Read queries (per second) 50,000 2 million

Write queries (per second) 20,000 700,000

Key value store Aerospike 2.0 Aerospike 3.0

Cluster size 8 Between 12 and 28

nodes

How did we scale with no downtime?


Choosing Key Value Store For User Data Store

■ SLA

■ Store billions of objects

■ Handle millions of reads/writes per second

■ Provide sub-millisecond latency on reads

■ Achieve near 100% uptime

■ Cost

■ In 2009, AppNexus’s CTO determined that leveraging Flash based technology was the only way to achieve scale in a cost effective manner

■ The Key Value Store Service needed to be very light on operational resources

■ The KVS needed to support Flash based sub-millisecond access as well as a pure main-memory configuration for even faster access.

■ Decisions taken in 2010

■ Previous solution (Schooner) was failing and no support was available from the vendor

■ Evaluated all existing KVS, including Couch, Cassandra, Aerospike, etc.

■ Chose Aerospike for its predictable latency and flash based technology that supported a pure main-memory deployment configuration and hybrid memory/flash setup


TCO advantage of Flash based deployments (for

Storage type DRAM & NoSQL SSD & DRAMStorage per server 180 GB (196 GB Server) 2.4 TB (4 x 700 GB)

Cost per server $8,000 $11,000

Server costs $1,488,000 $154,000

Power (2 years) $0.12 per kWh ave.

US$352,000 $32,400

Maintenance (2 years) $3,600 per

server$670,000 $50,400

Total $2,528,000 $247,800

Customer requires 500K TPS,

10 TB of storage, with

2x replication factor.

186 SERVERS REQUIRED 14 SERVERS REQUIRED

DRAM ONLY DB

ONLY

DRAM-SSD HYBRID DB

For Demonstration Purposes Only


Switching horses mid-stream from Schooner to Aerospike (2010)

■ Process

■ Run a side by side production trial of the new infrastructure (Aerospike) with old Infrastructure (Schooner)

■ Our Application was modified to write to both old and new systems and have a dynamic switch to read from either of the systems

■ Operational practices for switching between the new and old systems on the fly was also developed until the new system had the kinks ironed out

■ Trial on virtual machines

■ First, to contain cost, we ran a trial using Aerospike running on VMs

■ We ran limited production runs that identified a few critical issues that were then quickly addressed

■ Deployment on bare metal servers

■ Setup a 3 node test cluster to perform system integration tests before deployment

■ Deployed three cluster of 8 machines each, two in North America and one in Europe

■ Switched traffic to new systems and after a few weeks, switched off the old infrastructure


Aerospike Key Value Store

1) No Hotspots

– Distributed Hash Table simplifies data partitioning

2) Smart Client – 1 hop to data, no load balancers

3) Shared Nothing Architecture,

– every node is identical

4) Smart Cluster, Zero Touch

– auto-failover, rebalancing, rack aware, rolling upgrades

5) Transactions and long-running tasks prioritized in real-time

6) XDR – sync replication across data centers ensures

– Zero Downtime


Scale out has limits

■ Continuous operation as data and load increased

■ Started with 8 node clusters with effective unique data size around 2TB in 2010

■ In 2011, 2012:

■ We expanded by adding new nodes and ended up with a 48 node clusters

■ We replaced all SSDs in existing nodes with better SSDs of similar capacity

■ All these required rolling upgrades with no service level downtime

■ Note: We mandated that rolling upgrades to the service could be done at peak periods of load

■ Adding nodes to the cluster became counter-productive after a while

■ Reached capacity limits on existing storage system architecture of 4 SSDs per node

■ Growing maintenance, energy and operational costs as cluster sizes reached more than 40

nodes

■ SSD advances meant the new SSDs were better and attachments could be different, PCIe versus

SATA/SAS

■ All this called for an overhaul of the system


Replace large cluster with smaller clusters

■ Leveraging Moore’s Law

■ Billions of dollars per year are made into processors and SSDs by the likes of Intel,

Micron, Samsung, etc.

■ It is important to leverage the low cost/high performance curve to stay in business

amid handling explosive growth

■ Evaluated alternatives in 2012

■ Tried out Intel S3700 SSDs as well as PCIe based Flash cards from other vendor

that had excellent performance

■ Chose Intel SATA drives with larger capacity and attaching 12 to 20 such drives per

node because of cost reasons.

■ Configuration was Intel SSD DC S3700/S3500 series in the Dell R720xd Chassis

and H710p Host Controllers.


Switching clusters in production deployment

OLD CLUSTER

Application Reads &

Writes



OLD CLUSTER NEW CLUSTER

Application Reads &

Writes




Application Reads &

Writes

XDR




Application Reads &

Writes Backup &

Restore

XDR




Application ReadsApplication Reads &

Writes



NEW CLUSTER

Application Reads &

Writes


Case Study

http://www.intel.com/content/dam/www/public/us/en/documents/case-studies/ssd-aerospike-appnexus-study.pdf

http://www.intel.com/content/dam/www/public/us/en/documents/case-studies/ssd-aerospike-appnexus-study.pdf


Using a Vendor Product for Mission Critical Infrastructure

■ Relationship with Vendor is Critical

■ Vendors agents need to be competent and able to collaborate with us in expert manner for root cause analysis

■ Problems could be in network, machine configuration for os, network card in addition to vendor software.

■ Vendor needs to respond with enhancements and bug fixes in a timely manner

■ As system scales, new limits are reached in the infrastructure layers.

■ Vendor needs to think ahead and work on solution for issues encountered at the next level of scale before the deployment hits it.

■ Ensure that vendor is best of breed and continues to improve product to work at high scale

■ Schooner could not scale up even for one year – installed in 2009, replaced in 2010

■ Aerospike has been successfully scaling up for 6 years since installation in 2010

■ Two key Aerospike improvements that made a huge difference for operations

■ Faster server node restarts during upgrades that reduced node start time to seconds from hours (2013)

■ Faster node rebalancing times that reduced migration times for a large database from over a day to minutes (2016)

Questions

scaling during hypergrowth: lessons learned at appnexus

Technology