scaling during hypergrowth: lessons learned at appnexus
TRANSCRIPT
Scaling During Hypergrowth:
Lessons Learned at AppNexus
October 25, 2016
Brian Bulkowski
CTO and FounderAerospike, Inc.
2Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
■ Talk originally given by
Christopher Bowman, Sr Director, Data Systems Operations,
AppNexus on Aug 24, 2016
■ Responsible for real-time and Hadoop operations groups
■ Advisor to new technical initiaves within AppNexus
■ Recipient of Founders’ Award, 2016
■ Hear Chris in his own words
http://aerospike.com/webinars/
Operational Techniques for Scaling Up Real-Time Mission Critical Infrastructure
Webinar Available
3Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
AppNexus Platform
■ World’s leading independent ad tech platform
■ Requirements
■ 50 milliseconds maximum to execute requests
■ Load doubled every 9-12 months over the past 6 years
■ QPS Growth from 50,000 per second in 2009 to over 2 Million per second now
■ 40X growth
■ 100% uptime required – “the internet never sleeps”
4Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
AppNexus Real-Time Architecture - Simplified
5Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
User Data Store: Server-side Object Store
■ Mission-Critical component of the infrastructure
■ Requires near 100% uptime
■ No maintenance windows
■ Explosive Growth for 6+ years
2010 2016
Object Count 1 billion 50 billion
Read queries (per second) 50,000 2 million
Write queries (per second) 20,000 700,000
Key value store Aerospike 2.0 Aerospike 3.0
Cluster size 8 Between 12 and 28
nodes
How did we scale with no downtime?
6Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
Choosing Key Value Store For User Data Store
■ SLA
■ Store billions of objects
■ Handle millions of reads/writes per second
■ Provide sub-millisecond latency on reads
■ Achieve near 100% uptime
■ Cost
■ In 2009, AppNexus’s CTO determined that leveraging Flash based technology was the only way to achieve scale in a cost effective manner
■ The Key Value Store Service needed to be very light on operational resources
■ The KVS needed to support Flash based sub-millisecond access as well as a pure main-memory configuration for even faster access.
■ Decisions taken in 2010
■ Previous solution (Schooner) was failing and no support was available from the vendor
■ Evaluated all existing KVS, including Couch, Cassandra, Aerospike, etc.
■ Chose Aerospike for its predictable latency and flash based technology that supported a pure main-memory deployment configuration and hybrid memory/flash setup
7Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
TCO advantage of Flash based deployments (for
Storage type DRAM & NoSQL SSD & DRAMStorage per server 180 GB (196 GB Server) 2.4 TB (4 x 700 GB)
Cost per server $8,000 $11,000
Server costs $1,488,000 $154,000
Power (2 years) $0.12 per kWh ave.
US$352,000 $32,400
Maintenance (2 years) $3,600 per
server$670,000 $50,400
Total $2,528,000 $247,800
Customer requires 500K TPS,
10 TB of storage, with
2x replication factor.
186 SERVERS REQUIRED 14 SERVERS REQUIRED
DRAM ONLY DB
ONLY
DRAM-SSD HYBRID DB
For Demonstration Purposes Only
8Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
Switching horses mid-stream from Schooner to Aerospike (2010)
■ Process
■ Run a side by side production trial of the new infrastructure (Aerospike) with old Infrastructure (Schooner)
■ Our Application was modified to write to both old and new systems and have a dynamic switch to read from either of the systems
■ Operational practices for switching between the new and old systems on the fly was also developed until the new system had the kinks ironed out
■ Trial on virtual machines
■ First, to contain cost, we ran a trial using Aerospike running on VMs
■ We ran limited production runs that identified a few critical issues that were then quickly addressed
■ Deployment on bare metal servers
■ Setup a 3 node test cluster to perform system integration tests before deployment
■ Deployed three cluster of 8 machines each, two in North America and one in Europe
■ Switched traffic to new systems and after a few weeks, switched off the old infrastructure
9Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
Aerospike Key Value Store
1) No Hotspots
– Distributed Hash Table simplifies data partitioning
2) Smart Client – 1 hop to data, no load balancers
3) Shared Nothing Architecture,
– every node is identical
4) Smart Cluster, Zero Touch
– auto-failover, rebalancing, rack aware, rolling upgrades
5) Transactions and long-running tasks prioritized in real-time
6) XDR – sync replication across data centers ensures
– Zero Downtime
10Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
Scale out has limits
■ Continuous operation as data and load increased
■ Started with 8 node clusters with effective unique data size around 2TB in 2010
■ In 2011, 2012:
■ We expanded by adding new nodes and ended up with a 48 node clusters
■ We replaced all SSDs in existing nodes with better SSDs of similar capacity
■ All these required rolling upgrades with no service level downtime
■ Note: We mandated that rolling upgrades to the service could be done at peak periods of load
■ Adding nodes to the cluster became counter-productive after a while
■ Reached capacity limits on existing storage system architecture of 4 SSDs per node
■ Growing maintenance, energy and operational costs as cluster sizes reached more than 40
nodes
■ SSD advances meant the new SSDs were better and attachments could be different, PCIe versus
SATA/SAS
■ All this called for an overhaul of the system
11Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
Replace large cluster with smaller clusters
■ Leveraging Moore’s Law
■ Billions of dollars per year are made into processors and SSDs by the likes of Intel,
Micron, Samsung, etc.
■ It is important to leverage the low cost/high performance curve to stay in business
amid handling explosive growth
■ Evaluated alternatives in 2012
■ Tried out Intel S3700 SSDs as well as PCIe based Flash cards from other vendor
that had excellent performance
■ Chose Intel SATA drives with larger capacity and attaching 12 to 20 such drives per
node because of cost reasons.
■ Configuration was Intel SSD DC S3700/S3500 series in the Dell R720xd Chassis
and H710p Host Controllers.
12Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
Switching clusters in production deployment
OLD CLUSTER
Application Reads &
Writes
13Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
Switching clusters in production deployment
OLD CLUSTER NEW CLUSTER
Application Reads &
Writes
14Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
Switching clusters in production deployment
OLD CLUSTER NEW CLUSTER
Application Reads &
Writes
XDR
15Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
Switching clusters in production deployment
OLD CLUSTER NEW CLUSTER
Application Reads &
Writes Backup &
Restore
XDR
16Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
Switching clusters in production deployment
OLD CLUSTER NEW CLUSTER
Application ReadsApplication Reads &
Writes
17Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
Switching clusters in production deployment
NEW CLUSTER
Application Reads &
Writes
18Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
Case Study
19Proprietary & Confidential || © 2016 Aerospike Inc. All rights reserved. [ ]
Using a Vendor Product for Mission Critical Infrastructure
■ Relationship with Vendor is Critical
■ Vendors agents need to be competent and able to collaborate with us in expert manner for root cause analysis
■ Problems could be in network, machine configuration for os, network card in addition to vendor software.
■ Vendor needs to respond with enhancements and bug fixes in a timely manner
■ As system scales, new limits are reached in the infrastructure layers.
■ Vendor needs to think ahead and work on solution for issues encountered at the next level of scale before the deployment hits it.
■ Ensure that vendor is best of breed and continues to improve product to work at high scale
■ Schooner could not scale up even for one year – installed in 2009, replaced in 2010
■ Aerospike has been successfully scaling up for 6 years since installation in 2010
■ Two key Aerospike improvements that made a huge difference for operations
■ Faster server node restarts during upgrades that reduced node start time to seconds from hours (2013)
■ Faster node rebalancing times that reduced migration times for a large database from over a day to minutes (2016)
Questions