cloud world forum talk 062515
TRANSCRIPT
Transforming the Delivery of IT
Services, Cloud Delivery & Production
Ajay Dankar, PayPal
24-25 JUNE 2015, London, UK
PayPal’s Cloud Journey
Goals & Principles
Deliverables
Business Requirements
Learnings
Cloud Adoption @ PayPal
+ Havana
+ 12k+ hypervisors
+ 300k+ cores
+ 10+ availability zones
+ 15+ virtual private clouds
+ > 1.6 pb block storage
+ 100% KVM
+ 100% OVS
PayPal’s OpenStack based private cloud
serves 160+M customers for payment,
website interactions, mobile and more…
100% of PayPal web-tier and mid-tier
services run on OpenStack cloud.
PayPal’s Cloud Journey
Goals & Principles
Deliverables
Business Requirements
Learnings
Business Requirements Journey to Cloud began in 2012 in response to specific business asks.
Business Agility Cost Efficiency Enhanced Service Quality
+ Reduce time spent between
“code to LTS”
+ Rapid elasticity (scale up
and down as needed)
+ Self-service
+ Standardization
+ Automation
+ Superior to the current model
PayPal’s Cloud Journey
Goals & Principles
Deliverables
Business Requirements
Learnings
Design/Code
Goal: Enable the Developer
1
Build/Test 2
Test/Integrate 3
Deploy/Monitor 4
Cloud as the interface
for the data center 1
Improve resiliency
& efficiency of apps 2
Lights Out Management
(LOM) 3
Goal: Enable the Business
Adopt Open Source Where Possible
Avoid Vendor Lock-In
Leverage eBay Inc.-wide Investments
Guiding Principles
PayPal’s Cloud Journey
Goals & Principles
Deliverables
Business Requirements
Learnings
PLATFORM
AS A SERVICE
INFRASTRUCTURE
AS A SERVICE
PHYSICAL
INFRASTRUCTURE
Cloud Product Suite (Capabilities Aligned to Customer Needs) P
RO
DU
CT
TEA
MS
(MA
JOR
ITY
)
Self-Healing
Dependencies
Deployment
Auto-Scaling
Framework
& Runtimes
System
Services DBaaS Analytics Logging Metrics Alerting Messaging
Compute SDN Storage DNS LBaaS Orchestration
Compute
ENG
INEE
RIN
G
& A
RC
HIT
ECTS
O
PS
DB Networking Firewall Storage
PayPal’s Cloud Journey
Goals & Principles
Deliverables
Business Requirements
Learnings
2012 2015
+ A Dev/Test Cloud
+ Less Than a Rack of Compute
+ Handcrafted by an Engineer
+ Supported by Another Engineer
+ Zero Automation
+ Thousands of Nodes
+ Distributed Across Several AZs
+ Automated
+ Operated 24x7
+ Running the Business
Treat Infrastructure As Code
1
Fully Automate Deployments 1
Treat Automation Artifacts
Like You Treat Code 2
Take Automation as a
Product Feature 3
Measure Outcomes with KPIs 4
Well defined and agreed upon
Source Control
Road Map, Sprints, Bugs, Backlog, Releases
Time to Deploy, Time to Recover, Time to Rollout a Change
Code Reviews
Tests Deployment
Manage Drift
2
Incidents Waiting
to Happen
Impacts Time to Recover
Impacts Customers
System is in an Adverse State
+
+
+
Drift
Automation Gaps Transitional
Habits Debugging
Incidents
Automated Audits 1
Drift Tracking 2
Mitigation as a Planned
Routine Activity 3
Managing Drift
Mitigation
Culture – Reward
Good Habits 4
Awareness of Systems & Operations
3
Measure
Everything
Business KPIs
System Config
Alerts
Drift
Culture of Shared Accountability
4
Make TTR a Shared Goal
OPS DEV
+ Working on What’s Running the Business
+ Knows How the System Fails
+ Worries About TTR
+ Working on (wants to work on) new things
+ Knows how the system is supposed to work
+ Wants to Understand Why