paypal's cloud journey from folsom to kilo

19
© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. PayPal's Cloud Journey From Folsom to Kilo Wei Tian -- Cloud Performance Lead at Paypal 10/ 28 / 2015 What We Learned in the Upgrade Progress

Upload: wei-tian

Post on 15-Apr-2017

846 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary.

PayPal's Cloud Journey From Folsom to Kilo

Wei Tian -- Cloud Performance Lead at Paypal10/ 28 / 2015

What We Learned in the Upgrade Progress

Page 2: PayPal's Cloud Journey From Folsom to Kilo

© 2014-15 PayPal Inc. All rights reserved. Confidential and proprietary.

Agenda

2

• About paypal Cloud• Past Upgrade before Kilo• Kilo Upgrade• What next ?

Page 3: PayPal's Cloud Journey From Folsom to Kilo

© 2014-15 PayPal Inc. All rights reserved. Confidential and proprietary.

About PayPal Cloud

3

• Background– Started in July 2012 with 1 engineer and 16 decommissioned servers– Today, one of the world’s Largest OpenStack Private Cloud – Number of Physical Servers: 8064 – Number of Racks: 84 – Total Cores: 386,000– Block Storage: 2 peta bytes– Largest AZ with 2500+ hypervisors

• Business Goals– Hosting ~100% of PayPal’s production traffic (except Databases and Messaging)– Powers 100% of PaaS, Dev/QA and M&As– First production workload on SDN in 2013

Page 4: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 4

Upgrade History

4

• Early 2014, Multiple versions of Openstack in 10 AZ’s.• Grizzly and Folsom.

• Time to upgrade to Havana – 1 YEAR• Decision – Skip Icehouse and Juno

• Don’t want to be in constant catch-up mode• Upgrade directly to Kilo.

Current Status• One AZ in Kilo• Rest In-flight

Page 5: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 5

Upgrade is Difficult!

5

• One of the largest Openstack private cloud.

• 100% of the Paypal production • In Service Upgrade – No Availability

Impact Allowed. • Mixed Folsom and Grizzly environments!• Nova-network AND Neutron networking.• Custom Code in-line with Upstream

• Two months to code-ready.

Page 6: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 6

Complicated Code Base – Need Manual Merge

6

Page 7: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 7

Prepare the Code for the Upgrade – it Takes Time

7

Page 8: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 8

Custom Database Migration

8

• Custom tables (besides Nova tables)

• Custom DB migration script for Nova to migrate content from the custom tables.

• Custom DB migration script for keystone

Page 9: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 9

Seamless Migration from Nova-network to Neutron

9

• At Paypal, several data centers were using Folsom nova-networking in production with thousands of VMs in service. We successfully upgraded to OpenStack Havana, adopted Neutron as a Network service instead of nova-networking, and replaced Linux bridge by Open vSwitch.

• The upgrade covered both control plane and data plane migration. The control plane includes neutron network/subnet/port management and SDN controller integration, while the data plane includes tap device migration from Linux bridge to openvswitch bridge, DHCP service, security group, and Libvirt/KVM configuration

• For more details, Check out the presentation from my colleagues at the Paris Summit: https://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/seamless-migration-from-nova-network-to-neutron-in-ebay-production

Page 10: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 10

Openstack Services Deployment with Virtualenv

10

• Each controller node runs multiple services (keystone, nova, glance, neutron, cinder, etc.) and each service runs in its own virtual environments.

• The best way to have painless and reproducible deployments is to package whole virtual environments of the application you want to deploy including all dependencies but without configuration.

• The benefit for upgrade with virtualenv packages:• Speed. Deploying a new version is as simple as unzipping a tar ball.• Predictability. Each service running with its own virtualenv and can be

upgrade independently. • Easy rollback. All versions of a service exist under different folders, and

to rollback is simply changing the startup script with different working directory.

• Simple Puppet Script. Only deal with startup script and config file for Openstack services.

Page 11: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 11

Kilo Upgrade from Havana

11

Principles

• No down time on data plane• A few hours of down time on

control plane.• Upgrade on Saturday while the

usage is low.• Prepare hypervisors as much as

we can before the upgrade day.

Process

• Code readiness for kilo.• Build a shadow control plane.• Prepare run books for overall

process and for each components.• Dry-run I• QA cycle I with Functional testing,

performance and load testing.• Dry-run II• QA cycle II with Functional testing,

performance and load testing.• Pre-upgrade. Prepare all

hypervisors.• The official upgrade.

Page 12: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 12

Code Readiness for Kilo – Branch Strategy

12

Page 13: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 13

Code Readiness for Kilo – Coding Guidelines

13

Guidelines

• No change to upstream code.• No directly back port or merge

from havana changes.• Completely re-write the Paypal

extensions align with kilo code.• Apply standard Openstack

extension method..• Actively anticipant in upstream.• Report bug to upstream.• Fix bug in upstream.

Ways to customize Openstack

• WSGI middleware.• Openstack API extensions (resource,

controller, and child resource). • Extending manager classes for

services.• Custom filters and weighers.• Custom RPC methods.• Nova hooks.• Monkey patch as a last resort.

Page 14: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 14

Code Readiness for Kilo – Project Structure

14

Page 15: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 15

Build a Shadow Control Plane

15

Page 16: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 16

Dry-Run

16

Page 17: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 17

The Benefit of Using Latest Release from Upstream

17

• Latest features and bug fixes from upstream.• We can also GIVE BACK to the community.• No More cherry picking bug fixes.• Older branches DELETED from Github !!!• NO CODE DUPLICATION between Upstream and PayPal:

• In Folsom, we implemented ‘compute zone’.• Similar to host aggregate.

• In Grizzly, we implemented a list of ‘aggregate_XXXX_filter’.• Similar to the ‘aggregate_XXXX_filter’ in havana

• In Havana, we implemented nova extension to show host status and instance faults in nova list and nova show.

• Was Added Upstreamin Juno

Page 18: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 18

Prepare for the Liberty Upgrade

18

• After the overhaul refactoring in both code base and upgrade process, the preparation time for Liberty upgrade will be weeks instead of months.

• From kilo to liberty, we could apply the recommended upgrade process from upstream and do a live upgrade without down time in either control plane or data plane.

• The schedule for Liberty upgrade will be early next year.

Page 19: PayPal's Cloud Journey From Folsom to Kilo

© 2015 PayPal Inc. All rights reserved. Confidential and proprietary. 19

Questions ?