openstack cluster zero-downtime upgrade ft. kolla...zero downtime upgrade proposal (2) utilize...
TRANSCRIPT
2017 May 11Duong Ha-Quang and Hieu LEFujitsu Vietnam Limited
OpenStack ClusterZero-Downtime Upgrade ft. Kolla
Copyright 2017 Fujitsu Vietnam Limited
Who are we?
Duong Ha-Quang Software Engineer at Fujitsu Vietnam
Core reviewer of Kolla
Email: [email protected]
IRC: duonghq
Hieu LE Software Engineer at Fujitsu Vietnam
Official Vietnam OpenStack UG organizer
Email: [email protected]
IRC: hieulq
1 Copyright 2017 Fujitsu Vietnam Limited
Introduction
Some reviews and thought about zero-downtime upgrade for OpenStack services.
Ideas in this presentation are just concept.
PoC in Kolla.
2 Copyright 2017 Fujitsu Vietnam Limited
Agenda
1. OpenStack Upgrading overview
OpenStack upgrade assertion tag
OpenStack rolling upgrade requirements
2. From minimal to zero
3. Zero downtime upgrade proposal in Kolla
Kolla support for configuration management
Kolla support for OSM
Proposal/Demo
3 Copyright 2017 Fujitsu Vietnam Limited
OpenStack Upgrading
One of the most demand feature for every system
Cold upgrade is easy
Service-level agreement is more strictly nowadays, we need decreasing downtime of the system.
4 Copyright 2017 Fujitsu Vietnam Limited
Source: http://imgur.com/a/kZCrS
Review of current upgrading model
BlueGreen deployment and Canary release
Rolling upgrade
5 Copyright 2017 Fujitsu Vietnam Limited
BlueGreen deployment
Copyright 2017 Fujitsu Vietnam Limited6
Old
New
Router
User(s)
https://martinfowler.com/bliki/BlueGreenDeployment.html
Canary release
Copyright 2017 Fujitsu Vietnam Limited7
https://martinfowler.com/bliki/CanaryRelease.html
Router
User(s)
Old
New
Rolling upgrade
Eliminates the need to restart all services on new code simultaneously.
Requires mixed-version services work together properly in mid-upgrade.
May have downtime of some services at a time.
8 Copyright 2017 Fujitsu Vietnam Limited
A(X)B(X)
C(X)
X + 1 X + 1
OpenStack rolling upgrade requirements
1. Online Schema Migration (OSM)
2. Maintenance Mode
3. Live Migration
4. Multi-version Interoperability
5. Graceful Shutdown
6. Upgrade Orchestration
7. Upgrade Gating
8. Project Tagging
9 Copyright 2017 Fujitsu Vietnam Limited
https://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/rolling-upgrades.html
https://github.com/openstack/governance/blob/master/reference/projects.yaml
OpenStack upgrade assertion tag
TC of OpenStack defines five upgrade-related tags:
1. assert:supports-upgrade
2. assert:supports-accessible-upgrade
3. assert:supports-rolling-upgrade
4. assert:supports-zero-downtime-upgrade
5. assert:supports-zero-impact-upgrade
10 Copyright 2017 Fujitsu Vietnam Limited
https://governance.openstack.org/tc/reference/tags/
From minimal downtime to zero-downtime upgrade
Zero-downtime upgrade requires to enhance:
Configuration management (CM)
Database migration (OSM)
11 Copyright 2017 Fujitsu Vietnam Limited
Icons made by Freepik from www.flaticon.com is licensed by CC 3.0 BYIcons made by Google from www.flaticon.com is licensed by CC 3.0 BYIcons made by Madebyoliver from www.flaticon.com is licensed by CC 3.0 BY
Old version New version
New configDeprecated configRemoved configRPC pinning
RPC
Service Upgrading database
Two main approaches for OSM
Trigger-based E.g. Keystone, Glance
Other: Facebook [1]
Triggerless E.g. Neutron
Binary log-based [2]
12 Copyright 2017 Fujitsu Vietnam Limited
[1] https://www.facebook.com/notes/mysql-at-facebook/online-schema-change-for-mysql/430801045932/
[2] https://github.com/github/gh-ost
Zero downtime upgrade proposal
Database online schema migration: 2 candidate solutions
1. Buffer requests in upgrade period.
2. Utilize checkpoint/snapshot and binary log of database.
13 Copyright 2017 Fujitsu Vietnam Limited
Upgrading
Zero downtime upgrade proposal (1)
Copyright 2017 Fujitsu Vietnam Limited14
Buffering HTTP and RPC requests in upgrade period.
HA/LB stack
Service (ver X)
Database (ver X)
HA/LB stack
Service (X X + 1)
Database (X X + 1)
Requests buffer
Service (X + 1)
Database (X + 1)
Requests buffer
HA/LB stack
Requests are resent with original order
x
Step 0 Step 1 Step 2
Zero downtime upgrade proposal (1)
Buffering requests in upgrade period
There are 2 request types need buffering: RESTful HTTP requests from user and inter-projects.
Internal service RPC requests (through MQ)
Requests must be put in buffer in received order for replay correctly (best with timestamp)
15 Copyright 2017 Fujitsu Vietnam Limited
Zero downtime upgrade proposal (1)
Buffering requests in upgrade period
Pros: From user’s POV: no service perceivable downtime if migration time is short enough.
Cons: From user’s POV: system is lag when requests are queued in buffer.
If migration time is long (mainly in database migration), some requests can timeout [1].
Buffer can be very large if database migration time is long.
16 Copyright 2017 Fujitsu Vietnam Limited
[1] https://blueprints.launchpad.net/keystone/+spec/allow-expired
Next idea
Zero downtime upgrade proposal (2)
Copyright 2017 Fujitsu Vietnam Limited17
Utilize checkpoint/snapshot and binary log of database.
Database
New data (Y)
BeforeCheckpoint (X)
Create checkpoint
Turn on binary log
X innew schemaMigrate to
next release
Start new version and bring up system
New data (Y) inold schema
X innew schema
Database innew schema
Migrate usingbinary log
Shutdown database-related service
Turn off binary log
Recorded by binary log
downtime
Can beeliminatedwith previousapproach
Zero downtime upgrade proposal (2)
Utilize checkpoint/snapshot and binary log of database.
Pros:
Internal system downtime is much smaller than previous approach (only downtime for delta change vs
whole database).
Cons:
From user’s POV: there is a short downtime.
Implementation is more complicated than previous approach.
18 Copyright 2017 Fujitsu Vietnam Limited
Database is turned off
Zero downtime upgrade proposal
Two candidate methods can be combined to get advantage from both methods:
- Use 2nd approach but add buffer layer and turn on when database is shutdown
→ Zero downtime from user’s POV, only a bit lag.
19 Copyright 2017 Fujitsu Vietnam Limited
Create checkpoint,binary log
Migrate current dbto new schema
Turn on buffer
Migrate new data tonew database
Turn off binary log Finish
Idea 2
Idea 1
PoC in Kolla
Kolla’s mission is to provide production-ready containers and deployment tools for operating OpenStack clouds.
Three official deliverables: kolla(-image)
-> Docker images
kolla-ansible-> Deploy OpenStack using Ansible
kolla-kubernetes -> Deploy OpenStack inside k8s cluster
20 Copyright 2017 Fujitsu Vietnam Limited
Kolla support for configuration management
Kolla-Ansible has implemented mechanism for configuration management functions:Configurations overridden [1]
Kolla-Kubernetes posed good potential to automated CM
21 Copyright 2017 Fujitsu Vietnam Limited
[1] http://docs.openstack.org/developer/kolla/advanced-configuration.html
Kolla support for OSM (1)
For OpenStack projects had OSM native-supported Patches for Neutron and Keystone OSM are in progress
https://blueprints.launchpad.net/kolla-ansible/+spec/apply-service-upgrade-procedure
22 Copyright 2017 Fujitsu Vietnam Limited
Kolla support for OSM (2)
For OSM unsupported project
Implement above ideas at HA/LB layer.
Request buffer: Intermission/OpenResty
23 Copyright 2017 Fujitsu Vietnam Limited
PoC scenario for 1st approach
PoC for HTTP requests buffering
Intermission/OpenResty configuration proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
Intermission is bound to VIP:5000 and VIP:35357, HAProxy 5000 -> 5050, 35357 -> 35387
Scenario: Continuously send 200 create and delete network request to Ocata cluster
Upgrade Neutron to master code-based while requests are sending.
24 Copyright 2017 Fujitsu Vietnam Limited
Demonstration
Used scripts: https://github.com/vietstacker/zero-downtime-upgrade-scenario
Rolling upgrade with Kolla, we have downtime here https://www.youtube.com/watch?v=CfCBLeV1kIM
PoC buffer request with Kolla https://www.youtube.com/watch?v=6UDQXDINw84
25 Copyright 2017 Fujitsu Vietnam Limited