summit 2013 spring rob hirschfeld migrations v1

21
Rob Hirschfeld Dell, Distinguished Engineer http://lifeatthebar.com

Upload: rob-hirschfeld

Post on 16-Jan-2015

832 views

Category:

Documents


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Summit 2013 spring rob hirschfeld migrations v1

Rob Hirschfeld

Dell, Distinguished Engineer

http://lifeatthebar.com

Page 2: Summit 2013 spring rob hirschfeld migrations v1

• This session could repeat a lot from last summit

• http://www.openstack.org/summit/san-diego-2012/openstack-summit-

sessions/presentation/getting-from-folsom-to-grizzly-a-devops-upgrade-

pattern

• Interoperability & Reference Architecture

• Reference Architecture w/ Heat (Tuesday @ 11:00)

• Interop Panel (Tuesday @ 5:20)

• Upgrade Projects

• https://wiki.openstack.org/wiki/Upgrade-with-minimal-downtime

• https://wiki.openstack.org/wiki/Grenade

Page 3: Summit 2013 spring rob hirschfeld migrations v1

• The “Problem“ with Migration

• Paths to Nirvana (or Roads to Perdition)

• Alternatives

• An Opinion

• Discussion

http://learn.genetics.utah.edu/content/begin/cells/organelles/

F G

H

Page 4: Summit 2013 spring rob hirschfeld migrations v1

• OpenStack has 3 month release major/minor cycle

• Major version every 6 months

• Minor version (but important) 3 & 6 months after release

• Lots of Changes

• Bugs are fixed

• Operating Systems upgrade

• New technologies appear

• Whole projects are split off

• We expect operators to

• Keep systems running

• Never loose data

• And… Stay up to date http://cdn2.arkive.org

sockeye-salmon-predated-by-grizzly-bear-on-migration-upstream.jpg

Page 5: Summit 2013 spring rob hirschfeld migrations v1

• What are we upgrading? • OpenStack - Yes!

• Dependent packages - Probably?

• Base OS - Maybe?

• What is the state during the "in-between" time? • Infrastructure downtime?

• VM downtime? VM Reboot? Controlled/Informed?

• Availability Windows?

• What contingency plans? • Dry run? Maybe.

• Recover by going backwards? Maybe.

• What level of safety and trust do you need? • Assure data integrity?

• Assure Infrastructure Integrity?

• Maintain Security?

• How long can the migration take? • Big bang move or gradual migrate?

• How will my API consumers/ecosystem cope?

• Can Keystone Grizzly work with Folsom Nova???

• What about futures? G.1 to G.2? H to I?

• Can I skip versions? Jump from G to I? http://www.publicdomainpictures.net

Steep Steps by Peter Griffin

Page 6: Summit 2013 spring rob hirschfeld migrations v1

• Beginning Answers

• Distros will manage dependencies and packaging

• We can’t lose data or compromise security

• Infrastructure state and integrity will vary by solution

• Assumption of Staging

• Some managed environment (not a manual deploy)

• Staging/test environment to get "familiar" with the problem.

• Maintenance window for production - limits scope of change

• Step-wise changes are OK (big bang is not required)

• We can make trade-offs to defray expensive requirements

• Beyond Assumptions… Paradigm Shifts

• There are shared best practices

• Upgrades can be automated in a sharable way

http://www.theemailadmin.com/wp-content/uploads/2012/09/GFI229-hot-water-migration.jpg

Page 7: Summit 2013 spring rob hirschfeld migrations v1

All the nodes update to the latest code

in a short time window

• Details: 1. Cookbooks include update (instead of install) directives.

2. Control upstream package point (e.g. apt-update when appropriate)

3. Force chef-client run

4. Now at new level

• Considerations • Pros: Potentially fast, continuous operation

• Cons: Don't mess up, it is your production environment

• Scope: Security updates

• Code Assumptions:

• System can function through service restarts.

• Underlying data models don't change or migrate appropriately.

Page 8: Summit 2013 spring rob hirschfeld migrations v1

Nodes migrate in staged groups

• Details:

1. Choose subset of machines and quiesce them.

2. Update set

3. Freeze state (by tenant)

4. Migrate service/tenant content

5. Repurpose after complete.

• Considerations

• Pros: Safer, more controlled, and can move tenants as needed

• Cons: Takes longer, still has cut-over point, but less open

http://allgodscrittersgotrhythm.blogspot.com/2010_08_01_archive.html

Page 9: Summit 2013 spring rob hirschfeld migrations v1

Nodes changed individually by a system-wide

orchestration that supports components of multiple versions

• Details

1. Components must be able to straddle versions

2. Orchestration updates core components to new version

3. System as a whole queiseces and is validated (requires self test)

4. Orchestration individually migrates components (return to step 3)

• Considerations

• Pros: Creates a highly resilient system that handles higher rate of change

• Cons: More complex to create and maintain

http://www.grizzlycentral.com/forum/grizzly-tire-wheel-combos/1204-upgrade-tires-grizzly.html

Page 10: Summit 2013 spring rob hirschfeld migrations v1

• Orchestration (not just deployment automation)

• Awareness of physical layout is required

• Must respect fault zones to sustain HA

• Proximity of resources matters for migration

• Networking transitions are essential

• Collaboration with development teams is essential

• Components must support current and previous

• Upgrade plan must be baked into configuration and tested

• Upgrade dependencies must be 1) clear and 2) minimized

• HA complicates upgrades

• Upgrade can be detected as a failure

• HA system must be able to bridge versions

Page 11: Summit 2013 spring rob hirschfeld migrations v1
Page 12: Summit 2013 spring rob hirschfeld migrations v1

• Partial features were confusing

• We wanted to get ahead on upgrade

• It looked like dev jumped to Grizzly

• Good news:

• Some testing of upgrade

• Folsom to Grizzly ops was pretty smooth

• Bad news:

• Grizzly is more complex (more moving parts)

• Missing multi-node upgrade validation

Page 13: Summit 2013 spring rob hirschfeld migrations v1

DB DB

Msg Bus Msg Bus

Compute Compute

Client Client

Dashboard Dashboard Cinder Cinder

Quantum Quantum

Glance Glance

Keystone Keystone

Oslo Oslo

Celimeter Celimeter

Nova Nova

Page 14: Summit 2013 spring rob hirschfeld migrations v1

• Fault Tolerance on BOTH SIDES AND VERSIONS

• Same Version = EASY

• Backwards Version = HARD

• Forward Version = IMPOSSIBLE

Keystone

Havana Easy

Keystone

Grizzly

Nova

Havana

Page 15: Summit 2013 spring rob hirschfeld migrations v1

• We want to limit need to sustain old services

• New versions should support past APIs

• API consumers can migrate in steps

Ideally, we’d server AND client would be multi-version

Keystone

Havana

Step 3

Keystone

Grizzly

API

Nova

Havana

Nova

Grizzly

Ste

p 2

Page 16: Summit 2013 spring rob hirschfeld migrations v1

• Size Matters

• Big Steps = Release Based

• Small Steps = Commit Based

• Small steps are digest

• Easier to test small steps

• Incur less technical debt

• Expose issues to developers while code is fresh

• Large steps create risk

• More combinations to test

• More changes at one time

• Difficult to fix design issues

G H

Page 17: Summit 2013 spring rob hirschfeld migrations v1

Small Step vs Large

Serv

er

vs C

lient

Big Bang!

Continuous

Deploy

Staged

Upgrade

Rolling

Upgrade

Protocol

Stepping

Protocol

Driven

Parallel

Operation

Forced Client

Migration

Page 18: Summit 2013 spring rob hirschfeld migrations v1

Continuous

Deploy

Staged

Upgrade

Rolling

Upgrade

Protocol

Driven

Parallel

Operation

Forced Client

Migration

Protocol

Stepping

Big Bang!

Small Step vs Large

Serv

er

vs C

lient

Page 19: Summit 2013 spring rob hirschfeld migrations v1

Continuous

Deploy

Staged

Upgrade

Rolling

Upgrade

Protocol

Driven

Parallel

Operation

Forced Client

Migration

Protocol

Stepping

Big Bang!

Small Step vs Large

Serv

er

vs C

lient

Page 20: Summit 2013 spring rob hirschfeld migrations v1

• Servers & agents must be version tolerant

• Clients protocols must be testable and documented

• Ensure non-destructive migration

• Fast-fail on client, but version tolerant on server

• Require Expectation that servers will migrate need to be built into the system! Servers must be adopting latest protocols or clients will not follow.

• Servers must test legacy clients/protocols! We must have tests!

• We must be able to find and upgrade legacy clients

Page 21: Summit 2013 spring rob hirschfeld migrations v1

• Deployment Upstream Cookbooks/Modules

• Best Practice Discussions

• Code for Upgradeability

• Crowbar Collaboration

• Upgrade is a FEATURE!

• Orchestration + Chef

• Pull from Source Deployments

• System Discovery

• Networking Configuration

• Operating System Install

http://farm3.static.flickr.com/2561/3891653055_262410bc31.jpg