things you must know before you deploy openstack - catalyst openstack... · 2x 10gbps switches per...
TRANSCRIPT
Presented by Bruno Lago // 05 May 2016
Things you MUST know before you deploy OpenStack
WARNING!
I AM NOT HERE TO SELL YOU A PRODUCT
So...
I don’t have to make it look good
How much will it cost?
~ USD $150k one off
For a production cluster and pre-prod environment
+ 2 to 3 people per month to run it
OR
A service provider to manage it remotely for you (~ USD $10k / month)
Selecting your hardware
Network hardware
2x 10Gbps switches per rack
2x 40Gbps switches for the spine
2x 1Gbps switches for the management network
1x 1Gbps switch for the pre-prod cluster
Features required: VLAN, VXLAN, MLAG, L3 routing using BGP ECMP
Forget Cisco, Juniper, Arista. Use open source switches!
Avoid using vendor specifc neutron providers and go for Open vSwitch.
Network hardware
2x 10Gbps switches per rack
2x 40Gbps switches for the spine
2x 1Gbps switches for the management network
1x 1Gbps switch for the pre-prod cluster
Features required: VLAN, VXLAN, MLAG, L3 routing using BGP ECMP
Forget Cisco, Juniper, Arista. Use open source switches!
Avoid using vendor specifc neutron providers and go for Open vSwitch.
Not required onday one!
Server specs
Compute nodes
ALL THE HYPERVISORS!
(Yeah, Right!)
KVM is by far the most widely adopted and best supported hypervisor.
Open source hypervisors is where the numbers stack up!
AND where you get most support from the community.
That said: OpenStack does work with most hypervisors on the industry
and there are successful deployments running Xen or even VMware.
Node segmentation (for financial reasons)
● Specialised object storage nodes allow optimisation for low cost, high
capacity
● Block storage nodes can be optmised independently for performance
(IO operations completed under 30ms or 10ms)
● Compute optimised for high CPU and memory density (and maybe
GPUs)
Techniques to drive quality and service levels up
Node segmentation (service levels)
Potential issues with hyper convergence:
● Kernel bug high memory
● OVS / kernel bug affecting network namespaces
Segment at least controll plane, compute and storage. If possible
segment network nodes.
Useful techniques
● Run CI / automated tests in your own cloud (and ensure you can run
it on someone’s else cloud too if you have only one region)
Useful techniques
● Run CI / automated tests in your own cloud (and ensure you can run
it on someone’s else cloud too if you have only one region)
● Run tempest scenario tests as a CI gateway and monitoring check
Useful techniques
● Run CI / automated tests in your own cloud (and ensure you can run
it on someone’s else cloud too if you have only one region)
● Run tempest scenario tests as a CI gateway and monitoring check
● Have a decent pre-production environment (YES, you need one)
Useful techniques
● Run CI / automated tests in your own cloud (and ensure you can run
it on someone’s else cloud too if you have only one region)
● Run tempest scenario tests as a CI gateway and monitoring check
● Have a decent pre-production environment (YES, you need one)
● Think about communication channels with customers and prepare
communication tools ahead of time
Useful techniques
● Run CI / automated tests in your own cloud (and ensure you can run
it on someone’s else cloud too if you have only one region)
● Run tempest scenario tests as a CI gateway and monitoring check
● Have a decent pre-production environment (YES, you need one)
● Think about communication channels with customers and prepare
communication tools ahead of time
● Monitoring that picks up automatically every service / component
deployed
In place upgrades
(Yes Sergey, they are possible!)
● No big bang. One service at a time. Most services have backward
compatible API.
● Test every change in CI with automated tests
● Reherse every move in pre-prod
● Bullet proof live migration (Mitaka, QEMU guest agent)
● Have scripts to migrate routers and DHCP agents with minimum
downtime
Common deployment mistakes
GUI driven OpenStack
Carrying your own patches
● As a rule of thumb, never run code in production that has not been
merged upstream
● Every patch that is not commited upstream creates a recurring
overhead on the team with every release of OpenStack!
● DON’T do it, unless it is absolutely necessary
● Trusted me - people have wasted millions with this mistake!
● Be prepared to fix bugs and introduce new features upstream. If you
are not, then ask for a service provider to do it for you
Cloud != Hypervisor
● A cloud is a complex distributed system with many moving parts
● It touches every part of your data centre
● Your team needs to be prepared to dive deep in each area to
troubleshoot incidents and problems
Keystone != IDP
● Back Keystone with OpenLDAP, Active Directory or a SAML based
IdP
● Think about how people will create / terminate accounts, reset
passwords
All projects are production ready
“A project exists, therefore I can do it in production”
How to identify projects ready?
● Understand your requirements
● Validate functional and non-functional requirements in real life
● Try HA procedures in real life
● Try upgrade procedures in real life
● Validate security standards
● Consider doing a code inspection yourself
Do the numbers stack up?
Can OpenStack beat the prices of “massive sacale” global cloud providers?
AWS Sydney m3.large / month
USD $136.16
Can OpenStack beat the prices of “massive sacale” global cloud providers?
AWS Sydney m3.large / month
USD $136.16
AWS USA m3.large / month
USD $97.36
Can OpenStack beat the prices of “massive sacale” global cloud providers?
AWS Sydney m3.large / month
USD $136.16
AWS USA m3.large / month
USD $97.36
AWS USA m3.large reserved 3Y upfront
USD $38.14
Can OpenStack beat the prices of “massive sacale” global cloud providers?
AWS Sydney m3.large / month
USD $136.16
AWS USA m3.large / month
USD $97.36
AWS USA m3.large reserved 3Y upfront
USD $38.14
OpenStack Cloud USD $15.13
Price difference USD -$23.01
Price difference (%) 152%