openstack in tssg - heanet yates, jerry... · • highly automatable, lots of open apis. ... all...
TRANSCRIPT
Openstack in TSSG
Jerry Horgan, Infrastructure Manager
Paul Yates, Systems Administrator 12th Nov 2015
11/25/2015 www.tssg.org/ig
Who are we?
• The TSSG is an ICT research and innovation centre based in WIT.
– Which is further sub-divided into several research and support units based on theme.
• We work in the Infrastructure Group, which:
• Provides the Data Centre facilities to TSSG that support the Testbeds and
Demonstrators / Platforms, Networks and Servers (physical and virtual).
• We are here today to discuss virtual servers usage in TSSG and their impact.
2
The Infrastructure Group in TSSG in WIT
11/25/2015 www.tssg.org
The Problem
• Currently using VMWare infrastructre (basic/enterprise licenses).
– ~150VMs over 2 physical hosts.
• No Self-Service.
– Average VM turnaround time is 3 hours from ticket - ~30 minutes work.
– Would often get requests for 15-200 VMs for a short period of time.
• No quota management.
• VMs tended to migrate between projects and get old.
• VM Lifecycle Management is time consuming and tedious.
3
Virtual Machine Lifecycle
11/25/2015 www.tssg.org
Requirements
TSSG Staff Requirements:
• We issued a Questionnaire* to 30 staff.
• These were followed up with Interviews with 10 staff.
Infrastructure Group Requirements:
• We wanted to re-use as much existing equipment as possible.
• At a minimum solve our Life-Cycle Issue.
• Automate as much as possible and keep costs to a minumum.
*IVI Cloud Lifecycle
4
Build it and they will come! Really??
11/25/2015 www.tssg.org
Results
Questionnaire Results:
• Service Provisioning, and Risk Management are seen as the weakest areas.
• Solutions Delivery, and Capacity Forecasting and Planning were next weakest.
Interview Results:
• IaaS needed; PaaS desired.
• Private needed; public access desired.
• On-demand self-service, and rapid elasticity needed; Broad Access and Accounting
desired.
5
Needs vs Wants
11/25/2015 www.tssg.org
Selection Process
• OpenSource (free) Private Cloud IaaS Solution.
• Intuitive interface – very similar to AWS.
• Integrates with existing Network equipment (Arista and Cisco) and provides multiple
network types.
• Integrates with existing SAN (EMC).
• Highly Automatable, lots of open APIs.
• Has quotas, resource accounting, and built in security features.
• We were already using it on the XiFi EU project.
6
Why OpenStack
Implementation
• Approach: Agile - Small incremental releases to monitor each improvement.
• How: Local machine (KVM/Linux) then implementing on physical machines.
• Deployment: Manual install then automate with Ansible Playbooks.
• Equipment: Dell R720 Poweredge, Arista 7050T, EMC SAN, Ubuntu OS.
• Problem areas: Improving upon standard Networking implementations in
Openstack. Providing a Highly Available infrastructure.
• This resulted in 3 test deployments, with UAT since Jan 2015.
25/11/2015 www.tssg.org 7
Methodology
25/11/2015 www.tssg.org 12
Replaced previous Monolithic Plug-ins
Type Driver: maintains type-specific network state, performs provider network validation and tenant network allocation.
Mechanism Driver: responsible for taking established information and ensuring it is correctly applied
Neutron ML2 Driver
Provider Networks
25/11/2015 www.tssg.org 16
Four options considered
Master/Slave network nodePros: Easy to implement Network node redundancy Cons: Still have bottleneck Virtual routers take time to activate
Virtual Router Redundancy (VRRP)Pros: Reduce bottleneck – distribute traffic Network node redundancyCons: More network nodes (hardware) Traffic still Compute to Network node
Arista L3 Router Service Plugin Pros: No need for network node Networking done on physical switchesCons: Timing – not production ready (end of
2014)
Distributed Virtual Routing (DVR)Pros: All traffic directly out of Compute
(excluding SNAT) Removes bottlenecks Cons: Replication of Virtual routers
11/25/2015 www.tssg.org
Corosync & Pacemaker
22
Pacemaker – Stops/Starts services and ensures service is only running in one place. Corosync – Ensures cluster nodes can send and receive messages.
Active/Active or else Active/Passive
11/25/2015 www.tssg.org
Solution
• Fully resilient, scalable, automated, private cloud deployment which is publically
(Internet) accessible.
• We’ve managed to minimise the # of non-compute / non-service nodes.
– Leveraging our existing Arista switches and our EMC SAN*.
– We’ve found that software developers and network engineers have different understandings of what a
network is.
• Provides IaaS with some PaaS features.
• However, it took at lot of hardware (18 servers) and time (15 months).
26
The Deployment
11/25/2015 www.tssg.org
Benefits
• Removed VMWare and associated license costs, which allowed us to grow.
– Increased number of hosts from 2 to 9 (7 compute).
– Increased from ~150VMs to between 1,000 to 2,000 VMs.
• Similar functionality for ~1/10 the cost of a VMWare solution.
• Self-Service: Spin-up 1 VM in 20 seconds, 50 VMs in 1 minute.
• Quotas / Resource Accounting. VM Lifecyle becomes an end-user issue.
• Backups: Templates backed up centrally. Automated deployments through code push,
deocker repo push.
• SRP, SD, and CFP all addressed. Risk Management improved but needs work.
27
For TSSG and IG
11/25/2015 www.tssg.org
Future Work
• We need to determine our over-provisioning (contention) ratios.
• Develop debugging tools / procedures across all the relevant hardware and
components.
• Security of the VMs and network in particular, but also the Hypervisors etc.
• Platform as a Service (OpenShift, Docker).
• Software Defined Networking (OpenDaylight).
28
The next 15 months
Thank you for your attention
questions?