openstack in tssg - heanet yates, jerry... · • highly automatable, lots of open apis. ... all...

29
Openstack in TSSG Jerry Horgan, Infrastructure Manager Paul Yates, Systems Administrator 12 th Nov 2015

Upload: vannhu

Post on 07-May-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Openstack in TSSG

Jerry Horgan, Infrastructure Manager

Paul Yates, Systems Administrator 12th Nov 2015

11/25/2015 www.tssg.org/ig

Who are we?

• The TSSG is an ICT research and innovation centre based in WIT.

– Which is further sub-divided into several research and support units based on theme.

• We work in the Infrastructure Group, which:

• Provides the Data Centre facilities to TSSG that support the Testbeds and

Demonstrators / Platforms, Networks and Servers (physical and virtual).

• We are here today to discuss virtual servers usage in TSSG and their impact.

2

The Infrastructure Group in TSSG in WIT

11/25/2015 www.tssg.org

The Problem

• Currently using VMWare infrastructre (basic/enterprise licenses).

– ~150VMs over 2 physical hosts.

• No Self-Service.

– Average VM turnaround time is 3 hours from ticket - ~30 minutes work.

– Would often get requests for 15-200 VMs for a short period of time.

• No quota management.

• VMs tended to migrate between projects and get old.

• VM Lifecycle Management is time consuming and tedious.

3

Virtual Machine Lifecycle

11/25/2015 www.tssg.org

Requirements

TSSG Staff Requirements:

• We issued a Questionnaire* to 30 staff.

• These were followed up with Interviews with 10 staff.

Infrastructure Group Requirements:

• We wanted to re-use as much existing equipment as possible.

• At a minimum solve our Life-Cycle Issue.

• Automate as much as possible and keep costs to a minumum.

*IVI Cloud Lifecycle

4

Build it and they will come! Really??

11/25/2015 www.tssg.org

Results

Questionnaire Results:

• Service Provisioning, and Risk Management are seen as the weakest areas.

• Solutions Delivery, and Capacity Forecasting and Planning were next weakest.

Interview Results:

• IaaS needed; PaaS desired.

• Private needed; public access desired.

• On-demand self-service, and rapid elasticity needed; Broad Access and Accounting

desired.

5

Needs vs Wants

11/25/2015 www.tssg.org

Selection Process

• OpenSource (free) Private Cloud IaaS Solution.

• Intuitive interface – very similar to AWS.

• Integrates with existing Network equipment (Arista and Cisco) and provides multiple

network types.

• Integrates with existing SAN (EMC).

• Highly Automatable, lots of open APIs.

• Has quotas, resource accounting, and built in security features.

• We were already using it on the XiFi EU project.

6

Why OpenStack

Implementation

• Approach: Agile - Small incremental releases to monitor each improvement.

• How: Local machine (KVM/Linux) then implementing on physical machines.

• Deployment: Manual install then automate with Ansible Playbooks.

• Equipment: Dell R720 Poweredge, Arista 7050T, EMC SAN, Ubuntu OS.

• Problem areas: Improving upon standard Networking implementations in

Openstack. Providing a Highly Available infrastructure.

• This resulted in 3 test deployments, with UAT since Jan 2015.

25/11/2015 www.tssg.org 7

Methodology

DEPLOYING OPENSTACK

Technical Implementation

25/11/2015 www.tssg.org 8

Service Layout

25/11/2015 www.tssg.org 9

Minimal Architecture

Network Layout

25/11/2015 www.tssg.org 10

Minimal Architecture

Generic Routing Encapsulation (GRE)

TENANT NETWORKS

Layer 2 Networking

25/11/2015 www.tssg.org 11

25/11/2015 www.tssg.org 12

Replaced previous Monolithic Plug-ins

Type Driver: maintains type-specific network state, performs provider network validation and tenant network allocation.

Mechanism Driver: responsible for taking established information and ensuring it is correctly applied

Neutron ML2 Driver

Tenant Networks

25/11/2015 www.tssg.org 13

Arista-neutron-ml2-driver

Arista CVX

25/11/2015 www.tssg.org 14

Extension component

PROVIDER NETWORKS

Layer 3 Networking

25/11/2015 www.tssg.org 15

Provider Networks

25/11/2015 www.tssg.org 16

Four options considered

Master/Slave network nodePros: Easy to implement Network node redundancy Cons: Still have bottleneck Virtual routers take time to activate

Virtual Router Redundancy (VRRP)Pros: Reduce bottleneck – distribute traffic Network node redundancyCons: More network nodes (hardware) Traffic still Compute to Network node

Arista L3 Router Service Plugin Pros: No need for network node Networking done on physical switchesCons: Timing – not production ready (end of

2014)

Distributed Virtual Routing (DVR)Pros: All traffic directly out of Compute

(excluding SNAT) Removes bottlenecks Cons: Replication of Virtual routers

25/11/2015 www.tssg.org 17

Distributed Virtual Routing

Provider Networks

Topology Changes

25/11/2015 www.tssg.org 18

Story so far…

Topology

25/11/2015 www.tssg.org 19

New implementation

HIGH AVAILABILITY

Reducing Single Points of Failure

25/11/2015 www.tssg.org 20

11/25/2015 www.tssg.org

MySQL Cluster

21

Active/Active Service nodes

11/25/2015 www.tssg.org

Corosync & Pacemaker

22

Pacemaker – Stops/Starts services and ensures service is only running in one place. Corosync – Ensures cluster nodes can send and receive messages.

Active/Active or else Active/Passive

11/25/2015 www.tssg.org

Scaling Service nodes

23

Active/Active – load balancing

11/25/2015 www.tssg.org

Final Topology

24

Openstack Kilo in Production

CONCLUSION ...

Whats next?

25/11/2015 www.tssg.org 25

11/25/2015 www.tssg.org

Solution

• Fully resilient, scalable, automated, private cloud deployment which is publically

(Internet) accessible.

• We’ve managed to minimise the # of non-compute / non-service nodes.

– Leveraging our existing Arista switches and our EMC SAN*.

– We’ve found that software developers and network engineers have different understandings of what a

network is.

• Provides IaaS with some PaaS features.

• However, it took at lot of hardware (18 servers) and time (15 months).

26

The Deployment

11/25/2015 www.tssg.org

Benefits

• Removed VMWare and associated license costs, which allowed us to grow.

– Increased number of hosts from 2 to 9 (7 compute).

– Increased from ~150VMs to between 1,000 to 2,000 VMs.

• Similar functionality for ~1/10 the cost of a VMWare solution.

• Self-Service: Spin-up 1 VM in 20 seconds, 50 VMs in 1 minute.

• Quotas / Resource Accounting. VM Lifecyle becomes an end-user issue.

• Backups: Templates backed up centrally. Automated deployments through code push,

deocker repo push.

• SRP, SD, and CFP all addressed. Risk Management improved but needs work.

27

For TSSG and IG

11/25/2015 www.tssg.org

Future Work

• We need to determine our over-provisioning (contention) ratios.

• Develop debugging tools / procedures across all the relevant hardware and

components.

• Security of the VMs and network in particular, but also the Hypervisors etc.

• Platform as a Service (OpenShift, Docker).

• Software Defined Networking (OpenDaylight).

28

The next 15 months