climb technical overview

43
CLIMB Technical Overview Arif Ali Thursday 13 th July

Upload: arif-ali

Post on 20-Mar-2017

109 views

Category:

Technology


0 download

TRANSCRIPT

CLIMB Technical Overview

Arif Ali

Thursday 13th July

Hardware Overview• IBM/Lenovo x3550 M4

– 3 x Controller Nodes (Cardiff Only)– 1 x Server Provisioning Node

• IBM/Lenovo x3650 M4– 3 x Controller Nodes (Warwick and Swansea)– 4 x GPFS Servers

• IBM/Lenovo x3750 M4– 21 x Cloud Compute nodes

• IBM/Lenovo x3950 X6– 3 x Large Memory Nodes

• IBM Storwize V3700 Storage– 4 x Dual Controllers– 16 x Expansion Shelves

Cardiff HW Layout

Warwick Rack Layout

Swansea Rack Layout

Key Architecture Differences• Cardiff University has x3550 M4 instead of x3650 M4 for

controller nodes

• Cardiff University use Cat6 for 10G instead of 10G DACs

Key Architecture Challenges• Cable Lengths

• Cable Types

• Rack layouts

• Differences in Hardware and design

Software Overview• xCAT• IBM Spectrum Scale (originally GPFS)• CentOS 7• SaltStack• RDO OpenStack Juno/Kilo• Icinga

xCAT• eXtreme Cluster/Cloud Administration Toolkit• Management of clusters (Clouds, HPC, Grids)• Baremetal Provisioning• Scriptable• Large scale management (Lightsout, remote console,

distributed shell)• Configures key services based on tables

Why we are using xCAT?• Provides tight integration with IBM/Lenovo Systems

– Automagic Discovery– IPMI integeration

• Can manage the Mellanox switches from CLI

• OCF is very experienced, and has development experience

xCAT Configuration• Base images for each machine

– Highmem/compute– Controller– Storage

• Network configuration is defined within xCAT• Only salt-minion is configured through xCAT• All software and configs are done via SaltStack

What is SaltStack“Software to automate the management and configuration

of any infrastructure or application at scale”

• Uses YAML• Security, controlled by server/client public/private keys• Daemons are on master and client (minions)

Why SaltStack• Previously used by UoB, where some of the OpenStack

configuration had already been started

• Automate the configuration

• Consistency across installations

• Re-usable for future installs

• Repeatable

OpenStack and SaltStack• Integration of some key applications

– Keystone– Rabbitmq– Mysql/mariadb

What is OpenStack?

“To produce the ubiquitous Open Source cloud computing platform that will meet the needs of public and private cloud providers regardless of size, by being simple to implement and massively scalable”

OpenStack Logical Architecture

Conceptual Architecture

Nova (Compute)• Manage virtualised server

resources• Live guest migration• Live VM management• Security groups• VNC proxy• Support for various

hypervisors– KVM, LXC, VMWare, Xen,

Hyper V, ESX

APIs Supported:• OpenStack Compute API• EC2 API• Admin API

Nova Configuration• EC2 API has been enabled

• Enable extra extensions for monitoring for ceilometer

• Epheremal storage is centrally located on GPFS

• Security Groups are controlled by neutron

• Enabled live migration

• Create snapshots using RAW format

• Availability zones to distinguish between normal cloud nodes and large memory nodes

Neutron (Networking)• Framework for Software

Defined Network (SDN)• Responsible for managing

networks, ports, routers• Create/delete L2 networks• L3 support• Attach/Detach host to

network• Support for SW and HW

plugins– OIpenvSwitch, OpenFlow, Cisco

Nexus, Arist, NCS, Mellanox

Neutron Configuration• Enable security groups

• Use of network type to be VXLAN

• Overlapping of IPs is allowed

• Increase the dhcp agents to 2

• Enable layer 3 HA

Glance (Image Registry)• Image Registry, not Image Repository• Query for information on public and

private disk images• Register new disk images• Disk imagae can be stored in and

delivered from a variery of stores– Filesystem, Swift, Amazon S3

• Supported formats– Raw, Machine (AMI), VHD (Hyper-V), VDI

(VirtualBox), qcow2 (Qemu/KVM), VMDK (VMWare), OVF (VMWare), and others

Glance Configuration

• The images for glance are stored into GPFS filesystem or swift

• The scrub data is located locally on the controllers

Keystone (Authentication)

• Identity service provides auth credentials validation and data • Token service validates and manages tokens used to authenticate

requests after initial credential verification • Catalog service provides and endpoint registry used for endpoint

discovery • Policy service provides a rule-based authorisation engine and the

associated rule management interface • Each service configured to serve data from pluggable back- end

– Key-Value, SQL, PAM, LDAP, Templates

Keystone Configuration

• Using V2 API

• Local authentication which is stored in the database

Swift (Storage)• Object server that stores

objects • Storage, retrieval, deletion

of objects • Updates to objects • Replication • Modeled after Amazon

S3’s service

Swift Configuration• Swift is configured to store it’s data on GPFS

• The rings for swift have to be created manually

Cinder (Block Storage)• Responsible for managing lifecycle of

volumes and exposing of attachment • Enables additional attached persistent

block storage to virtual machines • Allows multiple volumes to be attached

per virtual machine • Supports following

– iscsi – RADOS block devices (e.g. Ceph)– NetApp– gpfs

• Similar to Amazon EBS Service

Cinder Configuration• The block devices within OpenStack are stored on GPFS

• Enable Copy on Write, which is a feature that is available in the GPFS driver for OpenStack

• Specify the specific gpfs storage pool that would be used for Cinder

OpenStack has drivers for GPFS for Cinder

Heat (Orchestration)• Declarative, template defined

deployment• Compatible with AWS

Cloudformation• Many Cloudformation-compatible

resources

• Templating, using HOT or CFN

• Controls complex groups of cloud resources

• Multiple use cases

Horizon (Dashboard)• Provides simple self service UI

for end-users • Basic Cloud administrator

functions • Thin wrapper over APIs, no

local state • Out-of-the-box support for all

core OpenStack projects– Nova, glance, swift, neutron

• Anyone can add a new component

• Visual and interaction paradigm are maintained

Other useful projects• OpenStack

– Ceilometer– Trove– Sahara– Ironic– Magnum

• Dependencies– Rabbitmq– Mariadb Galera Server– HA Proxy– keepalived

Why not RDO Manager/TripleO?• Ironic was still in technology preview

• RDO Manager was not available in Juno/Kilo

• RDO Manager would then conflict with xCAT and it’s DHCP configuration

• Issues with static IPs for machines until Mitaka. This would be an issue for GPFS

• Only works in specific scenarios at the time

• A provisioning system would be required to install the GPFS Servers

What is GPFS Spectrum Scale• IBM Spectrum Scale (Previously known as GPFS)• Parallel filesystem

– A single POSIX filesystem spans multiple block devices– Concurrent filesystem access from multiple clients

• Feature Rich (tiering, ILM, replication, snapshotting)• Distributed Architecture

Why we are using Spectrum Scale• Building block scalable solution• Large capacity

– Supports up to 8 Yottabyte (8422162432 PB)– Supports up to 8 Exabyte Files (8192 PB)

• Proven Technology (Spectrum Scale started in 1993)• Highly Parallel

– Scale Up– Scale Out

• Native client access over InfiniBand• IBM are supporting active development for OpenStack and

involving the community.• Storage Tiering

OpenStack links with GPFS• Spectrum Scale provides OpenStack drivers (Juno)• Spectrum Scale provides snapshotting, Copy on Write,

large concurrent file access• Has detailed documentation on Swift configuration• Road maps to provide further support (including

Manilla)

Storage Capacity and Configuration• 533TB on Swansea and Warwick• 399TB on Cardiff

• Mounted into /gpfs• Used RDMA

• Cinder driver being directly used storage pool nlsas in /gpfs/data/cinder

• Separate inode space is required for Swift with a new fileset in /gpfs/swift

• Epheremal storage for Instances also on gpfs, and this is located in /gpfs/data/nova

• Finally the glance store is located in /gpfs/data/glance

Initial SW install• RDO OpenStack Juno

• CentOS 7.0

• GPFS 4.1.0-5

Upgraded Installed SW• RDO OpenStack Kilo

• CentOS 7.1

• GPFS 4.1.1

Upgrade to Kilo• Migration of the salt configs were done internally at OCF • First run of upgrade was implemented on Warwick• A re-installation of the system on Warwick, once fine-

tuned• Same re-installation was implemented on Swansea

• Finally, migration of Cardiff from Juno to Kilo– Test bed at OCF– Challenges

• Kilo required FQDN for hosts• Updated cinder GPFS driver required manual intervention

Future Development• Collaborate work with openstack-salt team• Modularise the config• Add support for Keystone V3 API• Move keystone from eventlet to web based• Add support for OpenStack versions after kilo, such as

liberty and mitaka

• Maybe use TripleO to do the OpenStack deployment

Questions / Comments

Contact

• Email: [email protected]• IRC: arif-ali• Resources: http://www.github.com/arif-ali/openstack-

lab

• Support: [email protected]