climb technical overview
TRANSCRIPT
Hardware Overview• IBM/Lenovo x3550 M4
– 3 x Controller Nodes (Cardiff Only)– 1 x Server Provisioning Node
• IBM/Lenovo x3650 M4– 3 x Controller Nodes (Warwick and Swansea)– 4 x GPFS Servers
• IBM/Lenovo x3750 M4– 21 x Cloud Compute nodes
• IBM/Lenovo x3950 X6– 3 x Large Memory Nodes
• IBM Storwize V3700 Storage– 4 x Dual Controllers– 16 x Expansion Shelves
Key Architecture Differences• Cardiff University has x3550 M4 instead of x3650 M4 for
controller nodes
• Cardiff University use Cat6 for 10G instead of 10G DACs
Key Architecture Challenges• Cable Lengths
• Cable Types
• Rack layouts
• Differences in Hardware and design
Software Overview• xCAT• IBM Spectrum Scale (originally GPFS)• CentOS 7• SaltStack• RDO OpenStack Juno/Kilo• Icinga
xCAT• eXtreme Cluster/Cloud Administration Toolkit• Management of clusters (Clouds, HPC, Grids)• Baremetal Provisioning• Scriptable• Large scale management (Lightsout, remote console,
distributed shell)• Configures key services based on tables
Why we are using xCAT?• Provides tight integration with IBM/Lenovo Systems
– Automagic Discovery– IPMI integeration
• Can manage the Mellanox switches from CLI
• OCF is very experienced, and has development experience
xCAT Configuration• Base images for each machine
– Highmem/compute– Controller– Storage
• Network configuration is defined within xCAT• Only salt-minion is configured through xCAT• All software and configs are done via SaltStack
What is SaltStack“Software to automate the management and configuration
of any infrastructure or application at scale”
• Uses YAML• Security, controlled by server/client public/private keys• Daemons are on master and client (minions)
Why SaltStack• Previously used by UoB, where some of the OpenStack
configuration had already been started
• Automate the configuration
• Consistency across installations
• Re-usable for future installs
• Repeatable
What is OpenStack?
“To produce the ubiquitous Open Source cloud computing platform that will meet the needs of public and private cloud providers regardless of size, by being simple to implement and massively scalable”
Nova (Compute)• Manage virtualised server
resources• Live guest migration• Live VM management• Security groups• VNC proxy• Support for various
hypervisors– KVM, LXC, VMWare, Xen,
Hyper V, ESX
APIs Supported:• OpenStack Compute API• EC2 API• Admin API
Nova Configuration• EC2 API has been enabled
• Enable extra extensions for monitoring for ceilometer
• Epheremal storage is centrally located on GPFS
• Security Groups are controlled by neutron
• Enabled live migration
• Create snapshots using RAW format
• Availability zones to distinguish between normal cloud nodes and large memory nodes
Neutron (Networking)• Framework for Software
Defined Network (SDN)• Responsible for managing
networks, ports, routers• Create/delete L2 networks• L3 support• Attach/Detach host to
network• Support for SW and HW
plugins– OIpenvSwitch, OpenFlow, Cisco
Nexus, Arist, NCS, Mellanox
Neutron Configuration• Enable security groups
• Use of network type to be VXLAN
• Overlapping of IPs is allowed
• Increase the dhcp agents to 2
• Enable layer 3 HA
Glance (Image Registry)• Image Registry, not Image Repository• Query for information on public and
private disk images• Register new disk images• Disk imagae can be stored in and
delivered from a variery of stores– Filesystem, Swift, Amazon S3
• Supported formats– Raw, Machine (AMI), VHD (Hyper-V), VDI
(VirtualBox), qcow2 (Qemu/KVM), VMDK (VMWare), OVF (VMWare), and others
Glance Configuration
• The images for glance are stored into GPFS filesystem or swift
• The scrub data is located locally on the controllers
Keystone (Authentication)
• Identity service provides auth credentials validation and data • Token service validates and manages tokens used to authenticate
requests after initial credential verification • Catalog service provides and endpoint registry used for endpoint
discovery • Policy service provides a rule-based authorisation engine and the
associated rule management interface • Each service configured to serve data from pluggable back- end
– Key-Value, SQL, PAM, LDAP, Templates
Swift (Storage)• Object server that stores
objects • Storage, retrieval, deletion
of objects • Updates to objects • Replication • Modeled after Amazon
S3’s service
Swift Configuration• Swift is configured to store it’s data on GPFS
• The rings for swift have to be created manually
Cinder (Block Storage)• Responsible for managing lifecycle of
volumes and exposing of attachment • Enables additional attached persistent
block storage to virtual machines • Allows multiple volumes to be attached
per virtual machine • Supports following
– iscsi – RADOS block devices (e.g. Ceph)– NetApp– gpfs
• Similar to Amazon EBS Service
Cinder Configuration• The block devices within OpenStack are stored on GPFS
• Enable Copy on Write, which is a feature that is available in the GPFS driver for OpenStack
• Specify the specific gpfs storage pool that would be used for Cinder
OpenStack has drivers for GPFS for Cinder
Heat (Orchestration)• Declarative, template defined
deployment• Compatible with AWS
Cloudformation• Many Cloudformation-compatible
resources
• Templating, using HOT or CFN
• Controls complex groups of cloud resources
• Multiple use cases
Horizon (Dashboard)• Provides simple self service UI
for end-users • Basic Cloud administrator
functions • Thin wrapper over APIs, no
local state • Out-of-the-box support for all
core OpenStack projects– Nova, glance, swift, neutron
• Anyone can add a new component
• Visual and interaction paradigm are maintained
Other useful projects• OpenStack
– Ceilometer– Trove– Sahara– Ironic– Magnum
• Dependencies– Rabbitmq– Mariadb Galera Server– HA Proxy– keepalived
Why not RDO Manager/TripleO?• Ironic was still in technology preview
• RDO Manager was not available in Juno/Kilo
• RDO Manager would then conflict with xCAT and it’s DHCP configuration
• Issues with static IPs for machines until Mitaka. This would be an issue for GPFS
• Only works in specific scenarios at the time
• A provisioning system would be required to install the GPFS Servers
What is GPFS Spectrum Scale• IBM Spectrum Scale (Previously known as GPFS)• Parallel filesystem
– A single POSIX filesystem spans multiple block devices– Concurrent filesystem access from multiple clients
• Feature Rich (tiering, ILM, replication, snapshotting)• Distributed Architecture
Why we are using Spectrum Scale• Building block scalable solution• Large capacity
– Supports up to 8 Yottabyte (8422162432 PB)– Supports up to 8 Exabyte Files (8192 PB)
• Proven Technology (Spectrum Scale started in 1993)• Highly Parallel
– Scale Up– Scale Out
• Native client access over InfiniBand• IBM are supporting active development for OpenStack and
involving the community.• Storage Tiering
OpenStack links with GPFS• Spectrum Scale provides OpenStack drivers (Juno)• Spectrum Scale provides snapshotting, Copy on Write,
large concurrent file access• Has detailed documentation on Swift configuration• Road maps to provide further support (including
Manilla)
Storage Capacity and Configuration• 533TB on Swansea and Warwick• 399TB on Cardiff
• Mounted into /gpfs• Used RDMA
• Cinder driver being directly used storage pool nlsas in /gpfs/data/cinder
• Separate inode space is required for Swift with a new fileset in /gpfs/swift
• Epheremal storage for Instances also on gpfs, and this is located in /gpfs/data/nova
• Finally the glance store is located in /gpfs/data/glance
Upgrade to Kilo• Migration of the salt configs were done internally at OCF • First run of upgrade was implemented on Warwick• A re-installation of the system on Warwick, once fine-
tuned• Same re-installation was implemented on Swansea
• Finally, migration of Cardiff from Juno to Kilo– Test bed at OCF– Challenges
• Kilo required FQDN for hosts• Updated cinder GPFS driver required manual intervention
Future Development• Collaborate work with openstack-salt team• Modularise the config• Add support for Keystone V3 API• Move keystone from eventlet to web based• Add support for OpenStack versions after kilo, such as
liberty and mitaka
• Maybe use TripleO to do the OpenStack deployment
Contact
• Email: [email protected]• IRC: arif-ali• Resources: http://www.github.com/arif-ali/openstack-
lab
• Support: [email protected]