opennebula conf 2014 | one bit to rule them all - stefan kooman

36
to rule them all Stefan Kooman ([email protected], @basseroet)

Upload: netways

Post on 06-Jul-2015

1.070 views

Category:

Software


0 download

DESCRIPTION

Why does OpenNebula fit the needs of an ISP? What customizations have we made to fit our needs? What challenges have yet to be overcome? During the deployment of our clouds we have gained quite a bit of experience with OpenNebula. We will share practical tips on deployment (HOOKS FTW), and explore the f(e)uture(s) of OpenNebula.

TRANSCRIPT

Page 1: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

to rule them all

Stefan Kooman([email protected], @basseroet)

Page 2: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

● BIT is a business to business internet service provider specialized in colocation and managed hosting

● BIT delivers a high quality IT and internet infrastructure for demanding customers

● Reliability is the focus of BIT’s services (redundancy is keyword)

● Operates its own datacenters (Ede, NL) and network (NL, DE, EN)

● IPv6 on all services since 2004!

● ISO 27001 certification (All Services)

BIT

Page 3: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

AS12859

~900 peers

3 Transits

BIT Network

Page 4: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

● Customers want to know where their data gets stored (compliancy / Privacy Concerns)

● Alternative for shared hosting (customers with special requirements)

● Hybrid Solutions: bare metal servers & BIT VMs possible

● Availability (redundancy)

● ISO 27001

Why choose BIT?(instead of $(PUBLIC_)CLOUD)

Page 5: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

● Webshops

● Mission Critical Servers

● SMB Infrastructure servers

● Monitoring servers

● MongoDB (because no presentation is complete without mentioning it, Carlo Daffara, OpenNebulaConf, Berlin 2014)

What runs in our clouds?

Page 6: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

● Simple but powerful / flexible (KISS)

● Works out of the box

● Reliable

● (API) Interface(s)

● OSS

● Great community / development organization (OpenNebula Systems)

Why we choose ONE

Page 7: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

Cloud setups

Page 8: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

Cloudy?

Page 9: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

Cloud security

Security (confine risks) Separate VLANs, Storage, Servers, etc.→

Protect against “virtual machine escape” attack (worst case scenario)

Don't break production while exploring new features / setups (test-cloud)

Page 10: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

BIT-Cloud(Migrating ONE)→

Before

● Ad-hoc management of KVM hypervisors (virt-manager)

● Mixing of BIT / Customer VMs

● No easy overview of resources (capacity / business continuity planning)

● No integration with BACE (BIT Administration & Configuration Engine)

Page 11: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

BIT-Cloud(eat your own dog food)

After

● All VMs centrally managed

● Integration With BACE (hooks, XML-RPC)

● Webinterface (Sunstone) available for (remote) management (GUI tasks) / low level VM troubleshooting (GRML FTW!)

Page 12: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

BIT-Cloud(How we migrated it 1/2)

Migration Process (non pxe-based VMs)

● Create VM Templates based on old libvirt xml (virsh dumpxml domain)

● Create Images

● dd if=/old/vm/disk.img of=/var/lib/one/datastores/id/hash bs=1M

● Destroy _and_ Undefine old VM

● Instantiate VM Template

● Profit!

Page 13: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

BIT-Cloud(How we migrated it 2/2)

Migration Process (pxe-based VMs) Cloud style!

● Create VM Templates based on old libvirt xml (virsh dumpxml domain)

● Destroy _and_ Undefine old VM

● Instantiate VM Template

● Profit!

Page 14: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

Over and out(get rid of your junk)

Good bye junk, Good bye SUN, welcome to a bright cloudy day … ehh whut?!

Page 15: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

A bright cloudy day(it actually does exist)

Page 16: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

Customer Portal Interface to securely manage all services (DNS, MAIL, VMs, MONITORING, BILLING, etc.)

● For now only

→ stop, start, reboot

→ Out of band management: Console access (KVM)

● Future

→ full fledged provisioning / monitoring / metrics

create, destroy, clone, resize capacity, etc.

ONE & BITIntegration

Page 17: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

Multiple Datastores

● NetApp Qtree (NFS) → Provide separation between customers (“partitioning”)

→ Billing (IOPS / Disk Space)

→ Tiering possible (SAS, SSD/SATA)

→ Dedup: 41% savings on customer images, 61% on BIT images

● Future

→ Distributed Object Storage (CEPH, Gluster)

ONE – Storage

Page 18: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

Active checks on Front-end / Hosts (bit-monitoring daemon livestatus ↔API)

● Vital ONE functions (oned, sched, VM status, Datastores Capacitity, etc.)

● Hypervisor hardware, Network Bonds

Passive checks

● webservices (VIP's)

Icinga detects NTP out of sync issues within 1 minute after VM live-migration!

ONE – MonitoringIcinga FTW!

Page 19: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

● Graph as many metrics as possible (because we can)

→ OS, Apps, Storage, Network

● Trend analysis

● Finding performance issues

ONE – GraphingMunin FTW!

Page 20: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

ONE – GraphingMunin FTW!

Page 21: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

● Billing network traffic (volume / bandwith)

● Billing model of pay per use instead of 3 monthly contract (possible in future)

ONE – Accountingone.vm.monitoring

Page 22: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

● Provisioning Requirements (VM Templates) → Datastores

→ Clusters

→ Hosts

● Custom Attributes (awesome \o/)

→ SCHED_REQUIREMENTS="WINDOWSLICENSED=\"TRUE\""

→ SCHED_REQUIREMENTS="DATACENTER=\”BIT- 1\”"

→ SCHED_DS_REQUIREMENTS="NAME=system_ds_1_kvm_cluster"

ONE – FeaturesFiltering with scheduler

Page 23: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

VM_HOOK = [     name      = "notify_running",

   on        = "RUNNING",

   state     = "ACTIVE",

   lcm_state = "RUNNING",

   command   = "notify_running.php",

   arguments = "$ID $TEMPLATE $PREV_STATE $PREV_LCM_STATE" ]

● Executes script “notify_running.php” to register VM in BACE and/or Update Host / Datacenter Location as soon as VMs gets into running state

ONE – FeaturesHooks

Page 24: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

VM_HOOK = [     name      = "send_gratuitous_arp",

   on        = "RUNNING",

   state     = "ACTIVE",

   lcm_state = "RUNNING",

   command   = "segrarp.sh",

   arguments = "$ID $TEMPLATE $PREV_STATE $PREV_LCM_STATE",

   remote    = "yes" ]

Send out “gratuitous arp” on hypervisor as soon VM gets into running state

→ Update upstream switches Forwarding Table

→ Update arp cache Routers (only needed if MAC-address changed)

ONE – FeaturesHooks

Page 25: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

VM_HOOK = [   name      = "vhid_flow_fix",

   on        = "RUNNING",

   state     = "ACTIVE",

   lcm_state = "RUNNING",

   command   = "vhidflowfix.sh",

   arguments = "$ID $TEMPLATE $PREV_STATE $PREV_LCM_STATE",

   remote    = "yes" ]

ONE / OpenvSwitch Arp Cache Poisoning OpenFlow prevention rules also prevents HA-Setups from working correctly (VRRP / CARP)

→ Add OpenFlow rule for the VRID / VHID MAC-address (00-00-5E-00-01-XX)

ONE – FeaturesHooks

Page 26: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

● Live migration over dedicated Network Interfacesdeploy_id=$1

dest_host=$2

HOSTNAME=$(cut ­f1 ­d. <<< $2)

DOMAIN=$(cut ­f2­ ­d. <<< $2)

MIGSUF="migration"

DEST_MIGR_HOST=$HOSTNAME­$MIGSUF.$DOMAIN

exec_and_log "virsh ­­connect $LIBVIRT_URI migrate ­­live $deploy_id $QEMU_PROTOCOL://$DEST_MIGR_HOST/system ­­migrateuri tcp://$DEST_MIGR_HOST" "Could not migrate $deploy_id to $dest_host"

● Examplehost1             IN    A    172.17.17.1

host1­migration   IN    A    10.10.10.1 

ONE – Easily hackable (../remotes/vmm/kvm/kvmrc)

Page 27: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

ONE – OneGate

● European Kerio Cloud VMs run on ONE clouds @BIT

We are working on automating / speeding up provisioning process

● Provisioning through:

ONE API (XML-RPC)

OS (ONE contextualized Golden Image, and OneGate for asynchronous communciation channel)

● Configuration management:

Kerio API

Page 28: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

ONE – OneFlow(yet to be implemented)

● Autoscaling of Vms based on elasticity rules

→ CPU as inidicator alone is probably not enough (cpufreq scaling hypervisor)

● Should work well in a load-balanced environment

→ Need to integrate with F5 somehow

Page 29: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

● Set fixed machine type (QEMU) for MS Windows(TM) VMs to avoid breaking during Virtual Hardware upgrades

● Use Virtio (DISK / NIC) whenever you can

→ PfSense 2.1.5: virtio (1 Gbps) vs intel e1000 (300 Mbps)

● Group VMs together that interconnect a lot: webserver(s) / database(s) (saves inter-host bandwith, increases performance, lowers latency)

● Expose CPU flags when you need them (i.e. HPC / Rendering)

● Disable KSM (Kernel SamePage Merging) if you need all CPU cycles

Do's and Dont's(lessons learned so far)

Page 30: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

Group VMs on Host

~ 2 GB/s or ~ 17 Gb/s

Page 31: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

ONE - Test-cloud

Page 32: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

● 20 Gbps ethernet Migration links

→ Yes, live-migration goes much faster :-) (~600 MB/s)

● VLAN management on Brocade switch done by ONE (NETCONF / RFC 6241)

ONE - Test-cloud

Page 33: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

● Administer ALL IP's in ONE

→ Enable IP Aliasses ((web)servers in need of extra IP's)

→ Possibility to have OpenFlow “arp cache poisoning” and “IP Hijack prevention” rules enabled

→ Contextualization adjustments needed to handle extra IP's

● Network Integration (SDN)

ONE – ChallengesYet to overcome

Page 34: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

● Inter OpenNebula Clouds: Ultimate Hybrid

→ Complementary to Federation: separate administrative boundaries

● ONE CX (Cloud Exchange)

ONE – Future(let's connect them all)

Page 35: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

#1727: Resize disk images#2347: [anti-]affinity functionality for VMs to be placed in the same physical host #2650: Re-read oned.conf on reload#3181: IPv6 hijacking prevention#2921: (Per VM) DISKIO IO information in Sunstone#2648: ACL edit/view wizard#3015: Multi (domain) LDAP authentication support#2925: Ability to filter in sunstone on resource usage (CPU, RAM, NETWORK, DISK)

(and thanks for implementing “Multiple Datastores”, Federation, Clone between datastores, VLAN trunks, Address Range (AR), etc. countless others)

ONE – I want(feature requests

aka Sinterklaasverlanglijstje)

Page 36: OpenNebula Conf 2014 | ONE BIT to rule them all - Stefan Kooman

Thanks for your attention!

And thank you very much OpenNebula Systems /

Netways!