several nines cluster control.pptx
TRANSCRIPT
Our guest speaker will be Riaan Nolan of Foodpanda/Hellofood, Rocket Internet’s global online food delivery marketplace, operating in over 40 countries.
● eCommerce infrastructure challenges - AIA Case Study (i)
● Provisioning highly available environments across multi-server and multi-AZs● Building and maintaining configuration management systems such as Puppet● Enabling self-service infrastructure services to internal dev teams● Health and performance monitoring ● Elastic scaling & Automating failure handling● Disaster recovery
http://www.severalnines.com/sites/default/files/AIA_Case_Study.pdf (i)
eCommerce infrastructure challenges- AIA Case Study
Before Cluster Control
● Split Brain o.0- One node started with an empty gcomm:// address, thank you Puppet :)
● Track back your Data.- Use anything, callcentre logs, emails sends via AWS SES to recreate your Data.
● Disasters are expensive in time & money for EVERYONE!- Everything was gone, time, orders, changes on staging, new products, QA tests, the whole kit.
● Be agile and move quickly NOW!
- Restore to a trusted backup point, isolate the faulty data, extract everything you can, product SKUs etc.
What have I learned
Workflow Before Cluster Control
After Cluster Control
eCommerce infrastructure challenges- AIA Case Study
● Look for help!● If 20% sacrifice can fix 100% of your problems, go!
- Puppet does not have to control Cluster Control 100% and why should it?
- Getting a commercial license VS. actually hiring a DBA that knows Puppet & Galera.
● Fix bugs and refine your processes.- Experience is never a bad thing, and it’s how you play the cards that you were dealt.
- Re-Create Disasters in a Sandbox
What have I learned
Workflow After Cluster Control
Provisioning highly available environments across multi-server and multi-AZs
http://aws.amazon.com/cloudformation
http://aws.amazon.com/cloudformation/aws-cloudformation-templates
https://docs.puppetlabs.com
https://forge.puppetlabs.com
HIERA &FACTER
Puppet Manifests
HardwareStack
Building and maintaining configuration management systems such as Puppet
HIERA &FACTER
Puppet Manifests
HardwareStack
https://help.github.com
https://raw.githubusercontent.com/nerdgirl/git-cheatsheet-visual/master/gitcheatsheet.png (Wallpaper)
https://docs.puppetlabs.com
https://forge.puppetlabs.com
Enabling self-service infrastructure services to internal dev teams
WTF!
Enabling self-service infrastructure services to internal dev teams - Seriously o.0
Create End-Points in your Workflow● Devs and PMs are your most demanding customers. SALE NOW ON!
- And they are also your most prized possession. Keep them happy!
● They want everything protected with little holes.- They will ask you to Htaccess protect staging, but let their requests from
the Payment Gateways through without Htaccess.
● Keeping them happy, could be as simple as an SSH tunnel >>>
- In your SSH config add something like this:
LocalForward 33306 MyRDSInstanceReplica.mydomain.com:3306
● A Vagrant or Docker Environment.- Use your existing Puppet code to spin up Docker or Vagrant instances.
● Create Environments for everyone to play in.- So what if you already have staging, create dev, dev2, qa, beta whatever they need.
● Version Control- Keep everything under Version Control, Duh!
THE END.
Health and performance monitoring
http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/WhatIsCloudWatch.html
http://www.elasticsearch.org/overview/kibana
http://newrelic.com
https://www.icinga.org
http://www.severalnines.com/clustercontrol
Elastic scaling & Automating failure handling
"MyFleetAutoScalingGroup" : { "Type" : "AWS::AutoScaling::AutoScalingGroup", "Properties" : { "AvailabilityZones": { "Fn::GetAZs": { "Ref": "AWS::Region" } }, “LaunchConfigurationName" : { "Ref" : "MyFleetLaunchConfig" }, "MinSize" : { "Ref" : "InstanceCountMin" }, "MaxSize" : { "Ref" : "InstanceCountMax" }, "DesiredCapacity" : { "Ref" : "InstanceCountDesired" }, "LoadBalancerNames" : [ { "Ref" : "ProductionPublicElasticLoadBalancer" }, { "Ref" : "StagingPublicElasticLoadBalancer"} ], }}
"MyFleetPublicElasticLoadBalancer" : { "Type" : "AWS::ElasticLoadBalancing::LoadBalancer", "Properties" : { "CrossZone" : true, }}
"MyFleetLaunchConfig" : { "Type" : "AWS::AutoScaling::LaunchConfiguration", "Properties" : { "UserData" : { "Fn::Base64" : { "Fn::Join" : ["", [ "#!/bin/bash\n", “# e.g fleetserver-1104eb3a.mydomain.com\n”, "HOSTNAME=\"fleetserver-$(curl -s http://169.254.169.254/latest/meta-data/instance-id | cut -d '-' -f2).mydomain.com\"\n", "echo \"${HOSTNAME}\" > /etc/hostname\n", "hostname \"${HOSTNAME}\"\n", "apt-get update\n", "apt-get install -y puppet knockd\n”, "knock puppet.mydomain.com 7777 3333\n", "puppet agent -tv --server puppet.mydomain.com --waitforcert 300 --configtimeout 300\n" ] ] } } }}
node /^fleetserver/ { class { '::myfleet': }}service { ‘myservice’: ensure => running,}
Disaster recovery
"MyRDSInstance" : { "Type": "AWS::RDS::DBInstance", "Properties": { "MultiAZ" : “true”, }, "DeletionPolicy": "Snapshot"}
"Conditions" : { "CreateReadReplica" : { "Fn::Equals" : [{ "Ref" : "RDSReadReplica"}, "true" ]}}
"MyRDSInstanceReplica": { "Type": "AWS::RDS::DBInstance", "Condition" : "CreateReadReplica", "Properties": { "SourceDBInstanceIdentifier": { "Ref": "MyRDSInstance" }, }}
Re-cap: verb: rēˈkap/ - state again as a summary; recapitulate. "a way of recapping the story"
● Disasters are expensive in time & money for EVERYONE!- Everything was gone, time, orders, changes on staging, new products, QA tests, the whole kit.
● Be agile and move quickly NOW!
- Restore to a trusted backup point, isolate the faulty data, extract everything you can, product SKUs etc. Never import bad data!
● If 20% sacrifice can fix 100% of your problems, go! Go Now!- Puppet does not have to control Cluster Control 100% and why should it?
- Getting a commercial license VS. actually hiring a DBA that knows Puppet & Galera
● Log everything. Seriously EVERYTHING!● Devs and PMs are your most demanding customers. SALE NOW ON!
- And they are also your most prized possession. Keep them happy!
● Use Puppet and, if you can, Cloudformation, everything must be in code! Living the dream .. <3
- If your infrastructure, applications and configurations is in code, you are only 1 commit away from a fix.
● Backup all the things One must backup xXx
- Backup, BACKUP, BAAACCCKKKUUUPPP!!!
THANK YOU!
Who is using Cluster Control?
THANK YOU