puppet camp boston 2014: orchestrating infrastructure change using puppet rake, mcollective, lm and...
Post on 14-May-2015
698 Views
Preview:
DESCRIPTION
TRANSCRIPT
Application Deployment Orchestrationwith Puppet and JenkinsAnton Gurov, Chaminda Delpagodage
August 20, 2014
22
About Us
Chaminda DelpagodagePaydiant Technical Operations TeamRelease Engineering, Systems Administration, Automationlinkedin.com/in/chamindad
Anton GurovPaydiant Technical Operations TeamInfrastructure, Systems Administration, Securitylinkedin.com/in/antongurov
33
Cloud-based mobile wallet solution
Open ecosystem for mobile payments, offers and loyalty
Completely white-label
“Bank grade” platform of shared services
↘ SaaS
↘ Secure SDKs for iPhone and Android
Top tier investors and well capitalized
44
Paydiant Puppet Use
Puppet Enterprise (PE) users since day one
100% PE coverage of Paydiant platform
↘ PE handles everything after instance bootstrap
Multiple environments actively managed by PE
↘ 4 Puppet Masters in multiple datacenters and security zones
↘ 8 Environments
Licensed node count doubling every year
Estimated by
Year-End
0
100
200
300
400
500
600
700
800
900
Hosts
Nodes under
management
2011 2012 2013 2014 EST
55
Paydiant Puppet Use
‘11-12 – Bi-annual production platform releases
↘ Waterfall – major platform change
↘ Big outage – 1-2 days on the weekend
‘13-14 – Transition to daily/weekly non-production and monthly production releases
↘ Agile – smaller platform changes
↘ Zero-downtime deployment
↘ 100% Production release success rate since inception
Heavy usage of Puppet Dashboard, Puppet APIs and Jenkins
66
Puppet Dashboard as data repository
Why Dashboard?
↘ Visual, flexible, powerful (if used right)
↘ Allows for business data edits by teams unfamiliar with Puppet
↘ Hiera not available at the time
Decided early on to keep Puppet code and data separate
Came up with our own Dashboard pattern – “Classes, Parameters and Supergroups”
Puppet Module
Code
PuppetDashboard
Business
Data
Puppet Module
Parameters
77
Puppet Dashboard as data repositoryClasses, Parameters and Supergroups pattern overview
class_C
supergroup_type_A
class_Bclass_A
parameters_X parameters_Y parameters_Z…
…
node 1 node 2 node X…node 4node 3
Groups
Nodes
88
Puppet Dashboard as data repositoryClasses, Parameters and Supergroups pattern overview
class_C
supergroup_type_B
class_Bclass_A
parameters_X parameters_Y parameters_Z…
…
node 1 node 2 node X…node 4node 3
Groups
Nodes
99
class_B
def: default params
incl:
Puppet Dashboard as data repositoryClass building block
class B
class A class B
class_A
def: default params
incl: class A
class_C
def: default params
incl: class C
class C
…
Classes
Groups
Group name prefixed with class_Contains Puppet class and some default variables/parameters for the class
1010
Puppet Dashboard as data repositoryClass building block - example
1111
Puppet Dashboard as data repositoryParameters building block
Group name prefixed with parameters_
Only contains data and data overrides
Arbitrary hierarchy levels
Allows for inheritance and reuse
parameters_X_1
incl:
def: params overrides
def: additional params
parameters_X
def: default params
parameters_X
supergroup_A supergroup_B
parameters_X_2
incl:
def: params overrides
def: additional params
parameters_X
supergroup_C
1212
Puppet Dashboard as data repositoryParameters building block – inheritance example
1313
Puppet Dashboard as data repositorySupergroup building block == server “role”
Group name prefixed with supergroup_
Contains all the “ingredients” for the node to configure and define itself
Node can belong to only one supergroup (many-to-one)
supergroup_type_A
incl:
def: params overrides (if any)
def: additional params (if any)
class_B
class_A parameters_X
parameters_Z
node 1 node 2
Groups
Nodes
class_B
class_A
parameters_X
parameters_Z
1414
Puppet Dashboard as data repositorySupergroup building block - example
2-3 pages condensed
1515
Classes, Parameters and Supergroups pattern Pros
All parameters and classes are visible on the Supergroup page
↘ See missing parameters (if inherited “SET ME!” from parent for example)
↘ See parameter clashes (Dashboard will warn if parameter is defined in 2 places)
↘ See exactly where parameter is defined
Allows teams unfamiliar with Puppet to make changes via Dashboard
Arbitrary data hierarchy/inheritance
Data reuse
1616
Classes, Parameters and Supergroups pattern Cons
Version control is difficult
↘ Have to resolve to group cloning/export/import (custom RAKE copy/clone command from Puppet support)
↘ Puppet roadmap to fix this
Dashboard UI could use some help
↘ Too much data on the screen sometimes
↘ Lack of sorting/grouping
Can’t store complex multi-line variables like text blobs
Zero-Downtime Deployment architecture …
v.1
FrontendLoad Balancer
FE-B FE-Av.1
FE-B FE-Bv.1
BackendLoad Balancer
FE-B BE-Av.1
FE-B BE-Bv.1
parameters_deployment-staging-FE-BankApaydiant_deployment_bank=STAGING-FRONTEND-Apaydiant_app_operation_mode=LIVEpaydiant_app_version=1
v.1
High-level platformrepresentation
parameters_deployment-staging-BE-BankApaydiant_deployment_bank=STAGING-BACKEND-Apaydiant_app_operation_mode=LIVEpaydiant_app_version=1
parameters_deployment-staging-FE-BankBpaydiant_deployment_bank=STAGING-FRONTEND-Bpaydiant_app_operation_mode=LIVEpaydiant_app_version=1
parameters_deployment-staging-BE-BankBpaydiant_deployment_bank=STAGING-BACKEND-Bpaydiant_app_operation_mode=LIVEpaydiant_app_version=1
FrontendLoad Balancer
FE-B FE-Av.1
FE-B FE-Bv.1
BackendLoad Balancer
FE-B BE-Av.1
FE-B BE-Bv.1
Disable B(FE+BE)
v.1v.1
parameters_deployment-staging-FE-BankBpaydiant_deployment_bank=STAGING-FRONTEND-Bpaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=1
parameters_deployment-staging-BE-BankBpaydiant_deployment_bank=STAGING-BACKEND-Bpaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=1
v.2a
FrontendLoad Balancer
FE-B FE-Av.1
FE-B FE-Bv.1
BackendLoad Balancer
FE-B BE-Av.1
FE-B BE-Bv.1
Run first phase of database changes(i.e. adds new stuff & migrate data)
v.2aDB changes Phase 1
FrontendLoad Balancer
FE-B FE-Av.1
FE-B FE-Bv.2
BackendLoad Balancer
FE-B BE-Av.1
FE-B BE-Bv.2
Upgrade B (FE+BE)
v.2av.2a
parameters_deployment-staging-FE-BankBpaydiant_deployment_bank=STAGING-FRONTEND-Bpaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=2
parameters_deployment-staging-BE-BankBpaydiant_deployment_bank=STAGING-BACKEND-Bpaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=2
FrontendLoad Balancer
FE-B FE-Av.1
FE-B FE-Bv.2
BackendLoad Balancer
FE-B BE-Av.1
FE-B BE-Bv.2
Re-enable B (FE+BE)
v.2av.2a
parameters_deployment-staging-FE-BankBpaydiant_deployment_bank=STAGING-FRONTEND-Bpaydiant_app_operation_mode=LIVEpaydiant_app_version=2
parameters_deployment-staging-BE-BankBpaydiant_deployment_bank=STAGING-BACKEND-Bpaydiant_app_operation_mode=LIVEpaydiant_app_version=2
FrontendLoad Balancer
FE-B FE-Av.1
FE-B FE-Bv.2
BackendLoad Balancer
FE-B BE-Av.1
FE-B BE-Bv.2
Disable A(FE+BE)
v.2av.2a
parameters_deployment-staging-FE-BankApaydiant_deployment_bank=STAGING-FRONTEND-Apaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=1
parameters_deployment-staging-BE-BankApaydiant_deployment_bank=STAGING-BACKEND-Apaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=1
FrontendLoad Balancer
FE-B FE-Av.2
FE-B FE-Bv.2
BackendLoad Balancer
FE-B BE-Av.2
FE-B BE-Bv.2
Upgrade A (FE+BE)
v.2av.2a
parameters_deployment-staging-FE-BankApaydiant_deployment_bank=STAGING-FRONTEND-Apaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=2
parameters_deployment-staging-BE-BankApaydiant_deployment_bank=STAGING-BACKEND-Apaydiant_app_operation_mode=MAINTENANCEpaydiant_app_version=2
FrontendLoad Balancer
FE-B FE-Av.2
FE-B FE-Bv.2
BackendLoad Balancer
FE-B BE-Av.2
FE-B BE-Bv.2
Re-enable A (FE+BE)
v.2av.2a
parameters_deployment-staging-FE-BankApaydiant_deployment_bank=STAGING-FRONTEND-Apaydiant_app_operation_mode=LIVEpaydiant_app_version=2
parameters_deployment-staging-BE-BankApaydiant_deployment_bank=STAGING-BACKEND-Apaydiant_app_operation_mode=LIVEpaydiant_app_version=2
v.2
FrontendLoad Balancer
FE-B FE-Av.2
FE-B FE-Bv.2
BackendLoad Balancer
FE-B BE-Av.2
FE-B BE-Bv.2
Run second phase of database changes(Cleanup old v.1 data)
v.2DB changes Phase 2
Details of the upgrade sequence …
v.1
FrontendLoad Balancer
FE-B FE-Av.1
FE-B FE-Bv.1
BackendLoad Balancer
FE-B BE-Av.1
FE-B BE-Bv.1
Putting a set of nodes into maintenance mode
2929
Putting nodes into maintenance modeUsing LB node health check – http://nodeX:8080/healthcheck.jsp
Puppet ERB template for healthcheck.jsp content
………
Pseudo code:Check if “maintenance mode” throw exception elseIf “module A” present
Check if module A is upIf “module B” present
Check if module B is up…Throw 503 if any exception caught
3030
Putting nodes into maintenance mode cont.
A parameter group controls the maintenance mode
E.g. Parameter group “parameters_deployment-staging-BankB” controls “paydiant_app_operation_mode” for the nodes in set FE-B of the Staging environment
3131
Putting nodes into maintenance mode cont.
Update group parameter using Rake API (as ‘puppet-dashboard’ user)
RACK_ENV=production /opt/puppet/bin/rake -s -X -f /opt/puppet/share/puppet-dashboard/Rakefilenodegroup:variables [parameters_deployment-stagin-BankB, 'paydiant_app_operation_mode=MAINTENANCE’]
Puppet run-once using MCO (as ‘peadmin’ user)
mco puppet runonce --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B
While loop… check the health check page till all nodes return 503 (i.e. in maintenance) status
mco shellcmd --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B --cmd=\''curl --silent http://localhost:8080/healthcheck/healthcheck.jsp
FrontendLoad Balancer
FE-B FE-Av.1
FE-B FE-Bv.2
BackendLoad Balancer
FE-B BE-Av.1
FE-B BE-Bv.2
Upgrading applicationson a set of nodes
v.2a
3333
Upgrading Application Version
Disable Puppet agent
mco puppet disable --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B
Stop Tomcat service
mco service tomcat stop --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B
Cleanup exploded Tomcat webapps directory (for sanity)
mco shellcmd --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B --cmd='find $TOMCAT_HOME/webapps/ -maxdepth 1 -mindepth 1 -type d -exec rm -rf {} \;’
3434
Upgrading Application Version Cont.
Upgrade the application version
RACK_ENV=production /opt/puppet/bin/rake -s -X -f /opt/puppet/share/puppet-dashboard/Rakefilenodegroup:variables [parameters_deployment-stagin-BankB, ’paydiant_app_version=2’]
Re-enable Puppet
mco puppet enable --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B
Puppet run-once
mco puppet runonce --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B
FrontendLoad Balancer
FE-B FE-Av.1
FE-B FE-Bv.2
BackendLoad Balancer
FE-B BE-Av.1
FE-B BE-Bv.2
Taking a set of nodes out ofmaintenance mode
v.2a
3636
Taking nodes out of maintenance mode
Update parameter using Rake API (as ‘puppet-dashboard’ user)
RACK_ENV=production /opt/puppet/bin/rake -s -X -f /opt/puppet/share/puppet-dashboard/Rakefilenodegroup:variables [parameters_deployment-staging-BankB, 'paydiant_app_operation_mode=LIVE’]
Puppet run-once using MCO (as ‘peadmin’ user)
mco puppet runonce --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B
While loop… check the health check page till all nodes return 200 (i.e. live) status
mco shellcmd --with-fact fact_paydiant_deployment_bank=STAGING-FRONTEND-B --cmd=\''curl --silent http://localhost:8080/healthcheck/healthcheck.jsp
FrontendLoad Balancer
FE-B FE-Av.1
FE-B FE-Bv.2
BackendLoad Balancer
FE-B BE-Av.1
FE-B BE-Bv.2
Switching traffic toupgraded stack
v.2a
Viewing transition in Splunk across multiple datacenters
Jenkins …
4040
What is Jenkins
Tool to schedule and monitor the execution of repeated jobs
4141
Why Jenkins ?
Configurability
↘ Different types of input parameters
↘ Invoke shell scripts
↘ Post-build actions (automatic/manual)
4242
Why Jenkins ? cont.
Plugin support
↘ More than 600 plugins (https://wiki.jenkins-ci.org/display/JENKINS/Plugins)
↘ Eg. vSphere plugin (stop/start, snapshots, rollbacks…)
↘ Build pipeline plugin
↘ Parameterized remote trigger plugin
4343
Why Jenkins ? cont.
Keeps all your console logs at a single place
↘ No need to hunt for 10 log files on 5 different machines
↘ Visual representation of passed/failed/in-progress status, based on downstream shell scripts or other jobs
4444
Why Jenkins ? cont.
And it’s…
MCO
Rake API
DB FE-B FE-* FE-B BE-*
Source code, liquibase
change sets
4646
Jenkins – Puppet Integration
4747
Jenkins – Puppet Integration cont.
4848
Jenkins – Puppet Integration cont.
4949
Jenkins – Puppet Integration cont.
5050
Jenkins – Puppet Integration cont.
Jenkins invoke local bash scripts, which in turn use SSH to call;
↘ MCO (as ‘peadmin’ user on Puppet Master)
↘ Rake API (as ‘puppet-dashboard’ user on Puppet Master)
SSH login as ‘peadmin’ and ‘puppet-dashboard’ is password-less, using PKI
↘ Generate RSA keypair for the local Jenkins user, using ssh-keygen command
↘ Append the public key to ~/.ssh/authorized_keys file of ‘peadmin’ and ‘puppet-dashboard’ users, on Puppet Master
MCO special purpose sub commands we use;
↘ puppet
↘ service
↘ shellcmd* (ask your Puppet Enterprise Support for this custom MCO plugin)
5151
Links
Rake API: https://docs.puppetlabs.com/pe/latest/console_rake_api.html
MCO: https://docs.puppetlabs.com/mcollective/reference/basic/basic_cli_usage.html
Jenkins: http://jenkins-ci.org/
Liquibase: http://www.liquibase.org/documentation/index.html
5252
Recap/Takeaways… Use Puppet Enterprise
↘ Support is awesome (Celia Cottle, Jay Wallace, Ken Johnson, Zachary Stern – you guys rock!)
↘ Gotten help and features from James Turnbull and Nigel Kersten with some early versions of PE
↘ Live management and Mcollective are essential for any self-respecting enterprise
Zero-downtime upgrades
↘ To Dashboard or not to Dashboard?
↘ Database update phases
↘ Managing LB health check monitors dynamically using Puppet
Automation baby steps – don’t boil the ocean
↘ Understand what you are doing before automating it - develop runbooks
↘ Identify manual steps and script some of them
↘ Add scripts to orchestration tool (Jenkins, ServiceNow, whatever else you use in-house)
Thank you.
top related