large-scale infrastructure automation at verizon
TRANSCRIPT
“ Infrastructure engineering is 60% social, and only 40% technical. Changing people is far more important than changing technology.
4 YEARS AGO.� Pretty typical configuration management.
� Centralized Chef servers. � Lots of unmaintainable Ruby. � Ruby that generates Ruby which is
evaluated at runtime (yikes!). � Developer contract is non-existent.
Operations need to understand every application in detail.
� Code complete to finally deployed took around two weeks.
3 YEARS AGO.� Implemented immutable machine images
with Hashicorp Packer.
� Developer / Ops contract becomes an RPM/DEB file along with two YAML manifests.
� One manifest for provisioning.
� Another for runtime deployment setup.
� Drive the entire release workflow from source repositories.
� Orchestrated with many linked Jenkins jobs and schedules.
� Code complete to finally deployed took around 40 minutes.
TODAY.� Developer / operations contract is just a
linux container. � Repository contains a YAML manifest.
� Realization that placement and orchestration are entirely separate.
� Intelligent and fully automated cleanup. � Application dependency management. � Automated traffic bleeding. � Integrated alerting with prometheus,
general notifications with slack or email. � Code complete to deployed takes
around 5 minutes.
GOALS.� System elements should be awesome at
just one thing. � Reduce system complexity by increasing
responsibility of engineering teams. � Break it, you bought it. � All application specifications are
checked into source control. � Focus on orchestration, not placement. � Force automation in every aspect of work
� Manual access to systems are a crutch that enables automation avoidance.
- name: hello world type: job description: > mindlessly prints hello world to the console for five minutes schedule: hourly retries: 2 expiration_policy: > retain-latest-two-major dependencies: - ref: [email protected]
unit type
job stuff
- name: howdy type: service description: > always responds with hello world ports: - default->8080/http expiration_policy: > retain-latest-two-major dependencies: - ref: [email protected]
unit type
service stuff
- name: foobar-proxy type: proxy description: > proxy inbound from outside routes: - name: expose the ssl port expose: inbound->443/https destination: [email protected]>default expiration_policy: > retain-until-deprecated
routes
LIFECYCLE.� Various cleanup strategies
� Graph pruning � Explicit deprecation cycles � User selected policies for versions
� Retain last two major � Retain last two minor � Retain latest � Retain always
� Eliminates the “Do we still need this?” conversations between ops and development.
- name: hello world type: job description: > mindlessly prints hello world to the console for five minutes schedule: hourly retries: 2 expiration_policy: > retain-latest-two-major dependencies: - ref: [email protected]
TL;DR.
� Automate everything. Your future sanity depends on it. � Define concrete protocols at system integration points; favor machine verifiable
protocols where possible. � Your path to success involves people. Listen, learn and be open for criticism. � Consul & Vault provide building-block functionality that just works. � Never settle for mediocre tools.
� Know when buying is better than building, but don’t be afraid to build if it adds value.