chefconf 2015 - chef retrospective
TRANSCRIPT
Looking Back:
@JoeNuspl @gwaldo
Chef Retrospectives at Workday & CommerceHub
Joe and I have come to Chef from drastically different places, and our working conditions are almost guaranteed to be different than yours, but here are some lessons that we’ve learned the long way.
Workday
A NYSE listed company (WDAY) that provides enterprise cloud applications for human capital management (HCM), payroll,
financial management, recruiting, and analytics.
J
Workday Environment• 9 physical data centers world wide plus Amazon
and HP cloud
• 124 roles
• 153 cookbooks
• More than 10K servers under chef control
• PCI and Regulatory compliance
J
Connect e-Retailers with Suppliers, providing drop-shipping services. Processed > 44 million orders for the top online retailers in US & Canada (> $7B retail sales)
CommerceHub Environment
(kidding)
CommerceHub Environment
• Low-Thousands of VMs (VMware)
• Mostly monolithic codebase
• Java on Windows originally, now split w/ Ubuntu
• Many Roles and (small) Envs
Introduction of Chef
• 0.8.2 in 2010
• Knew no ruby
• Hired to apply engineering disciple to operations
• Chef 11 in 2013
• Knew no ruby
• Hired as DevOps / Automation Cheerleader
CommerceHubWorkday
The Good @ Workday: SSH
• 2FA ssh into the data center, then multi hop ssh to get the final machine
• Wrote ssh wrapper that grabs PIN from SecurID.app and sets up ssh control masters and socks proxies along the way.
• A VP regularly uses it to get access to some realtime performance dashboards.
J
The Good @ Workday: Jira Automation
• Don’t just automate servers; automate workflow
• Automate routine Jira / Confluence updates
J
Good at CommerceHub• Solid Infrastructure, ramping
up spending
• There was a lot of desire for improvements
• People care
• Some automation was already in place
• exp. around Testing
The Bad @ Workday: Chef Workarounds
• cookbook_file resources would update the file every chef run. Used templates for everything.
• search was slow and unreliable. ran knife exec scripts to collect the search data and stuff it into a data bag.
• too much “convert this shell script into chef code”
J W: Chef Search result order (“Sensunamis”)
The Bad @ Workday: Community Cookbook
Quality Variance• A majority assumed:
• running ubuntu
• Internet access
• can compile code
J W: I understand the Internet Access assumption, but not the code compiling one. Is it that you wouldn’t want to compile everything, but options for specifying a built package aren’t available? A bigger problem that I have with community cookbooks is that many simply don’t work. Ask about this on stage. ‘In fact, mcollective removes things like…'
The Bad @ Workday: Not having a “gold
standard cookbook”• Programmers tend to plagiarize.
• It is encouraged as “code reuse”
• People inevitably choose the worst example
• Causing the crap to spread
J
Bad at CommerceHub• Key people wanted different things
• Lots of “Key People”
• “Can you automate this environment first?”
• Gatekeepers
• Little insight tooling (logging, metrics, alerting)
• Surprise! Chef requires Engineering Effort
It’s not a DevOpsey conference without a @littleidea quote. But seriously, it seems that some people thought “Hire a DevOp, and it’ll magically get better!”
The Ugly @ Workday: Data Bag Misuse
• Created the silo data bag to put data center specific overrides
• Predated Chef::Environments
• Grew out of control, 280K of json.
J
The Ugly @ Workday• Not being tightly integrated with the rest
of the Infrastructure team
• Not creating build pipeline sooner
• Not creating easy-to-use test environments sooner
• Occasional excessive logic in Templates
• We were lacking clear “Gold Standard” Cookbook design example.
J Not tooting our own horn
Ugly at CommerceHub• Developers sometimes uninterested
in Chef/Ruby “Ops Work”
• Not establishing opinions early (TIMTOWTDI)
• Many small Environments
• Many teams solving the same problems*
Ugly at CommerceHub
• Resistance to Include Ops Eng work in timeframes
• Aligning People + Interest + Time/Opportunity/Dollars
• Berkshelf and Testing are late additions to Chef workflow
What do you call…A group of Wolves?
A group of Crows?
A group of Developers?
a Pack
a Murder
a Merge Conflict
W: Despite interest in Chef, there was significant resistance to adopt.
Why Resistant to Change?• You’re going to automate me out of a job
• I inherited the pile of crap, I don’t understand how it works, so if you break it I won’t be able to fix this.
• If it ain’t broke, don’t fix it. (or “I made this pile of crap. Don’t change it.”)
• Damn it Jim, I’m sys admin not a programmer.
• Used to Ops being invisible.
Resistant to Change
• “I’d just have to verify that it worked anyway.”
• Overemphasis on Standardization and Consensus.
• The people know the processes. They made them.
• “I don’t trust code.”
• “It’ll take longer to do the automation than the work.”
Friction• Status Quo
• Language
• Common Idioms
• “I have to learn Ruby?!”
• Analysis Paralysis
• Training, because Learning Curve
• “Windows Support*”
Friction
• Ops Engineering
• More used to the Former than the Latter
MistakesWereMade
J
What could we have done better?
• Lots of things
• Identify the goals of your org & make them:
• See the light
• Enter the light
• And shine
• Fight the Silver Bullet mentality
J
What could we have done better?
• Be more explicit about engineering effort involved. (It’s software engineering)
• Chef is powerful, but not always the best tool for the job.
• Identify as part of a skill and job promotion.
W:
What could we have done better?
• More Explicit about code-reviews.
• Be more opinionated early-on.
• Testing up-front.
W:
Wins
• Consistency
• No more snowflake hunts
• Mitigating environment differences
• Capacity additions made easy
• Facilitating Services split-outs
We don’t want everything sound too dour, because Chef has been a huge win for us. None of these are news, but we’re so close to Chef that they can become so familiar as to become invisible.
Wins
• Gateway drug to automation-addiction
• People Upgrades
• Bringing visibility of Operations work
• Reduction of “Works on my machine” rage
We don’t want everything sound too dour, because Chef has been a huge win for us. None of these are news, but we’re so close to Chef that they can become so familiar as to become invisible.
So, where does that leave us?
So here is where we have suggestions from Mugglesville.
Request #1: “Best Practices”
We’re often asked for “Best Practices”, but people see things like this. Their reaction is…
…If I try to be non-proscriptive, and give options, it can come across as wishy-washy….
Well…
…Trying to figure out what they need leads to exasperation…
…and sometimes they wonder if we know what we’re doing. Having strong feelings leads to the original problem when they see an opposing view. (ROLES STAHP)
Or you end up being wrong because they do something(s) unexpected.
Solution #1 “Recommended Practices”
• Present Options/Views of a subject (e.g. Roles)
• Explain pros & cons of the approach.
• “If your environment looks like ABC, this may make sense for you.”
• Reviewed periodically, and describe changes visibly.
So, let’s give it to them.
Request #2: “Where the [devops] did that value come from?”
Attributes. I love ‘em.
“Can you take a look at something?
I can’t figure out why the value isn’t $val.”
This is where I take them through the process of figuring out what values are being set, and where in the order they fit. This is time-consuming. And I often come down to showing them this:
https://docs.chef.io/attributes.html#attribute-precedence
I love this page. It gives new Chefs hives. 15 attribute levels. But you want to help, so you sit down.
And start digging… After awhile you can’t figure it out, when they say…
“Oh, I must have set this value on the node itself…”
“WHY WOULD YOU DO THAT?!” (I’d want to scream)
What I’d like to see is something like this:
Solution #2 `knife (…) inspect (…)`
The process to determine what value is set. Let’s make it a little more verbose.
Request #3: Windows
Look, I love this community. And I honestly don’t hate Windows. But Chef-on-Windows has not been great this last 2 years.
Request #4: Versions on Roles & Envs
It’s time, no?
W: Finally, a Plea to Chef: Chef is not our job. Our priorities are not the same. Asking for empathy and patience, and we’ll give you the same.
Now, we’d like to end on a high-note…
Introducing
Sous Chef
https://github.com/commercehub-oss/sous_chef/
(not an official logo) Work of Larry Zarou, this is a cookbook to help you set up a cookbook-testing pipeline.
Introducing
Sous Chef
https://github.com/commercehub-oss/sous_chef/
Currently opinionated toward CHUB’s environment, but contribs welcome!
@joenuspl Workday
@gwaldo CommerceHub
Thank you!
Please rate in the app