engineering your startup to innovate at...

Post on 22-May-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Engineering Your Startup to Innovate at Scale

Randy Shoup @randyshoup

linkedin.com/in/randyshoup

Background• VP Engineering at Stitch Fix

o Combining “Art and Science” to revolutionize apparel retail

• Consulting “CTO as a service”o Helping companies scale engineering organizations and technology

• Director of Engineering for Google App Engineo World’s largest Platform-as-a-Service

• Chief Engineer / Distinguished Architect at eBayo Multiple generations of eBay’s infrastructure

@randyshoup linkedin.com/in/randyshoup

Stitch Fix

@randyshoup linkedin.com/in/randyshoup

Stitch Fix

@randyshoup linkedin.com/in/randyshoup

Stitch Fix

@randyshoup linkedin.com/in/randyshoup

Stitch Fix

@randyshoup linkedin.com/in/randyshoup

Combining Humans and Data Science• 1:1 Ratio of Data Science to Engineering

o Almost 100 software engineerso Almost 100 data scientists and algorithm developerso Unique in our industry

• Apply intelligence to *every* part of the businesso Buyingo Inventory managemento Logistics optimizationo Styling recommendationso Demand prediction

• Humans and machines augmenting each other

@randyshoup linkedin.com/in/randyshoup

Styling at Stitch Fix

Personal styling

Inventory

@randyshoup linkedin.com/in/randyshoup

PersonalizedRecommendations

InventoryAlgorithmic

recommendations

Machine learning

@randyshoup linkedin.com/in/randyshoup

Expert HumanCuration

Human curation

Algorithmic recommendations

@randyshoup linkedin.com/in/randyshoup

Data-DrivenExperimentation

• Experimenting with …o Algorithmso Client Experienceo Stylist Interactionso Growth engineering

@randyshoup linkedin.com/in/randyshoup

Faster is Better

Willingness to go fast

Ability to go fast+

Lack of Fear

Capability+

High-Performing Organizations

• Multiple deploys per day vs. one per month

• Commit to deploy in less than 1 hour vs. one week

• Recover from failure in less than 1 hour vs. one day

• Change failure rate of 0-15% vs. 31-45%

@randyshoup linkedin.com/in/randyshoup

https://puppet.com/resources/whitepaper/state-of-devops-report

High-Performing Organizations

è2.5x more likely to exceed business goalso ProfitabilityoMarket shareo Productivity

@randyshoup linkedin.com/in/randyshoup

https://puppet.com/resources/whitepaper/state-of-devops-report

Speed vs. Stability?

Faster is Better

Innovating at Scale

•Organizing for Speed

•What to Build / What NOT to Build

•When to Build

•How to Build

•Delivering and Operating

Innovating at Scale

•Organizing for Speed

•What to Build / What NOT to Build

•When to Build

•How to Build

•Delivering and Operating

Conway’s Law• Organization determines architecture

o Design of a system will be a reflection of the communication paths within the organization

• Modular system requires modular organizationo Small, independent teams lead to more flexible, composable systemso Larger, interdependent teams lead to larger systems

• We can engineer the system we want by engineering the organization

@randyshoup linkedin.com/in/randyshoup

Small “Service” Teams

• Full-Stack, “2 Pizza” Teamso No team should be larger than can be fed by 2 large pizzaso Typically 4-6 peopleo All disciplines required for the team to function

• Aligned to Business Domainso Clear, well-defined area of responsibilityo Single service or set of related serviceso Deep understanding of business problems

• Growth through “cellular mitosis”

@randyshoup linkedin.com/in/randyshoup

Ideally, 80% of project work should be within a team boundary.

Autonomy and Accountability

• Give teams autonomy• Freedom to choose technology, methodology, working environment• Responsibility for the results of those choices

• Hold team accountable for *results*• Give a team a goal, not a solution• Let team own the best way to achieve the goal• Innovate and experiment to achieve the goal

Innovating at Scale

•Organizing for Speed

•What to Build / What NOT to Build

•When to Build

•How to Build

•Delivering and Operating

“Building the wrong thing is the biggest waste in software development.”

-- Mary and Tom Poppendieck, Lean Software Development

What problem are you trying to solve?

“A problem well-stated is a problem half-solved.”

-- Charles Kettering, former head of research for General Motors

What Problem Are You Trying to Solve?

• Focus on what is important for your business

• Problem might be solved without any technology at allo Redefine the problemo Change the business processo Implement manually for a while before automating in an application

@randyshoup linkedin.com/in/randyshoup

Buy, Not Build• Use Cloud Infrastructure

o Faster, cheaper, better than we can do ourselveso Stitch Fix has no owned physical infrastructure anywhere in the world

• Prefer Open Sourceo Kubernetes, Docker, Istioo MySQL, Postgres, Redis, Elastic Searcho Machine learning modelso Etc.o Usually better than the commercial alternatives (!)

@randyshoup linkedin.com/in/randyshoup

Buy, Not Build• Third-Party Services

o Stitch Fix uses >50 third party serviceso Logging, monitoring, alertingo Project management, bug trackingo Billing, fraud detectiono Etc.

• Focus on our core competencyo Use services for everything else (!)

@randyshoup linkedin.com/in/randyshoup

Soon it will be just as common to run your own data center as it is to run your own electrical power generation.

Experimental Discipline

• State your hypothesiso What metrics do you expect to move and whyo Understand your baseline

• Run a real A | B testo Sample sizeo Isolated treatment and control groupso No peeking or quitting early!

• Obsessively log and measureo Understand customer and system behavioro Understand why this experiment worked or did not

Experimental Discipline

• Listen to the datao Data trumps hope and intuitiono Develop insights for next experiment

• Thinking of the experiment is art; evaluating it is science

• Rinse and Repeato This is a journey, not a single step

eBay Machine-Learned Ranking

• Ranking function for search resultso Which item should appear 1st, 10th, 100th, 1000th

o Before: Small number of hand-tuned factorso Goal: Thousands of factors

• Incremental Experimentationo Predictive models: query->view, view->purchase, etc.o Hundreds of parallel A | B testso Full year of steady, incremental improvements

è 2% increase in eBay revenue (~$120M / year)

eBay Site Speed

• Reduce user-experienced latency for search results

• Iterative Processo Implement a potential improvemento Release to the site in an A | B testo Monitor metrics –time to first byte, time to click, click rate, purchase rate

è 2% increase in eBay revenue (~$120M / year)

Innovating at Scale

•Organizing for Speed

•What to Build / What NOT to Build

•When to Build

•How to Build

•Delivering and Operating

Prioritization• Scarce resources require prioritization

o We always have more to do than resources to do ito Opportunity cost -- deciding to do X means deciding not to do Yo Every decision is a tradeoff

• Priority ← Return on Investment o Impact / Effort

• Prioritization is a business decision, not a technical decision

@randyshoup linkedin.com/in/randyshoup

Fewer Things,More Done

Fewer Things,More Done

• Maximize resources applied too Priority 1, then o Priority 2o etc.

• Incremental Deliveryo Deliver increments along the way instead of everything at the end

• Deliver Value Fastero Time Value of Moneyo Benefit now is worth more than benefit in the future

@randyshoup linkedin.com/in/randyshoup

“When you solve problem one, problem two gets a promotion.”

Innovating at Scale

•Organizing for Speed

•What to Build / What NOT to Build

•When to Build

•How to Build

•Delivering and Operating

Microservices

• Single-purpose• Simple, well-defined interface• Modular and composable• Independently deployable

A

C D E

B

Evolution toMicroservices

• eBay • 5th generation today• Monolithic Perl à Monolithic C++ à Java à microservices

• Twitter• 3rd generation today• Monolithic Rails à JS / Rails / Scala à microservices

• Amazon• Nth generation today• Monolithic Perl / C++ à Java / Scala à microservices

No one starts with microservices…

Past a certain scale, everyone ends up with microservices

If you don’t end up regretting your early technology decisions, you probably over-engineered.

QualityDiscipline

• Quality and Reliability are “Priority-0 features”o Equally important to users as product features and engaging user

experience

• Developers responsible for o Featureso Qualityo Performanceo Reliabilityo Manageability

Test-Driven Development

• Tests help you go fastero Tests “have your back”o Development velocity

• Tests make better codeo Confidence to break thingso Courage to refactor mercilessly

• Tests make better systemso Catch bugs earlier, fail faster

@randyshoup linkedin.com/in/randyshoup

OptimizingDeveloper Effort

@randyshoup linkedin.com/in/randyshoup

• 75% reading existing code

• 20% modifying existing code

• 5% writing new code

https://blogs.msdn.microsoft.com/peterhal/2006/01/04/what-do-programmers-really-do-anyway-aka-part-2-of-the-yardstick-saga/

OptimizingDeveloper Effort

@randyshoup linkedin.com/in/randyshoup

• 75% reading existing code

• 20% modifying existing code

• 5% writing new code

https://blogs.msdn.microsoft.com/peterhal/2006/01/04/what-do-programmers-really-do-anyway-aka-part-2-of-the-yardstick-saga/

“Do you have time to do it twice?”

“We don’t have time to do it right!”

The more constrained you are on time or resources, the more important it is to build it right the first time.

Build It Right (Enough)The First Time

• Build one great thing instead of two half-finished things

• Right ≠ Perfect (80 / 20 Rule)

• è Basically no bug tracking system (!)o Bugs are fixed as they come upo Backlog contains features we want to buildo Backlog contains technical debt we want to repay

@randyshoup linkedin.com/in/randyshoup

Innovating at Scale

•Organizing for Speed

•What to Build / What NOT to Build

•When to Build

•How to Build

•Delivering and Operating

You Build It, You Run It.-- Werner Vogels

ContinuousDelivery

• Repeatable Deployment Pipelineo Low-risk, push-button deploymento Rapid release cadenceo Rapid rollback and recovery

• Most applications deployed multiple times per day

• More solid systemso Release smaller units of worko Smaller changes to roll back or roll forwardo Faster to repair, easier to understand, simpler to diagnose

@randyshoup linkedin.com/in/randyshoup

FeatureFlags

• Configuration “flag” to enable / disable a feature for a particular set of userso Independently discovered at eBay, Yahoo, Facebook, Google, etc.

• More solid systemso Decouple feature delivery from code deliveryo Rapid on and offo Develop / test / verify in productiono Dark launches

• Enables experimentation

Canaries and Dark Launches

• Canary Deploymento Deploy to a small number of systems or userso Avoid catastrophic failures

• Dark Launcho Execute the code and systems for a feature without displaying results to

the usero Work out performance bottlenecks and system interactions

Innovating at Scale

•Organizing for Speed

•What to Build / What NOT to Build

•When to Build

•How to Build

•Delivering and Operating

Faster is Better

Thanks!• Stitch Fix is hiring!

o www.stitchfix.com/careerso Based in San Franciscoo Hiring everywhere!o More than half remote, all across USo Application development, Platform engineering,

Data Science

• Please contact meo @randyshoupo linkedin.com/in/randyshoup

top related