building a distributed build system at google scale (strangeloop 2016)
TRANSCRIPT
Building a Distributed Build System
at ScaleAysylu Greenberg
September 17, 2016
Aysylu Greenberg
@aysylu22
Building Distributed Build System
at Google Scale
Building Distributed Build System
at Google Scale
What does a build system do?Photo by smemon / CC BY
Build Scenario
Project with dependencies
Makefilemain.o : main.c defs.h cc -c main.cmain.c : curl –O http://remote/main.c
Build Scenario
Project with dependencies
Find dependenciesMakefilemain.o : main.c defs.h cc -c main.cmain.c: curl –O http://remote/main.c
Build Scenario
Project with dependencies
Find dependencies
Build project with the dependencies
Download build artifacts
Test Scenario
Project with dependencies
Find dependencies
Build project with the dependencies
Download build artifacts Run the test
Get the results of the test
Building Distributed Build System
at Google Scale
Building Distributed Build System
at Google Scale
Scale• Engineers: >30,000 developers in 40+ offices
Scale• Engineers: >30,000 developers in 40+ offices• Commits: 15K by humans + 30K by robots/day
Scale• Engineers: >30,000 developers in 40+ offices• Commits: 15K by humans + 30K by robots/day• Source code: 2 billion LOC
Scale• Engineers: >30,000 developers in 40+ offices• Commits: 15K by humans + 30K by robots/day• Source code: 2 billion LOC• Builds and tests: 5M per day through BuildRabbit
Scale• Engineers: >30,000 developers in 40+ offices• Commits: 15K by humans + 30K by robots/day• Source code: 2 billion LOC• Builds and tests: 5M per day through BuildRabbit• Petabytes of output artifacts
Scale• Engineers: >30,000 developers in 40+ offices• Commits: 15K by humans + 30K by robots/day• Source code: 2 billion LOC• Builds and tests: 5M per day through BuildRabbit• Petabytes of output artifacts• 1 repository
Working in One RepositoryPhoto by flyingblogspot / CC BY SA
Repository of Artifacts vs Building from SourcePhoto by flyingblogspot / CC BY SA
Working in One Repository • Linear revision history
Working in One Repository • Linear revision history• Extensive code sharing & reuse
• Everything can be cross-referenced• Large-scale refactoring for codebase modernization• Easier collaboration across teams
Working in One Repository • Linear revision history• Extensive code sharing & reuse• Simplified dependency management
• No confusion about authoritative version of the file• No forking of shared libraries• Components for library releases = Git subtree or Git
subcomponents to separate release from WIP versions
Working in One Repository • Linear revision history• Extensive code sharing & reuse• Simplified dependency management• Predictable, repeatable builds from source
Working in One Repository • Linear revision history• Extensive code sharing & reuse• Simplified dependency management• Predictable, repeatable builds from source• Optimizations to avoid compiling same artifacts
Working in One Repository • Linear revision history• Extensive code sharing & reuse• Simplified dependency management• Predictable, repeatable builds from source• Optimizations to avoid compiling same artifacts• Decouple each team's processes as much as
possible
Building Distributed Build System
at Google Scale
Building Distributed Build System
at Google Scale
Build System StoryPhoto by imanka / CC BY
Towards Distributed Build System
Towards Distributed Build System
Towards Distributed Build System
Towards Distributed Build System
Towards Distributed Build System
Towards Distributed Build System
BuildRabbit:Distributed Build System
BuildRabbit: Distributed Build System
IN THE CLOUD
What does BuildRabbit do?
Continuous integration
system
BuildRabbit
Source System
Google engineers &
teams
Release infrastructure
Integration testing
infrastructure
Build artifact storage
Blaze
Building Distributed Build System
at Google Scale
Building Distributed Build System
at Google Scale
From Push to Pull: Client / Server
Client forUser
BuildRabbitScheduler
BuildRabbitWorker
From Push to Pull: Build Service
User
Persistent Queue
BuildRabbit Worker
Build ArtifactsBuild Progress
Info
eventRPCstream
From Push to Pull: Build Service
User
Persistent Queue
BuildRabbit Worker
Build ArtifactsBuild Progress
Info
eventRPCstream
Client for
User
BuildRabbitScheduler
BuildRabbitWorker
Client for
User
BuildRabbitScheduler
BuildRabbitWorker
User
Persistent Queue
BuildRabbit Worker
Build Progress InfoBuild Artifacts
Client for
User
BuildRabbitScheduler
BuildRabbitWorker
User
Persistent Queue
BuildRabbit Worker
Build Progress InfoBuild Artifacts
RESILIENCE LOGIC
RESILIENCE LOGIC
Continuous integration
system
BuildRabbit
Source System
Google engineers &
teams
Release infrastructure
Integration testing
infrastructure
Build artifact storage
Blaze
Migration with Zero DowntimePhoto by moonjazz / CC BY SA
MIGRATE BACKENDS FIRST
Client for
User
BuildRabbitScheduler
BuildRabbitWorker
User
Persistent Queue
BuildRabbit Worker
Build Progress InfoBuild Artifacts
FOCUS ON MIXED MODEMIGRATE BACKENDS FIRST
TARGET LAUNCH-FRIENDLY CLIENTSFOCUS ON MIXED MODEMIGRATE BACKENDS FIRST
Client for
User
BuildRabbitScheduler
BuildRabbitWorker
User
Persistent Queue
BuildRabbit Worker
Build Progress InfoBuild Artifacts
DECOUPLE LAUNCH OF SERVICESTARGET LAUNCH-FRIENDLY CLIENTSFOCUS ON MIXED MODEMIGRATE BACKENDS FIRST
Client for
User
BuildRabbitScheduler
BuildRabbitWorker
User
Persistent Queue
BuildRabbit Worker
Build Progress InfoBuild Artifacts
MAXIMUM VISIBILITY INTO SYSTEM STATEDECOUPLE LAUNCH OF SERVICESTARGET LAUNCH-FRIENDLY CLIENTSFOCUS ON MIXED MODEMIGRATE BACKENDS FIRST
PRACTICE THE LAUNCHMAXIMUM VISIBILITY INTO SYSTEM STATEDECOUPLE LAUNCH OF SERVICESTARGET LAUNCH-FRIENDLY CLIENTSFOCUS ON MIXED MODEMIGRATE BACKENDS FIRST
PRACTICE THE LAUNCH, A LOTMAXIMUM VISIBILITY INTO SYSTEM STATEDECOUPLE LAUNCH OF SERVICESTARGET LAUNCH-FRIENDLY CLIENTSFOCUS ON MIXED MODEMIGRATE BACKENDS FIRST
HAVE A SOLID ROLLBACK PLANPRACTICE THE LAUNCH, A LOTMAXIMUM VISIBILITY INTO SYSTEM STATEDECOUPLE LAUNCH OF SERVICESTARGET LAUNCH-FRIENDLY CLIENTSFOCUS ON MIXED MODE
Distributed System Design PatternsPhoto by oslointhesummertime / CC BY
Distributed vs Centralized Control PlanePhoto by oslointhesummertime / CC BY
Distributed Design Patterns: Distributed Control
Design implementation is scaled from 1 machine
Bigger overhead during evolution
Distributed Design Patterns: Centralized Control
Distributed design from the start
Bigger overhead during design phase
Building a Distributed Build System
at ScaleAysylu Greenberg
@aysylu22