aws loft presentation-here cciaws-de-media.s3.amazonaws.com/images/_munich_loft_slides/aws l… ·...

32
Peter Caron, HERE AWS Loft| November 16, 2016 AWS Loft

Upload: letuong

Post on 01-Sep-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Peter Caron, HEREAWS Loft| November 16, 2016

AWS Loft

Migrating and Running Continuous Integration Systems at Scale in AWS

Peter CaronAWS Loft, München | November 16, 2016

Agenda

1. Transition to CloudLift and ShiftCrawl, Walk, Run, Sprint

2. CI / CD Overview

3. Challenges

HERE’s Business

HERE is one the world’s leading map data companies and is now able to deliver the next generation of mobility and location-based services.

HERE Products

HERE software serves map, traffic and location data to a variety of target platforms• HERE Open Location Platform• Embedded Automobile Navigation • Enterprise Extensions• Mobile Apps

HERE’s Challenge

HERE needed a CI system that could meet complex and heterogenousdeployments and releases that could scale.

Before the Cloud

Region #x

HomemadeBuild Tools

Build Systems ~5+ Builds per month40T+ Tests cycle

CD Platform Pipelines 17 Services2 Unique pipelines1000+VMs on VMWare1 ish Deployments/month100s Acceptance tests/month~10 Build runs / month

Region #2

JenkinsUnder

the Desk

Region #1

JenkinsIn the Data Centre ?

AWS Services in Production

Region #1

security groupEC2 instances

JenkinsMaster

Amazon EFS

AmazonS3

Spot Instances

AWSDevice Farm

Amazon Cloud Watch

Amazon VPC

Region #2

security groupEC2 instances

JenkinsMaster

Common CI Systems – CCI (Jenkins / Electric Flow)110K+ Builds per day25M+ Tests per day

CI for Micro-services - JaaS (Jenkins as a Service)130 Products and services

CD Platform Pipelines (go as a Service)36 Services668 Unique pipelines600+ VMs on AWS 40+ Deployments/month100s Acceptance tests/day1400+ Build runs / month

What kind(s) of integration and testing to use?

Jenkins Unit testing

go integration testing

go deployment

Mesos / Marathon deployment orchestration

Real Device testing

Customer Integration

Static Data Services Micro ServicesReal-time Data Services Embedded and Downloaded

Applications

Transition to Cloud

Moving CI workstreams to AWS

Moving our CI / CD infrastructure to AWS …• Git• Gerrit• Jenkins• Go• Splunk It was a simple lift and shift from our local infrastructure

… and everything worked well from Day 1

Uh, not exactly!.

Plan to Grow

• Get your Workflow right• i.e. Get your CI act together first

• Know your Capacity and Limits• Focus on Testing• Set Expectations Internally• Know your Fallback options• Monitor changes (costs)

Create a Culture Change1. Start small, iterate• A single developer group before your flagship product

2. Understand your changes • There is infrastructure outside the control of your developer.

Don’t let is become Expensive Hosting 2.03. Infrastructure as Code is not just a buzz word• Apply it if you have one or more people using CI

4. Measure Results and Adapt WoW• Only react to verifiable metrics

What did we learn?

• Don’t trust the plugins• Capacity is always underestimated• Costs will be high• Plan fallback• Trust the developers – just enough• Moving the Cloud will help nothing• People will use it

… and what could we have done better?

Do Continuous Integration

Moving CI ways of working to a Cloud

Mainline

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Submit Verification

Submit Verification

Submit Verification

Submit Verification

Submit Verification

Submit Verification

Full Verification Full Verification Full Verification

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Every 5 minDuration < 20min

Runs on each successful SV

Duration < 20min

Runs every 3 hrsDuration : 3hrs

Runs every day ?Duration : ? Release

candidateE2E (Manual

Tests)

Client Pipelines

Artifact

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Submit Verification

Submit Verification

Submit Verification

Submit Verification

Submit Verification

Submit Verification

Submit Verification

Full VerificationFull VerificationFull Verification

Service Pipelines

Challenges and Lessons LearnedMaintaining a CI ways in a Cloud

Our biggest challengeHandling the Loads

2,499,109

81,283

Common CI Runs 2015-2016

TransparencyMeasurement and Monitoring

Dashboards provide visibility

What is the system doing?

Know what your system is doing!

TransparencyMeasure it, Use it What slowed down this build?

Other Challenges

EC2 Instance types and plugins• Build Rotator • Fluentd• BFA Plugin

Special Challenges (Peaks and Valleys)• CO2 (Choose your AWS region first)• Performance (Watch your Queues)• Use Containers (Duh!)

• Hierarchy Killer Plugin• Timestamp Plugin

The Advantages of CI in the Cloud

• Security in our infrastructure• Stability: automated tests run reliably in a consistent

infrastructure• Rapid scaling: slaves come online fast• Cost control: slaves go off-line fast• Common AWS tools are known to Engineers• High availability: Master servers are always available• Multi-regional presence reduces latency• Parallel builds and testing will reduced time and costs

Load and ScalabilityNumber of Build Runs per day

Speed and PredictabilityMean Duration of pre-commit validation runs

Final Thought

Avoid creating a big pile of poo!

Questions?

Thank youContact

Peter CaronService Automation and Continuous Integration

HEREInvalidenstrasse 11610115 Berlin

[email protected]

PluginsPlugin name Version affected Issue Action Download

BuildRotator Plugin ---

LogRotator that comes with Jenkins tries to be much smarter then needed. So, it loads entire job history at least twice to understand what could be removed and what -not.

Update plugin. Replace "LogRotator" to "BuildRotator" as build discard mechanism everywhere.

BuildRotator.hpi

Fluentd --- Send data to Fluentd Install and enjoy. fluentd.hpi

BFA Plugin 1.13.0 and earlier

When we have a huge amount of aborted builds, BFA needs to process all of them, that creates queue and slowdown Jenkins/feedback itself.

Build failure analyzer => Advanced => "Ignore aborted builds" option should be enabled in Jenkins configuration.

Available in Jenkins

PluginsPlugin name Version affected Issue Action Download

HierarchyKillerPlugin 0.98 and earlier

When plugin goes to kill some item from queue, it kills first job in queue instead of killing job that was connected to upstream.

FIX: Correct API call was used.

Update plugin and have fun. build-hierarchy-killer.hpi

Timestamper Plugin 1.8.4 and earlier

Even then Jenkins needs only last 150 KB, plugin reads entire log (because of the # of users we have up to 3 GB) to calculate timestamp for last X lines.Main problem that plugin stores timestamps in encoded format - VarInt.

FIX: Read only last 150 KB of logs for finished builds.

Update plugin and enjoy. Available in Jenkins

Contributions

Plugins• S3• BFA• DSL• EC2• Unit• Gerrit• ccache

Core• Jenkins• XML library