heavenly hell – automated tests at scale wojciech seliga

#atlassian

WOJCIECH SELIGA • SENIOR DEV MANAGER • ATLASSIAN • @WSELIGA

Heavenly HellAutomated Tests at Scale

• Coding since 6 yo• Agile Practices (inc. TDD) since 2003• Dev Nerd, Tech Leader, Agile Coach,

Speaker, PHB• 7 years with Atlassian

(JIRA Senior Dev Manager)• Spartez Co-founder & CEO

About me

XP Promise

Cos

t of

Cha

nge

Time

WaterfallXP

The Story

About 2.5 years ago

Almost 10 years of accumulating

legacy automatic tests

About 20 000 tests on all levels of abstraction

*just in core JIRA

Very slow (even hours)and fragile feedback loop

Serious performance and reliability issues

Dispirited devs accepting RED as a norm

FeedbackSpeed

`Test

Quality

Test Code is Not Trash

Design

MaintainRefactor

Share

Review

Prune

Respect

Discuss

Restructure

Rewrite

Test Pyramid

Unit Tests (including QUnit)

REST / HTML Tests

Selenium

Fastest, lowest overall confidence

Slowest, highest overall confidence

Test Pyramid

Unit Tests (including QUnit)

REST / HTML Tests

Selenium

90%

9%

1%

Optimum Balance

Optimum Balance

Isolation

Optimum Balance

Isolation Speed

Optimum Balance

Isolation Speed Coverage

Optimum Balance

Isolation Speed Coverage Level

Optimum Balance

Isolation Speed Coverage Level Access

Optimum Balance

Isolation Speed Coverage Level Access Effort

Dangerous to temper with


Quality / Determinism


MaintainabilityQuality / Determinism

Almost two years later…

People

People - Motivation Making GREEN the norm

Shades of Red

Build Tiers and Policy

Tier A1 - green soon after all commits

Tier A2 - green at the end of the day

Tier A3 - green at the end of the iteration

unit tests and functional* tests

WebDriver and bundled plugins tests

supported platforms tests, compatibility tests

Wallboards: Constant

Awareness

Training

• Favouring assertThat over assertTrue/False and assertEquals

• Avoiding races - Atlassian Selenium with its TimedElement

• Favouring unit tests over functional tests (including QUnit over WebDriver)

• Promoting Page Objects

• Brownbags, blog posts, code reviews

https://ecosystem.atlassian.net/browse/SELENIUM

Quality

Automatic Flakiness Detection Quarantine

Re-run failed tests and see if they pass

Quarantine - Healing

SlowMo - expose races

Selenium 1

Selenium ditching Sky did not fall in

Ditching - benefits

• Freed build agents - better system throughput

• Boosted morale

• Gazillion of developer hours saved

• Money saved on infrastructure

Ditching - due diligence

• conducting the audit - analysis of the coverage we lost

• determining which tests needs to rewritten (e.g. security related)

• rewriting the tests (good job for new hires + a senior mentor)

Flaky Browser-based Tests

Playing with "loading" CSS class does not really help

Races between test code and asynchronous page logic

Races Removal with Tracing

// in the browser:!function mySearchClickHandler() {! doSomeXhr().always(function() {! // This executes when the XHR has completed (either success or failure)! JIRA.trace("search.completed");" });!}!// In production code JIRA.trace is a no-op

// in my page object:!@Inject!TraceContext traceContext;! !public SearchResults doASearch() {! Tracer snapshot = traceContext.checkpoint();! getSearchButton().click(); // causes mySearchClickHandler to be invoked! // This waits until the "search.completed" // event has been emitted, *after* previous snapshot ! traceContext.waitFor(snapshot, "search.completed"); ! return pageBinder.bind(SearchResults.class);!}!

Can we halve our build times?

Parallel Execution - Theory

End of Build

Batches

Start of Build

Parallel Execution

End of Build

Batches

Start of Build

Parallel Execution - Reality Bites

End of Build

Batches

Start of Build

Agent availability

Dynamic Test Execution Dispatch - Hallelujah

"You can't manage what you can't measure."

not by W. Edwards Deming

"You can't manage what you can't measure."

not by W. Edwards Deming

If you believe just in it

you are doomed.

You can't improve the systemif you can't measure it

You can't improve the systemif you can't measure itProfiler, Build statistics, Logs, statsd → Graphite

Anatomy of Build*

CompilationPackaging

Executing Tests

Anatomy of Build*


Executing Tests

Fetching Dependencies

Anatomy of Build*


Executing Tests


*Any resemblance to maven build is entirely accidental

Anatomy of Build*


Executing Tests



SCM Update

Anatomy of Build*


Executing Tests



SCM Update

Agent Availability/Setup

Anatomy of Build*


Executing Tests



SCM Update


Publishing Results

JIRA Unit Tests Build

Compilation (7min)


Compilation (7min)

Packaging (0min)


Compilation (7min)

Packaging (0min)

Executing Tests (7min)


Compilation (7min)

Packaging (0min)


Publishing Results (1min)


Compilation (7min)

Packaging (0min)

Executing Tests (7min)Fetching Dependencies (1.5min)



Compilation (7min)

Packaging (0min)


SCM Update (2min)



Compilation (7min)

Packaging (0min)


SCM Update (2min)

Agent Availability/Setup (mean 10min)


Decreasing test execution time to

ZERO alone would not let us achieve our goal!


• starved builds due to busy agents building very long builds

• time synchronization issue - NTPD problem

• Proximity of SCM repo

• shallow git clones are not so fast and lightweight + generating extra git server CPU load

• git clone per agent/plan + git pull + git clone per build (hard links!)

• Much less load on Stash server (no need to queue up)

SCM Update - Checkout time

• Proximity of SCM repo

• shallow git clones are not so fast and lightweight + generating extra git server CPU load

• git clone per agent/plan + git pull + git clone per build (hard links!)

• Much less load on Stash server (no need to queue up)

SCM Update - Checkout time

2 min → 5 seconds

• Fix Predator

• Sandboxing/isolation agent trade-off: rm -rf $HOME/.m2/repository/com/atlassian/*

intofind $HOME/.m2/repository/com/atlassian/ -name “*SNAPSHOT*” | xargs rm

• Network hardware failure found (dropping packets)


• Fix Predator

• Sandboxing/isolation agent trade-off: rm -rf $HOME/.m2/repository/com/atlassian/*

intofind $HOME/.m2/repository/com/atlassian/ -name “*SNAPSHOT*” | xargs rm

• Network hardware failure found (dropping packets)


1.5 min → 10 seconds

Compilation

• Restructuring multi-pom maven project and dependencies

• Maven 3 parallel compilation FTW!

-T 1.5C *optimal factor thanks to scientific trial and error research

Compilation

• Restructuring multi-pom maven project and dependencies

• Maven 3 parallel compilation FTW!

-T 1.5C *optimal factor thanks to scientific trial and error research

7 min → 1 min

Unit Test Execution

• Splitting unit tests into 2 buckets: good and legacy (much longer)

• Maven 3 parallel test execution (-T 1.5C)

3000 poor tests (5min)

11000 good tests (1.5min)

Rewritten entirely over next year

Unit Test Execution

• Splitting unit tests into 2 buckets: good and legacy (much longer)

• Maven 3 parallel test execution (-T 1.5C)

7 min → 5 min

3000 poor tests (5min)

11000 good tests (1.5min)

Rewritten entirely over next year

Functional Tests

• Selenium 1 removal did help

• Faster reset/restore (avoid unnecessary stuff, intercepting SQL operations for debug purposes - building stacktraces is costly)

• Restoring via Backdoor REST API (JIRA TestKit)

• Using REST API for common setup/teardown operations

Functional Tests

Publishing Results

• Server log allocation per test → using now Backdoor REST API (was Selenium)

• Bamboo DB performance degradation for rich build history

Publishing Results

• Server log allocation per test → using now Backdoor REST API (was Selenium)

• Bamboo DB performance degradation for rich build history

1 min → 40 s

Unexpected Problem

• Stability Issues with our CI server (hardware)

• The bottleneck changed from I/O to CPU

• Too many agents per physical machine

JIRA Unit Tests Build Improved

Compilation (1min)


Compilation (1min)

Packaging (0min)


Compilation (1min)

Packaging (0min)


Publishing Results (40sec)


Compilation (1min)

Packaging (0min)


Fetching Dependencies (10sec)



Compilation (1min)

Packaging (0min)



SCM Update (5sec)



Compilation (1min)

Packaging (0min)



SCM Update (5sec)

Agent Availability/Setup (3min)*


Improvements Summary

Tests Before After Improvement %

Unit tests 29 min 17 min 41%

Functional tests 56 min 34 min 39%

WebDriver tests 39 min 21 min 46%

Overall 124 min 72 min 42%

* Additional ca. 5% improvement expected once new git clone strategy is consistently rolled-out everywhere

Better speed increases responsibility

Fewer commits (authors) per single build

vs.

The Quality Follows

But that's still bad

We want CI feedback loop in a few minutes maximum

Splitting The Codebase

Inevitable Split - Fears

• Organizational concerns - understanding, managing, integrating, releasing, coordinating

• Mindset change - if something worked for 10+ years why to change it?

• Trust - does this library still work?

• We damned ourselves with big buckets for all tests - where do they belong to?

Splitting code base

• Step 0 - JIRA Importers Plugin (3.5 years ago)

• Step 1- New Issue View and Navigator

• Step 2 - now everything else follows (e.g. Workflow Designer)JIRA 6.0

Getting back from hell to heaven is difficult. Hell sucks in your soul.

Key takeaways:

• Visibility and problem awareness help• Maintaining huge testbed is difficult and costly• Measure the problem - to baseline• No prejudice - no sacred cows• Automated tests are not one-off investment, it's a continuous journey• Performance is a damn important feature

#atlassian

Test performance is a damn

important feature!

XP vs Sad Reality

Cos

t of

Cha

nge

Time

WaterfallXP - idealSad Reality

• Green Traffic Light - by flrnt, CC-BY-SA-2.0

• Turtle - by Jonathan Zander, CC-BY-SA-3.0

• Loading - by MatthewJ13, CC-SA-3.0

• Merlin Tool - by L. Mahin, CC-BY-SA-3.0

• Flashing Red Light - by Chris Phan, CC BY 2.0

• In Heaven - by Daniel Pascoal, CC BY-NC-ND 2.0

Images - Credits

https://www.flickr.com/photos/flrnt/

http://photography.jznet.org/

http://matthewj13.deviantart.com/art/Loading-Animated-GIFs-244672794

http://commons.wikimedia.org/wiki/User:Lucyin

https://www.flickr.com/photos/dpg/

Thank you!

WOJCIECH SELIGA • SENIOR DEV MANAGER • ATLASSIAN • @WSELIGA

heavenly hell – automated tests at scale wojciech seliga

Documents

checkout time

5coptimal

pom maven

dynamic test

fetching dependenciescompilationpackagingexecuting

atlassianintofind

compilation

rf home