uber mobility meetup: mobile testing

Evolution of Octopus

Bian Jiang, Nirav NagdaUber Mobility MeetupJuly 7, 2016

From cross-app/cross device testing, to network record/replay, to scenario-based testing

Soon after we started at Uber, we encountered a fresh testing challenge. Can you guess what it is?

Our Uber Challenge

Octopus: Our platform agnostic test runner

GTAC 2015Octopus Takes on the Uber Challenge of Cross-app/Cross-device Testinghttps://www.youtube.com/watch?v=p6gsssppeT0

Uber Eng Bloghttps://eng.uber.com/rescued-by-octopus/

EndToEnd (L)

Scenario-based Tests (M)

Unit Tests (S)

Top: small number of large end to end tests (dozens)Middle: middling amount of medium integration tests (hundreds)Bottom: large number of small unit tests (thousands)

PREREQUISITES: HEALTHY TEST PYRAMID

Octopus Evolution(Hermetic Tests)

Record mode:- Records the network responses

Replay mode:- Plays back the network responses from local files

HERMETIC TESTS: USING NETWORK RECORD/REPLAY

ANDROID EXAMPLE: NETWORK RECORD/REPLAY

@Test@Replaypublic void whenDestinationPromptTap_shouldGoToLocationEditor() { HomeLayout .verifyLayout() .clickDestinationPrompt();}

(record = true)

func testFavoriteLocationsScreen() { do { launchAppWithOctopusConfig(try OctopusConfigBuilder() .setNetworkReplayCache("rider_favorite_locations") .setFakeLocation("555 Market St") .build()) } catch { print("Error when launching app") } login(withUsername: "fake_user", password: "fake_password") let favoriteLocationsScreen = FavoriteLocaitonsScreen( testCase: self, application: application) favoriteLocationsScreen.waitForRequiredElements()}

IOS EXAMPLE: NETWORK RECORD/REPLAY

Octopus Demo

http://youtube.com/v/auN09-LxNmg

Octopus Evolution(Rules & Scenario-based testing)

● What are rules?● How do they help?

○ Easier test code○ Shared setup/tearDown code logic○ Abstract out the complicated logic from tests

UBER RULES

● Experiments● Location● Accounts● Animations● App reset

RULES EXAMPLES

EXAMPLE: LOCATION RULE

@Location (device = TestLocation.CHICAGO_SUBURB)@Replay@Testpublic void whenValidAccount_shouldBeAbleToSignIn() {

WelcomeLayout.verifyLayout().clickLogin();

LoginLayout.verifyLayout().signInWith(Account.GENERIC.getEmail(),

Account.GENERIC.getPassword());

RideLayout.verifyLayout();}

● What is scenario-based testing?● How does it help?

○ Speeds up the test run○ Easier to maintain○ Deterministic tests

SCENARIO-BASED TESTING

SCENARIO-BASED TESTING: ANDROID EXAMPLE Only when recording

Actual test

public void cancelTripRequest() {

replayScenarioManager.doWhenRecording(new Interactions() { @Override public void perform() { AcceleratorsLayout.verifyLayout().clickItem(0); ProductSelectionLayout.verifyLayout().swipeToIndex(1) .clickProduct(R.string.uberx); ConfirmationLayout.verifyLayout().clickGo(); } });

// The test will start from here in replay mode. DispatchingLayout.verifyLayout().clickCancel(); ConfirmationLayout.verifyLayout(); }

Special thanks to mobile-test-platform@, octopus-eng@, mobile-eng@ and all the amazing people at Uber :)Uber eng bloghttps://eng.uber.com/rescued-by-octopus/

GTAC talk on Octopushttps://www.youtube.com/watch?v=p6gsssppeT0

How an "ideal" CI setup for mobile looks like

Valera Zakharov

Ideal =

Reliable

Scalable

Performant

Controllable

Debuggable

On Mobile

Different screen resolutions

Different OS versions

On Android

Need Android Runtime to run instrumentation tests

$ adb shell am instrument -w <test_package_name>/<runner_class>

Basic Building Blocks

Orchestration DeviceADB

Host (build node)

Ideal Orchestration

Obtains/releases a device

Sets up the device for testing

Issues ‘adb shell am instrument’ calls for each test

Collects debug info

Ideal Android Debug Bridge

Issue shell commands

Install apks

Push data to device

Pull data from device

Does all of the above reliably and quickly

Device

Ideal Test Device

Starts reliably and quickly

Runs as fast as top-of the line phone

Allows you to scale

Provides tests with control over system settings

Test Services

Prevents UI popups

IActivityController

Android Vanilla Setup

Orchestration

Jenkins or another equivalent

./gradlew connectedCheck

Adb

Stock

Device

Locally connected emulator or physical device

Vanilla Setup Problems

Orchestration

one process for all tests

no sharding

no separate logcat/debug info

Adb

Slow

Flaky

Vanilla Setup Problems

Device

Not test friendly

Physical devices

not designed for CI usage

hard to maintain and scale

Emulator:

hard to configure properly

system images not well maintained

“fast” version can’t run on AWS

Ideal =

Reliable

Scalable

Performant

Controllable

Debuggable

Google’s Internal Setup

Orchestration

Custom (hooked into the internal build system)

Adb

adb.turbo ™

Device

Stock x86 emulator running in dedicated data-center machines

Properly configured/maintained

Config abstracted away by a script

Includes special services for testing

Custom ActivityController for dismissing UI popups

Test services for tweaking system settings, screenshots, etc

Existing services come up short

Firebase Google Cloud Test Lab

Runs all tests in one instrumentation

Virtual Devices are slow

Not debuggable

Xamarin

Doesn’t run instrumentation tests

AWS

Runs all tests in one instrumentation

Only real devices available

So… now what

¯\_(ツ )_/¯

Some thoughts...

Orchestration

Tools for discovering tests in an apk (based on dexdump)

An alternative to ./gradlew connectedCheck

android_test --apk my.apk --tests test.apk

ADB

adb.turbo (or equivalent) if planning to run at Google-scale

Device

Custom IActivityController

No go on physical devices

Test Services works well as a community project

Some parts won’t work on physical devices

To scale, need a fast emulator in the cloud

Genymotion cloud?

User Mode Linux for Android?

Faster Tests with Test Parallelization JUSTIN MARTIN & DANIEL RIBEIRO

Too many tests? No such thing!

~150 minutes

~50 minutes

MACHINE TIME WALL TIME

Distributed vs. Non-distributed

~150 minutes

~125 minutes

~50 minutes


NON-DISTRIBUTED DISTRIBUTED


~150 minutes

~125 minutes

~50 minutes


~20 minutes


NON-DISTRIBUTED DISTRIBUTED

Separation of Powers

BUILD TEST

Separation of Powers

BUILD MAC PROS

TEST MAC MINIS

Reruns Are Even Faster!

PRO TIME MINI TIME

MACHINE TIME (RERUN)

WALL TIME (RERUN)

XCKnife

FASTER DISTRIBUTED TESTS FOR iOSGITHUB.COM/SQUARE/XCKNIFE

SQU.RE/XCKNIFE

Each iteration takes a test class from the queue and assigns it to the machine that is the least busy

XCKnife Balancing

Balanced Sharding

Approximate

Key takeaways

Disclaimer

Diminishing Returns

Testing Pyramid

Testing Matrix

Many devices

Testing Matrix:

OS versions

Testing Matrix:

Orientations

Testing Matrix:

localizations

Testing Matrix:

Test Explosion

It is distributed

Availability

Operations matter

Test your infrastructure

Everything breaks

When you fix, please open source it

Toolchain

UXMatters

Bad actors

Justin Martin

Daniel Ribeiro

[email protected]

[email protected]

3x3: Speeding Up Mobile Releases

Drew Hannay (@drewhannay)

Project VoyagerNew version of

flagship LinkedIn app

250+ committers across Android & iOS

~1 year of development

Investment in mobile infrastructure at LinkedIn

Before Voyager12 releases per year

RC build + manual regression suite

Mad dash to commit code before the RC cutoffMissing the cutoff meant a long wait for the next release

Product & marketing plans were made around the monthly releases

Hard to iterate on member feedback

3x3Release three times per day, no more than three hours from code commit to member

availability

Why three hours?Not enough time for manual testing steps

Not enough time to test everythingThe goal isn’t 100% automation, it’s faster iterations

We don’t want engineers spending most of their time maintaining tests that break whenever a design changes

UI tests are chosen based on production-critical business flows

Faster iteration helps emphasize craftsmanshipDevs can take the extra time to write quality code since the next release is soon

Commit Pipeline

CodeReview

StaticAnalysis

UnitTests

BuildReleaseArtifacts

UITests

AlphaRelease

FeatureDevelopment

Production Release

BetaRelease

Static analysisCompile-time contract with API server using Rest.li

Rest.li data templates are shared between API server & clients

Provides static analysis checks that guarantee backwards compatibility

Client models are code generated for further safety

Java Checkstyle

Android LintOver 200 checks provided by Google

Several custom checks written for LinkedIn-specific patterns

Swift LintForked version of Realm’s SwiftLint

Added custom checks for LinkedIn patterns

Building the codeOver 500k lines of code between Android & iOS

Building production binaries for a large codebase is slow

iOS & SwiftAt one point early on, Swift compilation took over two hours

Refactoring into sub-projects and modules lead to a more than 50% speed up

Android Split APKsSeparate binary for each combination of screen density and CPU architecture

Distributed buildsBuild the release binaries on separate machines while tests are running

What do we test?Unit tests

Layout testsUnit tests for views

Stress test views with long strings, short strings

Make sure views don’t overlap, and render properly in right-to-left mode

Scenario testsValidate that key business metric flows are working properly

Usually flows that span multiple screens in the app

App gets mock data from a local fixture server

Not an exhaustive suite

Test stabilityUI tests use Android Espresso & iOS KIF frameworks

Needed to create a consistent test environment across dev & build machines

AndroidSelf-contained, standardized Android emulator bundle

Custom test harness that runs one test per application launch

iOSKIF contained hard-coded waits that passed on dev boxes, but failed on slower build

servers

Forked KIF to convert to a checkpoint-based system,where the code tells the test when to proceed to the next step

Test speedAndroid

Use Espresso’s IdlingResource API to avoid sleeps and waits

Run up to 16 Android emulators in parallel on a single build machine

Custom test harness allows optimal test parallelization

iOSRefactoring KIF cut UI testing time by more than 80%

Distributed testing -> Shard tests across multiple machinesSignificantly faster, but led to greater exposure to any tooling instability

Nontrivial overhead in starting a child job

Android multi-emulator test run

iOS KIF refactoring

iOS multi-simulator testing

Partner teamsHistorically, several partner teams validated

the build before a release

For example, we needed sign off from the localization team

Lint checks catch hardcoded or improperly formatted strings

Layout tests catch strings that are too long and RTL layout bugs

Semantic correctness of translations is still validated by translators manually

Getting to membersEvery three hours, internal alpha testers get a new build

Mainly members of the Flagship team

Product managers, devs, and execs who want to see the latest code ASAP

Every week, the rest of the company gets a new beta buildiOS build is submitted to Apple for review

After a week of beta, the build is promoted to productionAssuming Apple’s review is complete, iOS is released

Take advantage of Google Play staged rollout for Android

DogfoodingAndroid: Google Play alpha/beta channel

Easy upgrades for employees, even while off the corporate network

Somewhat difficult to get set up, but easy once registered

iOS: TestFlightNice, but limited number of users

iOS: Custom enterprise distributionScales to our number of users, but employees must be on corporate wifi to upgrade

Splash screen in the production app encourages employees to use beta builds

Minimizing risk & enabling experimentsTake advantage of LinkedIn’s existing A/B testing infrastructure

New features are developed behind feature flagsCode can be ramped dynamically to different groups of members

Performance of new features or changes can be monitored

Dynamic configuration

Server-controlled kill switchCrashing or buggy code can often be disabled without a new build

3x3 after 6 months: areas to improveRelease automation

Production uploads to the app stores are still a manual process

Getting release notes & translations is painful

Automated performance testingWe can sample performance of the app in production,

but don’t have a great way of catching issues before release

Android Monkey testingEnables wide range of API level & device coverage with very low overhead cost

iOS speed improvementsKeep up with Swift evolution

Bring 3x3 framework to other LinkedIn apps

Questions

3x3 blogs & videos3x3: Speeding up mobile releases

3x3: iOS Build Speed and StabilityTest Stability - How We Make UI Tests Stable

UI Automation: Keep it Functional - and Stable!

Consistent Android Testing Environments with Gradle (slides)

Effective Layout Testing Library for iOS

Managing iOS Continuous Integration at Enterprise Scale

https://engineering.linkedin.com/blog/2016/02/3x3--speeding-up-mobile-releases

https://engineering.linkedin.com/blog/2015/12/test-stability---how-we-make-ui-tests-stable

https://engineering.linkedin.com/blog/2015/12/test-stability---how-we-make-ui-tests-stable

https://engineering.linkedin.com/blog/2016/01/ui-automation--keep-it-functional--and-stable-

https://www.youtube.com/watch?v=lLZ379EqLrw

http://www.slideshare.net/DrewHannay/linkedin-gradle-android

https://engineering.linkedin.com/blog/2016/01/effective-layout-testing-library-for-ios

https://engineering.linkedin.com/blog/2015/12/managing-ios-continuous-integration-at-enterprise-scale