uber mobility meetup: mobile testing
TRANSCRIPT
Evolution of Octopus
Bian Jiang, Nirav NagdaUber Mobility MeetupJuly 7, 2016
From cross-app/cross device testing, to network record/replay, to scenario-based testing
Soon after we started at Uber, we encountered a fresh testing challenge. Can you guess what it is?
Our Uber Challenge
Octopus: Our platform agnostic test runner
GTAC 2015Octopus Takes on the Uber Challenge of Cross-app/Cross-device Testinghttps://www.youtube.com/watch?v=p6gsssppeT0
Uber Eng Bloghttps://eng.uber.com/rescued-by-octopus/
EndToEnd (L)
Scenario-based Tests (M)
Unit Tests (S)
Top: small number of large end to end tests (dozens)Middle: middling amount of medium integration tests (hundreds)Bottom: large number of small unit tests (thousands)
PREREQUISITES: HEALTHY TEST PYRAMID
Octopus Evolution(Hermetic Tests)
Record mode:- Records the network responses
Replay mode:- Plays back the network responses from local files
HERMETIC TESTS: USING NETWORK RECORD/REPLAY
ANDROID EXAMPLE: NETWORK RECORD/REPLAY
@Test@Replaypublic void whenDestinationPromptTap_shouldGoToLocationEditor() { HomeLayout .verifyLayout() .clickDestinationPrompt();}
(record = true)
func testFavoriteLocationsScreen() { do { launchAppWithOctopusConfig(try OctopusConfigBuilder() .setNetworkReplayCache("rider_favorite_locations") .setFakeLocation("555 Market St") .build()) } catch { print("Error when launching app") } login(withUsername: "fake_user", password: "fake_password") let favoriteLocationsScreen = FavoriteLocaitonsScreen( testCase: self, application: application) favoriteLocationsScreen.waitForRequiredElements()}
IOS EXAMPLE: NETWORK RECORD/REPLAY
Octopus Demo
Octopus Evolution(Rules & Scenario-based testing)
● What are rules?● How do they help?
○ Easier test code○ Shared setup/tearDown code logic○ Abstract out the complicated logic from tests
UBER RULES
● Experiments● Location● Accounts● Animations● App reset
RULES EXAMPLES
EXAMPLE: LOCATION RULE
@Location (device = TestLocation.CHICAGO_SUBURB)@Replay@Testpublic void whenValidAccount_shouldBeAbleToSignIn() {
WelcomeLayout.verifyLayout().clickLogin();
LoginLayout.verifyLayout().signInWith(Account.GENERIC.getEmail(),
Account.GENERIC.getPassword());
RideLayout.verifyLayout();}
● What is scenario-based testing?● How does it help?
○ Speeds up the test run○ Easier to maintain○ Deterministic tests
SCENARIO-BASED TESTING
SCENARIO-BASED TESTING: ANDROID EXAMPLE Only when recording
Actual test
public void cancelTripRequest() {
replayScenarioManager.doWhenRecording(new Interactions() { @Override public void perform() { AcceleratorsLayout.verifyLayout().clickItem(0); ProductSelectionLayout.verifyLayout().swipeToIndex(1) .clickProduct(R.string.uberx); ConfirmationLayout.verifyLayout().clickGo(); } });
// The test will start from here in replay mode. DispatchingLayout.verifyLayout().clickCancel(); ConfirmationLayout.verifyLayout(); }
Special thanks to mobile-test-platform@, octopus-eng@, mobile-eng@ and all the amazing people at Uber :)Uber eng bloghttps://eng.uber.com/rescued-by-octopus/
GTAC talk on Octopushttps://www.youtube.com/watch?v=p6gsssppeT0
How an "ideal" CI setup for mobile looks like
Valera Zakharov
Ideal =
Reliable
Scalable
Performant
Controllable
Debuggable
On Mobile
Different screen resolutions
Different OS versions
On Android
Need Android Runtime to run instrumentation tests
$ adb shell am instrument -w <test_package_name>/<runner_class>
Basic Building Blocks
Orchestration DeviceADB
Host (build node)
Ideal Orchestration
Obtains/releases a device
Sets up the device for testing
Issues ‘adb shell am instrument’ calls for each test
Collects debug info
Ideal Android Debug Bridge
Issue shell commands
Install apks
Push data to device
Pull data from device
Does all of the above reliably and quickly
Device
Ideal Test Device
Starts reliably and quickly
Runs as fast as top-of the line phone
Allows you to scale
Provides tests with control over system settings
Test Services
Prevents UI popups
IActivityController
Android Vanilla Setup
Orchestration
Jenkins or another equivalent
./gradlew connectedCheck
Adb
Stock
Device
Locally connected emulator or physical device
Vanilla Setup Problems
Orchestration
one process for all tests
no sharding
no separate logcat/debug info
Adb
Slow
Flaky
Vanilla Setup Problems
Device
Not test friendly
Physical devices
not designed for CI usage
hard to maintain and scale
Emulator:
hard to configure properly
system images not well maintained
“fast” version can’t run on AWS
Ideal =
Reliable
Scalable
Performant
Controllable
Debuggable
Google’s Internal Setup
Orchestration
Custom (hooked into the internal build system)
Adb
adb.turbo ™
Device
Stock x86 emulator running in dedicated data-center machines
Properly configured/maintained
Config abstracted away by a script
Includes special services for testing
Custom ActivityController for dismissing UI popups
Test services for tweaking system settings, screenshots, etc
Existing services come up short
Firebase Google Cloud Test Lab
Runs all tests in one instrumentation
Virtual Devices are slow
Not debuggable
Xamarin
Doesn’t run instrumentation tests
AWS
Runs all tests in one instrumentation
Only real devices available
So… now what
¯\_(ツ )_/¯
Some thoughts...
Orchestration
Tools for discovering tests in an apk (based on dexdump)
An alternative to ./gradlew connectedCheck
android_test --apk my.apk --tests test.apk
ADB
adb.turbo (or equivalent) if planning to run at Google-scale
Device
Custom IActivityController
No go on physical devices
Test Services works well as a community project
Some parts won’t work on physical devices
To scale, need a fast emulator in the cloud
Genymotion cloud?
User Mode Linux for Android?
Faster Tests with Test Parallelization JUSTIN MARTIN & DANIEL RIBEIRO
Too many tests? No such thing!
~150 minutes
~50 minutes
MACHINE TIME WALL TIME
Distributed vs. Non-distributed
~150 minutes
~125 minutes
~50 minutes
MACHINE TIME WALL TIME
NON-DISTRIBUTED DISTRIBUTED
Distributed vs. Non-distributed
~150 minutes
~125 minutes
~50 minutes
MACHINE TIME WALL TIME
~20 minutes
Distributed vs. Non-distributed
NON-DISTRIBUTED DISTRIBUTED
Separation of Powers
BUILD TEST
Separation of Powers
BUILD MAC PROS
TEST MAC MINIS
Reruns Are Even Faster!
PRO TIME MINI TIME
MACHINE TIME (RERUN)
WALL TIME (RERUN)
XCKnife
FASTER DISTRIBUTED TESTS FOR iOSGITHUB.COM/SQUARE/XCKNIFE
SQU.RE/XCKNIFE
Each iteration takes a test class from the queue and assigns it to the machine that is the least busy
XCKnife Balancing
Balanced Sharding
Approximate
Key takeaways
Disclaimer
Diminishing Returns
Testing Pyramid
Testing Pyramid
Testing Matrix
Many devices
Testing Matrix:
OS versions
Testing Matrix:
Orientations
Testing Matrix:
localizations
Testing Matrix:
Test Explosion
It is distributed
Availability
Operations matter
Test your infrastructure
Everything breaks
When you fix, please open source it
Toolchain
UXMatters
Bad actors
3x3: Speeding Up Mobile Releases
Drew Hannay (@drewhannay)
Project VoyagerNew version of
flagship LinkedIn app
250+ committers across Android & iOS
~1 year of development
Investment in mobile infrastructure at LinkedIn
Before Voyager12 releases per year
RC build + manual regression suite
Mad dash to commit code before the RC cutoffMissing the cutoff meant a long wait for the next release
Product & marketing plans were made around the monthly releases
Hard to iterate on member feedback
3x3Release three times per day, no more than three hours from code commit to member
availability
Why three hours?Not enough time for manual testing steps
Not enough time to test everythingThe goal isn’t 100% automation, it’s faster iterations
We don’t want engineers spending most of their time maintaining tests that break whenever a design changes
UI tests are chosen based on production-critical business flows
Faster iteration helps emphasize craftsmanshipDevs can take the extra time to write quality code since the next release is soon
Commit Pipeline
CodeReview
StaticAnalysis
UnitTests
BuildReleaseArtifacts
UITests
AlphaRelease
FeatureDevelopment
Production Release
BetaRelease
Commit Pipeline
CodeReview
StaticAnalysis
UnitTests
BuildReleaseArtifacts
UITests
AlphaRelease
FeatureDevelopment
Production Release
BetaRelease
Static analysisCompile-time contract with API server using Rest.li
Rest.li data templates are shared between API server & clients
Provides static analysis checks that guarantee backwards compatibility
Client models are code generated for further safety
Java Checkstyle
Android LintOver 200 checks provided by Google
Several custom checks written for LinkedIn-specific patterns
Swift LintForked version of Realm’s SwiftLint
Added custom checks for LinkedIn patterns
Building the codeOver 500k lines of code between Android & iOS
Building production binaries for a large codebase is slow
iOS & SwiftAt one point early on, Swift compilation took over two hours
Refactoring into sub-projects and modules lead to a more than 50% speed up
Android Split APKsSeparate binary for each combination of screen density and CPU architecture
Distributed buildsBuild the release binaries on separate machines while tests are running
What do we test?Unit tests
Layout testsUnit tests for views
Stress test views with long strings, short strings
Make sure views don’t overlap, and render properly in right-to-left mode
Scenario testsValidate that key business metric flows are working properly
Usually flows that span multiple screens in the app
App gets mock data from a local fixture server
Not an exhaustive suite
Test stabilityUI tests use Android Espresso & iOS KIF frameworks
Needed to create a consistent test environment across dev & build machines
AndroidSelf-contained, standardized Android emulator bundle
Custom test harness that runs one test per application launch
iOSKIF contained hard-coded waits that passed on dev boxes, but failed on slower build
servers
Forked KIF to convert to a checkpoint-based system,where the code tells the test when to proceed to the next step
Test speedAndroid
Use Espresso’s IdlingResource API to avoid sleeps and waits
Run up to 16 Android emulators in parallel on a single build machine
Custom test harness allows optimal test parallelization
iOSRefactoring KIF cut UI testing time by more than 80%
Distributed testing -> Shard tests across multiple machinesSignificantly faster, but led to greater exposure to any tooling instability
Nontrivial overhead in starting a child job
Android multi-emulator test run
iOS KIF refactoring
iOS multi-simulator testing
Partner teamsHistorically, several partner teams validated
the build before a release
For example, we needed sign off from the localization team
Lint checks catch hardcoded or improperly formatted strings
Layout tests catch strings that are too long and RTL layout bugs
Semantic correctness of translations is still validated by translators manually
Getting to membersEvery three hours, internal alpha testers get a new build
Mainly members of the Flagship team
Product managers, devs, and execs who want to see the latest code ASAP
Every week, the rest of the company gets a new beta buildiOS build is submitted to Apple for review
After a week of beta, the build is promoted to productionAssuming Apple’s review is complete, iOS is released
Take advantage of Google Play staged rollout for Android
DogfoodingAndroid: Google Play alpha/beta channel
Easy upgrades for employees, even while off the corporate network
Somewhat difficult to get set up, but easy once registered
iOS: TestFlightNice, but limited number of users
iOS: Custom enterprise distributionScales to our number of users, but employees must be on corporate wifi to upgrade
Splash screen in the production app encourages employees to use beta builds
Minimizing risk & enabling experimentsTake advantage of LinkedIn’s existing A/B testing infrastructure
New features are developed behind feature flagsCode can be ramped dynamically to different groups of members
Performance of new features or changes can be monitored
Dynamic configuration
Server-controlled kill switchCrashing or buggy code can often be disabled without a new build
3x3 after 6 months: areas to improveRelease automation
Production uploads to the app stores are still a manual process
Getting release notes & translations is painful
Automated performance testingWe can sample performance of the app in production,
but don’t have a great way of catching issues before release
Android Monkey testingEnables wide range of API level & device coverage with very low overhead cost
iOS speed improvementsKeep up with Swift evolution
Bring 3x3 framework to other LinkedIn apps
Questions
3x3 blogs & videos3x3: Speeding up mobile releases
3x3: iOS Build Speed and StabilityTest Stability - How We Make UI Tests Stable
UI Automation: Keep it Functional - and Stable!
Consistent Android Testing Environments with Gradle (slides)
Effective Layout Testing Library for iOS
Managing iOS Continuous Integration at Enterprise Scale