a byom (bring your own metrics) platform

63
A/B testing at Uber A BYOM (Bring Your Own Metrics) Platform Milène Darnis Product Manager, Experimentation

Upload: others

Post on 04-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A BYOM (Bring Your Own Metrics) Platform

A/B testing at Uber A BYOM (Bring Your Own Metrics) Platform

Milène Darnis Product Manager, Experimentation

Page 2: A BYOM (Bring Your Own Metrics) Platform

Overview of A/B testing at Uber1

2Decoupling experimentation events from business metrics

5 Conclusion

4 Future work

3 Extending the platform

Page 3: A BYOM (Bring Your Own Metrics) Platform

Overview of A/B testing at Uber1

5 Conclusion

4 Future work

3 Extending the platform

2Decoupling experimentation events from business metrics

Page 4: A BYOM (Bring Your Own Metrics) Platform

1 Million Taxi Trips in NYC Data Source: www.nyc.gov

Uber’s mission is to ignite opportunity by setting the world in motion.

Page 5: A BYOM (Bring Your Own Metrics) Platform

1 Million Taxi Trips in NYC Data Source: www.nyc.gov

In order to do this, we run thousands of experiments every year.

Uber’s mission is to ignite opportunity by setting the world in motion.

Page 6: A BYOM (Bring Your Own Metrics) Platform

Where we experiment

Page 7: A BYOM (Bring Your Own Metrics) Platform

Where we experiment

Backend Python, Java, Go

Mobile iOS, Android

Page 8: A BYOM (Bring Your Own Metrics) Platform

Where we experiment

Backend Python, Java, Go

Mobile iOS, Android

Page 9: A BYOM (Bring Your Own Metrics) Platform

Where we experiment

Backend Python, Java, Go

Mobile iOS, Android

Page 10: A BYOM (Bring Your Own Metrics) Platform

What we experiment on

User facing features

Page 11: A BYOM (Bring Your Own Metrics) Platform

What we experiment on

User facing features

Page 12: A BYOM (Bring Your Own Metrics) Platform

What we experiment on

User facing features Bug fixes

🐞

Page 13: A BYOM (Bring Your Own Metrics) Platform

Analyzing A/B tests, a seemingly simple problem…

Basic statistics

Metrics computation

Data logging🗒

📊

Page 14: A BYOM (Bring Your Own Metrics) Platform

…but not at Uber’s scale and pace

Page 15: A BYOM (Bring Your Own Metrics) Platform

8Pipelines of 3,000+ lines of SQL

1-2Runs per day

60%Unused metrics

BEFORE We computed all the metrics for all experiments

Page 16: A BYOM (Bring Your Own Metrics) Platform

BEFORE To onboard new metrics

Page 17: A BYOM (Bring Your Own Metrics) Platform

BEFORE To onboard new metrics

For each new metric:

Metricrequest🙋

Experimentation engineers need to understand the metric

🤔

Implementation in code '

Validation of the metric between the two teams👍

Metric is used for experimentation📊

Page 18: A BYOM (Bring Your Own Metrics) Platform

BEFORE We had other problems too

Duplicate efforts across teams

Use of slightly different methodologies

Waste of time and resources

Incomprehension

🗑 🤷

People doing analysis themselves

Page 19: A BYOM (Bring Your Own Metrics) Platform

BEFORE We had other problems too

Duplicate efforts across teams

Use of slightly different methodologies

Waste of time and resources

Incomprehension

🗑 🤷

Our team productivity suffered

Less time for more interesting problems

People doing analysis themselves

Page 20: A BYOM (Bring Your Own Metrics) Platform

Overview of A/B testing at Uber1

5 Conclusion

4 Future work

3 Extending the platform

2Decoupling experimentation events from business metrics

Page 21: A BYOM (Bring Your Own Metrics) Platform

Changing the paradigm

Experimentation team

Experimentation data

Who is seeing which experiment when

Other teams

Team metrics

Self-serve creation and edit of metrics

Page 22: A BYOM (Bring Your Own Metrics) Platform

AFTER Experimentation events: the need for small data

2

Step 1 15B records

Step 2 5B records

Step 3 250M records

1 3

Page 23: A BYOM (Bring Your Own Metrics) Platform

Logging of events happens automatically

Mobile or backend SDK

Experimentation service

get_treatment_result(experiment_name, user_id=user_id)

control_group

Page 24: A BYOM (Bring Your Own Metrics) Platform

Logging of events happens automatically

Mobile or backend SDK

Experimentation service

get_treatment_result(experiment_name, user_id=user_id)

control_group

Kafka

• timestamp • experiment_name • user_id • treatment_group

→ Approx. 15 billion records emitted daily. → 1.5 TB of data.

Page 25: A BYOM (Bring Your Own Metrics) Platform

30 minutes Spark Jobs average runtime

Step 2: Deduplication using Spark 1 2▶ 3

Kafka

HDFS

Hive

→ Down to approx. 5 billion records daily Partitioned by date

Page 26: A BYOM (Bring Your Own Metrics) Platform

TIMESTAMP USER_ID EXPERIMENT_NAME TREATMENT_GROUP

2018-09-12 10:12:45 42986 experiment_abc treatment_1

2018-09-12 11:13:27 32989 experiment_abc treatment_2

2018-09-12 11:17:45 98829 experiment_abc control

2018-09-12 14:45:34 98397 experiment_abc control

2018-09-12 14:38:28 42986 experiment_abc treatment_1

1 2 3Still a lot of records

Page 27: A BYOM (Bring Your Own Metrics) Platform

1st user

2nd user

Entered the experiment

treatment

treatment control

1 2 ▶ 3Step 3: only keeping the relevant data

Page 28: A BYOM (Bring Your Own Metrics) Platform

1st user

2nd user

Entered the experiment

treatment

treatment control

1 2 ▶ 3Step 3: only keeping the relevant data

ENTRY_DATE EXIT_DATE USER_ID EXPERIMENT_NAME* TREATMENT_GROUP

2018/09/01 NULL 1 experiment_abc treatment

2018/09/01 2018/09/30 2 experiment_abc treatment

2018/09/30 NULL 2 experiment_abc control

Page 29: A BYOM (Bring Your Own Metrics) Platform

1st user

2nd user

Entered the experiment

treatment

treatment control

Now 250 million daily records

1 2 ▶ 3Step 3: only keeping the relevant data

ENTRY_DATE EXIT_DATE USER_ID EXPERIMENT_NAME* TREATMENT_GROUP

2018/09/01 NULL 1 experiment_abc treatment

2018/09/01 2018/09/30 2 experiment_abc treatment

2018/09/30 NULL 2 experiment_abc control

Page 30: A BYOM (Bring Your Own Metrics) Platform

1st user

2nd user

Entered the experiment

treatment

treatment control

* PARTITION COLUMN

Now 250 million daily records

1 2 ▶ 3Step 3: only keeping the relevant data

ENTRY_DATE EXIT_DATE USER_ID EXPERIMENT_NAME* TREATMENT_GROUP

2018/09/01 NULL 1 experiment_abc treatment

2018/09/01 2018/09/30 2 experiment_abc treatment

2018/09/30 NULL 2 experiment_abc control

Page 31: A BYOM (Bring Your Own Metrics) Platform

Takeaway #1

Whenever possible, present the data in a format that is easy to consume, not easy to compute.

Page 32: A BYOM (Bring Your Own Metrics) Platform

Supported:

• Hive • Presto • Vertica

Letting people define their own experimentation metrics

Page 33: A BYOM (Bring Your Own Metrics) Platform

Tying it all together

3 metrics selected

Page 34: A BYOM (Bring Your Own Metrics) Platform

Tying it all together

3 metrics selected

metric_1

Page 35: A BYOM (Bring Your Own Metrics) Platform

Tying it all together

3 metrics selected

metric_1

metric_2

metric_3

metric_1

Page 36: A BYOM (Bring Your Own Metrics) Platform

2.9 minutes Average runtime of a metric

5.75 metrics Average number of metrics per report

Everything happens asynchronously

Page 37: A BYOM (Bring Your Own Metrics) Platform

Before vs. After

Freshness ⭕ 24 hours ✅ 4 hours

Metrics selection ⭕ Static set ✅ BYOM

Metrics onboarding ⭕ Engineering work required

✅ No engineering work needed

Resources ⭕ Computation of irrelevant metrics

✅ Computation of what’s needed only

Speed ✅ Faster ⭕ Slower

Before After

Page 38: A BYOM (Bring Your Own Metrics) Platform

User delight, making the wait pleasant

Page 39: A BYOM (Bring Your Own Metrics) Platform

User delight, making the wait pleasant

Page 40: A BYOM (Bring Your Own Metrics) Platform

Takeaway #2

Balance speed versus flexibility based on user needs.

Page 41: A BYOM (Bring Your Own Metrics) Platform

Overview of A/B testing at Uber1

5 Conclusion

4 Future work

3 Extending the platform

2Decoupling experimentation events from business metrics

Page 42: A BYOM (Bring Your Own Metrics) Platform

It’s now easy to add new features

Data Engineering Data Science

Page 43: A BYOM (Bring Your Own Metrics) Platform

ENTRY_DATE EXIT_DATE USER_ID EXPERIMENT_NAME TREATMENT_GROUP

2018/09/01 NULL 1 experiment_abc treatment_1

2018/09/01 2018/09/30 2 experiment_abc treatment_2

Data engineering: slicing and dicing of results

Page 44: A BYOM (Bring Your Own Metrics) Platform

ENTRY_DATE EXIT_DATE USER_ID EXPERIMENT_NAME TREATMENT_GROUP

2018/09/01 NULL 1 experiment_abc treatment_1

2018/09/01 2018/09/30 2 experiment_abc treatment_2

=Enriched experimentation logs

join on user_id = user_id and to_date(entry_date) = date

DATE USER_ID CITY

2018/09/01 1 Veracruz

2018/09/01 2 Medellin

Data engineering: slicing and dicing of results

Page 45: A BYOM (Bring Your Own Metrics) Platform

Additional run time: +35% on average

Segmented results for a fraction of time

Page 46: A BYOM (Bring Your Own Metrics) Platform

Data Science innovation

Outlier removal Removes irregularities in the data, enables more robust results

Page 47: A BYOM (Bring Your Own Metrics) Platform

Data Science innovation

Outlier removal Removes irregularities in the data, enables more robust results

Pre-existing bias detection and correction Using CUPED method to adjust results and increase statistical power

Page 48: A BYOM (Bring Your Own Metrics) Platform

Takeaway #3

Instead of trying to do it all, do what you are great at and build an infrastructure that lets others add the missing pieces.

Page 49: A BYOM (Bring Your Own Metrics) Platform

Overview of A/B testing at Uber1

5 Conclusion

4 Future work

3 Extending the platform

2Decoupling experimentation events from business metrics

Conclusion

Page 50: A BYOM (Bring Your Own Metrics) Platform

What looking for the right metric looks like

Metrics governance

Infinite Scroll by @artrayd

Page 51: A BYOM (Bring Your Own Metrics) Platform

What looking for the right metric looks like

Metrics governance

Infinite Scroll by @artrayd

Page 52: A BYOM (Bring Your Own Metrics) Platform

Filter metrics by:

IMPORTANCE

TEAM

TYPE

DECISION METRICS GUARDRAIL METRICS

Used to conclude an experiment

Making sure experiments don’t introduce regressions

Uber for Business

High

A vast but organized catalog of metrics

Page 53: A BYOM (Bring Your Own Metrics) Platform

Self-serve metrics != metrics as an afterthought

Page 54: A BYOM (Bring Your Own Metrics) Platform

Self-serve metrics != metrics as an afterthought

Page 55: A BYOM (Bring Your Own Metrics) Platform

Upcoming work

Page 56: A BYOM (Bring Your Own Metrics) Platform

Overview of A/B testing at Uber1

5 Conclusion

4 Future work

3 Extending the platform

2Decoupling experimentation events from business metrics

Page 57: A BYOM (Bring Your Own Metrics) Platform

>> Check uber.com/careers

Data Scientists All levels

Software Engineers All levels

We’re hiring!

Page 58: A BYOM (Bring Your Own Metrics) Platform

Egor Gryaznov

Suman Bhattacharya

Tianxia Zhou

Spencer Lin

Colin Reid

David Schnurr

Acknowledgments

Page 59: A BYOM (Bring Your Own Metrics) Platform

If you only remember three things

Page 60: A BYOM (Bring Your Own Metrics) Platform

01. Easy to consume data > Easy to compute data

If you only remember three things

Page 61: A BYOM (Bring Your Own Metrics) Platform

01. Easy to consume data > Easy to compute data

02. Speed ↔ Flexibility tradeoff

If you only remember three things

Page 62: A BYOM (Bring Your Own Metrics) Platform

01. Easy to consume data > Easy to compute data

02. Speed ↔ Flexibility tradeoff

03. Leverage your strengths, build products that users can contribute to

If you only remember three things

Page 63: A BYOM (Bring Your Own Metrics) Platform

Thank you!Milène Darnis <[email protected]>

Proprietary and confidential © 2018 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber.