bighead - data council council...airbnb’s product a global travel community that offers magical...

66
Bighead Airbnb’s End-to-End Machine Learning Infrastructure Andrew Hoh ML Infra @ Airbnb

Upload: others

Post on 20-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

BigheadAirbnb’s End-to-End

Machine Learning Infrastructure

Andrew HohML Infra @ Airbnb

Page 2: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Background Design Goals ArchitectureDeep Dive

Open Source

Page 3: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Background

Page 4: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Airbnb’s Product

A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet.

Page 5: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Airbnb is already driven by Machine Learning

Search Ranking Smart Pricing Fraud Detection

Page 6: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

But there are *many* more opportunities for ML

● Paid Growth - Hosts

● Classifying / Categorizing Listings

● Experience Ranking + Personalization

● Room Type Categorizations

● Customer Service Ticket Routing

● Airbnb Plus

● Listing Photo Quality

● Object Detection - Amenities

● ....

Page 7: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Intrinsic Complexities with Machine Learning

● Understanding the business domain

● Selecting the appropriate Model

● Selecting the appropriate Features

● Fine tuning

Page 8: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Incidental Complexities with Machine Learning

● Integrating with Airbnb’s Data Warehouse

● Scaling model training & serving

● Keeping consistency between: Prototyping vs Production, Training vs Inference

● Keeping track of multiple models, versions, experiments

● Supporting iteration on ML models

→ ML models take on average 8 to 12 weeks to build

→ ML workflows tended to be slow, fragmented, and brittle

Page 9: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

The ML Infrastructure Team addresses these challenges

Vision

Airbnb routinely ships ML-powered features

throughout the product.

Mission

Equip Airbnb with shared technology to build

production-ready ML applications with no

incidental complexity.

Page 10: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Supporting the Full ML Lifecycle

Page 11: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Bighead: Design Goals

Page 12: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Scalable

Seamless Versatile

Consistent

Page 13: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Seamless

● Easy to prototype, easy to productionize

● Same workflow across different frameworks

Page 14: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Versatile

● Supports all major ML frameworks

● Meets various requirements○ Online and Offline○ Data size○ SLA○ GPU training○ Scheduled and Ad hoc

Page 15: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Consistent

● Consistent environment across the stack

● Consistent data transformation

○ Prototyping and Production

○ Online and Offline

Page 16: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Scalable

● Horizontal

● Elastic

Page 17: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Bighead: Architecture Deep Dive

Page 18: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Execution Management: Bighead Library

Environment Management: Docker Image Service

Feature Data Management: Zipline

Bighead Service / UI

Prototyping Lifecycle Management Production

Real Time Inference

Batch Training + Inference

Redspot

ML Automator

Airflow

Deep Thought

Page 19: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Execution Management: Bighead Library

Environment Management: Docker Image Service

Redspot

Feature Data Management: Zipline

Bighead Service / UI

Deep Thought

ML Automator

Prototyping Lifecycle Management Production

Airflow

Real Time Inference

Batch Training + Inference

Page 20: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

RedspotPrototyping with Jupyter Notebooks

Page 21: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Jupyter Notebooks?What are those?

“Creators need an immediate connection to what they are creating.”

- Bret Victor

Page 22: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

● Interactivity and Feedback

● Access to Powerful Hardware

● Access to Data

The ideal Machine Learning development environment?

Page 23: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

● A fork of the JupyterHub project

● Integrated with our Data Warehouse

● Access to specialized hardware (e.g. GPUs)

● File sharing between users via AWS EFS

● Packaged in a familiar Jupyterhub UI

Redspota Supercharged Jupyter Notebook Service

Page 24: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Redspot

Page 25: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Versatile

● Customized Hardware: AWS EC2 Instance Types e.g. P3, X1

● Customized Dependencies: Docker Images e.g. Py2.7, Py3.6+Tensorflow

Redspota Supercharged Jupyter Notebook Service

Consistent

● Promotes prototyping in the exact environment that your model will use in production

Seamless

● Integrated with Bighead Service & Docker Image Service via APIs & UI widgets

Page 26: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Execution Management: Bighead Library

Environment Management: Docker Image Service

Redspot

Feature Data Management: Zipline

Bighead Service / UI

Deep Thought

ML Automator

Prototyping Lifecycle Management Production

Airflow

Real Time Inference

Batch Training + Inference

Page 27: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Docker Image ServiceEnvironment Customization

Page 28: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

● ML Users have a diverse, heterogeneous set of dependencies

● Need an easy way to bootstrap their own runtime environments

● Need to be consistent with the rest of Airbnb’s infrastructure

Docker Image Service - Why

+

Page 29: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

● Our configuration management solution

● A composition layer on top of Docker

● Includes a customization service that

faces our users

● Promotes Consistency and Versatility

Docker Image Service - Dependency Customization

Page 30: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Execution Management: Bighead Library

Environment Management: Docker Image Service

Redspot

Feature Data Management: Zipline

Bighead Service / UI

Deep Thought

ML Automator

Prototyping Lifecycle Management Production

Airflow

Real Time Inference

Batch Training + Inference

Page 31: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Bighead ServiceModel Lifecycle Management

Page 32: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

● Tracking ML model changes is just as

important as tracking code changes

● ML model work needs to be

reproducible to be sustainable

● Comparing experiments before you

launch models into production is critical

Model Lifecycle Management - why?

Page 33: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But
Page 34: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But
Page 36: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Seamless

● Context-aware visualizations that carry over from the prototyping experience

Bighead Service

Consistent

● Central model management service

● Single source of truth about the state of a model, it’s dependencies, and what’s deployed

Page 37: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Execution Management: Bighead Library

Environment Management: Docker Image Service

Redspot

Feature Data Management: Zipline

Bighead Service / UI

Deep Thought

ML Automator

Prototyping Lifecycle Management Production

Airflow

Real Time Inference

Batch Training + Inference

Page 38: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Bighead Library

Page 39: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

ML Models are highly heterogeneous in

Frameworks Training data

● Data quality

● Structured vs Unstructured (image, text)

Environment

● GPU vs CPU

● Dependencies

Page 40: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

ML Models are hard to keep consistent

● Data in production is different from data in training

● Offline pipeline is different from online pipeline

● Everyone does everything in a different way

Page 41: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Versatile

● Pipeline on steroids - compute graph for preprocessing / inference / training / evaluation / visualization

● Composable, Reusable, Shareable

● Support popular frameworks

Bighead Library

Consistent

● Uniform API

● Serializable - same pipeline used in training, offline inference, online inference

● Fast primitives for preprocessing

● Metadata for trained models

Page 42: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Bighead Library: ML Pipeline

Page 43: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Visualization - Pipeline

Page 44: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Easy to Serialize/Deserialize

Page 45: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Visualization - Training Data

Page 46: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Visualization - Transformer

Page 47: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Execution Management: Bighead Library

Environment Management: Docker Image Service

Redspot

Feature Data Management: Zipline

Bighead Service / UI

Deep Thought

ML Automator

Prototyping Lifecycle Management Production

Airflow

Real Time Inference

Batch Training + Inference

Page 48: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Deep ThoughtOnline Inference

Page 49: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Easy to do

● Data scientists can’t launch models without engineer team

● Engineers often need to rebuild models

Hard to make online model serving...

Scalable

● Resource requirements varies across models

● Throughput fluctuates across time

Consistent with training

● Different data

● Different pipeline

● Different dependencies

Page 50: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Seamless

● Integration with event logging, dashboard

● Integration with Zipline

Deep Thought

Consistent

● Docker + Bighead Library: Same data source, pipeline, environment from training

Scalable

● Kubernetes: Model pods can easily scale

● Resource segregation across models

Page 51: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But
Page 52: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Execution Management: Bighead Library

Environment Management: Docker Image Service

Redspot

Feature Data Management: Zipline

Bighead Service / UI

Deep Thought

ML Automator

Prototyping Lifecycle Management Production

Airflow

Real Time Inference

Batch Training + Inference

Page 53: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

ML AutomatorOffline Training and Batch Inference

Page 54: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Automated training, inference, and evaluation are necessary

● Scheduling

● Resource allocation

● Saving results

● Dashboards and alerts

● Orchestration

ML Automator - Why

Page 55: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Seamless

● Automate tasks via Airflow: Generate DAGs for training, inference, etc. with appropriate resources

● Integration with Zipline for training and scoring data

ML Automator

Consistent

● Docker + Bighead Library: Same data source, pipeline, environment across the stack

Scalable

● Spark: Distributed computing for large datasets

Page 56: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

ML Automator

Page 57: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Execution Management: Bighead Library

Environment Management: Docker Image Service

Redspot

Feature Data Management: Zipline

Bighead Service / UI

Deep Thought

ML Automator

Prototyping Lifecycle Management Production

Airflow

Real Time Inference

Batch Training + Inference

Page 58: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

ZiplineML Data Management Framework

Page 59: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Feature management is hard

● Inconsistent offline and online datasets

● Tricky to generate training sets that depend on time correctly

● Slow training sets backfill

● Inadequate data quality checks or monitoring

● Unclear feature ownership and sharing

Page 60: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Seamless

● Integration with Deep Thought and ML Automator

Zipline

Consistent

● Consistent data across training/scoring

● Consistent data across development/production

● Point-in-time correctness across features to prevent label leakage

Scalable

● Leverages Spark and Flink to scale Batch and Streaming workloads

Page 61: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Production Data Stores

Model Scoring

Data Warehouse

Zipline

Model TrainingFeatures

Scoring

Training

Features

Zipline Addresses the Consistency Challenge Between Training and Scoring

Page 62: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Big Summary

End-to-End platform to build and deploy ML models to production that is seamless, versatile, consistent, and scalable

● Model lifecycle management● Feature generation & management● Online & offline inference● Pipeline library supporting major frameworks● Docker image customization service● Multi-tenant training environment

Built on open source technology

● TensorFlow, PyTorch, Keras, MXNet, Scikit-learn, XGBoost● Spark, Jupyter, Kubernetes, Docker, Airflow

Page 63: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

To be Open Sourced

We are selecting our first couple private collaborators. If you are interested, please email

me at [email protected]

Page 64: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Questions?

Page 65: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But

Appendix

Page 66: Bighead - Data Council Council...Airbnb’s Product A global travel community that offers magical end-to-end trips, including where you stay, what you do and the people you meet. But