johann schleier-smith, co-founder and cto, if(we) at mlconf sf
DESCRIPTION
Abstract: Agile Machine Learning for Recommender Systems What can data scientists and machine learning engineers learn from software developers? When it comes to process and tools, and managing complexity, the answer is: quite a bit. When we first started to deploy machine learning at if(we), it felt like we hit a speed bump in the middle of the highway. Accustomed to shipping software to millions of members multiple times a day, to constantly iterating toward better products, we were stunned at how long it took us to try new ideas using available machine learning tools. I will share what what we’ve learned from applying agile software development principles to building recommender systems, describing the tools and platforms that allow us to go from new ideas to proven product improvements in just a few days.TRANSCRIPT
![Page 1: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/1.jpg)
Agile Machine Learning for Real-time Recommender Systems
[email protected]@jssmith github.com/ifweco
Johann Schleier-Smith CTO, if(we)
![Page 2: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/2.jpg)
what it should look like
![Page 3: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/3.jpg)
1. Gain understanding of machine learning
2. Gain understanding of the product usage
3. See opportunity to make the product better
4. Create training data
5. Train predictive models
6. Put models in production
7. See improvements
![Page 4: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/4.jpg)
1. Gain understanding of machine learning
2. Gain understanding of the product usage
3. See opportunity to make the product better
4. Create training data
5. Train predictive models
6. Put models in production
7. See improvements
![Page 5: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/5.jpg)
what it often looks like
![Page 6: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/6.jpg)
1. Gain understanding of machine learning
2. Gain understanding of the product usage
3. See opportunity to make the product better
4. Pull records from database to create interesting features (usually aggregates)
5. Train predictive models
6. Go implement models for production
7. See improvements
![Page 7: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/7.jpg)
1. Gain understanding of machine learning
2. Gain understanding of the product usage
3. See opportunity to make the product better
4. Pull records from database to create interesting features (usually aggregates)
5. Train predictive models
6. Go implement models for production
7. See improvements
3-6 months
![Page 8: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/8.jpg)
1. Gain understanding of machine learning
2. Gain understanding of the product usage
3. See opportunity to make the product better
4. Pull records from database to create interesting features (usually aggregates)
5. Train predictive models
6. Go implement models for production
7. See improvements
![Page 9: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/9.jpg)
1. Gain understanding of machine learning
2. Gain understanding of the product usage
3. See opportunity to make the product better
4. Pull records from database to create interesting features (usually aggregates)
5. Train predictive models
6. Go implement models for production
7. See improvements Cool!
Was it worth it?
![Page 10: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/10.jpg)
• Profitable startup actively pursuing big opportunities in social apps
• Millions of users of existing brands
• Thousands of social contacts per second
![Page 11: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/11.jpg)
real-time recommendations
challenges
![Page 12: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/12.jpg)
• >10 million candidates to select from
• >1000 updates/sec
• Must be responsive to current activity
• Users expect instant query results
Tagged dating feature
![Page 13: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/13.jpg)
implementation pain points
![Page 14: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/14.jpg)
• Data scientist hands model description to software engineer
• May need to translate features from SQL to Java
• Aggregate features require batch processing
• May need to adjust features and model to achieve real-time updates
• Fast scoring requires high-performance in-memory data structures
![Page 15: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/15.jpg)
![Page 16: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/16.jpg)
time for new thinking
![Page 17: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/17.jpg)
one way thatworks better
![Page 18: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/18.jpg)
![Page 19: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/19.jpg)
!
!
!
4. Pull records from database to create interesting features (usually aggregates)
5. Train predictive models
6. Go implement models for production
![Page 20: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/20.jpg)
Pull records from database to create interesting features (usually aggregates)
Train predictive models
Go implement models for production
![Page 21: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/21.jpg)
Create interesting features
Train predictive models
Put models in production
![Page 22: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/22.jpg)
Create interesting features
Train predictive models
Put models in production
![Page 23: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/23.jpg)
one right way to data
![Page 24: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/24.jpg)
event history
one right way to data
![Page 25: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/25.jpg)
History. filterTime(start, PLUS_INFINITY). foreach { e: Event => model.update(e) }
![Page 26: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/26.jpg)
everything is an event
![Page 27: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/27.jpg)
Bob registers Alice registers
Alice updates profile Bob opens app
Bob sees Alice in recommendations Bob swipes yes on Alice
Alice receives push notification Alice sees Bob swiped yes
Alice swipes yes Alice sends message to Bob
![Page 28: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/28.jpg)
writing the model
![Page 29: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/29.jpg)
class MyModel { def update(e: Event) { … } def topN(ctx: Context, n: Int) = { … } }
![Page 30: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/30.jpg)
models are allabout features
![Page 31: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/31.jpg)
class MyFeature { def update(e: Event) { … } def score(ctx: Context, candidateId: Long): Double = { … } }
![Page 32: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/32.jpg)
model training
![Page 33: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/33.jpg)
History. filterTime(start, PLUS_INFINITY). foreach { e: Event => { writeTrainingData(outcome(e), model.features(context(e)) model.update(e) } }
![Page 34: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/34.jpg)
live demo
![Page 35: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/35.jpg)
live demo
Kaggle competition with Best Buy data
https://www.kaggle.com/c/acm-sf-chapter-hackathon-small
![Page 36: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/36.jpg)
product update events{ “timestamp” : “2012-05-03 6:43:15”, “eventType” : “ProductUpdate”, “eventProperties” : { “sku” : “1032361”, “regularPrice” : “19.99”, “name” : “Need for Speed: Hot Pursuit”, “description” : “Fasten your seatbelt and get ready to drive like your life depends on it...” ... } }
![Page 37: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/37.jpg)
product view events
{ “timestamp” : “2011-10-31 09:48:46”, “eventType” : “ProductView”, “eventProperties” : { “skuSelected” : “2670133”, “query” : “Modern warfare” } }
![Page 38: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/38.jpg)
demo
![Page 39: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/39.jpg)
1. Gain understanding of machine learning
2. Gain understanding of the product usage
3. See opportunity to make the product better
4. Create training data
5. Train predictive models
6. Put models in production
7. See improvements
![Page 40: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/40.jpg)
1. Gain understanding of machine learning
2. Gain understanding of the product usage
3. See opportunity to make the product better
4. Create training data
5. Train predictive models
6. Put models in production
7. See improvements
![Page 41: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/41.jpg)
1. Gain understanding of machine learning
2. Gain understanding of the product usage
3. See opportunity to make the product better
4. Create training data
5. Train predictive models
6. Put models in production
7. See improvementsFa
st cycles!!
![Page 42: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/42.jpg)
![Page 43: Johann Schleier-Smith, Co-Founder and CTO, if(we) at MLconf SF](https://reader033.vdocuments.us/reader033/viewer/2022042816/559445ea1a28ab13738b45ee/html5/thumbnails/43.jpg)
• All data in form of events – no exceptions!
• Roll through history to generate training examples
• Sample training data carefully to avoid feedback
• Model is static while features are live and personal
• Use interesting features with boring algorithms
• Expressiveness > performance > scalability
github.com/ifweco/antelope @jssmith