demystifying recommendation systems
TRANSCRIPT
![Page 1: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/1.jpg)
Demystifying Recommendation
Systems
![Page 2: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/2.jpg)
About Rumman
•Senior Data Scientist and Instructor at Metis •Practicing Data Scientist
• Find me on twitter @ruchowdh • Visit my website at rummanchowdhury.com
• Check out my jobs page • …and my blog
![Page 3: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/3.jpg)
About Metis
• Data Science Bootcamp
• Part of Kaplan
• Accredited by ACCET
• 12-weeks, full-time including 60 hours of online pre-work
• Evening and weekend training courses
• Third party financing options
• $3,000 scholarship for women, underrepresented minority groups, and veterans or members of the U.S. military
![Page 4: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/4.jpg)
Overview• What is a recommendation engine? • What are the types of recommendation systems? • What are the drawbacks of the most common recommendation engines and how do I deal with them? • How do I fine-tune my model?
![Page 5: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/5.jpg)
What are recommendation systems?
![Page 6: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/6.jpg)
What are recommendation systems?Automated systems that seek to suggest whether a given item (product, event, movie, song, etc) will be desirable to a user.
Or, more data science-y: predict what a user’s review will be for items that they have not reviewed
![Page 7: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/7.jpg)
Where does a recommendation system lie in the space of data science and analytics?
• Descriptive • Average, percents, etc • Explains post-event or during
• Predictive • Uses modeling of past behavior to make predictions about the future
• Prescriptive • Informed decision of how actions should be taken based on data
![Page 8: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/8.jpg)
How do I pick the best kind of recommender system for my data?
• What is your existing data? • How quickly does your inventory change? • How much information can you get on a user? (explicit and implicit) • Does your model need to scale well?
![Page 9: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/9.jpg)
What are the kinds of recommendation systems?
![Page 10: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/10.jpg)
What are the kinds of recommender systems?
• Search (knowledge-based) • Pros: items will be close matches to expressed needs, no cold-start issues • Cons: Static, manual tagging, will not work well with very similar inventories or rapidly changing inventories
• Example: Amazon’s basic search
![Page 11: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/11.jpg)
What are the kinds of recommender systems?
• Content-based • Items are mapped based on characteristics into an item-feature space, and recommendations are based on specified characteristics
• Pros: Easier comparison between items • Cons: Cold start problem, need good content descriptions, need item ratings •Example: Search for ‘ai’ vs ‘AI’, ‘mit’ vs ‘MIT’
![Page 12: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/12.jpg)
What are the kinds of recommender systems?
• Collaborative filtering: based on user and item similarities • Pros: can provide less-obvious matches • Cons: cold-start problem for new users and new items, requires a feedback rating
![Page 13: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/13.jpg)
Limitations, or, Ask yourself, do you really need a recommendation engine?
• Recommendation systems have to update immediately. • You have to have a sufficiently inexpensive model and have the bandwidth to return results fast.
• You have more information than you think: • existing item popularity • geography based in ip address • cookies
![Page 14: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/14.jpg)
How does Content-Based recommendation work?
• Users and items are represented by vectors in a feature space • Approaches:
• Map users and items to the same feature space, compute distance between a user and an item.
![Page 15: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/15.jpg)
Example: Content-Based Recommendation
Features = (big box office, aimed at kids, famous actors)
Items (movies): Finding Nemo = (5, 5, 2) Mission Impossible = (3, -5, 5) Jiro Dreams of Sushi = (-4, -5, -5)
Predicted ratings*:
(-3*5 + 2*5 + 2*2) = -9 (-3*3 - 2*5 - 2*5) = -29 (3*4 - 2*5 + 2*5) = +12
* Ratings for user with a described preference of (-3, 2, 2) for these features
![Page 16: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/16.jpg)
How does Content Based Recommendation work?
• Another option is to create features from user+item pairs and use an algorithm (classifier?) to predict like/dislike
•Each user/item pair has a labeled outcome, such as purchased/not purchased. You can train a model to predict purchase behavior.
![Page 17: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/17.jpg)
How does Collaborative Filtering work?
• Collaborative filtering refers to a family of methods for predicting ratings where instead of thinking about users and items in terms of a feature space, we are only interested in the existing user-item ratings themselves.
•In this case, our dataset is a ratings matrix whose columns correspond to items, and whose rows correspond to users.
![Page 18: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/18.jpg)
Example: Netflix movie recommendations
![Page 19: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/19.jpg)
How does collaborative filtering work?• Method 1: Item-based CF, a.k.a. neighborhood methods or memory-based CF
• Ratings data are used to create an item-item similarity matrix. • Recommendations are made based on the items most similar to those a user has already rated highly.
•This method does not scale well. • Why? You need a fully populated matrix of item-item similarity. This doesn’t work well if you have a lot of items or if your items change a lot.
![Page 20: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/20.jpg)
How does CF work?• Method 2: Model-based CF use matrix decomposition via singular value decomposition (SVD) to reduce dimensionality and extract latent variables.
• We express users and items in terms of these variables.
![Page 21: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/21.jpg)
Why is model-based CF preferred?
• Scalable, flexible, accurate, domain independent, and requires no explicit information.
![Page 22: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/22.jpg)
What are the drawbacks, and how can I address them?
![Page 23: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/23.jpg)
Let’s discuss the drawbacks
• Cold-start problem! • Data is typically very sparse •Need granularity in your data
![Page 24: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/24.jpg)
Drawback: Cold Start problem
• Build an initial profile based on implicit data, evolve based on explicit feedback as it comes. • Sometimes called a ‘hybrid’ filtering method, you can use content-based information to ease cold-start and data sparsity problems.
![Page 25: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/25.jpg)
Drawback: Sparsity of Data
• Famous Netflix prize dataset, ~ 99% of possible ratings were missing. • Data is skewed and sparse
• or, most people don’t rate a lot and most items aren’t rated • those that are often are rated constantly
![Page 26: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/26.jpg)
Drawback: Granularity of data• Traditional model-based CF works well for non-binary data (ie, a 5 star rating). Doesn’t work well for binary (ie, click/not click, purchased/did not purchase)
• You will need to tweak your measurements of item similarities
![Page 27: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/27.jpg)
Quick overview of measurement
• Non-binary rating: • Pearson correlation coefficient • Euclidean distance • Manhattan distance
• Binary ratings: • Jaccard similarity • Cosine similarity
![Page 28: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/28.jpg)
How do I refine my model?
![Page 29: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/29.jpg)
Normalization
• Some items are significantly higher rated (ie, blockbuster movies, Oscar winners) • Some users are lower (or higher) raters from the norm • Ratings can change over time
![Page 30: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/30.jpg)
Normalization• Need to offset per user • Need to offset per item
•Ex: Mean rating across all users for item x is some value. How does it differ from the mean rating across all items? How does my rating differ from the mean rating of that item?
![Page 31: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/31.jpg)
Capturing data trends• Rating distributions:
• ratings aren’t random, they follow a distribution - model this distribution
• Feature importance: You can regress on your feature vectors to get an understanding of what values impact ratings • Feature generation: Characterize your users and create one-hot features (this can save a lot of time, and help with cold-start problems)
![Page 32: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/32.jpg)
Temporal factors
• There can be an upward trend of ratings over time • Seasonal shifts due to holidays, awards, etc • Anchoring (ie, an item based on a previous iteration or version of that item)
![Page 33: Demystifying Recommendation Systems](https://reader031.vdocuments.us/reader031/viewer/2022030114/589b538d1a28ab4a398b6ec5/html5/thumbnails/33.jpg)
Conclusions
• Think about your data, your capabilities, and your needs prior to creating a recommendation system • Consider the pros and cons of each type • Refine your model thoughtfully