Download - Real-time recommendations for retail: Architecture, algorithms, and design

REAL-TIME RECOMMENDATIONS FOR RETAIL: ARCHITECTURE, ALGORITHMS, AND DESIGN

Juliet Hougland and Jonathan Natkins

Who Are We?

Jonathan NatkinsField Engineer at WibiDataBefore that, Cloudera Software EngineerBefore that, Vertica Software/Field Engineer

Juliet HouglandData Scientist, previously at WibiDataMS in Applied MathBA in Math-Physics

Recommendations in Retail

Personalized versus Non-Personalized

Recommender ContextsTaste History

Based on everything you know about a userInterests over months/years

Current TasteBased on a user’s immediate historyInterests over minutes/hours

EphemeralExtreme version of current tasteFor example, location

Demographic*Similar to taste history, but less subjectiveGeographic region, age bracket, etc.

Why Does Real-Time Matter?

Relevancy

I am a Special Snowflake

Natty

Requirements for a Real-Time System

General System RequirementsHandle millions of customers/usersSupport collection and storage of complex data

Static and event-series

Real-Time System RequirementsQuickly retrieve subsets of data for a single userAggregate/derive new, first-class data per user

What is Kiji?

The Kiji project is a modular, open-source framework for building real-time applications that collect, store, and analyze entity-centric data

kiji.orggithub.com/kijiproject

Three Challenges

Developing models for use in real-timeScoring models in real-timeDeploying models into a production environment

How Can We Make Real-Time Models?

Population interests change slowly

Individual interests change quickly




Models don’t need to be retrained

frequently




Models don’t need to be retrained

frequently

Application of a model should be fast

A Common Workflow

Train a model over the entire datasetSave fitted model parameters to a file or another tableAccess the model parameters when generating new recommendations based on new data

This is EXPENSIVE

Developing Models

KijiExpressScala interface for interacting with Kiji dataUses Scalding for designing complex dataflows

Model LifecycleAllows analysts and data scientists to break apart a model into phases

Scoring Models in Real-Time

Batch isn’t real-time



Number ofUsers

Number of Interactions



Number ofUsers


A few users withmany interactions



Number ofUsers


A few users withmany interactions

A lot of users withfew interactions

Fresheners Compute Lazily

Client

KijiScoring Server HBase

Read a column

Get from HBase


Client


Read a column

Get from HBase

Freshness Policy


Client


Read a column

Get from HBase

Freshness PolicyYes, return to client


NO

Client


Read a column

Get from HBase

Freshness Policy

Scorer


Client


Read a column

Get from HBase

Freshness Policy

ScorerYes, return to client

Write back for next time

Kiji Application Stack

Deployment Challenges

Kiji Model Repository

Link between application and modelsStores Freshener metadata

FreshnessPolicy, Scorer, attached columnLocation of trained model

Stores Scorer codeCode repository makes model scoring code available to the application from a central location

New models can be deployed to the Model Repository and made immediately available to the application

Kiji Model Repository

Retail Recommendation

Types of Recommenders

RecommendationAlgorithms

CollaborativeFilteringMethods

ContentBased

Methods

MemoryBased

ModelBased

Content-Based Recommenders

Orange-Nosed

Lab Assistant

Meeps a lot

Build models around entities using features that we think reflect inherent characteristics

Content-Based Recommenders

safer

faster knife

Pandora: Content-Based

Expertly-CharacterizedMusic

Collaborative Filtering

Represent users-itemaffinities as a sparsematrix

Beaker

BananaSlicer

PineappleSlicerUsers ≈ Rows

Items ≈ Columns

Aspirational Ratings

I put in my queue… I actually watch

Collaborative Filtering

Represent users-itemaffinities as a sparsematrix

Beaker

BananaSlicer

PineappleSlicerUsers ≈ Rows

Items ≈ Columns

Simple aggregate predictors

Collaborative Filtering: How It WorksSimilar Users Similar Products

Similar Entities

What do we mean by similar?Jaccard Index: a measure of set similarityCosine Similarity: the angle between two vectorsPearson Correlation: statistical measure, similar to cosine

Naively, we could compare every entity to each other

…But that would not scale will with increasing numbers of entities

Building the Similarity Matrix

Collaborative Filtering: Is This Useful?

Problem: Too much data!Tracking user preferences and all their events generates huge amounts of data

Problem: Too little data!Dimensions of user-space and item-space are usually very largeMore variables makes it more difficult to generate user preferences

Problem: Cold startIf you don’t know anything about a user, what should you recommend?

Problem: More ratings means slower computationsIdentifying neighborhoods of entities is expensive

Collaborative Filtering: Why Is It Useful?

Because it worksContent-agnostic

All that matters is co-occurrence of events

Amazon: Item-Item Collaborative Filtering

Used for personalized recommendationsFill screen real estate with related itemsProduces specific, but non-creepy recommendations

Linden, G.; Smith, B.; York, J., "Amazon.com recommendations: item-to-item collaborative filtering," Internet Computing, IEEE , vol.7, no.1, pp.76,80, Jan/Feb 2003

>

Item-Item Collaborative Filtering

Beaker buys a banana slicerThen:

Generate list of candidate items to predict ratings forPredict ratings for candidate itemsSelect Top-N items

Accessing External Data

KeyValueStore API enables external data access when applying a modelExternal data might be…

Trained model parametersHierarchical/Taxonomic dataGeo-lookup

Store external data flexiblyText files, sequence files, Kiji tables, etc.Data access is decoupled from use during execution

If the data doesn’t fit in memory, put it in a table

How Much Less Work Can We Do?

We can choose a predictor that allows us to truncate a sum

There are two ways terms in the sum of our predictor can be small

No ratingSmall similarity





Ignore unrated items





Ignore dissimilar items


If we only present a few recommendations, we don’t need to predict ratings for all itemsChoose your candidate set to estimate ratings wisely or infer from nearest neighbors

Organizing Data in Item-Item CF

Accessing Data During Freshening

Want to Know More?

The Kiji Projectkiji.orggithub.com/kijiproject

Questions about this presentation?Twitter: @JulietHougland or @nattyiceEmail: [email protected]

Download - Real-time recommendations for retail: Architecture, algorithms, and design

Top Related