streaming predictions of user behavior in real- time ethan dereszynskiethan dereszynski (webtrends)...
TRANSCRIPT
![Page 1: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/1.jpg)
Streaming Predictions of User Behavior in Real-Time
Ethan Dereszynski (Webtrends)
Eric Butler (Cedexis)
OSCON 2014
![Page 2: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/2.jpg)
How come you never see a headline like "Psychic Wins Lottery"?
Jay Leno
![Page 3: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/3.jpg)
![Page 4: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/4.jpg)
![Page 5: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/5.jpg)
![Page 6: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/6.jpg)
Enabling Interesting Predictions:
Leverage Streaming Data
![Page 7: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/7.jpg)
Streams Data
websockets
![Page 8: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/8.jpg)
Streams Data
websockets1 second
![Page 10: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/10.jpg)
The best way to predict the future is to invent it.
Alan Kay
![Page 11: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/11.jpg)
Session Data Each user “click” triggers a event
Event information captured by embedded tag
![Page 12: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/12.jpg)
Session Data A session is a string of events that all correspond to a single “visit” to a web site.
Event 1 Event 2
![Page 13: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/13.jpg)
Session Data A session end when a visitor leaves the site, closes the browser, or goes idle for 30 minutes
Event 1 Event 2 Event 3
![Page 14: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/14.jpg)
Learning from Streaming Data Sessions provide examples of visit behavior
Not all sessions are equally likely
- Many paths are rarely, if ever, taken
- Frequent paths suggest common ways visitors behave on a given site
Learning Models of Visitor Behavior
- Predict future actions
- Provides a rich, new feature to identify/segment users
- Identify users who have a common trajectory, or subtrajectory, through the web site
- More than just a label
- Behavior tells us something about how users achieve a goal on a web site
![Page 15: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/15.jpg)
Event Data
JSON containing parameter/value pairs
Describes content of page (triggered by event)
Contains geo, device, referrer, etc.
50-100 parameters per page (event)
![Page 16: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/16.jpg)
Challenges of Real Data How do we describe each event?
- Number of parameters per event can be large
- Space of possible “events” is massive
Not all parameters are relevant to the user’s actions
Client 1 Client 2Num
ber of events
![Page 17: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/17.jpg)
About Topics Models Each topic is a distribution over all words in the dictionary
Each document is generated by a mixture of topics
D. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77–84, 2012.
![Page 18: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/18.jpg)
Abstraction Layer: Global/Local Topic – Latent Dirichlet Allocation (GLT-LDA) Topic modeling technique for document clustering
- Documents assigned to a single topic (instead of a mixture)
- Global “Noise” topic explains redundant parameters
Clusters parameters into topics
:
:
:
:
,
,
ji
ji
G
k
x
w
Distribution over parameter for
topic k
Distribution over noise parameters
jth parameter in event i
Noise-indicator for jth parameter in event i
:
:
:
i
i
z
Topic distribution
Noise rate for document iTopic label for document i
BetaBinomialx
lMultinomiawzDirichlet
ji
jiiKG
~ ~
~, ~,,
,
,
![Page 19: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/19.jpg)
The Dataset Collection of visitor traces, varying length
…Event 1 Event 2 Event t
Visitor 1
Visitor 2
…
Visitor n
![Page 20: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/20.jpg)
Representing Behavior: Two Approaches Enumerate the space of all possible paths and count
- This is would require a very big table.
- Most of the entries would be 0.
- Not clear how to handle variable length visits
Hidden Markov Model (HMM)
- Encodes visitor behavior in a probabilistic model
- Calculates likelihood (or probability) of specific trajectories
- Enables prediction of future actions a visitor may take on the site
![Page 21: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/21.jpg)
The Hidden Markov Model Site visit (emission) probabilities:
Stochastic state transitions:
0 1( | ) ( , ,..., )t t j j jMP A S j Multinomial
),...,,()|( 101 j
Kjjtt lMultinomiajSSP
0S 1S … tS
0A 1A tA
ObservedHidden
![Page 22: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/22.jpg)
The Hidden Markov Model
Viewing Products
Product Comparison
Make Purchase
.6
.4
Visitors arrive at a site with an intention
- The current intention specifies the probability they will take some action (trigger an event)
- After the page is selected, the intention transitions to a new value (could be the same as the previous intention)
.7 .3
![Page 23: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/23.jpg)
The Hidden Markov Model
Viewing Products
.7 .3 .7 .3
Product Comparison
Visitors arrive at a site with an intention
- The current intention specifies the probability they will take some action (trigger an event)
- After the page is selected, the intention transitions to a new value (could be the same as the previous intention)
.15 .85
Make Purchase
![Page 24: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/24.jpg)
Predictive Model: Learning and Runtime Offline:
- Session data is recorded into batch file for training
- Trained with expectation maximization (EM) algorithm
Online :
- The model used to predict specific visitor actions
- CartAdd (add an item to the shopping cart)
- Purchase (complete the purchase funnel)
- Conditions predictions on observed actions the visitor has taken so far
- Update predictions each time a new action is taken by the visitor.
- Can be generalized to other predictive queries
![Page 25: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/25.jpg)
Online Inference Goal: Compute the probability that actions t+1 to t+5 contain at least a single purchase /
cartAdd.
t t+1 t+2 t+3 t+4 t+5
act. act. act. act. act. act.
state state state state state state
![Page 26: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/26.jpg)
Online Inference Goal: Compute the probability that actions t+1 to t+5 contain at least a single purchase /
cartAdd.
t t+1 t+2 t+3 t+4 t+5
act. act. act. act. act. act.
state state state state state state
Prediction window
![Page 27: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/27.jpg)
Sequence Time Action
t = 0 ?t = 1 ?t = 2 ?
t = 3 ?
t = 4 ?
![Page 28: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/28.jpg)
Sequence Time Actiont = 0 19:38:47.182Z Landing: Clicked Ad
t = 1 19:38:52.571Z ListView
t = 2 19:39:01.941Z ProductView
t = 3 ?
t = 4 ?t = 5 ?t = 6 ?t = 7 ?
![Page 29: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/29.jpg)
Sequence Time Actiont = 0 19:38:47.182Z Landing: Clicked Ad
t = 1 19:38:52.571Z ListView
t = 2 19:39:01.941Z ProductView
t = 3 19:39:15.467Z Link
t = 4 19:43:08.296Z Link
t = 5 19:50:23.952Z ProductView
t = 6 ?
t = 7 ?
t = 8 ?t = 9 ?t = 10 ?
![Page 30: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/30.jpg)
Sequence Time Actiont = 0 19:38:47.182Z Landing: Clicked Ad
t = 1 19:38:52.571Z ListView
t = 2 19:39:01.941Z ProductView
t = 3 19:39:15.467Z Link
t = 4 19:43:08.296Z Link
t = 5 19:50:23.952Z ProductView
t = 6 19:50:47.646Z AddedToCart
t = 7 ?
t = 8 ?
t = 9 ?t = 10 ?t = 11 ?
![Page 31: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/31.jpg)
Sequence Time Actiont = 0 19:38:47.182Z Landing: Clicked Ad
t = 1 19:38:52.571Z ListView
t = 2 19:39:01.941Z ProductView
t = 3 19:39:15.467Z Link
t = 4 19:43:08.296Z Link
t = 5 19:50:23.952Z ProductView
t = 6 19:50:47.646Z AddedToCart
t = 7 19:51:01.273Z ProductView
t = 8 19:51:11.691Z Link
t = 9 19:51:20.499Z Link
t = 10 ?
t = 11 ?t = 12 ?t = 13 ?t = 14 ?
![Page 32: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/32.jpg)
Sequence Time Actiont = 0 19:38:47.182Z Landing: Clicked Ad
t = 1 19:38:52.571Z ListView
t = 2 19:39:01.941Z ProductView
t = 3 19:39:15.467Z Link
t = 4 19:43:08.296Z Link
t = 5 19:50:23.952Z ProductView
t = 6 19:50:47.646Z AddedToCart
t = 7 19:51:01.273Z ProductView
t = 8 19:51:11.691Z Link
t = 9 19:51:20.499Z Link
t = 10 19:51:27.320Z ListView
t = 11 19:51:47.992Z ProductView
t = 12 19:52:04.216Z ListView
t = 13 19:52:11.398Z ProductView
t = 14 19:52:20.873Z Link
t = 15 ?
t = 16 ?t = 17 ?t = 18 ?t = 19 ?
![Page 33: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/33.jpg)
Sequence Time Actiont = 0 19:38:47.182Z Landing: Clicked Ad
t = 1 19:38:52.571Z ListView
t = 2 19:39:01.941Z ProductView
t = 3 19:39:15.467Z Link
t = 4 19:43:08.296Z Link
t = 5 19:50:23.952Z ProductView
t = 6 19:50:47.646Z AddedToCart
t = 7 19:51:01.273Z ProductView
t = 8 19:51:11.691Z Link
t = 9 19:51:20.499Z Link
t = 10 19:51:27.320Z ListView
t = 11 19:51:47.992Z ProductView
t = 12 19:52:04.216Z ListView
t = 13 19:52:11.398Z ProductView
t = 14 19:52:20.873Z Link
t = 15 19:54:18.080Z ViewedCart
t = 16 19:55:32.557Z StartCheckout
t = 17 19:57:13.246Z CompletedPurchase
t = 18 19:57:39.698Z ConfirmCheckout
t = 19-24 ?
![Page 34: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/34.jpg)
Streams Data
websockets
![Page 35: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/35.jpg)
Prediction Bolt
Prediction Architecture:
Validation Bolt
Validates raw events from Kafka
Augments events with prediction values and confidence labels
![Page 36: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/36.jpg)
Prediction Bolt
Event Stream Bolt Session Stream Bolt
Prediction Architecture:
Validation Bolt
Validates raw events from Kafka
Augments events with prediction values and confidence labels
Dispatches individual events to Streams
Dispatches full sessions to Streams
websockets
![Page 37: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/37.jpg)
Prediction Bolt ROC Bolt
Event Stream Bolt Session Stream Bolt
Prediction Architecture:
Validation Bolt
Validates raw events from Kafka
Augments events with prediction values and confidence labels
Dispatches individual events to Streams
Dispatches full sessions to Streams
Completed sessions are used to scored predictive model’s accuracy
Model receives new thresholds for confidence labels
websockets
![Page 38: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/38.jpg)
Streams Demo
![Page 39: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/39.jpg)
Results
![Page 40: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/40.jpg)
Next Steps Integrating visitor information across multiple visits
Automated re-training of predictive model
- Adjust to seasonal and trend effects
Generative models for Anomaly Detection
- What does a Likely/Unlikely session look like?
Richer models of visitor behavior
- Hierarchical models for behavior
![Page 41: Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014](https://reader035.vdocuments.us/reader035/viewer/2022081516/56649e8f5503460f94b93c62/html5/thumbnails/41.jpg)
Questions? Thank you! [email protected]