summer internship 2014 report by rishabh misra, thapar university

17
Summer Internship Work done by – Rishabh Misra, Thapar University Books read- Reinforcement Learning: An Introduction – o Learnt what Reinforcement Learning is all about and how it can be used in providing recommendations. o Learnt about Bandit Problem and how this problem can be used to introduce various basic learning methods. Research paper read- A Contextual-Bandit Approach to Personalized News Article Recommendation o Learnt how News recommendation problem can be modelled as contextual bandit problem. o Leant about previously proposed non-contextual algorithms like epsilon-greedy and upper confidence bound (UCB) and their performances in recommendation. o Leant about LinUCB algorithm for hybrid as well as for disjoint model and how it performs better than non- contextual algorithms. o Learnt about how to evaluate these algorithms on the given data set. Exponentiated Gradient LINUCB for Contextual Multi- Armed Bandits

Upload: rishabh-misra

Post on 13-Apr-2017

110 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Summer internship 2014 report by Rishabh Misra, Thapar University

Summer Internship

Work done by –

Rishabh Misra, Thapar University

Books read-

Reinforcement Learning: An Introduction –

o Learnt what Reinforcement Learning is all about and how

it can be used in providing recommendations.

o Learnt about Bandit Problem and how this problem can

be used to introduce various basic learning methods.

Research paper read-

A Contextual-Bandit Approach to Personalized News Article Recommendation – o Learnt how News recommendation problem can be

modelled as contextual bandit problem. o Leant about previously proposed non-contextual

algorithms like epsilon-greedy and upper confidence bound (UCB) and their performances in recommendation.

o Leant about LinUCB algorithm for hybrid as well as for disjoint model and how it performs better than non-contextual algorithms.

o Learnt about how to evaluate these algorithms on the given data set.

Exponentiated Gradient LINUCB for Contextual Multi-Armed Bandits –

Page 2: Summer internship 2014 report by Rishabh Misra, Thapar University

o Learnt about gradient-LINUCB algorithm and how it uses gradient optimization to find optimal exploration in LINUCB algorithm.

Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms – o Learnt about offline evaluation algorithms for bandit

problems. One considering sequence of logged events to be an infinitely long stream and other as finite stream.

Parametric Bandits: The Generalized Linear Case – o Learnt about GLM-UCB algorithm. o Learnt how this algorithm models a problem on statistical

framework of Generalized Linear Models. This allows it to address various specific reward structures widely found in application.

Group Recommendations Via Multi-armed Bandits – o Learnt about Group-UCB algorithm and how it can be

used to provide recommendations to persistent group of people.

o Learnt how the Group-UCB algorithm can be implemented on MovieLens dataset.

Learning Diverse Rankings with Multi-Armed Bandits – o Learnt about algorithms whose aim is to provide at least

one relevant item out of top k recommended items. o ‘Ranked Explore and Commit’ algorithm uses greedy

strategy assuming user’s interest will not change. Greedy selection is based on CTR for each document which is presented fixed number of times to a user.

o ‘Ranked Bandit’ algorithm uses Multi-Arm Bandit for each rank and selects top item from bandit for that particular rank that has not been recommended previously.

Page 3: Summer internship 2014 report by Rishabh Misra, Thapar University

Controlled experiment on the web: survey and practical guide– o Various experiments on websites and software and using

user’s feedback to improve the overall appeal and usefulness of the website/software.

o Learnt about some experiments performed in controlled environment and analysing their impacts on software’s usefulness.

Content Recommendation on Web Portals – o Learnt about methods for Content Recommendations on

different types of web portals. Learnt about various factors that affect the recommendations on web.

o Learnt about various algorithms that address Content analysis task, User profile modelling task, scoring task and Ranking task.

Matrix Factorization Techniques For Recommender Systems – o Learnt various matrix factorization techniques that take

into account various additional inputs to solve the problem.

o Learnt about ‘Stochastic gradient descent’ and ‘Alternating least squares’ as optimization and learning algorithm.

Probabilistic Matrix Factorization – o Learnt about probabilistic matrix factorization technique

that scales linearly with number of observations and perform well on very sparse and imbalance datasets.

o Learnt about a constraint version of probabilistic matrix factorization which assumes that users rating similar sets of movies have similar interests.

Page 4: Summer internship 2014 report by Rishabh Misra, Thapar University

Probabilistic Group Recommendation via Information Matching – o Probabilistic group recommendations are given based on

group relevance of an item which considers item’s relevance to each person as an individual as well as item’s relevance to the whole group.

o Learnt about the various algorithms and their performances in recommending the items to different types of groups.

Interactive Collaborative Filtering – o Leant about how to incorporate interactive feedback

mechanism to collaborative filtering. o Learnt how to use Thompson sampling and upper bound

confidence algorithms to select items and probabilistic matrix factorization to incorporate the feedback from the users thus interactively changing the low factor matrices.

Second Order Online Collaborative Filtering – o Learnt about confidence weighted online collaborative

filtering method. o Uses second order optimization technique for finding

optimal solution of low-rank matrix factorization that converges faster as compared to first order techniques.

o At each step, apart from updating user and item weight vectors, we also estimate their mean and covariance matrix.

Scalable Variational Bayesian Matrix Factorization with Side Information – o Learnt about Variational Bayesian Matrix Factorization

method that has time complexity linear in K (low-factor

Page 5: Summer internship 2014 report by Rishabh Misra, Thapar University

dimension) and can be easily parallelized on multi-core systems.

Online Multi-Task Collaborative Filtering for On-the-Fly Recommender Systems – o Online Multi-Task Collaborative filtering algorithms

tackles the problem of re-training whole model on arrival of new training data.

o Learnt various algorithms for online collaborative filtering and algorithms for handling novel sample extension.

Parameter estimation for text analysis – o Learnt about parameter estimation methods common

with discrete probability. o Learnt about distributions like maximum likelihood, a

posteriori and Bayesian estimation. o Learnt about central concepts like conjugate distributions

and Bayesian networks.

MovieTweetings: a Movie Rating Dataset Collected From Twitter – o Describes how the databases, like MovieLens, Netflix,

available for evaluation are outdated and can give absurd results as they lack recent and relevant movies.

o These existing databases are often filtered to remove the extreme cases which prevent experimental results to be generalizable to real life scenario.

o Proposed database has recent movies and ratings in form of tweets. Tweet contains rating and imdb link to the movie page.

o This database does not filter extreme cases and thus can provide more accurate results for algorithms.

Page 6: Summer internship 2014 report by Rishabh Misra, Thapar University

Algorithms Implemented-

Epsilon-greedy – o First implemented it in MATLAB on random data and

plotted the graph (average reward vs iterations) to compare its performance with other algorithm.

o Implemented it in python for yahoo today module dataset. Used average reward for all the iterations as parameter to compare its performance.

Softmax Action Selection – o Implemented in MATLAB and plotted the graph (average

reward vs iterations) to compare its performance with other algorithms.

Page 7: Summer internship 2014 report by Rishabh Misra, Thapar University

UCB – o Implemented it in MATLAB on random data and plotted

the graph (average reward vs iterations) to compare its performance with other algorithm.

Page 8: Summer internship 2014 report by Rishabh Misra, Thapar University

LinUCB – o First implemented it in MATLAB on random data and

plotted the graph (average reward vs iterations) to compare its performance with other algorithm.

o Implemented it in python for yahoo today module dataset. Used average reward for all the iterations as parameter to compare it with other algorithms.

o Implemented it in python for MovieLens dataset.

GLM-UCB – o First implemented it in MATLAB on random data and

plotted the graph (average reward vs iterations) to compare its performance with other algorithm.

Page 9: Summer internship 2014 report by Rishabh Misra, Thapar University

o Implemented it in python for yahoo today module dataset. Used average reward for all the iterations as parameter to compare it with other algorithms.

Hybrid-LinUCB – o First implemented it in MATLAB on random data. o Implemented it in python for yahoo today module

dataset. Used average reward for all the iterations as parameter to compare it with other algorithms.

Probabilistic Matrix Factorization – o PMF on MovieLens dataset –

Factorized rating matrix into two low dimensional latent factor approximation matrices. Plotted the learning process over various iterations.

Page 10: Summer internship 2014 report by Rishabh Misra, Thapar University

o Stochastic Gradient Descent –

Implemented in python. First implementation considers data to be in

sequential manner (according to their timestamp) for updating the gradients.

Page 11: Summer internship 2014 report by Rishabh Misra, Thapar University

Second implementation considers data in batches for updating the gradients.

Third implementation considers data to be in

random manner for updating the gradients.

Page 12: Summer internship 2014 report by Rishabh Misra, Thapar University

Plotted the performances of these algorithms to compare them.

o Dual averaging method – Implemented in python. Used online approach to calculate the root mean

squared error. Plotted the graph to compare its performance with

other stochastic algorithms.

Thompson Sampling – o Implemented in python o Applied on the factorization matrix obtained from

factorizing MovieLens dataset with hundred thousand observations.

o Evaluated using policy evaluator proposed in ‘A Contextual-Bandit Approach to Personalized News Article Recommendation’ research paper.

Group UCB – o Implemented in python. o Each user is assigned with a weight. Using this weight and

reviews of users for a particular item, a score for each item is calculated.

o Item with maximum score is then recommended to the group.

o Group’s feedback for the selected item is then used to modify the model accordingly.

Work in Collaborative tweet Recommendation-

Created Timelines –

Page 13: Summer internship 2014 report by Rishabh Misra, Thapar University

o First separated all the users having more than 15 followers and 50 retweets.

o Created separate files for retweets of user and tweets of all the followers for each user.

o Created the timeline file for each user in one retweet and four tweets patterns. File also contains the publisher id of each tweet. Retweets of user cannot occur again as tweets of his follower.

Created Tweet Ids – o Removed all the non-English tweets. o Assigned unique id to each tweet that is used in creation

of other features.

Extraction of Relation Features – o Extracted mention scores that measures the number of

times user is mentioned by the publisher. Used the timeline file created for the users to find

out mention score for each user. o Extracted binary feature friends that tell whether

publisher and follower are friends or not. o Co-follow Score: Calculated similarity between followee

set of user and publisher. Used timeline file for each user.

Extraction of Tweet’s Content-based Feature – o Length of tweet –

Created a file that contains tweet id and length of the tweet.

o Hash Tag Count – For each tweet stored hash tag count. Tweet is

considered more informative if it has more hash tags.

o URL Count –

Page 14: Summer internship 2014 report by Rishabh Misra, Thapar University

For each tweet stored URL count. URLs are provided because of restriction in length of tweet. More URL means more information.

o Retweet Count – First made files that contain all the retweets of all

the users (that have more than 15 followers and 50 retweets) with ids (obtained from tweet id file).

Used ids in these files to count number of times tweets has been retweeted.

Extraction of Publisher’s authority Features – o Mention Count –

Created user’s screen name and user’s id file. For each tweet, extracted the names mentioned in

tweets using ‘@’. Maintained count of mention for each user by its id.

o Followee Count – Extracted the information from the data file

provided with tweet’s information.

o Follower Count – Extracted the information from the data file

provided with tweet’s information.

o Status Count – For each user, stored the number of statuses

posted.

Extraction of Content-relevance Features – o Relevance to status history –

Created a file having count of all the words occurring in all tweets. Excluded stop words and deleted punctuation marks. Retained only top ten thousand words count.

Created a file having tweets of users with more than 15 followers and more than 50 retweets.

Page 15: Summer internship 2014 report by Rishabh Misra, Thapar University

Removed all the retweets from files. Used top ten thousand words obtained from all

tweets and made a vector that counts the number of times particular word from vector occur in tweets of a particular user.

Then relevance of a tweet is calculated as dot product of previously constructed vector and vector constructed for current tweet.

This is done for every tweet for a particular user.

o Relevance to retweet history –

Created a file having count of all the words occurring in all retweets. Excluded stop words and deleted punctuation marks. Retained only top ten thousand words count.

Created a file having retweets of users with more than 15 followers and more than 50 retweets.

Used top ten thousand words obtained from all retweets and made a vector that counts the number of times particular word from vector occur in retweets of a particular user.

Then relevance of a retweet is calculated as dot product of previously constructed vector and vector constructed for current retweet.

This is done for every retweet for a particular user.

o Relevance to Hash Tags –

Created files for all users that contains list of all the hashtags used by the user.

Calculated the relevance of hash tags for each tweet by adding the number of times particular hashtag appears in hash tag history.

Page 16: Summer internship 2014 report by Rishabh Misra, Thapar University

o Finally created a file containing data in format: user id, publisher id, tweet id, relevance to status history, relevance to retweet history and relevance to hash tags, binary field denoting whether it is a tweet or retweet.

Video Lectures Referred -

Stanford’s Machine Learning Course by Andrew Ng on Coursera. o Topics Referred

Basics of Machine Learning Cost Function Analysis Gradient Descent for Linear Regression Gradient Descent for multiple variable Octave’s basics Classification Regularisation Clustering

Probabilistic Systems Analysis and Applied Probability from MIT open courseware o Topics referred –

Probability Models and Axioms Conditioning and Bayes’ Rule Independence Counting Discrete Random Variables; Probability Mass

Function; Expectations Discrete Random Variable Examples; Joint PMFs Multiple Discrete Random Variables; Expectations,

Conditioning , Independence Continuous Random Variables

Page 17: Summer internship 2014 report by Rishabh Misra, Thapar University

Multiple Continuous Random Variables Continuous Bayes’ Rule; Derived Distributions

Johns Hopkins courses done from Coursera – o The data scientist’s toolbox o R programming o Getting and cleaning data

Languages Learnt

Python

Octave

MATLAB