recommendation engine

Project proposal – Recommendation EngineParinita, Masters in Computational Linguistics,

University of Washington

Recommendation Engine Proposing two types of Recommendation Engine

1) Proactive- query less recommendation engine (based on Automatic Collaborative Filtering mechanism )

2) Reactive – query based content personalization Both based on user personalization where user data is

gathered from web server logs Data used to learn about the implicit and explicit

preferences of individual users. This information is used to personalize their information retrieval processes

Each user profile records relevancy information to discriminate between those jobs that the user looks at or considers, and those that she is truly interested in

Graded profiles on a user make it possible to 1) recommend jobs matching the interest based on what similar users have previously liked 2) supplement each user’s search queries with additional relevant search terms, and filter the retrieval results to weed out irrelevant hits

1. Query-less recommendation service LinkedIn has availability of high-quality user

profiles Good to build a proactive, personalized and

intelligent model of information access Goal : Proactively recommend new jobs to users

based on what similar users have previously liked Automatically and passively constructed by

mining server logs

Fig1: From server logs to graded user profile

Relevancy information can be preprocessed using web server logs Records a single job access by a user Encodes details like the time and type of access, and the

job and user ids Records “Revisit Data”- the amount of times that a user

has accessed an information shows their interest in the job

Removes misleading data like "irritation clicks" (repeated clicks on a job description while it is downloading)

Records “Read-Time” - the time difference between successive requests by the same user

Eliminates spurious read-times (logoff ) Records “Activity Data” that avails usage of online

application or email facility as measures of high relevancy

Recommendation can be made on similarity between user profiles Set of users related to the target user is identified Profile items from these users that are not in the target profile,

are ranked for recommendation to the target user Similarity Measures :a) The degree of overlap between their profile items

b) The correlation coefficient between their grading lists ,whereby a k-nearest neighbor (K-NN) strategy is used

Fig. 2 Direct vs. Indirect User Relationships

Two approaches - direct and indirect

Direct relationships:Reliance on direct user relationships, for example between A and B Recommendations may be based on a small number of profiles with low degrees of similarityMay even result in no recommendationsIgnores potential indirect relationships between users. C may have the same job taste as A,

but as C has seen a different set of jobs, this will not be recognized.

Indirect relationships:Indirect transitive relationship ; user B is directly related to users A and CGroup users prior to recommendation – profiles are clustered into virtual communities such

that all of the users in a given community are relatedThe single-link clustering technique can be used with a thresholded version of the similarity

metricEach community is a maximal set of users such that every user has a similarity value > than

the threshold with at least one other community memberEach target user is recommended the most frequently occurring jobs in its virtual community

2. Case-Based User Profiling for Content Personalization

Two-step personalized retrieval engine

When a user enters a new search query, a server-side similarity-based search engine isused to select a set of similar job cases. This is followed by Personalization, a post-processing retrieval task where the result-set is compared to a user profile in order to filter-out irrelevant jobs.

Step1: Similarity-Based Retrieval A similarity-based retrieval technique rather than an exact

match technique Case is made up of a fixed set of features such as job type,

salary, key skills, minimum experience etc Compute the similarity between each job case and the

target query

Key skills contains symbolic values, represented as concept trees

Symbolic feature similarity is based on subsumption relationships and on the distances between nodes in these trees.

Step1: Similarity-Based Retrieval Any job containing a concept that is a descendant of a node

in the tree is taken to be an exact match Concept proximity - the closer two concepts are in the tree

the more similar they are

Step 2: Result Personalization Classifying each individual retrieved job as either

relevant or not relevant A nearest-neighbor type classification algorithm

that uses the graded job cases in a target user’s profile as training data

Compare a candidate job to each profile job case, using a similarity metric to locate the k nearest profile jobs

Take the majority classification of the nearest neighbors

Analyze the server logs again to improve the recommendations (cyclical process)

Bibliography: Automated Collaborative Filtering Applications for Online Recruitment

Services Rachael Rafter, Keith Bradley, Barry Smyth ;Smart Media Institute, Department of Computer Science, University College Dublin, Ireland

Case-Based User Profiling for Content Personalisation Keith Bradley , Rachael Rafter & Barry Smyth Smart Media Institute Department of Computer Science, University College Dublin, Ireland

Navigating Nets : Simple algorithms for proximity search Robert Krauthgamer & James R Lee

User Profiles for Personalized Information Access Susan Gauch Mirco Speretta Aravind Chandramouli and Alessandro Micarelli ; Electrical Engineering and Computer Science Information & Telecommunication Technology Center , Lawrence Kansas

Another interesting approach :

http://www.careerbuildercommunications.com/pdf/searchebook.pdf