dr. susan gauch when is a rock not a rock? conceptual approaches to personalized search and...
TRANSCRIPT
![Page 1: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/1.jpg)
Dr. Susan Gauch
When is a rock not a rock?
Conceptual Approaches to Personalized Search and
Recommendations
Nov. 8, 2011
TResNet
![Page 2: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/2.jpg)
Outline• Background• Motivation • Collecting User Information• Building Conceptual Profiles• Using User Profiles in Search
– Misearch• Using User Profiles in Recommender
Systems– MyCiteSeerx
• Issues with User Profiles
![Page 3: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/3.jpg)
Background• Information retrieval (IR) studies the
indexing and retrieval of textual documents• Searching for pages on the World Wide
Web is the most recent “killer app”• Concerned with retrieving relevant
documents to a query• Concerned with retrieving from large sets of
documents efficiently
![Page 4: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/4.jpg)
Web Search System
Query String
IRSystem
RankedDocuments
1. Page12. Page23. Page3 . .
Documentcorpus
Web Spider
![Page 5: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/5.jpg)
The Vector-Space Model• Assume t distinct terms remain after
preprocessing; call them index terms or the vocabulary.
• These “orthogonal” terms form a vector space. Dimension = t = |vocabulary|
• Each term, i, in a document or query, j, is given a real-valued weight, wij.
• Both documents and queries are expressed as t-dimensional vectors:
dj = (w1j, w2j, …, wtj)
![Page 6: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/6.jpg)
Graphic Representation
T3
T1
T2
Q = 0T1 + 0T2 + 2T3
• Is D1 or D2 more similar to Q?• How to measure the degree of
similarity? Distance? Angle?
D2 = 3T1 + 7T2 + T3
7
3
D1 = 2T1+3T2 + 5T3
2
5
3
![Page 7: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/7.jpg)
![Page 8: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/8.jpg)
Cosine Similarity Measure• Cosine similarity measures the
cosine of the angle between two vectors.
• Inner product normalized by the vector lengths. 2
t3
t1
t2
D1
D2
Q
1
D1 is 6 times better match than D2 using cosine
similarity
t
i
t
i
t
i
ww
ww
qd
qd
iqij
iqij
j
j
1 1
22
1
)(
CosSim(dj, q) =
![Page 9: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/9.jpg)
Motivation• Search engines contain very large
collections – Google reports over 1 trillion web pages
• Receive very short queries– 68% are 3 words long or less
• Users examine few results– rarely go beyond first page– rarely examine more than 1 result– Exacerbated by small mobile screens
![Page 10: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/10.jpg)
Ambiguity• How return precise results with
ambiguous queries?• Return results based on simple key-
word matches• No consideration of differing meanings• If the query is “salsa”, is it……
![Page 11: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/11.jpg)
![Page 12: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/12.jpg)
Dealing with Ambiguity
• Expand user queries using a thesaurus – “An Expert System for Searching in Full-Text,”
Susan Gauch, 1990– Basically, make query vectors longer so more
likely to match documents• Represent documents and queries using high-
level concepts instead of keywords– “Conceptual Search with KeyConcept,” Susan
Gauch, 2010– Basically, make reduce dimensions in vectors to
provide conceptual match
![Page 13: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/13.jpg)
Ontologies
• A structured set of concepts• Where do ontologies come from?
![Page 14: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/14.jpg)
Semantic Web
• Manually build ontologies• Experts manually tag data items• Very “intelligent” but not scalable
![Page 15: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/15.jpg)
IR Community
• Use implicit ontologies– Wikipedia– Open Directory Project
• Develop automated techniques to tag items
• Not as “intelligent” but much more scalable
![Page 16: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/16.jpg)
Need for Personalization
• All users get identical results for identical queries
• No distinction between veterinarian and child for query “beagle puppy”
• Need for personalized results based on background and current context
• How pick best 10 (or 1!) result for _you_?
![Page 17: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/17.jpg)
How to Personalize
• Build a user profile that represents user interests– Collect information– Construct user profile– Use user profile for personalized
interactions
![Page 18: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/18.jpg)
Collecting User Information
• Explicit user information– Users fill in site-specific surveys– Users too lazy busy– Data may be deliberately accidentally
inaccurate– Information becomes out of date
![Page 19: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/19.jpg)
Implicit user information– Software collects information about
user activity as they perform regular activities
– Information is • indirect• noisy
– Various approaches used by well-known applications
![Page 20: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/20.jpg)
Implicit Sources
• Browsing histories– User connects to Internet via a proxy– User periodically shares history – Pros:
• captures browsing activity at multiple sites
– Cons: • captures history from only one computer
![Page 21: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/21.jpg)
My Browsing History
![Page 22: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/22.jpg)
Used to Autofill urls
![Page 23: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/23.jpg)
Implicit Sources• Desktop toolbar
– User must install desktop toolbar– Communication between toolbar and
site– Pros:
• interactions tracked across multiple sites• access to desktop windows, file system
– Cons:• user must install software• fine line between toolbar and spyware
![Page 24: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/24.jpg)
Google’s Toolbar
![Page 25: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/25.jpg)
Used to Personalize Search
![Page 26: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/26.jpg)
Implicit Sources– User Account
• user activity is tracked via cookies/session variables
• best if user signs in to retain same profile across multiple machines
– Pros:• users tracked across all interactions
– Cons:• only works at one site• users must create an account
![Page 27: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/27.jpg)
Amazon’s Login
![Page 28: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/28.jpg)
Used for Recommendations
![Page 29: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/29.jpg)
Our Approach– Personalization based on implicit data– Represent profile using weighted
conceptual taxonomy– Use profile for personalization in many
different ways• OBIWAN – Web browsing• Misearch – Web search• MyCiteSeerx – recommender system
![Page 30: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/30.jpg)
Building a Conceptual Profile
• Need an ontology for the domain• Need a collection of text that
represents the user’s interests• Need classification technique
– train classifier with training data– classify user texts w.r.t
ontology/taxonomy/concept hierarchy/thesaurus/knowledge base
– accumulate weights
![Page 31: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/31.jpg)
Arts
Root
Games
Music Design Comics
Doc 1Doc 2Doc 3
.
.
.Doc n
Doc 1Doc 2Doc 3
.
.
.Doc n
Doc 1Doc 2Doc 3
.
.
.Doc n
Doc 1Doc 2Doc 3
.
.
.Doc n
Doc 1Doc 2Doc 3
.
.
.Doc n
TraditionalIndexer
Newdocuments
ConceptDatabase
Classifier Results
Building the User Profile
![Page 32: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/32.jpg)
User Profile Representation
Entertainment0.01
Homemaking0.04
Cooking0.49
Lessons0.3
Videos0.1
Root
![Page 33: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/33.jpg)
MiSearch • User search histories
– information available to search engine itself– collect the user’s queries, clicked on search
results– no software installed
• Users create accounts– login– just track userid in a cookie during the
session– Similar to Amazon, Ebay, etc.
![Page 34: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/34.jpg)
Personalizing Search Results
• Submit query to Internet search engine (e.g., Google)
• Categorize each result into same concept hierarchy to create result profiles– top 3 levels of ODP, ~3,000 categories
• Calculate similarity between result profile and user profile
![Page 35: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/35.jpg)
Ambiguous: “canon book”
![Page 36: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/36.jpg)
User Profile (Classics)
![Page 37: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/37.jpg)
![Page 38: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/38.jpg)
User Profile (Photography)
![Page 39: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/39.jpg)
![Page 40: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/40.jpg)
MyCiteSeerx
• Categorize contents of CiteSeerx with respect to ACM CCS topic hierarchy
• Users create an account• Capture their queries and clicked-on
documents• Build a conceptual profile• Compare user concepts to document
concepts to create recommendations
![Page 41: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/41.jpg)
User interested in IR
![Page 42: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/42.jpg)
Their recommendations
![Page 43: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/43.jpg)
User interested in multimedia
![Page 44: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/44.jpg)
Their recommendations
![Page 45: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/45.jpg)
Recent Work• Bridge gap between Semantic Web
and Information Retrieval– Semi-automatically build domain-
specific ontologies • Do text mining from domain-specific
literature collection
![Page 46: Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e995503460f94b9c754/html5/thumbnails/46.jpg)
Conclusions
• Information on which to base user profiles can be collected via interactions with a specific site
• Conceptual profiles can be used to improve search (misearch)
• Conceptual profiles can be used to provide conceptual recommendations for the CiteSeerx collection
• Creates issues for profile sharing and user privacy
• Leads to work on how to reuse/expand/build ontologies for narrow domains