information management on the world-wide web
DESCRIPTION
Information Management on the World-Wide Web. Junghoo “John” Cho UCLA Computer Science. The Web and Information Galore. 10 Years Ago. Reading papers for research Stacks of papers Long wait. With Web. Challenges (1). Information overload Too much information, too little time. - PowerPoint PPT PresentationTRANSCRIPT
1
Information Management Information Management on the World-Wide Webon the World-Wide Web
Junghoo “John” ChoJunghoo “John” Cho
UCLA Computer ScienceUCLA Computer Science
2
The Web and Information GaloreThe Web and Information Galore
3
10 Years Ago10 Years Ago
Reading papers for Reading papers for researchresearch– Stacks of papersStacks of papers– Long waitLong wait
4
With WebWith Web
5
Challenges (1)Challenges (1)
Information overloadInformation overload– Too much information, too little timeToo much information, too little time
6
Information OverloadInformation Overload
““XML” to GoogleXML” to Google– 14 Million14 Million matching documents! matching documents!
““XML” to AmazonXML” to Amazon– 464464 matching books! matching books!
Which one to read?Which one to read?
7
Challenges (2)Challenges (2)
Hidden WebHidden Web
– Not indexed by Search EnginesNot indexed by Search Engines– ““Hidden” from an average userHidden” from an average user– Browse every site manually?Browse every site manually?
…
8
Challenges (3)Challenges (3)
TransienceTransience
9
Challenges (4)Challenges (4)
Scattered & unstructured dataScattered & unstructured data– All Computer Science faculty members and All Computer Science faculty members and
graduate students in the US?graduate students in the US?
10
Projects In Our GroupProjects In Our Group
Web ArchiveWeb Archive Hidden Web IntegrationHidden Web Integration Page Ranking AlgorithmPage Ranking Algorithm User Recommendation SystemUser Recommendation System
11
User Recommendation SystemUser Recommendation System
464 books on XML464 books on XML Which one to read?Which one to read?
– The one that my The one that my colleagues and friends colleagues and friends recommend?recommend?
12
Amazon’s Recommendation SystemAmazon’s Recommendation System
1 – 5 star rating by individual users1 – 5 star rating by individual users Books can be sorted by “average user Books can be sorted by “average user
rating”rating”
13
My Typical ScenarioMy Typical Scenario
Sort books by their average user ratingSort books by their average user rating Browse top 20 books to decide what to readBrowse top 20 books to decide what to read
14
QuestionsQuestions
Is “5 star” by one user better than “4.9 star” Is “5 star” by one user better than “4.9 star” by 100 users?by 100 users?– Intuitively, I prefer 4.9 star by 100 usersIntuitively, I prefer 4.9 star by 100 users– More “reliable” ratingMore “reliable” rating
How much can I trust the rating of a How much can I trust the rating of a particular person?particular person?– How do I know that the person’s rating is How do I know that the person’s rating is
reliablereliable
15
Our ApproachOur Approach
““Inherent quality” or “rating” of a bookInherent quality” or “rating” of a book– How many users recommend the book (i.e., How many users recommend the book (i.e.,
give high rating) if all users have read the give high rating) if all users have read the book?book?
More user rating More user rating More information on More information on the “quality” of the bookthe “quality” of the book– An average user is likely to give high rating for An average user is likely to give high rating for
a high-quality booka high-quality book
16
Probabilistic Rating ModelProbabilistic Rating Model
How likely is the book of “4 star rating”?How likely is the book of “4 star rating”?– Rating probability distributionRating probability distribution
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5
Book rating/quality
Prob
abil
ity
dens
ity
17
Update of Rating ProbabilityUpdate of Rating Probability
As more users provide rating, we update As more users provide rating, we update our probability distributionour probability distribution
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5
Book rating/quality
Prob
abil
ity
dens
ity
18
Update of Rating ProbabilityUpdate of Rating Probability
As more users provide rating, we update As more users provide rating, we update our probability distributionour probability distribution
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5
Book rating/quality
Prob
abil
ity
dens
ity
After five-star ratingby a user
19
Update of Rating ProbabilityUpdate of Rating Probability
As more users provide rating, we update As more users provide rating, we update our probability distributionour probability distribution
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5
Book rating/quality
Prob
abil
ity
dens
ity
After one-star ratingby a user
20
Update of Rating ProbabilityUpdate of Rating Probability
As more users provide rating, we update As more users provide rating, we update our probability distributionour probability distribution
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5
Book rating/quality
Prob
abil
ity
dens
ity
After many ratings
21
Bayesian Inference TheoryBayesian Inference Theory
Given a user rating UR, what is the inherent rating Given a user rating UR, what is the inherent rating IR?IR?
)(
)()|()|(
URP
IRPIRURPURIRP
Probability of book rating BEFORE user ratingProbability of book rating
AFTER user rating
22
User ModelUser Model
The characteristics of a userThe characteristics of a user
Sensitivity: Slope of the curveSensitivity: Slope of the curve+1: good, –1 : bad, 0: not useful+1: good, –1 : bad, 0: not useful
1
2
3
4
5
1 2 3 4 5
1
2
3
4
5
1 2 3 4 5
Good Bad
Book quality
Use
r ra
ting
Book qualityU
ser
rati
ng
23
User ModelUser Model
The characteristics of a userThe characteristics of a user
Bias: Average “height” of the curveBias: Average “height” of the curve
1
2
3
4
5
1 2 3 4 5
1
2
3
4
5
1 2 3 4 5
Positive bias Negative bias
Book quality
Use
r ra
ting
Book qualityU
ser
rati
ng
24
Iterative Model RefinementIterative Model Refinement
As more users rate a book, we get better As more users rate a book, we get better estimates on book qualityestimates on book quality
As we estimate a book quality better, we get As we estimate a book quality better, we get better idea on a user’s sensitivity and biasbetter idea on a user’s sensitivity and bias
25
Iterative Model RefinementIterative Model Refinement
User-providedRating
Book Rating Estimate
UserCharacteristics
26
Final RecommendationFinal Recommendation
Recommend the book with the highest Recommend the book with the highest expected ratingexpected rating
27
Initial ResultsInitial Results
Our system prefers a 4.9-star book by 100 Our system prefers a 4.9-star book by 100 people to a 5-star book by 1 userpeople to a 5-star book by 1 user
If a user gives random ratings, the system If a user gives random ratings, the system ignores the user’s ratingignores the user’s rating
More thorough evaluation on the wayMore thorough evaluation on the way
28
Other ProjectsOther Projects
Web ArchiveWeb Archive Hidden Web IntegrationHidden Web Integration Page Ranking AlgorithmPage Ranking Algorithm
29
Ph.D. Students on the ProjectsPh.D. Students on the Projects
Alex NtoulasAlex Ntoulas Rob AdamsRob Adams Victor LiuVictor Liu– In Dr Chu’s groupIn Dr Chu’s group
30
Thank YouThank You
Questions?Questions?