advisor : dr. koh jia-ling speaker : che-wei liang date : 2007.11.20

25
Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper Advisor Advisor Dr. Koh Jia-Ling Dr. Koh Jia-Ling Speaker Speaker Che-Wei Liang Che-Wei Liang Date Date 2007.11.20 2007.11.20 1

Upload: adair

Post on 05-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper. Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20. Outline. Introduction Problem Definitions Computational Model - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Truth Discovery with Multiple Confliction Information Providers

on the WebXiaoxin Yin, Jiawei Han, Philip S.Yu

Industrial and Government Track short paper

AdvisorAdvisor :: Dr. Koh Jia-LingDr. Koh Jia-LingSpeakerSpeaker :: Che-Wei LiangChe-Wei Liang

DateDate :: 2007.11.202007.11.20

1

Page 2: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Outline

• Introduction• Problem Definitions• Computational Model– Web Site Trustworthiness and Fact Confidence– Iterative Computation

• Empirical Study• Conclusions

2

Page 3: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Introduction

• World-wide web– a necessary part of our lives.– ex: Amazon.com, ShopZilla.com.

• Is the world-wide web always trustable?– There is no guarantee for the correctness of

information on the web.

3

Page 4: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Introduction

• Example 1: Authors of books

incomplete!

incorrect!

4

Page 5: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Introduction

• Ranking web pages– According to authority based on hyperlinks.– Ex: Authority-Hub analysis, PageRank,

more general link-based analysis.

• Does authority or popularity of web sites lead to accuracy of information?

5

Page 6: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Introduction

• Veracity problem– Discover the true fact about each object.

6

Page 7: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Problem Definitions

• Define1: Confidence of facts.– The probability of a fact f being correct,

denote by s(f).

• Define2: Trustworthiness of web sites.– The expected confidence of the facts provided by

a web site w, denote by t(w).

7

Page 8: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Problem Definitions

• Facts may be conflict or supportive to each other.– Ex: “Jennifer Widom”, “J. Widom”

• Concept of implication– imp(f1 → f2): f1’s influence on f2’s confidence.

8

Page 9: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Basic heuristic

• Basic heuristic1. Usually there is only one true fact

for a property of an object.

2. This true fact appears to be the same or similar on different web sites.

9

Page 10: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Basic heuristic (cont.)

• Basic heuristic3. The false facts on different web sites are

less likely to be the same or similar.

4. In a certain domain, a web site that provides mostly true facts for many objects will likely provide true facts for other objects.

10

Page 11: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Web Site Trustworthiness and Fact Confidence

• Trustworthiness t(w)

where F(w) is the set of facts provided by w.

11

Page 12: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Web Site Trustworthiness and Fact Confidence

• more difficult to estimate the confidence of a fact.

12

Page 13: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Web Site Trustworthiness and Fact Confidence

• Simple case– f1 is the only fact about object o1

– assume w1 and w2 are independent.

• Confidence s(f)

W(f) is the set of web sites providing f.13

Page 14: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Web Site Trustworthiness and Fact Confidence

• Trustworthiness score of a web site

• τ(w) is between 0 and +∞, better characterizes how accurate w is.– ex: t(w1) = 0.9, t(w2) = 0.99

t(w2) = 1.1 × t(w1)

τ(w2) = 2 × τ(w1)

14

Page 15: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Web Site Trustworthiness and Fact Confidence

• Confidence score of a fact

– Property:

15

Page 16: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Web Site Trustworthiness and Fact Confidence

• adjusted confidence score of a fact f

16

Page 17: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Web Site Trustworthiness and Fact Confidence

• Compute the confidence of f based on σ*(f) in the same way as computing it based on σ(f).

• Different web sites are independent. add a dampening factor γ, 0 < γ < 1.

incorrect!

17

Page 18: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Web Site Trustworthiness and Fact Confidence

• Negative-confidence problem– a fact f conflicting with some facts provided by

trustworthy web sites. σ*(f) < 0 and s*(f) < 0.

• – If γ . σ*(f) > 0, s(f) is very close to s*(f).– If γ . σ*(f) < 0, s(f) is close to zero but still

positive.

unreasonable!

18

Page 19: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Iterative Computation

• TRUTHFINDER - Iterative method– TruthFinder has little information about the

web sites and the facts.

– Each iteration, improves its knowledge about trustworthiness and confidence.

– Stops when the computation reaches a stable state.

19

Page 20: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Empirical Study

• Compare with VOTING– Which Chooses the fact that is provided by most

web sites.

• Intel PC with a 1.66GHz dual-core processor, 1GB memory, Windows XP Professional.ρ = 0.5 and γ = 0.3.

20

Page 21: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Empirical Study

21

Page 22: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Empirical Study

22

Page 23: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Empirical Study

23

Page 24: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Empirical Study

24

Page 25: Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Conclusions

• Introduce and formulate the Veracity problem– resolving conflicting facts from multiple web site.– finding true facts among them.

• Propose TRUTHFINDER– Utilizes Web site trustworthiness and fact confidence to

find trustable web sites and true facts.

• Experiment achieves high accuracy.

25