crowd mining - (joint work with y. amsterdamer, y ... · 5 december 2012, the university of hong...

35
5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE SENELLART

Upload: others

Post on 11-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

5 December 2012, The University of Hong Kong

Crowd Mining(joint work with Y. Amsterdamer,Y. Grossman, and T. Milo)

PIERRE SENELLART

Page 2: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

2 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Association rule mining

One of the best most studied aspect of data mining [Agrawalet al., 1993]

Discovering rules in a database of transactions D

Transaction: set of items

Rule: X ! Y with X , Y sets of itemsOnly interested in rules with support and confidence greater thangiven thresholds �s, �c

supp(X ! Y ) =#ft 2 D j X [Y � tg

#Dconf(X ! Y ) =

#ft 2 D j X [Y � tg#ft 2 D j X � tg

Typical application: market basket Diaper! Beer

Page 3: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

3 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Crowd-sourced data

Many applications where raw, extensional, exhaustive data is notavailable

But intensionally hidden in people’s collective minds

) Resort to asking humans (the crowd) for bits of the data theyknow (shopping history, life habits, etc.)

Humans are bad at remembering the full history; also bad atdiscovering correlations

The crowd is a costly resource [Parameswaran and Polyzotis, 2011]

Page 4: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

4 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Mining association rules from the crowd

Goal of this workDetermining association rules on crowd-sourced data, by:

asking questions to humans that are easy to answer;

determining which is the best question to ask at any given point;

deducing from all answers a (probabilistic) set of valid associationrules;

optimizing this computation as much as possible.

Page 5: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

5 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Outline

Introduction

Concepts

Crowd Mining Algorithm

The CrowdMiner System

Experiments

Conclusions

Page 6: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

6 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

User support and confidence

A set of users U

Each user u 2 U has a (hidden) transaction database Du

Each rule X ! Y is associated with its user support anduser confidence:

suppu(X ! Y ) =#ft 2 Du j X [Y � tg

#Du

confu(X ! Y ) =#ft 2 Du j X [Y � tg#ft 2 Du j X � tg

Page 7: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

7 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Significant rules

Significant rules are those whose overall support and confidenceare above specified threshold �s, �c

Overall support and confidence defined as the mean user supportand confidence:

supp(r) = avgu2U

suppu(r) conf(r) = avgu2U

confu(r)

Goal: finding significant rules while asking the smallest number ofquestions to the crowd as possible

Page 8: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

8 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Questions to the crowd

Two kind of questions:

Closed questions X ! Y ? Ask a user for her (approximate) supportand confidence for this rule;

Open questions ?! ? Ask a user for one arbitrary rule and its(approximate) support and confidence.

Users will not be precise, but that’s fine.

Example (Morning! Jogging)“How often do you go jogging in the morning?”“I go jogging three times per week in the morning.”

confu(Morning! Jogging) = 37 suppu(Morning! Jogging) = 3

21

(if there is one transaction for each morning, afternoon, evening)

Page 9: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

8 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Questions to the crowd

Two kind of questions:

Closed questions X ! Y ? Ask a user for her (approximate) supportand confidence for this rule;

Open questions ?! ? Ask a user for one arbitrary rule and its(approximate) support and confidence.

Users will not be precise, but that’s fine.

Example (Morning! Jogging)“How often do you go jogging in the morning?”“I go jogging three times per week in the morning.”

confu(Morning! Jogging) = 37 suppu(Morning! Jogging) = 3

21

(if there is one transaction for each morning, afternoon, evening)

Page 10: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

9 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Outline

Introduction

Concepts

Crowd Mining Algorithm

The CrowdMiner System

Experiments

Conclusions

Page 11: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

10 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Algorithm components

Choosethenextquestion

Choosethenextclosedquestion

Openor closedquestion?

Choosecandidaterules

Ranktherulesbygrade

Estimatenext error

Estimatecurrent error

estimatemeandistribution

estimatesampledistribution

estimaterulesignificance

One generalframework forcrowdmining

One particularchoice ofimplementation ofall black boxes

We do not claimany optimality

But we validate byexperiments

Page 12: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

11 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Estimating distributions

Attention: support and confidence are correlated, we need toconsider bivariate distributions!

Central limit theorem: the sample distribution of (confidence,support) pairs for a rule is normally distributed

Hypothesis: The distribution of (confidence, support) values forrules among the whole set of users is normally distributed

The sample mean � and covariancematrix � are unbiased estimators ofthat of the original distribution

Co

nfi

den

ce

1.0

0.8

0.6

0.4

0.2

0.0 1.0 0.8 0.6 0.4 0.2 0.0

Support

Page 13: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

11 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Estimating distributions

Attention: support and confidence are correlated, we need toconsider bivariate distributions!

Central limit theorem: the sample distribution of (confidence,support) pairs for a rule is normally distributed

Hypothesis: The distribution of (confidence, support) values forrules among the whole set of users is normally distributed

The sample mean � and covariancematrix � are unbiased estimators ofthat of the original distribution

Co

nfi

den

ce

1.0

0.8

0.6

0.4

0.2

0.0 1.0 0.8 0.6 0.4 0.2 0.0

Support

Page 14: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

11 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Estimating distributions

Attention: support and confidence are correlated, we need toconsider bivariate distributions!

Central limit theorem: the sample distribution of (confidence,support) pairs for a rule is normally distributed

Hypothesis: The distribution of (confidence, support) values forrules among the whole set of users is normally distributed

The sample mean � and covariancematrix � are unbiased estimators ofthat of the original distribution

Co

nfi

den

ce

1.0

0.8

0.6

0.4

0.2

0.0 1.0 0.8 0.6 0.4 0.2 0.0

Support

Page 15: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

11 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Estimating distributions

Attention: support and confidence are correlated, we need toconsider bivariate distributions!

Central limit theorem: the sample distribution of (confidence,support) pairs for a rule is normally distributed

Hypothesis: The distribution of (confidence, support) values forrules among the whole set of users is normally distributed

The sample mean � and covariancematrix � are unbiased estimators ofthat of the original distribution

Co

nfi

den

ce

1.0

0.8

0.6

0.4

0.2

0.0 1.0 0.8 0.6 0.4 0.2 0.0

Support

Page 16: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

12 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Estimating rule significance

A rule is significant if:

1Z

�s

1Z

�c

N�;

1K �(c; s) dc ds > 0:5

�, � are sample mean and covariance matrixK is the number of samplesN is the bivariate normal distribution

Efficient algorithms [Genz, 2004] for numerical integration ofbivariate normal distributions.

The current error probability on rule significance is simply thedistance of this integral to 0 or 1

Page 17: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

13 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Estimating next error

The current distribution N�;� for a rule can be used as anestimator of what the next answer would be

We sample according to N�;�, recom-pute rule significance and error prob-abilities, and deduce from that thenext error probability in this partic-ular case

Co

nfi

den

ce

1.0

0.8

0.6

0.4

0.2

0.0 1.0 0.8 0.6 0.4 0.2 0.0

Support

Co

nfi

den

ce

1.0

0.8

0.6

0.4

0.2

1.0 0.8 0.6 0.4 0.2

Support

0.0 0.0

By averaging over all samples, we obtain an estimate of the nexterror probability

The difference between next error and current error (expectederror reduction) is an estimate of how much we gain by asking aquestion on this rule!

Page 18: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

13 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Estimating next error

The current distribution N�;� for a rule can be used as anestimator of what the next answer would be

We sample according to N�;�, recom-pute rule significance and error prob-abilities, and deduce from that thenext error probability in this partic-ular case

Co

nfi

den

ce

1.0

0.8

0.6

0.4

0.2

0.0 1.0 0.8 0.6 0.4 0.2 0.0

Support

Co

nfi

den

ce

1.0

0.8

0.6

0.4

0.2

1.0 0.8 0.6 0.4 0.2

Support

0.0 0.0

By averaging over all samples, we obtain an estimate of the nexterror probability

The difference between next error and current error (expectederror reduction) is an estimate of how much we gain by asking aquestion on this rule!

Page 19: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

13 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Estimating next error

The current distribution N�;� for a rule can be used as anestimator of what the next answer would be

We sample according to N�;�, recom-pute rule significance and error prob-abilities, and deduce from that thenext error probability in this partic-ular case

Co

nfi

den

ce

1.0

0.8

0.6

0.4

0.2

0.0 1.0 0.8 0.6 0.4 0.2 0.0

Support

Co

nfi

den

ce

1.0

0.8

0.6

0.4

0.2

1.0 0.8 0.6 0.4 0.2

Support

0.0 0.0

By averaging over all samples, we obtain an estimate of the nexterror probability

The difference between next error and current error (expectederror reduction) is an estimate of how much we gain by asking aquestion on this rule!

Page 20: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

13 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Estimating next error

The current distribution N�;� for a rule can be used as anestimator of what the next answer would be

We sample according to N�;�, recom-pute rule significance and error prob-abilities, and deduce from that thenext error probability in this partic-ular case

Co

nfi

den

ce

1.0

0.8

0.6

0.4

0.2

0.0 1.0 0.8 0.6 0.4 0.2 0.0

Support

Co

nfi

den

ce

1.0

0.8

0.6

0.4

0.2

1.0 0.8 0.6 0.4 0.2

Support

0.0 0.0

By averaging over all samples, we obtain an estimate of the nexterror probability

The difference between next error and current error (expectederror reduction) is an estimate of how much we gain by asking aquestion on this rule!

Page 21: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

14 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Putting everything together

Choosethenextquestion

Choosethenextclosedquestion

Openor closedquestion?

Choosecandidaterules

Ranktherulesbygrade

Estimatenext error

Estimatecurrent error

estimatemeandistribution

estimatesampledistribution

estimaterulesignificance

Candidate rules are rules oflength 1, rules for which wehave samples, and rules forwhich subrules are significant(analogous to Apriori [Agrawalet al., 1994])

The grade of a rule is theexpected error reduction whenknown, an estimate based onsubrules otherwise

We decide between closed oropen by flipping a coin(exploitation vs exploration)

Page 22: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

15 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Outline

Introduction

Concepts

Crowd Mining Algorithm

The CrowdMiner System

Experiments

Conclusions

Page 23: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

16 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Architecture

Initial Data

Query Selector Best Rules ExtractorData Aggregator

answer question

Question Display Portal Interface

Rule DatabaseRule Database

resultsuser queryask question

rule+query [rule, conf, supp]

Portal User Interface

Page 24: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

17 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Outcome

Rank Change Rule Support Conf. Error Prob.

1 +1 Morning → Jogging 0.087 0.61 0.521e-11

2 -1 Jogging → Energy Drink, Granola 0.085 0.5 0.66e-8

3 Morning → Coffee 0.067 0.52 0.54e-7… … … … … …

1752 -8 Upset Stomach → Chamomile 0.032 0.05 0.03

1753 Vegetarian, Yoga → Raw Foods 0.009 0.047 0.012… … … … … …

Page 25: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

18 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Outline

Introduction

Concepts

Crowd Mining Algorithm

The CrowdMiner System

Experiments

Conclusions

Page 26: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

19 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Datasets

We experimented on several datasets:

Real-world Retail dataset [Brijs et al., 1999] from a shoppingbasket application; since the data is anonymized, users areassigned transactions in a random fashion

Edits on categories in Simple English Wikipedia: transactions arearticles, items are high-level categories (Wordnet-level classes ofYAGO [Suchanek et al., 2007]) assigned to articles, users areeditors of these articles

Synthetic dataset (not discussed here)

Page 27: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

20 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Experimental setting

Baselines:Random: At each step, we choose a random rule to ask a user

aboutGreedy: Ask about the known rule with the fewest samples

(starting with smaller rules)

Settings:Zero-knowledge: we start with no information about the worldKnown items: the set of items is known, no information about

rulesRule refinement: already know some rules (not discussed here)

We evaluate in terms of precision, recall, F-measure of predictedsignificant rules, as well as absolute number of errors (notdiscussed here)

Page 28: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

21 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

F-measure, zero-knowledge

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 500 1000 1500 2000

F-m

eas

ure

Number of Samples

CrowdMiner

Random

Greedy

Retail dataset

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 500 1000 1500 2000

F-m

eas

ure

Number of Samples

CrowdMinerRandomGreedy

Wikipedia dataset

Page 29: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

22 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Precision and recall, zero-knowledge

0

0.2

0.4

0.6

0.8

1

1.2

0 500 1000 1500 2000

Precision

NumberofSamples

CrowdMinerRandomGreedy

Retail dataset

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 500 1000 1500 2000

Recall

NumberofSamples

CrowdMinerRandomGreedy

Retail dataset

Better precision: we make sure to reduce the global expectednumber of errors; Greedy loses precisions as new rules are explored

Much better recall: due to adding potentially large rules ascandidates once candidate subrules are found (Greedy will onlyadd such rules much later)

Page 30: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

23 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

F-measure, known items

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 500 1000 1500 2000

F-m

eas

ure

Number of Samples

CrowdMinerRandomGreedy

Retail dataset

Good initial precision of the greedy algorithm: the best thing todo is to start by asking about rules of small size anyway

CrowdMiner overtakes Greedy: larger rules are soon madecandidate and their significance assessed

Page 31: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

24 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Outline

Introduction

Concepts

Crowd Mining Algorithm

The CrowdMiner System

Experiments

Conclusions

Page 32: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

25 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

In brief

How to design an interactive poll? Many situations when onewants to find correlations in non-extensionally accessible data

“Crowd-sourced” Apriori (but with subtleties)

Good behavior in practice

Many other design choices for replacing black boxes, especially inthe presence of priors

Connections with active learning [Lindenbaum et al., 2004]

Page 33: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

26 / 26 Télécom PT & Tel Aviv U. Pierre Senellart

Perspectives

What are the best next k questions to ask?Allows parallelization. Also possible to do thatby sampling, and not significantly more costly!

Take into account correlations between rules torefine estimates

Which user to ask which question?

Page 34: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

References I

R. Agrawal, T. Imieliński, and A. Swami. Mining association rulesbetween sets of items in large databases. SIGMOD Record, 22(2),1993.

R. Agrawal, R. Srikant, et al. Fast algorithms for mining associationrules. In VLDB, 1994.

Tom Brijs, Gilbert Swinnen, Koen Vanhoof, and Geert Wets. Usingassociation rules for product assortment decisions: A case study. InKnowledge Discovery and Data Mining, 1999.

Alan Genz. Numerical computation of rectangular bivariate andtrivariate normal and t probabilities. Statistics and Computing, 14,2004.

Page 35: Crowd Mining - (joint work with Y. Amsterdamer, Y ... · 5 December 2012, The University of Hong Kong Crowd Mining (joint work with Y. Amsterdamer, Y. Grossman, and T. Milo) PIERRE

References II

M. Lindenbaum, S. Markovitch, and D. Rusakov. Selective samplingfor nearest neighbor classifiers. Machine Learning, 54(2), 2004.

Aditya G. Parameswaran and Neoklis Polyzotis. Answering queriesusing humans, algorithms and databases. In CIDR, 2011.

Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. YAGO: Acore of semantic knowledge. Unifying WordNet and Wikipedia. InWWW, 2007.