link prediction with the linkpred tool

46
Measuring scholarly impact: Methods and practice Link prediction with the linkpred tool Raf Guns University of Antwerp raf.guns @ uantwerpen.be

Upload: rafg

Post on 06-Aug-2015

257 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Link prediction with the linkpred tool

Measuring scholarly impact: Methods and practice

Link prediction with the linkpred tool

Raf GunsUniversity of [email protected]

Page 2: Link prediction with the linkpred tool

If you want to follow along…

Download and install Anaconda Python from http://continuum.io/downloads

Download the example data from http://bit.ly/1HpZvIa

Page 3: Link prediction with the linkpred tool

“A pair of scientists who have five mutual previous collaborators, for instance, are about twice as likely to collaborate as a pair with only two, and about 200 times as likely as a pair with none.” (Newman, 2001; emphasis mine)

Page 4: Link prediction with the linkpred tool

Agenda

What is link prediction? (and why?)

Example data

The linkpred tool

Link prediction in practice

Conclusion

Page 5: Link prediction with the linkpred tool

What is link prediction?

Page 6: Link prediction with the linkpred tool

Networks

Page 7: Link prediction with the linkpred tool

Networks in informetrics

Citation Papers Journals Authors Patents …

Collaboration Authors Institutions Countries …

Co-citation Bibliographic coupling Web links And so on

Page 8: Link prediction with the linkpred tool

Definitions

A network G = (V, E) consists of: A set of nodes or vertices V A set of links or edges E

Each link connects two nodes from V

Neighbourhood N(v) of node v: all nodes connected to v

Node degree |N(v)| of v: number of connected nodes = number of items in set N(v)

Page 9: Link prediction with the linkpred tool

Change in networks

Most networks are not static, e.g. in collaboration network: New authors appear Old authors disappear New collaborations are initiated Previous collaborators stop collaborating

Page 10: Link prediction with the linkpred tool

Change in networks

Some changes are more plausible than others

Page 11: Link prediction with the linkpred tool

Change in networks

Different mechanisms have been identified

Assortativity: similar nodes are more likely to connect

Preferential attachment: well-connected nodes attract more new connections

Cf. cumulative advantage, Matthew effect

Page 12: Link prediction with the linkpred tool

The link prediction question

Liben-Nowell and Kleinberg (2003, 2007):

“Given a snapshot of a social network, can we infer whichnew interactions among its members are likely to occurin the near future?”

Page 13: Link prediction with the linkpred tool

Link prediction steps

1. Data gathering

2. Preprocessing

3. Prediction

4. Evaluation

Page 14: Link prediction with the linkpred tool

Steps

Page 15: Link prediction with the linkpred tool

Why link prediction?

You want to know which links will appear in the future

Recommendation

Finding missing links

Finding ‘anomalous’ links (correct or incorrect)

Evaluating network formation and evolution models

Page 16: Link prediction with the linkpred tool

Our example data

Page 17: Link prediction with the linkpred tool

Data

Guns and Rousseau (2013) Collaboration between

cities in Africa and South-Asia

Topic: malaria In three consecutive

time periods

Available as three Pajek network files: http://bit.ly/1HpZvIa

Page 18: Link prediction with the linkpred tool

1997-2001

Page 19: Link prediction with the linkpred tool

2002-2006

Page 20: Link prediction with the linkpred tool

2007-2011

Page 21: Link prediction with the linkpred tool

The linkpred tool

Page 22: Link prediction with the linkpred tool

About

https://github.com/rafguns/linkpred

Cross-platform (written in Python)

Open source: BSD license

Command-line tool!

Alternative: LPmade (https://github.com/rlichtenwalter/LPmade)

Page 23: Link prediction with the linkpred tool

How and where to get linkpred

1. Install Anaconda Python: http://continuum.io/downloads

2. Open command-line window3. Run command:

> pip install https://github.com/rafguns/linkpred/archive/stable.zip

4. Wait until installation is finished

Page 24: Link prediction with the linkpred tool

Basic usage

> linkpredShould display brief usage instructions

> linkpred --helpDisplays more complete help output

Page 25: Link prediction with the linkpred tool

Basic usage

> linkpred training-network-file --predictors predictor --output output-type

Read the network in training-network-file, predict using predictor and give output of output-type

> linkpred training-network-file test-network-file --predictors predictor --output output-type

Read the network in training-network-file, compare with test-network-file, predict using predictor and give output of output-type

Page 26: Link prediction with the linkpred tool

Link prediction in practice

Page 27: Link prediction with the linkpred tool

Preprocessing

Nodes may also appear and disappear Restrict to intersection of node sets of training and test

network Only where test network is available

Restrict by degree (default: only discard isolate nodes)

Directed networks: not supported Convert to undirected first

Page 28: Link prediction with the linkpred tool

Prediction: choosing predictors

Local AdamicAdar AssociationStrength CommonNeighbours Cosine DegreeProduct Jaccard MaxOverlap MinOverlap NMeasure Pearson ResourceAllocation

Global GraphDistance Katz RootedPageRank SimRank

Other Community Copy Random

Page 29: Link prediction with the linkpred tool

Local predictors

Tendency towards triadic closure

Number of common neighbours is a simple but powerful predictor.

Page 30: Link prediction with the linkpred tool

Local predictors

Common neighbours

Normalizations of common neighbours Jaccard coefficient, cosine measure…

Adamic/Adar (Adamic & Adar, 2003)

Page 31: Link prediction with the linkpred tool

Weighted networks

In weighted networks, links have weights (e.g. number of joint papers, number of citations…)

Link weights : often ignored!!

Most predictors in linkpred can use link weights General idea: higher link weight (e.g., more common

papers), stronger connection

Page 32: Link prediction with the linkpred tool

Global predictors

Graph distance: lowest number of links needed to travel from a to b problem: small world

phenomenon

Page 33: Link prediction with the linkpred tool

Global predictors

Katz (1953):

: 1 if i and j are linked, 0 otherwise : number of walks with length k from i to j : parameter, “probability of effectiveness of a single link”

Longer walks: lower effectiveness

Page 34: Link prediction with the linkpred tool

Global predictors

Rooted PageRank

Page 35: Link prediction with the linkpred tool

Global predictors

Rooted PageRank

Page 36: Link prediction with the linkpred tool

Global predictors

SimRank (Jeh & Widom, 2002)

“Objects that link to similar objects are similar themselves.”

Starting point: a node is maximally similar to itself:W(v, v) = 1

Page 37: Link prediction with the linkpred tool

Demo

Predict

Save predictions to file import in e.g. Excel

Page 38: Link prediction with the linkpred tool

Evaluation

Step 4: ‘How well does it work?’

How? compare to ‘known good’ test network

Four groups:

Link Non-link

Predicted True positive False positive

Not predicted False negative True negative

Page 39: Link prediction with the linkpred tool

Evaluation

Simply save results to text file:--output cache-evaluations

Create chart: Recall-precision ROC

Page 40: Link prediction with the linkpred tool

Evaluation: recall-precision

Precision: fraction of correct predictions

Recall: fraction of correctly predicted links

Page 41: Link prediction with the linkpred tool

Evaluation: ROC

False positive rate:Fraction of incorrectly

predicted links

True positive rate: fraction of correctly

predicted links(= recall)

Page 42: Link prediction with the linkpred tool

Profiles

A simple way to save and reuse the configuration of a complex prediction run (options, predictors, parameters…)

Usage example:> linkpred network-file --profile profile.yml

Format: YAML, see https://en.wikipedia.org/wiki/YAML

Page 43: Link prediction with the linkpred tool

Example profile

predictors: - name: AdamicAdar displayname: Adamic/Adar - name: GraphDistance displayname: Graph distance parameters: weight: weight - name: SimRank displayname: SimRank (c=0.4) parameters: c: 0.4

- name: SimRank displayname: SimRank (c=0.8) parameters: c: 0.8output: - cache-predictions - recall-precision

Page 44: Link prediction with the linkpred tool

Conclusion

Page 45: Link prediction with the linkpred tool

About link prediction

Link prediction is possible because link formation is not a purely random process

Limitations: Unaware of social and other circumstantial factors Which predictor is ‘best’ for a concrete situation? Trade-off between prediction accuracy and non-triviality

Page 46: Link prediction with the linkpred tool

About linkpred

Relatively simple but powerful

Limitations: Not suitable for very large and/or dense networks Does not incorporate more complex setups like predictor

combinations, machine learning etc.

All results can be exported for analysis in other software (cache-*)

Open source: contributions welcome!