Transcript
Page 1: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 1

The Logarithmic Dimension Hypothesis

Anthony BonatoRyerson University

MITACS International Problem Solving Workshop

July 2012

Page 2: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 2

Workshop team• David Gleich (Purdue)

• Dieter Mitsche (Ryerson)

• Stephen Young (UCSD)

• Myunghwan Kim (Stanford)

• Amanda Tian (York)

Page 3: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 3

Friendship networks• network of friends (some real, some virtual) form

a large web of interconnected links

Page 4: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 4

6 degrees of separation

• (Stanley Milgram, 67): famous chain letter experiment

Page 5: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 5

6 Degrees in Facebook?• 900 million users, > 70

billion friendship links• (Backstrom et al., 2012)

– 4 degrees of separation in Facebook

– when considering another person in the world, a friend of your friend knows a friend of their friend, on average

• similar results for Twitter and other OSNs

Page 6: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 6

Complex Networks• web graph, social networks, biological networks, internet

networks, …

Page 7: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 7

The web graph

• nodes: web pages

• edges: links

• over 1 trillion nodes, with billions of nodes added each day

Page 8: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 8

On-line Social Networks (OSNs)Facebook, Twitter, LinkedIn, Google+…

Page 9: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 9

Key parameters• degree distribution:

• average distance:

• clustering coefficient:

|})deg(:)({| , iuGVuN ni

)(,

1

2),()(

GVvu

nvudGL

)(

1-1

)()( ,2

)deg(|))((| )(

GVxxcnGC

xxNExc

Page 10: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 10

Properties of Complex Networks• power law degree distribution

(Broder et al, 01)

2 some ,, bniN bni

Page 11: The Logarithmic Dimension Hypothesis

Power laws in OSNs (Mislove et al,07):

Log Dimension Hypothesis 11

Page 12: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 12

Small World Property• small world networks

(Watts & Strogatz,98)– low distances

• diam(G) = O(log n)• L(G) = O(loglog n)

– higher clustering coefficient than random graph with same expected degree

Page 13: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 13

Sample data: Flickr, YouTube, LiveJournal, Orkut

• (Mislove et al,07): short average distances and high clustering coefficients

Page 14: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 14

• (Zachary, 72)

• (Mason et al, 09)

• (Fortunato, 10)

• (Li, Peng, 11): small community property

Community structure

Page 15: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 15

(Leskovec, Kleinberg, Faloutsos,05):• densification power law: average degree is

increasing with time• decreasing distances

• (Kumar et al, 06): observed in Flickr, Yahoo! 360

Page 16: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 16

Geometry of OSNs?

• OSNs live in social space: proximity of nodes depends on common attributes (such as geography, gender, age, etc.)

• IDEA: embed OSN in 2-, 3- or higher dimensional space

Page 17: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 17

Dimension of an OSN• dimension of OSN: minimum number of

attributes needed to classify nodes

• like game of “20 Questions”: each question narrows range of possibilities

• what is a credible mathematical formula for the dimension of an OSN?

Page 18: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 18

Geometric model for OSNs• we consider a geometric

model of OSNs, where– nodes are in m-

dimensional Euclidean space

– threshold value variable: a function of ranking of nodes

Page 19: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 19

Geometric Protean (GEO-P) Model(Bonato, Janssen, Prałat, 12)

• parameters: α, β in (0,1), α+β < 1; positive integer m• nodes live in m-dimensional hypercube (torus metric)• each node is ranked 1,2, …, n by some function r

– 1 is best, n is worst – we use random initial ranking

• at each time-step, one new node v is born, one randomly node chosen dies (and ranking is updated)

• each existing node u has a region of influence with volume

• add edge uv if v is in the region of influence of u nr

Page 20: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 20

Notes on GEO-P model

• models uses both geometry and ranking• number of nodes is static: fixed at n

– order of OSNs at most number of people (roughly…)

• top ranked nodes have larger regions of influence

Page 21: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 21

Simulation with 5000 nodes

Page 22: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 22

Simulation with 5000 nodes

random geometric GEO-P

Page 23: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 23

Properties of the GEO-P model (Bonato, Janssen, Prałat, 2012)

• a.a.s. the GEO-P model generates graphs with the following properties:– power law degree distribution with exponent

b = 1+1/α– average degree d = (1+o(1))n(1-α-β)/21-α

• densification– diameter D = O(nβ/(1-α)m log2α/(1-α)m n)

• small world: constant order if m = Clog n– clustering coefficient larger than in comparable

random graph

Page 24: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 24

Spectral properties• the spectral gap λ of G is defined by the

difference between the two largest eigenvalues of the adjacency matrix of G

• for G(n,p) random graphs, λ tends to 0 as order grows

• in the GEO-P model, λ is close to 1• (Estrada, 06): bad spectral expansion in real

OSN data

Page 25: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 25

Dimension of OSNs

• given the order of the network n, power law exponent b, average degree d, and diameter D, we can calculate m

• gives formula for dimension of OSN:

Dn

nd

bb

Dnm

loglog

loglog

211

loglog

Page 26: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 26

6 Dimensions of Separation

OSN DimensionFacebook 7YouTube 6Twitter 4Flickr 4

Cyworld 7

Page 27: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 27

Uncovering the hidden reality• reverse engineering approach

– given network data (n, b, d, D), dimension of an OSN gives smallest number of attributes needed to identify users

• that is, given the graph structure, we can (theoretically) recover the social space

Page 28: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 28

Logarithmic Dimension Hypothesis

• Logarithmic Dimension Hypothesis (LDH): the dimension of an OSN is best fit by about log n, where n is the number of users OSN– theoretical evidence GEO-P and MAG

(Leskovec, Kim,12) models–empirical evidence? – (Sweeney, 2001)

Page 29: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 29

Experimental design• supervised machine learning

– Alternating Decision Trees (ADT)– approach of (Janssen et al, 12+) based on earlier

work on PIN by (Middendorf et al, 05)• classify OSN data vs simulated graphs from GEO-P

model in various dimensions• develop a feature vector (graphlets, degree distribution

percentiles, average distance, etc) to classify the correct dimension

• ADT will classify which dimension best fits the data – cross-validation and robustness testing

Page 30: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 30

Example

Page 31: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 31

• preprints, reprints, contact:search: “Anthony Bonato”

Page 32: The Logarithmic Dimension Hypothesis

Log Dimension Hypothesis 32


Top Related