how does the web really grow? · web’s link structure •the web is a small-world: small diameter...

19
How does the Web really grow? Filippo Menczer School of Informatics Department of Computer Science Indiana University http://informatics. indiana .edu/ fil / Research supported by NSF CAREER Award IIS-0133124

Upload: others

Post on 04-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

How does the Webreally grow?

Filippo MenczerSchool of Informatics

Department of Computer Science

Indiana University

http://informatics.indiana.edu/fil/

Research supported by NSF CAREER Award IIS-0133124

Page 2: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

Conclusion

To explain the emergenttopology of the Web,

we model the cognitiveprocesses behind Web

dynamics – content matters!

Page 3: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

Characterizing theWeb’s link structure

• The Web is a small-world:small diameter(Barabasi, Albert & al @ ND,Huberman,Adamic & al @ PARC/HP)

• The Web is a bow tie(Broder & al @ AltaVista/Compaq/IBM)

• The Web is scale-free: power laws(all of the above)

• Subsets of the Web are not scale free(Pennock, Lawrence, Giles & al @ NECI)

log degree

log N

Page 4: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

Web growth models

Pr(i) µ k(i)• Preferential attachment “BA”

–At each step t add page pt

–Create m new links from pt to pi<t

(Barabasi & Albert 1999, de Solla Price 1976)

Pr(i) µh(i)k(i)• Modified BA / fitness / hidden variable(Bianconi & Barabasi 2001, Adamic & Huberman 2000…)

Pr(i) µy ⋅ k(i) + (1-y) ⋅ c• Mixture

(Pennock & al. 2002,Cooper & Frieze 2001, Dorogovtsev & al 2000)

Pr(i) µy ⋅ Pr( j Æ i) + (1-y) ⋅ c• Web copying (Kleinberg, Kumar & al 1999, 2000)

i = argmin(frit + gi)• Mixture with metric distance

(Fabrikant, Koutsoupias & Papadimitriou 2002, Aldous 2003)

classes

Page 5: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

Mapping the relationshipbetween topologies

• Given any pair of pages, need ‘similarity’ or‘proximity’ metric for each topology:– Content: textual/lexical (cosine) similarity– Link: co-citation/bibliographic coupling

• Data: Open Directory Project (dmoz.org)– RDF Snapshot: 2002-02-14 04:01:50 GMT– After cleanup: 896,233 URLs in 97,614 topics– After sampling: 150,000 URLs in 47,174 topics

• 10,000 from each of 15 top-level branches

Page 6: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

Content similarity†

s c p1, p2( ) =p1 ⋅ p2

p1 ⋅ p2

term i

term j

term k

p1p2

p1 p2

s l (p1, p2) =Up1

«Up2

Up1»Up2

Link similarity

Page 7: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

Individual metric distributions

Page 8: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

Correlationbetweencontentand link

similarity

0 0.1 0.2 0.3 0.4 0.5

Arts

Business

Shopping

Society

Recreation

Computers

All pairs

Health

Science

Games

Kids and Teens

Sports

Adult

Reference

News

Home

r

3.84 x 109

pairs

Page 9: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

Joint distribution map

sc

sl

100

>104

Page 10: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

Link probabilityvs lexical distance

r =1 s c -1

Pr(l | r) =(p,q) : r = r Ÿs l > l

(p,q) : r = r

Phasetransition

Power law tail

Pr(l | r) ~ r-a(l)

r*

Proc. Natl.Acad. Sci. USA99(22): 14014-14019, 2002

Page 11: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

Local content-based growth model

g=4.3±0.1

g=4.26±0.07

• Similar to preferentialattachment (BA)

• At each step t add page pt

• Create m new links frompt to existing pages

• Use degree (k) info onlyfor nearby pages(popularity/importance of

similar/related pages)

Pr(pt Æ pi< t ) =k(i)mt

Pr(pt Æ pi< t ) =k(i)mt

if r(pi, pt ) < r*

c[r(pi, pt )]-a otherwise

Ï Ì Ô

Ó Ô

Page 12: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

So, many models get thedegree distribution right

• Which is "right"?• Need an independent observation (other

than degree) to validate models• Distribution of content similarity across

linked pairs– Across all pairs:

Page 13: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

None of these models is right!

Page 14: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

Back to the mixture model

• Extra free parameter– Can tune popularity with "random"

– Bias choice by content similarity insteadof uniform distribution

Pr(i) µy ⋅ k(i) + (1-y) ⋅ c• Pennock & al. 2002• Cooper & Frieze 2001• Dorogovtsev & al 2000

Page 15: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

Mixture model class

• Degree-uniform mixture:

• Degree-similarity mixture:

Page 16: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

Both mixture models get thedegree distribution right…

Page 17: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

…but the degree-similarity mixturemodel predicts the similarity

distribution much better

Proc. Natl.Acad. Sci. USA,

forthcoming

Page 18: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

Conclusion

• To explain the emergent topology of theWeb, we model the cognitive processesbehind Web dynamics – content matters!

• What about paper citation networks?

Page 19: How does the Web really grow? · Web’s link structure •The Web is a small-world: small diameter (Barabasi, Albert & al @ ND, Huberman,Adamic & al @ PARC/HP) •The Web is a bow

Questions?

http://informatics.indiana.edu/fil/

Research supported by NSF CAREER Award IIS-0133124