cs224w: social and information network analysis jure...
TRANSCRIPT
![Page 1: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/1.jpg)
CS224w: Social and Information Network Analysis Jure Leskovec, Stanford University
http://cs224w.stanford.edu
![Page 2: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/2.jpg)
What do the following things
have in common?
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 2
![Page 3: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/3.jpg)
World economy 9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 3
![Page 4: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/4.jpg)
Human cell 9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 4
![Page 5: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/5.jpg)
Roads 9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 5
![Page 6: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/6.jpg)
Brain 9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 6
![Page 7: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/7.jpg)
domain2
domain1
domain3
router
Internet 9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 7
![Page 8: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/8.jpg)
Friends & Family 9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 8
![Page 9: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/9.jpg)
Media & Information 9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9
![Page 10: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/10.jpg)
Society 9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 10
![Page 11: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/11.jpg)
The Network! 9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 12
![Page 12: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/12.jpg)
Behind each such system there is an intricate wiring diagram, a network, that
defines the interactions between the components
We will never understand these systems unless we understand the
networks behind it
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 13
![Page 13: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/13.jpg)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 14
Facebook social graph 4-degrees of separation [Backstrom-Boldi-Rosa-Ugander-Vigna, 2011]
![Page 14: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/14.jpg)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 15
Graph of the Internet (Autonomous Systems) Power-law degrees [Faloutsos-Faloutsos-Faloutsos, 1999]
Robustness [Doyle-Willinger, 2005]
![Page 15: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/15.jpg)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 16
Connections between political blogs Polarization of the network [Adamic-Glance, 2005]
![Page 16: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/16.jpg)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 17
Seven Bridges of Königsberg [Euler, 1735]
Return to the starting point by traveling each link of the graph once and only once.
![Page 17: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/17.jpg)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 18
Citation networks and Maps of science [Börner et al., 2012]
![Page 18: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/18.jpg)
19
Understand how humans navigate Wikipedia
Get an idea of how people connect concepts
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis
[West-Leskovec, 2012]
![Page 19: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/19.jpg)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 20
9/11 terrorist network [Krebs, 2002]
![Page 20: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/20.jpg)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 21
Nodes:
Links:
Companies
Investment
Pharma
Research Labs
Public
Biotechnology
Collaborations
Financial
R&D
Bio-tech companies [Powell-White-Koput, 2002]
![Page 21: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/21.jpg)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 22
Human brain has between 10-100 billion neurons
[Sporns, 2011]
![Page 22: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/22.jpg)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 23
Metabolic networks: Nodes: Metabolites and enzymes
Edges: Chemical reactions
Protein-Protein Interaction Networks: Nodes: Proteins
Edges: ‘physical’ interactions
![Page 23: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/23.jpg)
How do we reason about networks? Empirical: Study network data to find organizational
principles
Mathematical models: Probabilistic, graph theory
Algorithms for analyzing graphs What do we hope to achieve from studying
networks? Patterns and statistical properties of network data
Design principles and models
Understand why networks are organized the way they are (Predict behavior of networked systems)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 24
![Page 24: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/24.jpg)
What do we study in networks? Structure and evolution:
What is the structure of a network?
Why and how did it became to have such structure?
Processes and dynamics:
Networks provide “skeleton” for spreading of information, behavior, diseases
How do information and diseases spread?
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 25
![Page 25: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/25.jpg)
Why is the role of networks expanding? Data availability
Rise of Mobile, Web 2.0 and Social media
Universality
Networks from science, nature, and technology are more similar than one would expect
Shared vocabulary between fields
Computer Science, Social science, Physics, Economics, Statistics, Biology
Impact!
Social networking, Social media, Drug design 9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 26
![Page 26: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/26.jpg)
Age and size of networks
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 27
CS!!
![Page 27: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/27.jpg)
Network data: Orders of magnitude
436-node network of email exchange at a corporate research lab [Adamic-Adar, SocNets ‘03]
43,553-node network of email exchange at an university [Kossinets-Watts, Science ‘06]
4.4-million-node network of declared friendships on a blogging community [Liben-Nowell et al., PNAS ‘05]
240-million-node network of communication on Microsoft Messenger [Leskovec-Horvitz, WWW ’08]
800-million-node Facebook network [Backstrom et al. ‘11]
28 9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis
![Page 28: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/28.jpg)
Large on-line applications with
hundreds of millions of users
The Web is our “laboratory” for
understanding the pulse of humanity.
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 29
![Page 29: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/29.jpg)
Google Market cap: $250 billion
Cisco Market cap: $100 billion
FacebookMarket cap: $50 billion
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 30
![Page 30: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/30.jpg)
If you were to understand the spread of diseases, can you do it without social networks?
If you were to understand the WWW structure and information, hopeless without invoking the Web’s topology.
If you want to understand dissemination of news or evolution of science, it is hopeless without considering the information networks
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 33
![Page 31: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/31.jpg)
![Page 32: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/32.jpg)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 35
Bob West (head TA)
Jacob Bank
Ashton Anderson
Yu (Wayne) Wu Anshul Mittal
See course website for office hour schedule!
![Page 33: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/33.jpg)
http://cs224w.stanford.edu
Slides posted at least 30 min before the class
Readings:
Mostly chapters from Easley&Kleinberg
Papers
Optional readings:
Papers and pointers to additional literature
This will be very useful for project proposals
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 36
![Page 34: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/34.jpg)
Piazza Q&A website:
http://piazza.com/stanford/fall2012/cs224w
If you don’t have @stanford.edu email address, send us your email and we will manually register you to Piazza
For e-mailing course staff, always use:
We will post course announcements to Piazza (make sure you check it regularly)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 37
![Page 35: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/35.jpg)
Final grade will be composed of:
Homeworks: 50%
Homework 0: 2%
Homeworks 1,2,3,4: 12% each
Substantial class project: 50%
Proposal: 20%
Project milestone: 15%
Final report: 50%
Poster presentation: 15%
Extra credit for class/Piazza participation
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 38
![Page 36: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/36.jpg)
Week Assignment Due on THU
2 Homework 0 October 4
3 Homework 1 October 11
4 Project proposal October 18
5 Homework 2 October 25
Work on the project
7 Homework 3 November 8
8 Project milestone November 15
Thanksgiving break
9 Homework 4 November 29
Poster session December 10 12:15-3;15pm
Final report December 11 (no late days!)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 39
![Page 37: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/37.jpg)
Assignments take time. Start early! How to submit?
Paper (Print code!): In class and in cabinet in Gates
SCPD: Submit via SCPD
In addition, write-ups (proposal, milestone, final report) have to also be submitted electronically
Email PDF to [email protected]
2 late days for the quarter:
1 late day expires at the start of next class
Max 1 late day per assignment
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 40
![Page 38: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/38.jpg)
Substantial course project:
Experimental evaluation of algorithms and models on an interesting network dataset
A theoretical project that considers a model, an algorithm and derives a rigorous result about it
Develop scalable algorithms for massive graphs
Can become part of SNAP library
Performed in groups of 3 students Project is the main work for the class
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 41
![Page 39: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/39.jpg)
Good background in:
Algorithms
Graph theory
Probability and Statistics
Linear algebra
Programming:
You should be able to write non-trivial programs
4 recitation sessions:
2 to review programming tools (SNAP, NetworkX)
2 to review basic mathematical concepts
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 42
![Page 40: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/40.jpg)
Introduce properties, models and tools for Large real-world networks Processes taking place on networks through real applications and case studies Goal: find patterns, rules, clusters, outliers, …
… in large static and evolving graphs
… in processes spreading over the networks
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 43
![Page 41: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/41.jpg)
Covers a wide range of network analysis techniques – from basic to state-of-the-art
You will learn about things you heard about:
Six degrees of separation, small-world, page rank, network effects, P2P
networks, network evolution, spectral graph theory, virus propagation, link prediction, power-laws, scale free networks, core-periphery, network communities, hubs and authorities, bipartite cores, information cascades, influence maximization, …
Covers algorithms, theory and applications It’s going to be fun
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 44
![Page 42: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/42.jpg)
![Page 43: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/43.jpg)
Network is a collection of objects where some pairs of objects are connected by links What is the structure of the network?
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 46
![Page 44: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/44.jpg)
Objects: nodes, vertices N
Interactions: links, edges E
System: network, graph G(N,E)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 47
![Page 45: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/45.jpg)
Network often refers to real systems
Web, Social network, Metabolic network
Language: Network, node, link
Graph: mathematical representation of a network
Web graph, Social graph (a Facebook term)
Language: Graph, vertex, edge
We will try to make this distinction whenever it is appropriate, but in most cases we will use the two terms interchangeably
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 48
![Page 46: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/46.jpg)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 49
Peter Mary
Albert
Tom
co-worker
friend brothers
friend
Protein 1 Protein 2
Protein 5
Protein 9
Movie 1
Movie 3 Movie 2
Actor 3
Actor 1 Actor 2
Actor 4
|N|=4
|E|=4
![Page 47: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/47.jpg)
Choice of the proper network representation determines our ability to use networks successfully:
In some cases there is a unique, unambiguous representation
In other cases, the representation is by no means unique
The way you assign links will determine the nature of the question you can study
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 50
![Page 48: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/48.jpg)
If you connect individuals that work with each other, you will explore a professional network
If you connect those that have a sexual relationship, you will be exploring sexual networks
If you connect scientific papers that cite each other, you will be studying the citation network
If you connect all papers with the same word in the title, you will be exploring what? It is a network, nevertheless
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 51
![Page 49: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/49.jpg)
Undirected Links: undirected
(symmetrical)
Examples:
Collaborations
Friendship on Facebook
Directed Links: directed
(arcs)
Examples:
Phone calls
Following on Twitter
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 52
A
B
D
C
L
M F
G
H
I
A G
F
B
C
D
E
![Page 50: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/50.jpg)
Bridge edge: If we erase it, the graph becomes disconnected. Articulation point: If we erase it, the graph becomes disconnected.
Largest Component: Giant Component Isolated node (node F)
Connected (undirected) graph:
Any two vertices can be joined by a path.
A disconnected graph is made up by two or more connected components
D C
A
B
H
F
G
D C
A
B
H
F
G
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 53
![Page 51: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/51.jpg)
E
C
A
B
G
F
D
Strongly connected directed graph
has a path from each node to every other node and vice versa (e.g., A-B path and B-A path)
Weakly connected directed graph
is connected if we disregard the edge directions
Graph on the left is connected
but not strongly connected (e.g.,
there is no way to get from F to G
by following the edge directions).
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 54
![Page 52: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/52.jpg)
Q: What does the Web “look like”?
Here is what we will do next:
We will take a real system (i.e., the Web)
We will collect lots of Web data
We will represent the Web it as a graph
We will use language of graph theory to reason about the structure of the graph
Do a computational experiment on the Web graph
Learn something about the structure of the Web!
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 55
![Page 53: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/53.jpg)
Q: What does the Web “look like” at a global level? Web as a graph:
Nodes = web pages
Edges = hyperlinks
Side issue: What is a node?
Dynamic pages created on the fly
“dark matter” – inaccessible database generated pages
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 56
![Page 54: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/54.jpg)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 57
I teach a class on
Networks. CS224W: Classes are
in the Gates
building Computer Science
Department at Stanford
Stanford University
![Page 55: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/55.jpg)
In early days of the Web links were navigational Today many links are transactional
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 58
I teach a class on
Networks. CS224W: Classes are
in the Gates
building Computer Science
Department at Stanford
Stanford University
![Page 56: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/56.jpg)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 59
![Page 57: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/57.jpg)
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 60
Citations References in an Encyclopedia
![Page 58: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/58.jpg)
How is the Web linked? What is the “map” of the Web?
Web as a directed graph [Broder et al. 2000]:
Given node v, what can v reach?
What other nodes can reach v?
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 61
In(v) = {w | w can reach v}
Out(v) = {w | v can reach w}
E
C
A
B
G
F
D
For example:
In(A) = {A,B,C,E,G}
Out(A)={A,B,C,D,F}
![Page 59: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/59.jpg)
Two types of directed graphs:
Strongly connected:
Any node can reach any node via a directed path
In(A)=Out(A)={A,B,C,D,E}
DAG – Directed Acyclic Graph:
Has no cycles: if u can reach v, then v can not reach u
Any directed graph can be expressed in terms of these two types!
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 62
E
C
A
B
D
E
C
A
B
D
![Page 60: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/60.jpg)
Strongly connected component (SCC) is a set of nodes S so that:
Every pair of nodes in S can reach each other
There is no larger set containing S with this property
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 63
E
C
A
B
G
F
D
Strongly connected
components of the graph:
{A,B,C,G}, {D}, {E}, {F}
![Page 61: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/61.jpg)
Fact: Every directed graph is a DAG on its SCCs
(1) SCCs partitions the nodes of G
Each node is in exactly one SCC
(2) If we build a graph G’ whose nodes are SCCs, and with an edge between nodes of G’ if there is an edge between corresponding SCCs in G, then G’ is a DAG
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 64
E
C
A
B
G
F
D
(1) Strongly connected components of
graph G: {A,B,C,G}, {D}, {E}, {F}
(2) G’ is a DAG:
G G’ {A,B,C,G}
{E}
{D}
{F}
![Page 62: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/62.jpg)
Claim: SCCs partitions nodes of G. This means: Each node is member of exactly 1 SCC.
Proof by contradiction:
Suppose there exists a node v which is a member of 2 SCCs S and S’.
But then S S’ is one large SCC!
Contradiction!
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 65
v
S S’
![Page 63: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/63.jpg)
Claim: G’ (graph of SCCs) is a DAG. This means: G’ has no cycles.
Proof by contradiction: Assume G’ is not a DAG
Then G’ has a directed cycle.
Now all nodes on the cycle are mutually reachable, and all are part of the same SCC
But then G’ is not a graph of connections between SCCs (SCCs are defined as maximal sets) Contradiction!
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 66
{A,B,C,G}
{E}
{D}
{F}
Now {A,B,C,G,E,F} is a SCC!
G’
G’ {A,B,C,G}
{E}
{D}
{F}
![Page 64: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/64.jpg)
Goal: Take a large snapshot of the Web and try to understand how its SCCs “fit together” as a DAG
Computational issue:
Want to find a SCC containing node v?
Observation:
Out(v) … nodes that can be reached from v
SCC containing v is: Out(v) ∩ In(v)
= Out(v,G) ∩ Out(v,G), where G is G with all edge directions flipped
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 67
v
Out(v)
![Page 65: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/65.jpg)
For example:
Out(A) = {A, B, C, D, E, F, G}
In(A) = {A, B, D, E, F, G}
So, SCC(A) = Out(A) ∩ In(A) = {A, B, D, E, F, G}
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 68
E
C
A
B
D
F
G
![Page 66: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/66.jpg)
There is a giant SCC There won’t be 2 giant SCCs Heuristic argument:
It just takes 1 page from one SCC to link to the other SCC
If the 2 SCCs have millions of pages the likelihood of this not happening is very very small
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 69
Giant SCC1 Giant SCC2
![Page 67: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/67.jpg)
Broder et al., 2000:
Altavista crawl from October 1999
203 million URLS
1.5 billion links
Computer: Server with 12GB of memory
Undirected version of the Web graph:
91% nodes in the largest weakly conn. component
Are hubs making the web graph connected?
Even if they deleted links to pages with in-degree >10 WCC was still ≈50% of the graph
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 70
![Page 68: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/68.jpg)
Directed version of the Web graph:
Largest SCC: 28% of the nodes (56 million)
Taking a random node v
Out(v) ≈ 50% (100 million)
In(v) ≈ 50% (100 million)
What does this tell us about the conceptual
picture of the Web graph?
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 71
![Page 69: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/69.jpg)
203 million pages, 1.5 billion links [Broder et al. 2000] 9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 72
318 A. Broder et al. / Computer Networks 33 (2000) 309–320
Fig. 9. Connectivity of the Web: one can pass from any node of IN through SCC to any node of OUT. Hanging off IN and OUT are
TENDRILS containing nodes that are reachable from portions of IN, or that can reach portions of OUT, without passage through SCC. It
is possible for a TENDRIL hanging off from IN to be hooked into a TENDRIL leading into OUT, forming a TUBE: i.e., a passage from
a portion of IN to a portion of OUT without touching SCC.
regions have, if we explore in the direction ‘away’
from the center? The results are shown below in the
row labeled ‘exploring outward – all nodes’.
Similarly, we know that if we explore in-links
from a node in OUT, or out-links from a node in
IN, we will encounter about 100 million other nodes
in the BFS. Nonetheless, it is reasonable to ask:
how many other nodes will we encounter? That is,
starting from OUT (or IN), and following in-links
(or out-links), how many nodes of TENDRILS and
OUT (or IN) will we encounter? The results are
shown below in the row labeled ‘exploring inwards
– unexpected nodes’. Note that the numbers in the
table represent averages over our sample nodes.
Starting point OUT IN
Exploring outwards – all nodes 3093 171
Exploring inwards – unexpected nodes 3367 173
As the table shows, OUT tends to encounter larger
neighborhoods. For example, the second largest
strong component in the graph has size approxi-
mately 150 thousand, and two nodes of OUT en-
counter neighborhoods a few nodes larger than this,
suggesting that this component lies within OUT. In
fact, considering that (for instance) almost every cor-
porate Website not appearing in SCC will appear in
OUT, it is no surprise that the neighborhood sizes
are larger.
3.3. SCC
Our sample contains 136 nodes from the SCC.
To determine other properties of SCC, we require
a useful property of IN and OUT: each contains a
few long paths such that, once the BFS proceeds
beyond a certain depth, only a few paths are being
explored, and the last path is much longer than any
of the others. We can therefore explore the radius
at which the BFS completes, confident that the last
![Page 70: CS224w: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/01-intro.pdf · Rise of Mobile, Web 2.0 and Social media ... through real applications](https://reader034.vdocuments.us/reader034/viewer/2022050413/5f899bbd266c6f4e62578818/html5/thumbnails/70.jpg)
Learn: Some conceptual organization of the Web (i.e., the bowtie)
Not learn: Treats all pages as equal
Google’s homepage == my homepage
What are the most important pages How many pages have k in-links as a function of k?
The degree distribution: ~ 1 / k2
Link analysis ranking -- as done by search engines (PageRank)
Internal structure inside giant SCC Clusters, implicit communities?
How far apart are nodes in the giant SCC: Distance = # of edges in shortest path
Avg = 16 [Broder et al.]
9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 73