the structure of the web. getting to knowing the web how big is the web and how do you measure it?...

15
The Structure of the Web

Upload: sharyl-reynolds

Post on 17-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

The Structure of the Web

Page 2: The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

Getting to knowing the Web

How big is the web and how do you measure it?How many people use the web?How many use search engines?What is the shape of the web?

How do people search for information?Can we categorize web searchers?

Page 3: The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

The Web Is a Graph“M

ap o

f the

Inte

rnet

” (1

998)

Page 4: The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

Examples of Graphs

i

ab

c

g

edf

h

Page 5: The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

Graph G: Formal Definition

• A graph G consists of two sets G = {V, E}– A set V of vertices, or nodes

(entities)– A set E VV of edges

(relationships between entities)

i

ab

c

g

edf

h

Nodes are students in a large HS, edges join two who had a romantic relationship at some point during the 18-month period in which the study was conducted

Page 6: The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

Peter Mary

Albert

Albert

co-worker

friendbrothers

friend

Protein 1 Protein 2

Protein 5

Protein 9

Movie 1

Movie 3Movie 2

Actor 3

Actor 1 Actor 2

Actor 4

N=4L=4

Network Science: Graph Theory 2012

Examples of Relationships (Graphs)

Page 7: The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

Connected, Paths and Distances

• A Connected graph• has a path between

each pair of distinct vertices• A Path AB is

• A sequence of edgesfrom A to B

• Minimum Distance• The shortest path • How do you measure it?

Page 8: The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

Computing Minimum Distances

• Algorithm:

Page 9: The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

Directed Graphs and DAGs

• Directed graph– Each edge is a directed edge, or an arc, or a link– Can have two arcs between a pair of vertices,

one in each direction– Vertex y is adjacent to vertex x iff

there is a directed edge from x to y

• Directed Path – A sequence of directed edges between two vertices

• Directed Acyclic Graph (DAG)– Directed graph that has no cycles

Page 10: The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

The WEB is aDIRECTED GRAPH

Page 11: The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

Structureof The Web• A Web Page

corresponds to a node

• A Hyperlink corresponds to a directed edge

Page 12: The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

Structureof The Web

• The Web is a directed graphwith one largeStronglyConnectedComponent

• Is there a directed path fromthe Univ. of Xto Company Z’s Home?

• How aboutto USNews College Rankings?

• The other way(s) around?

Page 13: The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

How big is the web?

Number of accessible web pages – May 2005 estimate: 11.5 Billion pagesMost recent estimates? ________

The deep (or hidden or invisible) web “contains 400-550 times more information”

(Are they serious?)

Coverage (i.e. the proportion of the web indexed) is crucial for search engines.Today, ____________ pages are indexed

Page 14: The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

How do you measure the size of web?

Capture-recapture method SE1 = # of pages indexed search engine 1. QSE2 = # of pages returned by search engine 2 for “typical” queries. OVR = # of pages returned by both search engines for typical queries.

Estimate :SE1 / WWW = OVR / QSE2 =>WWW = (SE1 x QSE2) / OVR

SE1 OVR

QSE2

WWW

Page 15: The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

How hard is it to go from one page to another?

Over 75% of the time there is no directed path from one random web page to another.

When a directed path exists its average length is 16 clicks.

When an undirected path exists its average length is 7 clicks.

Short average path between pairs of nodes is characteristic of a small-world network.

Kleinberg: The small-world phenomenon (we will study later)