ba990/ece590: statistical inference on graphsjx77/intro_ba990.pdf4 final project (40%) i team of 1-2...

53
BA990/ECE590: Statistical inference on graphs Spring 2020

Upload: others

Post on 15-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

BA990/ECE590: Statistical inference on graphs

Spring 2020

Page 2: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

• Schedule: Wed and Fri, 3:05PM - 4:20PM, Seminar Room G, Fuqua

• Instructor: Prof. Jiaming Xu [email protected], Fuqua W313I Office hours: by appointment

• Course website:https://faculty.fuqua.duke.edu/~jx77/Course_graphs.html

Page 3: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Course prerequisites:

I Maturity with probability theory and linear algebraI Familiarity with statistical theory, optimization, and algorithms

2 Participation (30%):

I Attending classes on time :-)I Raising good questions or providing good answers to questionsI Proofreading lecture notes and pointing out errors

3 Homework (30%): three to four problem sets

4 Final project (40%)

I Team of 1-2 studentsI Either presenting paper(s) or a standalone research projectI Guidelines and project ideas will be announced before March 1I Mid-term project report due March 18 (after spring break)I Project presentation (15 mins each team) on April 10 and 15I Final project report due April 27

Page 4: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Course prerequisites:I Maturity with probability theory and linear algebra

I Familiarity with statistical theory, optimization, and algorithms

2 Participation (30%):

I Attending classes on time :-)I Raising good questions or providing good answers to questionsI Proofreading lecture notes and pointing out errors

3 Homework (30%): three to four problem sets

4 Final project (40%)

I Team of 1-2 studentsI Either presenting paper(s) or a standalone research projectI Guidelines and project ideas will be announced before March 1I Mid-term project report due March 18 (after spring break)I Project presentation (15 mins each team) on April 10 and 15I Final project report due April 27

Page 5: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Course prerequisites:I Maturity with probability theory and linear algebraI Familiarity with statistical theory, optimization, and algorithms

2 Participation (30%):

I Attending classes on time :-)I Raising good questions or providing good answers to questionsI Proofreading lecture notes and pointing out errors

3 Homework (30%): three to four problem sets

4 Final project (40%)

I Team of 1-2 studentsI Either presenting paper(s) or a standalone research projectI Guidelines and project ideas will be announced before March 1I Mid-term project report due March 18 (after spring break)I Project presentation (15 mins each team) on April 10 and 15I Final project report due April 27

Page 6: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Course prerequisites:I Maturity with probability theory and linear algebraI Familiarity with statistical theory, optimization, and algorithms

2 Participation (30%):

I Attending classes on time :-)I Raising good questions or providing good answers to questionsI Proofreading lecture notes and pointing out errors

3 Homework (30%): three to four problem sets

4 Final project (40%)

I Team of 1-2 studentsI Either presenting paper(s) or a standalone research projectI Guidelines and project ideas will be announced before March 1I Mid-term project report due March 18 (after spring break)I Project presentation (15 mins each team) on April 10 and 15I Final project report due April 27

Page 7: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Course prerequisites:I Maturity with probability theory and linear algebraI Familiarity with statistical theory, optimization, and algorithms

2 Participation (30%):I Attending classes on time :-)

I Raising good questions or providing good answers to questionsI Proofreading lecture notes and pointing out errors

3 Homework (30%): three to four problem sets

4 Final project (40%)

I Team of 1-2 studentsI Either presenting paper(s) or a standalone research projectI Guidelines and project ideas will be announced before March 1I Mid-term project report due March 18 (after spring break)I Project presentation (15 mins each team) on April 10 and 15I Final project report due April 27

Page 8: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Course prerequisites:I Maturity with probability theory and linear algebraI Familiarity with statistical theory, optimization, and algorithms

2 Participation (30%):I Attending classes on time :-)I Raising good questions or providing good answers to questions

I Proofreading lecture notes and pointing out errors

3 Homework (30%): three to four problem sets

4 Final project (40%)

I Team of 1-2 studentsI Either presenting paper(s) or a standalone research projectI Guidelines and project ideas will be announced before March 1I Mid-term project report due March 18 (after spring break)I Project presentation (15 mins each team) on April 10 and 15I Final project report due April 27

Page 9: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Course prerequisites:I Maturity with probability theory and linear algebraI Familiarity with statistical theory, optimization, and algorithms

2 Participation (30%):I Attending classes on time :-)I Raising good questions or providing good answers to questionsI Proofreading lecture notes and pointing out errors

3 Homework (30%): three to four problem sets

4 Final project (40%)

I Team of 1-2 studentsI Either presenting paper(s) or a standalone research projectI Guidelines and project ideas will be announced before March 1I Mid-term project report due March 18 (after spring break)I Project presentation (15 mins each team) on April 10 and 15I Final project report due April 27

Page 10: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Course prerequisites:I Maturity with probability theory and linear algebraI Familiarity with statistical theory, optimization, and algorithms

2 Participation (30%):I Attending classes on time :-)I Raising good questions or providing good answers to questionsI Proofreading lecture notes and pointing out errors

3 Homework (30%): three to four problem sets

4 Final project (40%)

I Team of 1-2 studentsI Either presenting paper(s) or a standalone research projectI Guidelines and project ideas will be announced before March 1I Mid-term project report due March 18 (after spring break)I Project presentation (15 mins each team) on April 10 and 15I Final project report due April 27

Page 11: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Course prerequisites:I Maturity with probability theory and linear algebraI Familiarity with statistical theory, optimization, and algorithms

2 Participation (30%):I Attending classes on time :-)I Raising good questions or providing good answers to questionsI Proofreading lecture notes and pointing out errors

3 Homework (30%): three to four problem sets

4 Final project (40%)

I Team of 1-2 studentsI Either presenting paper(s) or a standalone research projectI Guidelines and project ideas will be announced before March 1I Mid-term project report due March 18 (after spring break)I Project presentation (15 mins each team) on April 10 and 15I Final project report due April 27

Page 12: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Course prerequisites:I Maturity with probability theory and linear algebraI Familiarity with statistical theory, optimization, and algorithms

2 Participation (30%):I Attending classes on time :-)I Raising good questions or providing good answers to questionsI Proofreading lecture notes and pointing out errors

3 Homework (30%): three to four problem sets

4 Final project (40%)I Team of 1-2 students

I Either presenting paper(s) or a standalone research projectI Guidelines and project ideas will be announced before March 1I Mid-term project report due March 18 (after spring break)I Project presentation (15 mins each team) on April 10 and 15I Final project report due April 27

Page 13: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Course prerequisites:I Maturity with probability theory and linear algebraI Familiarity with statistical theory, optimization, and algorithms

2 Participation (30%):I Attending classes on time :-)I Raising good questions or providing good answers to questionsI Proofreading lecture notes and pointing out errors

3 Homework (30%): three to four problem sets

4 Final project (40%)I Team of 1-2 studentsI Either presenting paper(s) or a standalone research project

I Guidelines and project ideas will be announced before March 1I Mid-term project report due March 18 (after spring break)I Project presentation (15 mins each team) on April 10 and 15I Final project report due April 27

Page 14: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Course prerequisites:I Maturity with probability theory and linear algebraI Familiarity with statistical theory, optimization, and algorithms

2 Participation (30%):I Attending classes on time :-)I Raising good questions or providing good answers to questionsI Proofreading lecture notes and pointing out errors

3 Homework (30%): three to four problem sets

4 Final project (40%)I Team of 1-2 studentsI Either presenting paper(s) or a standalone research projectI Guidelines and project ideas will be announced before March 1

I Mid-term project report due March 18 (after spring break)I Project presentation (15 mins each team) on April 10 and 15I Final project report due April 27

Page 15: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Course prerequisites:I Maturity with probability theory and linear algebraI Familiarity with statistical theory, optimization, and algorithms

2 Participation (30%):I Attending classes on time :-)I Raising good questions or providing good answers to questionsI Proofreading lecture notes and pointing out errors

3 Homework (30%): three to four problem sets

4 Final project (40%)I Team of 1-2 studentsI Either presenting paper(s) or a standalone research projectI Guidelines and project ideas will be announced before March 1I Mid-term project report due March 18 (after spring break)

I Project presentation (15 mins each team) on April 10 and 15I Final project report due April 27

Page 16: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Course prerequisites:I Maturity with probability theory and linear algebraI Familiarity with statistical theory, optimization, and algorithms

2 Participation (30%):I Attending classes on time :-)I Raising good questions or providing good answers to questionsI Proofreading lecture notes and pointing out errors

3 Homework (30%): three to four problem sets

4 Final project (40%)I Team of 1-2 studentsI Either presenting paper(s) or a standalone research projectI Guidelines and project ideas will be announced before March 1I Mid-term project report due March 18 (after spring break)I Project presentation (15 mins each team) on April 10 and 15

I Final project report due April 27

Page 17: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Course prerequisites:I Maturity with probability theory and linear algebraI Familiarity with statistical theory, optimization, and algorithms

2 Participation (30%):I Attending classes on time :-)I Raising good questions or providing good answers to questionsI Proofreading lecture notes and pointing out errors

3 Homework (30%): three to four problem sets

4 Final project (40%)I Team of 1-2 studentsI Either presenting paper(s) or a standalone research projectI Guidelines and project ideas will be announced before March 1I Mid-term project report due March 18 (after spring break)I Project presentation (15 mins each team) on April 10 and 15I Final project report due April 27

Page 18: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Materials: Lecture notes and additional reading materials will be posted onthe course website

2 March 4 and 6 classes will be in RAND classroom just below seminarG

3 Classes will be video recorded using Duke’s Panopto system

Page 19: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Materials: Lecture notes and additional reading materials will be posted onthe course website

2 March 4 and 6 classes will be in RAND classroom just below seminarG

3 Classes will be video recorded using Duke’s Panopto system

Page 20: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Administrivia

1 Materials: Lecture notes and additional reading materials will be posted onthe course website

2 March 4 and 6 classes will be in RAND classroom just below seminarG

3 Classes will be video recorded using Duke’s Panopto system

Page 21: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Statistical problems

• Statistical tasks: using data to make informed decisions (hypothesestesting, estimation, etc)

θ ∈ Θ︸ ︷︷ ︸parameter

7→ X︸︷︷︸data

7→ θ︸︷︷︸estimate

• Understanding the fundamental limits:

Q1 Characterize statistical (information-theoretic) limit: What ispossible/impossible?

Q2 Can statistical limits be attained computationally efficiently, e.g., inpolynomial time? If yes, how? If not, why?

• In this course:

I Data = graphI Parameter = hidden (latent, or planted) structureI Focus on large-graph limit (number of vertices → ∞)

Page 22: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Statistical problems

• Statistical tasks: using data to make informed decisions (hypothesestesting, estimation, etc)

θ ∈ Θ︸ ︷︷ ︸parameter

7→ X︸︷︷︸data

7→ θ︸︷︷︸estimate

• Understanding the fundamental limits:

Q1 Characterize statistical (information-theoretic) limit: What ispossible/impossible?

Q2 Can statistical limits be attained computationally efficiently, e.g., inpolynomial time? If yes, how? If not, why?

• In this course:

I Data = graphI Parameter = hidden (latent, or planted) structureI Focus on large-graph limit (number of vertices → ∞)

Page 23: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Statistical problems

• Statistical tasks: using data to make informed decisions (hypothesestesting, estimation, etc)

θ ∈ Θ︸ ︷︷ ︸parameter

7→ X︸︷︷︸data

7→ θ︸︷︷︸estimate

• Understanding the fundamental limits:

Q1 Characterize statistical (information-theoretic) limit: What ispossible/impossible?

Q2 Can statistical limits be attained computationally efficiently, e.g., inpolynomial time? If yes, how? If not, why?

• In this course:

I Data = graphI Parameter = hidden (latent, or planted) structureI Focus on large-graph limit (number of vertices → ∞)

Page 24: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Statistical problems

• Statistical tasks: using data to make informed decisions (hypothesestesting, estimation, etc)

θ ∈ Θ︸ ︷︷ ︸parameter

7→ X︸︷︷︸data

7→ θ︸︷︷︸estimate

• Understanding the fundamental limits:

Q1 Characterize statistical (information-theoretic) limit: What ispossible/impossible?

Q2 Can statistical limits be attained computationally efficiently, e.g., inpolynomial time? If yes, how? If not, why?

• In this course:

I Data = graphI Parameter = hidden (latent, or planted) structureI Focus on large-graph limit (number of vertices → ∞)

Page 25: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Statistical problems

• Statistical tasks: using data to make informed decisions (hypothesestesting, estimation, etc)

θ ∈ Θ︸ ︷︷ ︸parameter

7→ X︸︷︷︸data

7→ θ︸︷︷︸estimate

• Understanding the fundamental limits:

Q1 Characterize statistical (information-theoretic) limit: What ispossible/impossible?

Q2 Can statistical limits be attained computationally efficiently, e.g., inpolynomial time? If yes, how? If not, why?

• In this course:

I Data = graphI Parameter = hidden (latent, or planted) structureI Focus on large-graph limit (number of vertices → ∞)

Page 26: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Statistical problems

• Statistical tasks: using data to make informed decisions (hypothesestesting, estimation, etc)

θ ∈ Θ︸ ︷︷ ︸parameter

7→ X︸︷︷︸data

7→ θ︸︷︷︸estimate

• Understanding the fundamental limits:

Q1 Characterize statistical (information-theoretic) limit: What ispossible/impossible?

Q2 Can statistical limits be attained computationally efficiently, e.g., inpolynomial time? If yes, how? If not, why?

• In this course:I Data = graph

I Parameter = hidden (latent, or planted) structureI Focus on large-graph limit (number of vertices → ∞)

Page 27: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Statistical problems

• Statistical tasks: using data to make informed decisions (hypothesestesting, estimation, etc)

θ ∈ Θ︸ ︷︷ ︸parameter

7→ X︸︷︷︸data

7→ θ︸︷︷︸estimate

• Understanding the fundamental limits:

Q1 Characterize statistical (information-theoretic) limit: What ispossible/impossible?

Q2 Can statistical limits be attained computationally efficiently, e.g., inpolynomial time? If yes, how? If not, why?

• In this course:I Data = graphI Parameter = hidden (latent, or planted) structure

I Focus on large-graph limit (number of vertices → ∞)

Page 28: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Statistical problems

• Statistical tasks: using data to make informed decisions (hypothesestesting, estimation, etc)

θ ∈ Θ︸ ︷︷ ︸parameter

7→ X︸︷︷︸data

7→ θ︸︷︷︸estimate

• Understanding the fundamental limits:

Q1 Characterize statistical (information-theoretic) limit: What ispossible/impossible?

Q2 Can statistical limits be attained computationally efficiently, e.g., inpolynomial time? If yes, how? If not, why?

• In this course:I Data = graphI Parameter = hidden (latent, or planted) structureI Focus on large-graph limit (number of vertices → ∞)

Page 29: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Basic definitions of graphs

A graph G = (V,E) consists of

• A vertex set V = [n] ≡ 1, . . . , n for some positive integer n.

• An edge set E ⊂(V2

). Each element of E is an edge e = (i, j) (unordered

pair). We say i and j are connected and write i ∼ j if (i, j) ∈ E.

We mostly focus on graphs that are undirected and simple.

Adjacency matrix representation: A = (Aij)i,j∈[n] is an n× n symmetricbinary matrix with zero diagonal and

Aij = 1i ∼ j =

1 (i, j) ∈ E0 o.w.

.

21

34

A =

0 1 0 11 0 1 00 1 0 11 0 1 0

Page 30: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Basic definitions of graphs

• The neighborhood of a given vertex v ∈ V :

N(v) = u ∈ V : u ∼ v

• The degree of v:dv = |N(v)|

• Induced subgraph: For any S ⊂ V , the subgraph induced by S isG[S] = (S,ES), where

ES , (u, v) ∈ E : u, v ∈ S

• A clique is a complete subgraph. A graph is complete iff all pairs ofvertices in the graph are connected.

Page 31: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Planted clique – graph view

1 A set S of k vertices is chosen to form a clique

2 For every other pair of vertices, add an edge w.p. 12

Page 32: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Planted clique – graph view

1 A set S of k vertices is chosen to form a clique

2 For every other pair of vertices, add an edge w.p. 12

••

••

••

••••

••

••

••

••••

••

••

••••

•••• ••

••

••

••

••

••••••

••

••

••

••

••

••

••••

•• ••••

••••••

••

••

••

••

••

••

••

••

••

••

••••

••

••

••

••

••

••

••

••

••

••••

••••

••

••

••••

••

••

••

••

••

••

••

••

••

••

••

•• ••

Page 33: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Planted clique – graph view

1 A set S of k vertices is chosen to form a clique

2 For every other pair of vertices, add an edge w.p. 12

••

••

••

••••

••

••

••

••••

••

••

••••

•••• ••

••

••

••

••

••••••

••

••

••

••

••

••

••••

•• ••••

••••••

••

••

••

••

••

••

••

••

••

••

••••

••

••

••

••

••

••

••

••

••

••••

••••

••

••

••••

••

••

••

••

••

••

••

••

••

••

••

•• ••

Page 34: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Planted clique – graph view

1 A set S of k vertices is chosen to form a clique

2 For every other pair of vertices, add an edge w.p. 12

••

••

••

••••

••

••

••

••••

••

••

••••

•••• ••

••

••

••

••

••••••

••

••

••

••

••

••

••••

•• ••••

••••••

••

••

••

••

••

••

••

••

••

••

••••

••

••

••

••

••

••

••

••

••

••••

••••

••

••

••••

••

••

••

••

••

••

••

••

••

••

••

•• ••

Page 35: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Planted clique – graph view

1 A set S of k vertices is chosen to form a clique

2 For every other pair of vertices, add an edge w.p. 12

Page 36: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Planted clique – adjacency matrix view

Page 37: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Planted clique – adjacency matrix view

Page 38: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Planted clique – adjacency matrix view

Page 39: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Planted clique recovery

Planted clique S −→ graph G −→ Estimated clique S

• Minimax framework: Find an estimator S = S(G) that performs well inworst-case

minS∈([n]

k )PS

[S(G) = S

]≈ 1

• Bayesian framework: Find an estimator S = S(G) that performs well onaverage

ES∼Unif

(([n]

k ))PS

[S(G) = S

]≈ 1

• The two formulations are equivalent by the permutaiton invariance of themodel:

supS

minS∈([n]

k )PS

[S(G) = S

]= sup

S

ES∼Unif

(([n]

k ))PS

[S(G) = S

].

Page 40: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Community detection in networks

• Networks with community structures arise in many applications

• Task: Discover underlying communities based on the network topologyalone

Page 41: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Community detection in networks

• Networks with community structures arise in many applications

• Task: Discover underlying communities based on the network topologyalone

Page 42: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Example 1

Santa Fe Institute Collaboration network [Girvan-Newman ’02]

Page 43: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Example 2

Protein-protein interaction networks [Jonsson et al. 06’]

Page 44: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Example 3

Political blogosphere and the 2004 U.S. election [Adamic-Glance ’05]

Figure 1: Community structure of political blogs (expanded set), shown using utilizing the GUESS visual-ization and analysis tool[2]. The colors reflect political orientation, red for conservative, and blue for liberal.Orange links go from liberal to conservative, and purple ones from conservative to liberal. The size of eachblog reflects the number of other blogs that link to it.

Because of bloggers’ ability to identify and frame break-ing news, many mainstream media sources keep a close eyeon the best known political blogs. A number of mainstreamnews sources have started to discuss and even to host blogs.In an online survey asking editors, reporters, columnists andpublishers to each list the “top 3” blogs they read, Dreznerand Farrell [4] identified a short list of dominant “A-list”blogs. Just 10 of the most popular blogs accounted for overhalf the blogs on the journalists’ lists. They also found that,besides capturing most of the attention of the mainstreammedia, the most popular political blogs also get a dispro-portionate number of links from other blogs. Shirky [12]observed the same effect for blogs in general and Hindmanet al. [7] found it to hold for political websites focusing onvarious issues.While these previous studies focused on the inequality of

citation links for political blogs overall, there has been com-paratively little study of subcommunities of political blogs.In the context of political websites, Hindman et al. [7] notedthat, for example, those dealing with the issue of abortion,gun control, and the death penalties, contain subcommuni-ties of opposing views. In the case of the pro-choice andpro-life web communities, an earlier study [1] found pro-lifewebsites to be more densely linked than pro-choice ones. Ina study of a sample of the blogosphere, Herring et al.[6] dis-covered densely interlinked (non-political) blog communitiesfocusing on the topics of Catholicism and homeschooling, aswell as a core network of A-list blogs, some of them political.Recently, Butts and Cross [3] studied the response in the

structure of networks of political blogs to polling data andelection campaign events. In another political blog study,Welsch [15] gathered a single-day snapshot of the network

neighborhoods of Atrios, a popular liberal blog, and In-stapundit, a popular conservative blog. He found the In-stapundit neighborhood to include many more blogs thanthe Atrios one, and observed no overlap in the URLs citedbetween the two neighborhoods. The lack of overlap in lib-eral and conservative interests has previously been observedin purchases of political books on Amazon.com [8]. Thisbrings about the question of whether we are witnessing acyberbalkanization [11, 13] of the Internet, where the prolif-eration of specialized online news sources allows people withdifferent political leanings to be exposed only to informationin agreement with their previously held views. Yale law pro-fessor Jack Balkin provides a counter-argument7 by pointingout that such segregation is unlikely in the blogosphere be-cause bloggers systematically comment on each other, evenif only to voice disagreement.

In this paper we address both hypotheses by examining ina systematic way the linking patterns and discussion topicsof political bloggers. In doing so, we not only measure thedegree of interaction between liberal and conservative blogs,but also uncover differences in the structure of the two com-munities. Our data set includes the posts of 40 A-list blogsover the period of two months preceding the U.S. Presiden-tial Election of 2004. We also study a large network of over1,000 political blogs based on a single day snapshot that in-cludes blogrolls (the list of links to other blogs frequentlyfound in sidebars), and so presents a more static picture ofa broader blogosphere.

From both samples we find that liberal and conservativeblogs did indeed have different lists of favorite news sources,

7http://balkin.blogspot.com/2004 01 18 balkinarchive.html#107480769112109137

Page 45: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Stochastic block model – graph view

1 n nodes are partitioned into 2 equal-sized communities

2 For every pair of nodes in same community, add an edge w.p. p

3 For every pair of nodes in diff. community, add an edge w.p. q

Page 46: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Stochastic block model – graph view

1 n nodes are partitioned into 2 equal-sized communities

2 For every pair of nodes in same community, add an edge w.p. p

3 For every pair of nodes in diff. community, add an edge w.p. q

••••

•• ••••

••••••

••

••

••

••

••

••

••

••

••

••

••••

••

••

••

••

••

••

••

••

••

••••

••••

••

••

••••

••

••

••

••

••

••

••

••

••

••

••

•• ••

Page 47: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Stochastic block model – graph view

1 n nodes are partitioned into 2 equal-sized communities

2 For every pair of nodes in same community, add an edge w.p. p

3 For every pair of nodes in diff. community, add an edge w.p. q

••••

•• ••••

••••••

••

••

••

••

••

••

••

••

••

••

••••

••

••

••

••

••

••

••

••

••

••••

••••

••

••

••••

••

••

••

••

••

••

••

••

••

••

••

•• ••

Page 48: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Stochastic block model – graph view

1 n nodes are partitioned into 2 equal-sized communities

2 For every pair of nodes in same community, add an edge w.p. p

3 For every pair of nodes in diff. community, add an edge w.p. q

••••

•• ••••

••••••

••

••

••

••

••

••

••

••

••

••

••••

••

••

••

••

••

••

••

••

••

••••

••••

••

••

••••

••

••

••

••

••

••

••

••

••

••

••

•• ••

Page 49: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Stochastic block model – graph view

1 n nodes are partitioned into 2 equal-sized communities

2 For every pair of nodes in same community, add an edge w.p. p

3 For every pair of nodes in diff. community, add an edge w.p. q

Page 50: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Stochastic block model – adjacency matrix view

0 50 100 150 200

nz = 7962

0

20

40

60

80

100

120

140

160

180

200

Page 51: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Stochastic block model – adjacency matrix view

0 50 100 150 200

nz = 7962

0

20

40

60

80

100

120

140

160

180

200

Page 52: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Stochastic block model – estimation

Planted community σ ∈ ±1n −→ graph G −→ Estimated community σ

P[(i, j) ∈ E

]=

p σi = σj

q σi 6= σj,

• If p and q are known, estimate the planted community σ

• If p and q are unknown, jointly estimate p, q and σ

When p = q, the SBM reduces to the Erdos-Renyi random graph G(n, p)

Page 53: BA990/ECE590: Statistical inference on graphsjx77/intro_BA990.pdf4 Final project (40%) I Team of 1-2 students I Either presenting paper(s) or a standalone research project I Guidelines

Asymptotic Notation

For two sequences of numbers an and bn, where bn > 0 for all sufficiently largen. Then

• an = O(bn) or an . bn, if there exist constants C and n0 such that|an| ≤ Cbn for all n ≥ n0.

• an = Ω(bn) or an & bn, if there exist constants c > 0 and n0 such thatan ≥ cbn for all n ≥ n0.

• an = Θ(bn) or an bn, if an = O(bn) and an = Ω(bn).

• an ∼ bn if an/bn → 1 as n→∞.

• an = o(bn), if an/bn → 0 as n→∞.

• an = ω(bn) if an/bn →∞ as n→∞.

• an bn if an ≥ 0 and an = o(bn).

Also, we say that a sequence of events En holds with high probability (w.h.p.), ifP En → 1 as n→∞.