piyush kumar (lecture 2: pagerank) welcome to cot5405

51
Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Upload: kelly-barrett

Post on 11-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Piyush Kumar(Lecture 2: PageRank)

Welcome to COT5405

Page 2: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Quick Recap: Linear AlgebraMatrices

2 3 7

1 1 5A

1 3 1

2 1 4

4 7 6

B

Source: http://www.phy.cuhk.edu.hk/phytalent/mathphy/

Page 3: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

3

Square matrices

When m = n, i.e.,

11 12 1

21 22 2

1 2

n

n

n n nn

a a a

a a aA

a a a

1.1 Matrices

Page 4: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

4

Sums of matrices

1.2 Operations of matrices

1 2 3

0 1 4

A

2 3 0

1 2 5

BExample: if and

Evaluate A + B and A – B. 1 2 2 3 3 0 3 5 3

0 ( 1) 1 2 4 5 1 3 9

A B

1 2 2 3 3 0 1 1 3

0 ( 1) 1 2 4 5 1 1 1

A B

Page 5: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

5

Scalar multiplication

1.2 Operations of matrices

1 2 3

0 1 4

AExample: . Evaluate 3A.

3 1 3 2 3 3 3 6 93

3 0 3 1 3 4 0 3 12

A

Page 6: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

6

Properties

1.2 Operations of matrices

Matrices A, B and C are conformable,

A + B = B + A

A + (B +C) = (A + B) +C

(A + B) = A + B, where is a scalar

(commutative law)

(associative law)

Can you prove them?

(distributive law)

Page 7: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

7

Matrix multiplication

1.2 Operations of matrices

If A = [aij] is a m p matrix and B = [bij] is a p n matrix, then AB is defined as a m n matrix C = AB, where C= [cij] with

1 1 2 21

...

p

ij ik kj i j i j ip pjk

c a b a b a b a b

1 2 3

0 1 4

A

1 2

2 3

5 0

BExample: , and C = AB. Evaluate c21.

1 21 2 3

2 30 1 4

5 0

21 0 ( 1) 1 2 4 5 22 c

for 1 i m, 1 j n.

Page 8: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

8

Matrix multiplication

1.2 Operations of matrices

1 2 3

0 1 4

A

1 2

2 3

5 0

BExample: , , Evaluate C = AB.

11

12

21

22

1 ( 1) 2 2 3 5 181 2

1 2 2 3 3 0 81 2 32 3

0 ( 1) 1 2 4 5 220 1 45 0

0 2 1 3 4 0 3

c

c

c

c

1 21 2 3 18 8

2 30 1 4 22 3

5 0

C AB

Page 9: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

9

Properties

1.2 Operations of matrices

Matrices A, B and C are conformable,

A(B + C) = AB + AC

(A + B)C = AC + BC

A(BC) = (AB) C

AB BA in general

AB = 0 NOT necessarily imply A = 0 or B = 0

AB = AC NOT necessarily imply B = C How

ever

Page 10: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Identity Matrix

Examples of identity matrices: and

1 0

0 1

1 0 0

0 1 0

0 0 1

Page 11: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

11

The transpose of a matrix

The matrix obtained by interchanging the rows and columns of a matrix A is called the transpose of A (write AT).

Example:

The transpose of A is

1 2 3

4 5 6

A

1 4

2 5

3 6

TA

For a matrix A = [aij], its transpose AT = [bij], where bij = aji.

1.3 Types of matrices

Page 12: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

12

If matrices A and B such that AB = BA = I, then B is called the inverse of A (symbol: A-

1); and A is called the inverse of B (symbol: B-1).

The inverse of a matrix

6 2 3

1 1 0

1 0 1

B

Show B is the the inverse of matrix A.

1 2 3

1 3 3

1 2 4

A

Example:

1 0 0

0 1 0

0 0 1

AB BA

Ans: Note that Can you show the details?

1.3 Types of matrices

Page 13: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

13

Symmetric matrix

A matrix A such that AT = A is called symmetric, i.e., aji = aij for all i and j.

A + AT must be symmetric. Why?

Example: is symmetric.1 2 3

2 4 5

3 5 6

A

A matrix A such that AT = -A is called skew-symmetric, i.e., aji = -aij for all i and j.

A - AT must be skew-symmetric. Why?

1.3 Types of matrices

Page 14: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

14

(AB)-1 = B-1A-1

(AT)T = A and (A)T = AT

(A + B)T = AT + BT

(AB)T = BT AT

1.4 Properties of matrix

Page 15: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

3.15

The determinant of a 2 × 2 matrix:

Note:

1. For every square matrix, there is a real number associated with this matrix and

called its determinant

2. It is common practice to delete the matrix brackets

2221

1211

aa

aaA

12212211||)det( aaaaAA

2221

1211

aa

aa

2221

1211

aa

aa

Source: http://www.management.ntu.edu.tw/~jywang/course/

Page 16: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

3.16

Historically, the use of determinants arose from the recognition of special patterns that occur in the solutions of linear systems:

Note:

1. a11a22 - a21a12≠0

2. x1 and x2 have the same denominator, and this quantity is called the

determinant of the coefficient matrix A

11 1 12 2 1

21 1 22 2 2

1 22 2 12 2 11 1 211 2

11 22 21 12 11 22 21 12

and

a x a x b

a x a x b

b a b a b a b ax x

a a a a a a a a

Page 17: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

3.17

Ex. 1: (The determinant of a matrix of order 2)

21

32

24

12

42

30

Note: The determinant of a matrix can be positive, zero, or negative

)3(1)2(2 34 7

)1(4)2(2 44 0

)3(2)4(0 60 6

Page 18: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

18

1.5 Determinants

1. If every element of a row (column) is

zero, e.g., , then |A| = 0.

2. |AT| = |A|

3. |AB| = |A||B|

determinant of a matrix = that of its transpose

The following properties are true for determinants of any order.

1 21 0 2 0 0

0 0

Page 19: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Eigenvalues and Eigenvectors

Ax = λx

Should not exist?

det(A − λI) = 0.

Fact: A and transpose(A) have the same eigenvalues. Why?

Page 20: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405
Page 21: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Task of search enginesCrawlBuild indices so that one can search

keywords efficiently.Rate the importance of pages.

One example is the simple algorithm named pagerank.

Page 22: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

The basic ideaMimic democracy!Use the brains of all people collectively.

Page 23: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

The basic ideaMimic democracy!Use the brains of all people collectively for

the ranking.

What’s wrong withcounting backlinks?

Should page 1 be rankedabove page 4?

Page 24: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Voting using backlinks?

But then we don’t want an individual to cast more than one vote?

Normalize?

Page 25: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Normalized Voting?

Page 26: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Link Matrix (for the given web):

Page 27: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Link Matrix (for the given web):

Most important node = 1?

Page 28: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

DefinitionA square matrix is called column stochastic if

all of its entries are non-negative and the entries in each column sum to 1.

Lemma: Every column stochastic matrix has 1 as an eigenvalue.

Proof: A and A’ = transpose of A, have the same eigenvalues: Why?

Page 29: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Two shortcomingsNonunique Rankings.Dangling nodes : Nodes with no outgoing

edges.The matrix is no longer column stochastic.Can we transform it into one easily?

Page 30: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Nonunique Rankings

Not clear: Which linear combination should we pick for the ranking?

Page 31: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Nonunique Rankings

Page 32: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Nonunique Rankings

Page 33: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Modification of the Link Matrix

The value of m used by google (1998) was .15For any m between 0,1; M is column stochastic.M can be used to compute unambiguous importance scores(in the absence of dangling nodes)

For m = 1, the only normalized eigenvector with eigenvalue 1 is ?

Page 34: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Modification of the Link Matrix

Page 35: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Example 1For our first example graph, m = 0.15.

Page 36: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Example 2Still, m = 0.15.

Page 37: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Towards the proof

For real numbers

Proof by Contradiction? -> Let x be an eigenvector with mixed signs for the eigenvalue 1.

Page 38: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Towards the proof

Page 39: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

A punchline

Page 40: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

The Algorithm (aka Power Method)

Page 41: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

c ?

Page 42: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

One last lemma…

Page 43: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Why does it converge?

Page 44: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

The main theorem

For figure 2:

Page 45: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

First Example

Do we need any modifications to A?

Page 46: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Calculations

Page 47: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Another Example

Page 48: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Random Surfer ModelThe 85-15 Rule:

Assume that 85 per cent of the time the random surfer clicks a random link on the current page (each link chosen with equal probability)

15 percent of the time the random surfer goes directly to a random page (all pages on the web chosen with equal probability).

Page 49: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Random Surfer ModelCons

No one chooses links or pages with equal probability.

There is no real potential to surf directly to each page on the web.

The 85-15 (or any fixed) breakdown is just a guess. Back Button? Bookmarks?

Despite these flaws, the model is good enough that we have learnt a great deal about the web using it.

Page 50: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Related stuff to explore Random walks and Markov Chains. Random Graph construction using Random walks. Absorbing Markov Chains. Ranking with not too many similar items at the top. Dynamical Systems point of view. Equilibrium or Stationary Distributions. Rate of convergence. Perron-Frobenius Theorem Intentional Surfer model.

Markov Chain Slides:

http://www.math.dartmouth.edu/archive/m20x06/public_html/Lecture13.pdfhttp://www.math.dartmouth.edu/archive/m20x06/public_html/Lecture14.pdfhttp://www.math.dartmouth.edu/archive/m20x06/public_html/Lecture15.pdf

Page 51: Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405

Homework 1Implementation: Parse wikipedia pages and find

pageranks of top 1000 pages of the given input. (TBA)Theory: Solve Exercises in the given paper. (Online)

There are 24 questions in total in the paper (including subproblems marked with a filled disc, Example, problem 6 has 3 subproblems).

Pick the first two characters of your fsu.edu email address. Example “pk” for [email protected]. (all lowercase)

Represent in hex : “706B” = x = Your hex number goes here.Calculate f1 = ((x mod 3D) mod 18)+1Calculate f2 = (f1 + 12) mod 18Solve those two exercises in the paper. Write the problems you

solve (including problem numbers) and the solution in Latex. Submit.