web markov skeleton processes and their applications zhi-ming ma 18 april, 2011, bnu. email:...

Post on 18-Dec-2015

219 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Web Markov Skeleton Processes and their Applications

Zhi-Ming Ma 18 April, 2011, BNU.

Email: mazm@amt.ac.cn http://www.amt.ac.cn/member/mazhiming/index.html

• Y. Liu, Z. M. Ma, C. Zhou: Web Markov Skeleton Processes and

Their Applications, to appear in Tohoku Math J.

• Y. Liu, Z. M. Ma, C. Zhou:

Further Study on Web Markov Skeleton Processes

Web Markov Skeleton Process

Markov Chain

conditionally independent given

Define by :

WMSP

Simple WMSP:

Many simple WMSPs are Non-Markov Processes

[LMZ2011a,b]

Mirror Semi-Markov Process

Mirror Semi-Markov Process is not a Hou’s Markov Skeleton Process, i.e. it does not satisfy

Time Homogeneous WMSP

right continuous, piecewise constant functions

Stability of Time homogeneous WMSP

Theorem [LMZ 2011a,b]

for all

WMSP

Multivariate Point Process associated with WMSP

Let

Consequently where

Define

We can prove that

where

Why it is called a Web Markov Skeleton Process?

How can google make a ranking of 1,950,000 pages

in 0.19 seconds?

Web page Ranking

Web page Ranking

Importance Ranking

Importance Ranking

Relevance Ranking

Relevance Ranking

HITS1998 Jon Kleinberg Cornell University

PageRank

1998 Sergey Brin and Larry Page

Stanford University

The first major improvement

in the history of Web search engine

科学时报.pdf

Ranking Web pages by the mean frequency of visiting pages

From probabilistic point of view,

PageRank is the stationary distribution of a Markov chain.

Page Rank, a ranking algorithm used by the Google search engine.

1998, Sergey Brin and Larry Page , Stanford University

Markov chain describing surfing behavior

Markov chain describing surfing behavior

Web surfers usually have two basic ways to access web pages:

1. with probability α, they visit a web page by clicking a hyperlink.

2. with probability 1-α, they visit a web page by inputting its URL address.

where

More generally we may consider personalized d.:

PageRank is defined as the stationary distribution:

By the strong ergodic theorem: mean frequency of visiting pages

Weak points of PageRank

• Using only static web graph structure• Reflecting only the will of web managers, but ignore the will of users e.g. the staying

time of users on a web.• Can not effectively against spam and junk

pages.

BrowseRankSIGIR.ppt

Data Mining

Browsing Process

• Markov property

• Time-homogeneity

Computation of the Stationary Distribution

– Stationary distribution:

– is the mean of the staying time on page i.

The more important a page is, the longer staying time on it is.

– is the mean of the first re-visit time at page i. The more important a page is, the smaller the re-visit time is, and the larger the visit frequency is.

( )P t

• Properties of Q process: – Jumping probability is conditionally independent

from jumping time: •

– Embedded Markov chain:• is a Markov chain with the transition probability

matrix

Computation of the Stationary Distribution

– is the stationary distribution of – The stationary distribution of discrete model

is easy to compute• Power method for

• Log data for

Computation of the Stationary Distribution

BrowseRank: Letting Web Users Vote for Page Importance

Yuting Liu, Bin Gao, Tie-Yan Liu, Ying Zhang,

Zhiming Ma, Shuyuan He, and Hang Li

July 23, 2008, Singapore the 31st Annual International ACM SIGIR

Conference on Research & Development on Information Retrieval.

Best student paper !

• Browse Rank the next PageRank

says Microsoft

•jerbrowser.wmv

• Browsing Processes will be a

Basic Mathematical Tool in

Internet Information Retrieval

Beyond:

--General fromework of Browsing Processes?

--How about inhomogenous process?

--Marked point process

--Mobile Web: not really Markovian

ExtBrowseRank and semi-Markov processes

[10] B. Gao, T. Liu, Z. M. Ma, T. Wang, and H. Li

A general markov framework for page importance computation, In proceedings of CIKM '2009,

[11] B. Gao, T. Liu, Y. Liu, T. Wang, Z. M. Ma and H. LI

Page Importance Computation based on Markov Processes, to appear in Information Retrieval

online first: <http://www.springerlink.com/content/7mr7526x21671131

Web Markov Skeleton Process

Thank you !

The statistical properties of a time homogeneous mirror semi-Markov process is completely determined by:

Reconstruction of Mirror Semi-Markov Processes

We can construct

such that

Given: ,

,

Theorem [LMZ 2011b]

uniformly

Write

is expressed as

[LMZ2011b]

top related