web markov skeleton processes and their applications zhi-ming ma 18 april, 2011, bnu. email:...
TRANSCRIPT
Web Markov Skeleton Processes and their Applications
Zhi-Ming Ma 18 April, 2011, BNU.
Email: [email protected] http://www.amt.ac.cn/member/mazhiming/index.html
• Y. Liu, Z. M. Ma, C. Zhou: Web Markov Skeleton Processes and
Their Applications, to appear in Tohoku Math J.
• Y. Liu, Z. M. Ma, C. Zhou:
Further Study on Web Markov Skeleton Processes
Web Markov Skeleton Process
Markov Chain
conditionally independent given
Define by :
WMSP
Simple WMSP:
Many simple WMSPs are Non-Markov Processes
[LMZ2011a,b]
Mirror Semi-Markov Process
Mirror Semi-Markov Process is not a Hou’s Markov Skeleton Process, i.e. it does not satisfy
Time Homogeneous WMSP
right continuous, piecewise constant functions
Stability of Time homogeneous WMSP
Theorem [LMZ 2011a,b]
for all
WMSP
Multivariate Point Process associated with WMSP
Let
Consequently where
Define
We can prove that
where
Why it is called a Web Markov Skeleton Process?
How can google make a ranking of 1,950,000 pages
in 0.19 seconds?
Web page Ranking
Web page Ranking
Importance Ranking
Importance Ranking
Relevance Ranking
Relevance Ranking
HITS1998 Jon Kleinberg Cornell University
PageRank
1998 Sergey Brin and Larry Page
Stanford University
The first major improvement
in the history of Web search engine
科学时报.pdf
Ranking Web pages by the mean frequency of visiting pages
From probabilistic point of view,
PageRank is the stationary distribution of a Markov chain.
Page Rank, a ranking algorithm used by the Google search engine.
1998, Sergey Brin and Larry Page , Stanford University
Markov chain describing surfing behavior
Markov chain describing surfing behavior
Web surfers usually have two basic ways to access web pages:
1. with probability α, they visit a web page by clicking a hyperlink.
2. with probability 1-α, they visit a web page by inputting its URL address.
where
More generally we may consider personalized d.:
PageRank is defined as the stationary distribution:
By the strong ergodic theorem: mean frequency of visiting pages
Weak points of PageRank
• Using only static web graph structure• Reflecting only the will of web managers, but ignore the will of users e.g. the staying
time of users on a web.• Can not effectively against spam and junk
pages.
BrowseRankSIGIR.ppt
Data Mining
Browsing Process
• Markov property
• Time-homogeneity
Computation of the Stationary Distribution
– Stationary distribution:
– is the mean of the staying time on page i.
The more important a page is, the longer staying time on it is.
– is the mean of the first re-visit time at page i. The more important a page is, the smaller the re-visit time is, and the larger the visit frequency is.
( )P t
• Properties of Q process: – Jumping probability is conditionally independent
from jumping time: •
– Embedded Markov chain:• is a Markov chain with the transition probability
matrix
Computation of the Stationary Distribution
– is the stationary distribution of – The stationary distribution of discrete model
is easy to compute• Power method for
• Log data for
Computation of the Stationary Distribution
BrowseRank: Letting Web Users Vote for Page Importance
Yuting Liu, Bin Gao, Tie-Yan Liu, Ying Zhang,
Zhiming Ma, Shuyuan He, and Hang Li
July 23, 2008, Singapore the 31st Annual International ACM SIGIR
Conference on Research & Development on Information Retrieval.
Best student paper !
• Browse Rank the next PageRank
says Microsoft
•jerbrowser.wmv
• Browsing Processes will be a
Basic Mathematical Tool in
Internet Information Retrieval
Beyond:
--General fromework of Browsing Processes?
--How about inhomogenous process?
--Marked point process
--Mobile Web: not really Markovian
ExtBrowseRank and semi-Markov processes
[10] B. Gao, T. Liu, Z. M. Ma, T. Wang, and H. Li
A general markov framework for page importance computation, In proceedings of CIKM '2009,
[11] B. Gao, T. Liu, Y. Liu, T. Wang, Z. M. Ma and H. LI
Page Importance Computation based on Markov Processes, to appear in Information Retrieval
online first: <http://www.springerlink.com/content/7mr7526x21671131
Web Markov Skeleton Process
Thank you !
The statistical properties of a time homogeneous mirror semi-Markov process is completely determined by:
Reconstruction of Mirror Semi-Markov Processes
We can construct
such that
Given: ,
,
Theorem [LMZ 2011b]
uniformly
Write
is expressed as
[LMZ2011b]