network analysis based upon the renyi' entropies of the associated markov monoid...

37
NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013 Joseph E. Johnson, PhD Distinguished Professor Emeritus Physics Department USC May 23, 2013 © 2013

Upload: austin-cooper

Post on 18-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

NETWORK ANALYSIS

BASED UPON THE RENYI' ENTROPIES OF THE

ASSOCIATED MARKOV MONOID TRANSFORMATION.

USC CAS IMI Summer School on

Network Science May 2013

Joseph E. Johnson, PhD Distinguished Professor Emeritus

Physics Department USC

May 23, 2013 © 2013

Page 2: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013
Page 3: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Networks Defined• A network is a set of points 1, 2, …(nodes) with connections among pairs of nodes.

• Cij is called the connection, connectivity, or adjacency matrix & defines the network.

• Cij gives the strength of connection between nodes i and j (a real non-negative #)

• Ci≠j ≥ 0 One does not normally consider a negative connectivity value

• Cii is not defined – The connection of a thing to itself does not have meaning.

• Without a defined diagonal, C is relatively incomplete mathematically• There is no meaning to an eigenvalue analysis without a diagonal.

• C can be extremely large yet many of the C values are zero (a ‘sparse matrix).• Cij may be or not be symmetric. q(Cij – d)

Page 4: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

• We define a “graph” to be a network where the C values are 1 or 0

• Thus things are either connected or not and the information is not very “rich”• This is a special, somewhat degenerate case of a network.• A network of real numbers contains a trillion times the information of a graph• All of our network results are automatically applicable to graphs

• Analysis, visualization, and comparisons of topologies is very difficult • The same topology and structure can be realized in a vast number of

indistinguishable ways • One of the fundamental problems is that there is no natural order to the nodes thus

giving rise to the N! different C matrices that describe the exact same network• Thus N! different arrangements of the nodes give different C matrices • Consequently it is extremely difficult to compare.

Page 5: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Problems & Objectives• How do we normally understand very complex physical systems?

• Consider the sound made by an instrument• The wave f(t) is a function we seek to understand.

• We can expand f(t) usefully using Fourier analysis• That is because columns of air and strings have standing waves that are

multiples of the basic frequency• Each of these frequencies has a different amplitude but the series consists of

smaller and smaller terms• This allows one to see the dominant system features first then increasingly see

features of lesser importance• In complex physical phenomena, we seek expansions in some useful series of

approximations• Often these are orthogonal functions, with terms that provide increasing

accuracy of representation

Page 6: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

• We would like to have something similar for networks:

• A more solid mathematical foundation for networks generally.• Metrics for expansion of topologies similar to Fourier analysis for sounds.• Means to compare two topologies and thus also a change in a topology over time..• Means for classifying topologies• Perhaps even a metric of “distance” between two topologies• Perhaps even a structure that would support a dynamical theory of network

evolution• To describe what I have found, I need to give some brief background material

Page 7: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Lie Groups and Lie Algebras in Physics

• A group is a set of elements A, B, and a product with (a) closure AB=C, (b) associativity A(BC)=(AB)C, (c) an identity I with IA=AI=A, and (d) an inverse AA-1=A-1A=I

• But groups like rotations R= (cos , -q sin q ; sin q , cos q) have an ∞ of elements.

• Sophius Lie (1890s) used an exponentiation of an infinitesimal transformation

• R(e)=(cos ,-e sin e; sin e, cos e)=(1, - ; , 1) = + e e I e(0,-1; 1, 0) = 1+ eL or R=e qL

Page 8: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

• Let a LVS be A+B = C and aA=D.

• A LVS with a metric A*B = S AiBi = AB cos is a Metric Space

• A LVS L = S aiLi with an antisymmetric product [ Li, Lj ] = cijk Lk and Jacobi identity is

a Lie Algebra• We normally seek a subgroup of the GL(n, R) by requiring something is invariant:

• x2+y2 = r2 , c2t2- r2 , * Y Y =1 , translations, affine, conformal or other requirements • All of relativity and quantum theory can be founded on Lie algebras & groups

• This proceedure also makes the concept of system invariants more obvious.

Page 9: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

• If our reality is based upon the foundations of physics, and

those laws state that the state of everything is described by a vector, then how does the concept of a network, whose state is described by a connection matrix, arise?

Page 10: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

The General Linear Lie Group

• Consider the group of all continuous linear transformations GL(n,R)• Seek continuous transformations M that preserve ∑ xi (Markov Type)

• These are motions on a hyperplane perpendicular to (1, 1, 1, ….1), <1|x> is invariant

• One can show: GL(n,R) = Scaling Algebra (S) + Markov Type Algebra (M) • But M can take one from positive to negative values (not true Markov)

• Choose a Lie basis with a 1 at the ij position and a -1 at the jj position:• In 2 dimensions: L12 = (0, 1; 0, -1) and L21 = (-1, 0; 1, 0) • The Lie Algebra is then all elements of that form with L = lij Lij

• The commutator closes & the Jacobi Identity is satisfied • (note that each column of lij Lij sums to 0)

Page 11: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

• The Lie Group is then M(l) = exp (lij Lij ) which leaves ∑ xi invariant in x’ = M x

• With lij non-negative, then one stays in the positive hyperquadrant• That is because all probabilities must be non-negative.

• But the inverse is lost so this is a Markov Monoid (a group without an inverse)• Those transformations would take some states to negative probabilities.

Page 12: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

The Lie Markov Monoid (MM)• M(l) = exp (lij Lij ) gives all continuous Markov Transformation that are continuously

connected to the identity.• These transformations work by giving a non-negative fraction of one x i to another xi.

• This is a rob Peter to pay Paul transformation • Valid Markov transformations, represent irreversible diffusion, and have no inverse.

• There are (n2- n) Lij , one for each off diagonal element (i,j)• Note that the diagonal elements are the negative sums of that column

• The rest of GL(n,R) is spanned by the Scaling Group S(n) = exp (lii Lii )

• Where Lii = 1 or 0 , an n parameter Abelian Lie group that scales the axes.

Page 13: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Networks are 1 to 1 with MM • Consider that any network Cij defines a =l C with L = lij Lij where M = exp(lij Lij)

• Simply set lij = Cij for off diagonal terms and automatically the L ij will define the diagonals.

• For example in three dimensions if C = (0, 1, 2; 3, 0, 4; 5, 8, 0) then L = • -8 1 2 • 3 -9 4• 5 8 -6

• Then M = e sL is a true Markov transformation where s is a continuous value

• Thus every network C defines a Lie algebra element that generates a one parameter Markov monoid transformation. • One can study networks by studying the associated MM.• The MM models any network as a transformation on a vector that gives a

set of “entity flows” that conserve the imaginary entity ($, energy, water, probability, charge...)

Page 14: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

• An eigenvalue/eigenvector analysis can be done on the MM

• The eigenvalues are the exponential decreases in the associated eigenvalues which are linear combinations of nodes.

• These are similar to the normal nodes of oscilation for coupled harmonic oscillators

• All are less than 1 except for a single eigenvalue of 1 that gives equilibrium• It is possible to have complex eigenvalues that correspond to network

‘cycles’.

Page 15: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

M Supports Entropy Metrics• The expansion of the M=exp (lij Lij ) gives the degrees of separation

• Each term in the expansion maintains the Markov property M=1+ l L + (1/2) (l L)2 +…

• But it would seem we have not made real progress:• The model of conserved flows is intuitive but does not solve things• C and the associated L and M are often too large to practically execute eigenvalue analysis• The MM Lie algebra has no Casimir operators and does not offer any deeper insight

• However, the columns of M are non-negative and sum to unity • Thus the Mij can be thought of as probabilities and thus support a definition of entropy

Page 16: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

• We define the entropy of each column as the Renyi’ entropy Ra • Rj

c a b = (1/(1-a))* log2 ((∑i M bij ) a) is defined for each column and for b degrees of

separation. • The same can be defined for rows as for columns.

• These entropies measure the order / disorder structure of the network “flows”• The column and row entropies measure the incoming and outgoing flows for each

node.• They “distill” the information about the underlying topology.

Page 17: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Sorted Entropies Distill The Topology• There is no natural order for the nodes of network making it combinatorially impossible

to compare network topologies or to monitor a topology over time.• But if we sort the Renyi’ entropy values by order, we get an entropy spectra:

• AND we get a unique node ordering.• The entropy spectral curves distill the topology of the network • By choosing both row and column Renyi entropies of multiple orders with multiple

degrees of separation, the network can be expanded in curves of ‘diminishing’ importance.

• Thus we propose that these entropy spectral curves provide a powerful representation of the underlying topology and provide practical tools for comparing networks as well as monitoring a network over time.

Page 18: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

The Topology is Expanded as a Series of Entropies and Degrees of Separation

• The collection of these multiple Renyi’ entropies for rows and columns of the Markov matrix, using Markov matrices that are expanded with different degrees of separation, can provide a complete description of any network (except for a possible degeneracy).

• We now have the expansion in a meaningful set of metrics for the topology

• Two networks that have different entropy spectra must be themselves different (when the s parameter is standardized with other factors).

Page 19: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Scalar product of DAB for two networks (or for two times) -> metric• By forming Ra,i c/r for each of two networks, one can take

the difference of the column (or row) values (for multiple a and I) squared:• DAB = √ (∑ ,a b,i (RA,c, ,a b,

i – RB,c, ,a b,i)2) as a measure of the

“distance between the topologies A and B”.• Here the ,a b,i summation covers the Renyi & degrees

of separation orders while i ranges over the nodes for each.

• Of course one could get a distance between the two topologies A and B by just taking the column entropies for the first degree of separation and using only the differences of the second order Renyi entropies.

Page 20: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

• By standardizing the parameter ,l and the normalizations

of the C and thus the L matrix, then this parameter provides a distance metric between topologies. • Then one could describe a given topology in terms of its distance

from a set of “reference topologies” .

Page 21: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

A Network of Networks• Let us next imagine a network CAB defined in terms of nodes which

are themselves all possible networks using a connection matrix CAB = exp- (DAB)2 • Note that this function (properly normalized) maximizes when the networks are close• This is exactly what we want for the CAB connection matrix.

• This could provide a framework for partially classifying networks.• A given network would be defined (partially) by its distance from a set

of reference networks much like we use the positions of masses in

our three dimensional space for regular objects. • Having the “position” of a network that is changing now

allows us to define its “velocity”.

Page 22: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Past Results• We have been able to identify system attacks and abuse using the

Renyi entropy for network data from a SC university network and using our software, by clicking the aberrant parts of the entropy spectra, to identify the specific offensive nodes.

• We have studied applications of this to networks of the U.S. economy (utilizing the Leontief Input-Output matrices from the BEA).

• We have also studied social networks among Physics students who work with each other – a heterogeneous network where some nodes are students, some are their grades, and some are parameters of their efforts.

• We are now investigating what we believe to be a new type of information network that has not previously been investigated.

Page 23: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

The Algorithm – How it all works:• Consider that any network Cij defines a =l C with L = lij Lij where M = exp(slij Lij)

• Simply set lij = Cij for off diagonal terms and automatically the L ij will define the diagonals.

• For example in three dimensions if C = (0, 1, 2; 3, 0, 4; 5, 8, 0) then L = • -8 1 2 • 3 -9 4• 5 8 -6

• But the C matrix is only defined to within an overall factor and we need to make this matrix smaller so that when the first term is added to the unit matrix I in the expansion, then the diagonal terms are not negative. Lets take the parameter s = “0.01” to get

• M = 1 + 0.01*L =• .92 .01 .02• .03 .91 .04• .05 .08 .94• Notice that each column sums to unity and that all elements are non-negative. • It is straight forward to add any number of more terms OR NOT to include other

degrees of separation in the M matrix.

Page 24: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

• Then the R2 values for each node (column) are • = - log2 (M11

2 + M212 +M31

2) etc for the next two columns or

• 0.070683 078522 .052762

• These are then sorted in numerical order to form a non-decreasing (or increasing if you prefer) curve.

• That is the entropy spectral curve for the three node network.

• When s=0.02 (twice as great then the entropies are:• 0.14315 0.155896 0.106571)

• At higher values of s, one can include the s2 term in the expansion (second order degree of separation).

Page 25: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Here is a Larger C Matrix with a Full Calculation and Entropy Plot Using Excel:

Given a C Matrix, determine the L matrix, then computer the M matrix using different s values

The s value is here set and used only with the first degree of separation (first power of sL).

L Matrix This is the C network matrix with the diagonal set to the negative of the sum of each column

-20 1 1 2 1

3 -12 3 2 2

4 3 -15 2 3

5 4 5 -10 4

8 4 6 4 -10

s= 0.02 This is one example. S must be set so that 1+(the largest diagonal) > 0

M=I+sL or This matrix is a Markov matrix as one can easily verify

0.6 0.02 0.02 0.04 0.02

0.06 0.76 0.06 0.04 0.04

0.08 0.06 0.7 0.04 0.06

0.1 0.08 0.1 0.8 0.08

0.16 0.08 0.12 0.08 0.8

Renyi 2 = - log2 (sum of squares of elements in that column)

0.391902 0.225921 0.285335 0.186286 0.185752

Renyi 2 sorted in numerical order:

0.391902 0.285335 0.225921 0.186286 0.185752

s = 0.001 s = 0.005 s = 0.01 s = .02 s = 0.03

1 0.01750 0.08999 0.18615 0.391902 0.58071

2 0.01310 0.06682 0.13692 0.285335 0.43604

3 0.01047 0.05323 0.10869 0.225921 0.34930

4 0.00872 0.04422 0.09002 0.186286 0.28802

5 0.00872 0.04419 0.08991 0.185752 0.28651

Page 26: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Entropy curves for that example

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.50.00000

0.10000

0.20000

0.30000

0.40000

0.50000

0.60000

0.70000

s = 0.001

s = 0.005

s = 0.01

s = .02

s = 0.03

Page 27: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Higher Order Degrees of Separation• This example only included the first degree of separation

• By that we mean the first power of L in the expansion M =esL

• Each higher order will alter the entropy curve slightly• The amount of change will depend upon the s value.• The M matrix will be Markov no matter how many or few degrees of

separation one includes.

Page 28: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Networks of Different Numbers of Nodes

• One of the fundamental aspects of networks in the real world is that the number of nodes is constantly changing.• In social networks one is constantly adding new members and loosing

other members. The same phenomena is true in all networks thus presenting a difficult problem, namely:

• How can we compare network topologies as they acquire new nodes and lose some current nodes?

• We suggest the answer is to still use the comparison of the Renyi entropies as above but to smooth the values and scale the spectral curve between two fixed points – say 0 and 1.

• Thus a million node network would have the sorted entropies plotted every 1E-6 of the distance between 0 and 1.

• Then one is still comparing the entropy spectral shape as nodes come and go.

Page 29: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Open Questions of Interest• Can one construct a dynamical theory based upon the lowest order changes in the

components of expansion of low orders Renyi entropy and low degrees of separation using entropy spectral differences?

• Identify invariants as real networks evolve over time.• What criteria will allow one to prove that the lower order expansions contain the

dominant components of the network and allow us to ignore higher order terms?• How are the expansion parameter (s value) and trace(L) values best standardized

across different networks for these calculations?• How are the Renyi row and Renyi column entropy spectra different for practical

networks?• How can fundamental topologies (random, scale free, rings, cliques, clusters, trees,

etc.) be best classified using these entropy spectra?

Page 30: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Thank You for your interest.

• The author encourages partnering on research with him using these concepts.

• This research was supported by grants from DARPA with the author as PI• The IP of this research is protected by U.S. Patent 8271412B2 owned by USC

• Email: [email protected]• Web Site: www.ExaSphere.com

• Office: 803-777-6431• Physics Department PSC Room 405• University of South Carolina• Columbia SC 29208

Page 31: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Some Plots From Previous Research• Plots show (a) the entropy (by column or row) sorted by

magnitude against (b) time and (c) sorted node number. • We have performed the computations in real time here for

internet traffic to identify anomalies.• The software allows one to click on an entropy curve anomaly and

identify the associated node in spite of the constantly changing sort order.

Page 32: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Column Entropy - Order 1

Page 33: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Column Entropy - Order 2

Page 34: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Column Entropy - Order 3

Page 35: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Order 1 – Order 2 Difference Plot

Page 36: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Order 2 – Order 3 Difference Plot

Page 37: NETWORK ANALYSIS BASED UPON THE RENYI' ENTROPIES OF THE ASSOCIATED MARKOV MONOID TRANSFORMATION. USC CAS IMI Summer School on Network Science May 2013

Column/Row Ratio Plot(Symmetry Plot) – Order 2