extracting hidden information from knowledge networks sergei maslov brookhaven national laboratory,...

Extracting hidden information from knowledge networks

Sergei MaslovBrookhaven

National Laboratory,

New York, USA

Hanse Institute for Advanced Study, March 2002

Outline of the talk

What is a knowledge network and how is it different from an ordinary graph or network?

Knowledge networks on the internet: matching products to customers

Knowledge networks in biology: large ensembles of interacting biomolecules

Empirical study of correlations in the network of interacting proteins

Collaborators: Y-C. Zhang, and K. Sneppen


Networks in complex systems

Network is the backbone of a complex system Answers the question: who interacts with whom? Examples:

– Internet and WWW– Interacting biomolecules (metabolic, physical,

regulatory)– Food webs in ecosystems– Economics: customers and products; Social: people

and their choice of partners


Predicting tastes of customers based on their opinions on products

Each of us has personal tastes These tastes are sometimes unknown even to

ourselves (hidden wants) Information is contained in our opinions on

products Matchmaking: customers with similar tastes

can be used to predict future opinions Internet allows to do it on a large scale


Types of networks

read

ers

book

s2

1

3

4

1

2

3

Plain network Knowledge or opinion network

read

er’s

ta

stes

book

’s f

eatu

res

opinion

2

1

3

4

1

2

3


Storing opinions

X X X 2 9 ? ?

X X X ? 8 ? 8

X X X ? ? 1 ?

2 ? ? X X X X

9 8 ? X X X X

? ? 1 X X X X

? 8 ? X X X X

book

s2

1

3

4

read

ers

98

81

21

2

3

Matrix of opinions IJNetwork of opinions


Using correlations to reconstruct customer’s tastes

Similar opinions similar tastes

Simplest model: – Readers M-dimensional

vector of tastes rI

– Books M-dimensional

vector of features bJ

– Opinions scalar product:

IJ= rIbJ

cust

omer

s

book

s

98

81

21

2

2

1

3

43


Loop correlationcu

stom

ers

book

s

98

8

1

2

2

1

3

43

predictive power 1/M(L-1)/2

one needs many loops to completely freezemutual orientation of vectors

an unknown opinion

L known opinions


Field Theory Approach

• If all components of vectors are Gaussian and uncorrelated:

• Generating functional is: det(1+i)-M/2

• All irreducible correlations are proportional to M• All loop correlations <12 23 34 … L1>=M• Since each is IJ~M sign correlation scales as M–(L-1)/2


Main parameter: density of edges

The larger is the density of edges p the easier is the prediction

At p1 1/N (N=Nreaders+Nbooks) macroscopic prediction

becomes possible. Nodes are connected but vectors rI

bJ are not fixed: ordinary percolation threshold

At p2 2M/N > p1 all tastes and features (rI and bJ)

can be uniquely reconstructed: rigidity percolation threshold


Spectral properties of

For M<N the matrix IJ has N-M zero eigenvalues and M positive ones: = R R+.

Using SVD one can “diagonalize” R = U D V+ such that matrices V and U are orthogonal V+ V = 1, U U+ = 1, and D is diagonal. Then = U D2 U+

The amount of information contained in : NM-M(M-1)/2 << N(N-1)/2 - the # of off-diagonal elements


Practical recursive algorithm of prediction of unknown opinions

1. Start with 0 where all unknown elements are filled with <> (zero in our case)

2. Diagonalize and keep only M largest eigenvalues and eigenvectors

3. In the resulting truncated matrix ’0

replace all

known elements with their exact values and go to step 1


Convergence of the algorithm

• Above p2 the algorithm exponentially converges to theexact values of unknown elements

• The rate of convergence scales as (p-p2)2


Reality check: sources of errors

Customers are not rational! IJ= rIbJ + Ij

(idiosyncrasy)

Opinions are delivered to the matchmaker through a narrow channel:– Binary channel SIJ = sign(IJ) : 1 or 0 (liked or not)

– Experience rated on a scale 1 to 5 or 1 to 10 at best

If number of edges K, and size N are large, while M is small these errors can be reduced


How to determine M?

In real systems M is not fixed: there are always finer and finer details of tastes

Given the number of known opinions K one should choose Meff K/(Nreaders+Nbooks) so that systems are below the second transition p2 tastes should be determined hierarchically


Avoid overfitting

Divide known votes into training and test sets Select Meff so that to avoid overfitting !!!

Reasonable fit Overfit


Knowledge networks in biology

Interacting biomolecules: key and lock principle

Matrix of interactions (binding energies) IJ= kIlJ+ lIkJ

Matchmaker (bioinformatics researcher) tries to guess yet unknown interactions based on the pattern of known ones

Many experiments measure SIJ =(IJ-th)

k(1) k(2) l(2)l(1)


Real systems

Internet commerce: the dataset of opinions on movies collected by Compaq systems research center:

– 72916 users entered a total of 2811983 numeric ratings (* to *****) for 1628 different movies: Meff~40

– Default set for collaborative filtering research Biology: table of interactions between yeast proteins

from Ito et al. high throughput two-hybrid experiment– 6000 proteins (~3300 have at least one interaction partner)

and 4400 known interactions– Binary (interact or not)– Meff~1: too small!


Yeast Protein Interaction Network

• Data from T. Ito, et al. PNAS (2001) • Full set contains 4549 interactions among

3278 yeast proteins• Here are shown only nuclear proteins interacting with at least one other nuclear protein


Correlations in connectivities

Basic design principles of the network can be revealed by comparing the frequency of a pattern in real and random networks

P(k0,k1) – probability that nodes with connectivities k0 and k1 directly interact

Should be normalized by Pr(k0,k1) – the same property in a randomized network such that:

– Each node has the same number of neighbors (connectivity)– These neighbors are randomly selected– The whole ensemble of random networks can be generated


Correlation profile of the protein interaction network

P(k0,k1)/Pr(k0,k1) Z(k0,k1) =(P(k0,k1)-Pr(k0,k1))/r(k0,k1)


Correlation profile of the internet


What it may mean?

Hubs avoid each other (like in the internet R. Pastor-Satorras, et al. Phys. Rev. Lett. (2001))

Hubs prefer to connect to terminal ends (low connected nodes)

Specificity: network is organized in modules clustered around individual hubs

Stability: the number of second nearest neighbors is suppressed harder to propagate deleterious perturbations


Conclusion

Studies of networks are similar to paleontology: learning about an organism from its backbone

You can learn a lot about a complex system from its network !! But not everything…


THE END


Entropy of unknown opinions

Density of knownopinions p

p1 p2

Entropy

0 1


How to determine p2?

K known elements of an NxN matrix IJ= rIbJ

(N=Nr+Nb) Approximately N x M degrees of freedom

(minus M(M-1)/2 gauge parameters) For K>MN all missing elements can be

reconstructed p2 =K2/(N(N-1)/2) 2M/N


What is a knowledge network?

Undirected graph with N vertices and K edges Each vertex has a (hidden) M-dimensional

vector of tastes/features Each edge carries a scalar product (opinion) of

vectors on vertices it connects The centralized matchmaker is trying to guess

vectors (tastes) based on their scalar products (opinions) and to predict unknown opinions


Versions of knowledge networks

Regular graph: every link is allowed. Example: recommending people to other people according to their areas of interests

Bipartite graphs: Example: Customers to products

Non-reciprocal opinions: each vertex has two vectors dI, qI so that IJ= dIqJ . Example: Real

matchmaker recommending men to women.

extracting hidden information from knowledge networks sergei maslov brookhaven national laboratory,...

Documents

advanced study

hanse institute

unknown opinions

usa slide

sneppen slide

predicting tastes of

b j customers books

opinions xxx29