wikimania 2010-illesfarkas-v4

9
Tagging structure in a protein-protein interaction network, a co-authorship network and the (English) Wikipedia Protein s Math co- authors Wikipedia articles Network Nodes + Links Node tags Directed Acyclic Graph Gergely Palla, Illés J. Farka s , Péter Pollner, Imre Derényi, Tamás Vicsek Category tree FIFA World Cup FIFA World Cup Players Classification of co-authored papers Combinator ics Graph theory Biochemical functions Growth Cell growth Eötvös University and Hungarian Academy of Sciences (Budapest, Hungary) Interactions (“MIPS”) Co-authorships (“MathSciNet”) Hyperlinks in Wikipedia CFinder.org

Upload: illes-farkas

Post on 06-Aug-2015

229 views

Category:

Technology


0 download

TRANSCRIPT

Tagging structure in a protein-protein interaction network, a co-authorship network and the (English) Wikipedia

Proteins Math co-authors Wikipedia articles

Network

Nodes + Links

Node tags

DirectedAcyclicGraph

Gergely Palla, Illés J. Farkas, Péter Pollner, Imre Derényi, Tamás Vicsek

Category tree

FIFA World Cup

FIFA World Cup Players

Classification of co-authored papers

Combinatorics

Graph theory

Biochemical functions

Growth

Cell growth

Eötvös University and Hungarian Academy of Sciences (Budapest, Hungary)

Interactions (“MIPS”) Co-authorships (“MathSciNet”) Hyperlinks in Wikipedia

CFinder.org

G. Palla, I. J. Farkas, P. Pollner, I. Derényi, T. Vicsek Fundamental statistical features and self-similar properties of tagged networks New Journal of Physics 10, 123026 (2008) 

http://CFinder.org --> Publications

Full version:

There is no clearly separated group of “most popular tags”Transition between popular and less popular tags is continuous

Portion (x) of all nodes in the network

Probability that a given tag and its descendants label a portion x of all nodes

CFinder.org

There is no clear group of “most heavily tagged nodes”Transition between strongly and less strongly tagged nodes is continuous

Number of tags (n) on a node (protein, author, wiki article) of the network

Probability that a node has n tags

CFinder.org

On the large scale there is no “link saturation” within a topicIn fact, link density within a large topic is almost the same as outside

CFinder.org

Number of links among these nodes

Number of Wikipedia articles selected from those labeled with “Japan” and descendant terms

Maximum possible

Wikipedia average

CFinder.org

AB

C D

E

F G

H

J

Meaningful removal of loops from the category hierarchyHow to achieve tree structure (DAG) with the lowest number of removals

Example (Oct.2007):

Category:Urdu Category:Hindustani

Finding all loops is easy.

But which of their links should be removed?

Goals: - Remove lowest possible number of links- Smallest “damage” to existing category hierarchy

subcategory

subcategory

CFinder.org

Meaningful removal of loops from the category hierarchyHow to achieve tree structure (DAG) with the lowest number of removals

Example (Oct.2007):

(1) Identify “loop subgraph”by iteratively removing nodes with 1 link

(2) Iteratively remove “least important” linksfrom the loop subgraph

(3) Add non-loop links again

AB

C D

E

F G

H

JCategory:Urdu Category:Hindustani

subcategory

subcategory

Finding all loops is easy.

But which of their links should be removed?

Goals: - Remove lowest possible number of links- Smallest “damage” to existing category hierarchy

Gergely Palla Illés Farkas Péter Pollner Imre Derényi Tamás Vicsek

http://CFinder.org -- with network data and free analysis software

We thank forsupport from: