connected components in software networks

22
Connected Components in Software Networks Miloš Savić, Mirjana Ivanović, Miloš Radovanović Department of Mathematics and Informatics Faculty of Science University of Novi Sad

Upload: selima

Post on 31-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Connected Components in Software Networks. Miloš Savić, Mirjana Ivanović, Miloš Radovanović Department of Mathematics and Informatics Faculty of Science University of Novi Sad. Content. Introduction Data collection Experiments and results Conclusions. Introduction - software networks -. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Connected Components in Software Networks

Connected Components in Software Networks

Miloš Savić, Mirjana Ivanović, Miloš Radovanović

Department of Mathematics and InformaticsFaculty of Science

University of Novi Sad

Page 2: Connected Components in Software Networks

Content

• Introduction• Data collection• Experiments and results• Conclusions

Page 3: Connected Components in Software Networks

Introduction - software networks -

• Two levels of software complexity:- internal complexity of software entities (classes, functions...)- structural complexity of dependencies between entities

• Class collaboration networks:nodes: classes/interfaceslinks: OO relationships

• Static call graphs:nodes: functions/procedureslinks: call-return relationships

Page 4: Connected Components in Software Networks

Introduction- connected components -

• Connected component:set of mutually reachable nodes

• Giant connected component:contains the vast majority of nodes

• Directed networks:strongly connected componentsweakly connected components

Page 5: Connected Components in Software Networks

Introduction- theory of complex networks -

• Random graphs:- Poisson degree distribution- ER model (static + uniform attachment)

• Scale-free networks:- power-law degree distribution- BA model (growth + preferential attachment)

• Exponential networks:- exponential degree distribution- Model A (growth + uniform attachment)

Page 6: Connected Components in Software Networks

Introduction- motivations -

• Model A: test complementary cummulative in/out/total degree distributions of giant weakly connected components againts a power-law and an exponential distribution

• “robust yet fragile”: investigate topological stability of giant weakly connected components

• “hierarchical small-worlds, scale-free networks from optimal design”: determine size of strongly connected components

Page 7: Connected Components in Software Networks

Data collection

• Class collaboration networks:- Ant, Tomcat, Lucene, JavaCC, JDK- extractor – Yaccne

• Static call graphs:- gcc, kernel component of Linux kernel- extractor – Doxygen + our .dot aggregator

Page 8: Connected Components in Software Networks

Experiments and results- giant weakly connected components -

Network lwcc-size [%] wcc2-size [%]Linux 93.57 0.29Gcc 99.67 0.33Jdk 98.40 0.05Ant 97.81 0.25

Tomcat 94.17 0.86Lucene 96.89 0.56Javacc 97.46 1.26

Comparable networks sampled by ER, BA and Model A contain GWCC.

Page 9: Connected Components in Software Networks

Experiments and results- degree distribution of GWCCs -

Page 10: Connected Components in Software Networks

Experiments and results- Implications -

• Theoretical implications:model that can reproduce connectivity pattern characteristic to software systems

• Related to software engineering:in-degree = degree of class/function reuseout-degree = degree of class/function aggregation

Page 11: Connected Components in Software Networks

Experiments and results- theoretical implications -

• Superposition model (growth + preferential attachment for out-going links + uniform attachment for in-coming links)

Page 12: Connected Components in Software Networks

Experiments and results- Analytical solution of the superposition model -

• Continuum approach:“Mean field theory for scale-free random networks”, (Barabási et al, ’99)

inoutin DDD

inin D

kkCCD

/)(

)(

in

outout D

kDkCCD exp)(

Din/Dout – number of in-coming/out-going links introduced by each node

Page 13: Connected Components in Software Networks

Experiments and results- Implications related to SE -

• First combinatorial principle of graph theory:Avg(reuse) = Avg(aggregation)

But:Dispersion(reuse) ∞ as N ∞Dispersion(aggregation) ~ Avg(aggregation)2

• Conslusions:

1. Software systems exhibit a characteristic scale of code aggregation, but there is no characteristic scale of code reuse.2. Highly reused entities tend to be more reused.3. Predictability of code reuse and unpredictability of code aggregation as software system evolve.

Page 14: Connected Components in Software Networks

Experiments and results- Topological stability of GWCCs -

• Experiments:- removal of one node: to check the existence of articulation points- successive removal of preferential nodes: to check the fragility- successive removal of nodes at random: to check the robustness

• After each removal, size of the largest weakly connected component is measured

• fc-pref/fc-rnd:critical fraction of nodes that needed to be removed in order to destroy giant weakly connected component when preferential/random node removal scheme is applied

Page 15: Connected Components in Software Networks

Experiments and results- Articulation points -

• Software networks contain APs: [2.91% - 15.50%] of network size

• BA model:Dtotal – number of links introduced by each nodeDtotal = 1 num(AP) in the range [31% - 35.4%]Dtotal > 1 num(AP) = 0

• BAU model:- Dtotal is not constant value but random variable such that P{Dtotal = 1} > 0- Modification does not affect scale-free properties of degree distributions and produces APs

Page 16: Connected Components in Software Networks

Experiments and results- preferential node removal -

• Software networks are extremely vulnerable:fc (software network) < fc (BAU) < fc (EXP) < fc(RND)

Page 17: Connected Components in Software Networks

Experiments and results- random node removal-

• Software networks (except Linux) never lose GWCCs

• The same situation is for comparable networks generated by theoretical models

• Linux static call graphs is scale-free, random errors sensitive network:fc(Linux) < fc(RND) < fc(EXP) < fc(BAU)

• Large real-world networks: fc(RND) < fc(rw-net)

Page 18: Connected Components in Software Networks

Experiments and results- strongly connected components -

Linux: SCCs as a minor effectOther networks: no GSCC, but have relatively large

SCCstopological sort cannot be made there is no

elegant systematic testing strategy

Network NUM MAX1[%] MAX2[%] MAX3[%]Linux 2 0.088633 0.044316 -GCC 8 12.62242 0.652884 0.435256JDK 48 19.80519 6.926407 2.32684Ant 21 18.79106 1.839685 1.576873

Tomcat 33 7.918782 2.030457 1.725888Lucene 8 13.99417 1.166181 0.874636JavaCC 1 10.38961 - -

Page 19: Connected Components in Software Networks

Largest strongly connected component in GCC’s giant weakly connected component containing 116 mutually reachable nodes

Page 20: Connected Components in Software Networks

Conclusions• Out-degree sequences of software networks can

be better modeled with an exponential distribution than a power-law

• Scale-free software networks contain articulation points

• Software networks are extremely vulnerable to the removal of highest degree nodes, and (except Linux) share the same level of robustness as comparable networks generated by theoretical models

Page 21: Connected Components in Software Networks

Conclusions

• Linux static call graph is an interesting and intriguing example of a scale-free network which does not display tolerance against random errors

• Software networks contain relatively large cyclic dependencies - substructures that does not reflect optimal design and hierarchical small-worldliness

Page 22: Connected Components in Software Networks

Connected Components in Software Networks

Miloš Savić, Mirjana Ivanović, Miloš Radovanović

Department of Mathematics and InformaticsFaculty of Science

University of Novi Sad