[ieee 2010 ieee international workshop on: business applications of social network analysis (basna)...

Community discovery in a growing model of Social Networks

Sreedhar Bhukya Department of Computer and Information Sciences

University of Hyderabad Hyderabad, India

Email: [email protected]

Abstract— A number of recent studies on social networks are based on characteristics which include assortative mixing, high clustering, short average path lengths, broad degree distributions and the existence of community structure. Here, a model which satisfies all the above characteristics is developed. In addition, this model facilitates interaction between different communities. This model gives very high clustering coefficient by retaining the asymptotically scale-free degree distribution. Here the community structure is raised from a mixture of random attachment and implicit preferential attachment. A model for community discovery also has been discovered where strict community structure has been preserved.

Keywords-component; social networks; communiy network; community discovery.

I. INTRODUCTION A social network is made of nodes that are tied by one or more specific types of relationships. The vertex represents individuals or organizations. Social network has been intensively studied by Social scientists [2,3], for several decades in order to understand local phenomena such as local formation and their dynamics, as well as network wide process, like transmission of information, spreading disease, spreading rumor, sharing ideas etc. Various types of social networks, such as those related to professional collaboration [4,5,6], Internet dating [7], and opinion formation among people are studied. Social network involves Financial, Cultural, Educational, Families, Relations and so on. Social network creates relationship between vertices. Social network includes sociology, basic mathematics and graph theory. The basic mathematics structure for social network is a graph. The main social network properties includes hierarchical community structure [8], small world property [9], power law distribution of vertex degree [17] and the most basic is Barabasi Albert model of scale free networks [10]. The more online social network gains popularity the more scientific community is attracted by the research opportunities that these new fields give. Most popular online social network is Facebook, where user can add friends, send them messages, and update their personal profiles to notify friends about themselves. Essential characteristics for social networks are believed to include assortative mixing [11,12], high clustering, short average path lengths, broad degree distributions[13,14].

This paper address two issues 1. I discuss how to grow a social network with different communities 2. I discuss how to discover a person belonging to a community in a growing social network. Here the highest priority is given to preserve community structure. i.e., once a person joins a community initially, his/her priority will be preserved for ever even though he/she has more contacts in another community. Here in section II network growth algorithm has been discussed. In section III community discovery has been discussed and in section IV obtained results are explained, finally in section V conclusions are given.

Here we have considered an existing model [1] of social networks and developed it. The model is as follows:

The algorithm consists of two growth processes: (1) random attachment (2) implicit preferential attachment resulting from

following edges from the randomly chosen initial contacts.

The existing model lacks in the following two aspects: 1. There is no connection between initial to initial

contacts, if more than one initial contact is chosen. 2. There is no connection between initial contact and its

neighbour of neighbour vertices. These two aspects have considerable effect in the following

case. Let us consider a person contacting a person in a research group for his own purpose and suppose that he/she didn’t get adequate support from that person or from his neighbours, but he may get required support from some friend of friend for his/her initial contact. Then the only way a new person could get help is that his primary contact has to be updated or create a contact with his friend of friend for supporting his new contact and introduce his new contact to his friend of friend. The same thing will happen in our day to day life also. If a person contacts us for some purpose and we are unable to help him, we will try to help him by some contacts of our friends. The extreme case of this nature is that we may try to contact our friend of friend for this purpose. We have implemented the same thing in our new model. In the old model, information about friends only used to be updated, where as in our model information about friend of friend also has been updated. Of course this model creates a complex social network model but, sharing of information or data will be very fast. This fulfils the actual purpose of social

Administrator

Typewritten Text

978-1-4244-9000-4/10/$26.00 ©2010 IEEE

networking in an efficient way with a faster growth rate by keeping the community structure as it is.

II. NOVAL NETWORK GROTH ALGORITHM The algorithm includes three processes (1) Random

attachment (2) Implicit preferential contact with the neighbours of initial contact (3) Contact between the new vertex and a Neighbour of Neighbour of initial contact via the initial contact. The algorithm of the model is as follows [1]:

1. Start with a seed network of N0 vertices 2. Pick an average mr ≥ 1 preferential contact through

someone as initial contact 3. Pick on average mt ≥ 1 neighbors of each secondary contact as tertiary contact. 4. Pick an average ms ≥ 0 neighbors of each initial contact as secondary contact 5. Connect the new vertex to the initial, secondary and tertiary contacts 6. Repeat steps 2–5 until the network has grown to desired (fig.1)

I did analytical calculation for growing network and got expected result for mr , ms and mt. For implementation, any non negative distribution of mr, ms and mt are chosen with these expectations value. The distribution for number of secondary contact has a long tail and it happens that number of connected secondary contacts is higher than then the degree of initial contacts. For any community structure formation, it need edges to connect with the neighbor of initial contacts. Neighbor of neighbor vertices are always connected to new vertices and if initial contact is more than one chosen, the new vertex will form a bridge between the communities. Here I use discrete uniform distributions in the network, usually if one initial contact is chosen than the probability is high, if more than one initial contact is chosen than the probability is less.

Fig 1: Growing process of community network. The new

vertex ‘V’ initially connects through some one as initial contact (say i). Now i, updates its neighbor of neighbor contact list and hence connects to l. ‘V’ connects to ms number of neighbors (say j ,l ) and mt number of neighbor of neighbors of i (say l).

The major advantage of this model is that there is a possibility of establishing a contact between two initially chosen random contacts in a mean time, where as the earlier model strictly prohibits this contact. For example consider the graph in Fig: 2.

Here let us say a new node N1 selects R1, R2 as its random initial contacts which are the members of two exclusive communities. In the earlier model there is no way to establish a contact between R1 and R2 at any point of time in future, but in our model if any newly coming node, let us say Nk, chooses either of R1 or R2 as its random initial contact, then immediately there establishes a contact between R1 and R2 via N1.

Fig. 3 Visualization of a network graph with N= 30 vertices

indicates strong community structure with communities of size clearly visible. Number of each secondary contact from each initial contact n2nd ~ U[0,3) (uniformly distributed between 0 and 3), and initial contact is connecting its neighbor of neighbor vertices U[1-2], Each vertex represents a social actor and each edge between two vertices represents their social interactions. In this model we tried 30 sample vertices and prepare a strong growing network.

IIA. VERTEX DEGREE DISTRIBUTION We derive approximate value for the vertex degree

distribution for growing network model mixing random initial contact, neighbor of neighbor initial contact and neighbor of initial contacts. Power law degree distribution with p (k) ~ kγ with exponent 2<γ<∞ have derived [15-17]. Our model strongly supported to add edges in between new vertex to initial, its neighbor contacts and initial vertex to its neighbor of neighbor vertices to new vertex, in our model degree correlation are not presented, even low degree vertices are also growing into the network community. The rate equation which describes how the degree of a vertex changes on average during one time step of the network growth is constructed. The degree of vertex vi grows in 3 process [1].

1. When a new vertex directly links to vi, since at any time t, there will be on average ~ t vertices. Here we are selecting mr out of them with a probability mr /t.

2. When a vertex links to vi as secondary contact, the selection will give rise to preferential attachment. These

will be mr ms in number (for each mr there will be ms secondary contacts).

3. When a vertex links to vi as tertiary contact, this will also be a random preferential attachment. These will be 2mr ms mt (For each mr there will be ms secondary contacts and for each ms there will be mt tertiary contacts. In total mrmsmt number of edges will be created once between mr and mt and again between ms and mt.)

These three processes leads to the following rate equation for the degree of vertex vi [1]

The probability density distribution for the degree k is given by (See Appendix-A)

( ) ( ) -2 -3AP k = A B k + C m + 2 m ms s t (1)

Where

m +m m +2m m mr r s r s tA=2m m +2m m mr s r s t

⎛ ⎞⎜ ⎟⎝ ⎠

1B=A m + m m +m m mr r s r s t2

C=Amr

⎛ ⎞⎜ ⎟⎝ ⎠

II B. CLUSTERING The clustering coefficient on vertex degree can also be

found by the rate equation method [16]. Let us examine how the number of triangles Ei changes with time. The triangle around vi are mainly generated by three processes:

1. Vertex vi is chosen as one of the initial contact with probability mr/t and new vertex links to some of its neighbors, and neighbor of neighbors giving raise to triangles. There will be ms+3mt triangles.

2. The vertex vi is chosen as secondary contact and the new vertex links to it as its primary or tertiary contact giving raise to triangles. There will be 1+3mt triangles.

3. The vertex vi is chosen as tertiary contact and the new vertex links to it as its primary or secondary contact giving raise to triangles. There will be two such triangles one with its primary contact and one with its secondary contact.

These three processes will give the time evaluation of triangle around a vertex vi as

i i rs s t

r s t i

r r s r s t

r s t

r r s r s t

E (k ,t) m (m +3m m )t t

m m (1+3m )k2(m +m m +2m m m )t

m m m .42(m +m m +2m m m )t

ik

∂ =∂

+

+

( ) ( ) a+bk a+bkt i iE t = a+bk ln + ln +Ei i initt b a+bki init

⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠

(2)

From this we can obtain the clustering the coefficient as [1]

( ) ( )( )

2 E ki ic k =ii k k - 1i i

( ) ( )

( )

a+bkiE k =Abk ln k +C +k ln -bAk lnB+i i i i i ia+bkinit

a+bka iaAln k +C + ln +E -aAlnBi initb a+bkinit

⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠

⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠

( ) ( )

( )

a+bkiE k =Abk ln k +C +k ln -bAk lnBi i i i i ia+bkinit

a+bka i+aAln k +C + ln +E -aAlnBi initb a+bkinit

⎛ ⎞⎜ ⎟⎝ ⎠

⎛ ⎞⎜ ⎟⎝ ⎠

Where

( )

a=m m +3m m m -mr s r s rt5m m mr s tb=

2 m +m m +2m m mr r s r s ta

F=a+bkinit

Here A, B and C are above. For large values of degree k, the clustering coefficient thus depend on k as c(k)~ln k/k.

III. COMMUNITY DISCOVERY Community discovery is primarily proposed to provide a

sharing platform for people with common interests. Community members, whether friends or not, are encouraged to contribute and share more contents. Through this process, the chances of members to become friends will be promoted. Community activities change the social network and affect the social behavior tremendously. Community discovery is based on the contents of one’s personal interests and their social relationships [18]. In this paper we present a class of new algorithms for community discovery growing model of social network. OUR PROPOSED ALGORITHM Table1. Proposed Community Discovery Algorithm V [G] = list of vertices E [U] = list of neighhbours of vertex U

C [K] = community of vertex k LC [list] = list of communities Comm [ i ] = list of vertices in community i Community˙list ( G, LC [list] ) Total = V [G] for each i∈ LC [list] Comm [ i ] = NULL While (Total ≠ NULL) for each U∈ Total Community of U = C[U] for each UI∈ E[U] community of UI=C[UI] if C[UI] ≠ C [U] remove Edge(U, V) In this model, the community which a newly entering vertex joins is preserved after the formation of entrée network. For the division of community, the basic algorithm is shown table 1.

Explanation of algorithm: Here one begins with any random vertex and check its community. Now consider all its neighbors and their communities. If the community of any of the neighbors differs from the community of random vertex, the corresponding edge will be removed. The process is continued for all vertices. Finally will obtain a network where community structure is preserved. The final network for the network is Fig. 3 after community discovery is as shown in Fig. 4.

Fig.4 In this case there are three X, Y and Z communities, denoted by the circles, which have dense internal links.

IV. RESULTS

Here results have been presented by calculating the edge to vertex ratio and triangle to vertex ratio for 30 vertices community wise. The results are given in TABLE: 2

In our growing social network, secondary contact and neighbor of neighbor initial contacts are growing fast compared to initial contact. I compared the results with average initial, average secondary contact and average neighbor of neighbor

initial contacts, and community discovery network size is N= 30 vertices (Table 2).

SIMULATION RESULTS Fig. Display the results of growing communities in a social networks includes initial contacts, secondary contact and tertiary contacts are connected to vertex vi.

Fig.5. Display three different community discovery results of X, Y and Z communities.◊ indicates community X, ■ indicates community Y and ▲ indicates community Z.

Fig.6. Display three different community discovery results of X, Y and Z communities.◊ indicates community X, ■ indicates community Y and ▲ indicates community Z.

V. CONCLUSION A method for growing a social network and discovering

communities in growing social networks has been described in this paper. Here highest priority has been given to community structure while giving a facility of interaction among communities during network formation. And it keeps the community structure at ease which facilitates an easy community discovery within a short time cycle. TOOL We have used C programming Language, UciNet and NetDraw for simulation results. NOTATIONS

Size IC SC NNIC

X 10 3.3 2.0 2.14

Y 11 3.0 1.5 2.5 Z 9 3.3 2.5 2.72

Total 30 9.6 6.0 7.36

Comparing X,Y and Z Communities

1

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Vertices

Deg

rees

X Y Z

Comparing Results

02468

101214

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Vertices

Deg

rees

Total

Notation Description mr Random Initial Contact ms Secondary Contact mt Tertiary contact (Neighbor of Neighbor of

Initial Contact) IC Initial contact SC Secondary contact

NNIC Neighbor of neighbor initial contact

APPENDEX A. From the discussion in [19] one can write the total probability of a new node to be connected as primary or secondary or tertiary contact as

r i ir s r s t

j jj j

m k k+m m +2m m mt k k∑ ∑

Average initial degree of a vertex is given by

init r r s r s tk =m +m m +2m m m from this one can write[1]

j r r s r s tj

k =2(m +m m +2m m m )t∑

Now the rate of increase of degree of vertices given by the total probability as

r s r s tir i

r r s r s t

m m +2m m mk 1= m + kt t 2(m +m m +2m m m

⎛ ⎞∂⎜ ⎟∂ ⎝ ⎠

One can write this as

i ir

k k1= m +t t A

∂ ⎛ ⎞⎜ ⎟∂ ⎝ ⎠

Where

m +m m +2m m mr r s r s tA=2m m +2m m mr s r s t

⎛ ⎞⎜ ⎟⎝ ⎠

i

t k

iit kinit

r

1 1dt= dkkt m +A

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

∫ ∫

r i r itt ir i n i t r i n i t

i

1 1m + k m + k1 A A= l n = l n1 1A m + k m + kA Atl n t

⎛ ⎞⎜ ⎟⎜ ⎟ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠⎜ ⎟⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

1r i

ir init

1m + kAtt 1m + kA

A⎛ ⎞= =⎜ ⎟⎝ ⎠

1A

ii

tk =B -Ct⎛ ⎞⎜ ⎟⎝ ⎠

From this ( )-AAi it =B k +C t from [1] we have

i i1F(k )= (t-t )t

and by replacing ki by k in this we get

( ) s s t-2 -3m +m mAP k = A B (k+ C )

REFERENCES [1] Riitta Toivonen, Jukka-Pekka Onnela, Jari Sarama¨ki, Jo¨rkki

Hyvo¨nen, Kimmo Kaski “A model for social networks”. Physica A 371 (2006) 851–860

[2] S. Milgram, Psychology Today 2 (1967) 60–67. [3] Mx. Granovetter, “The Strength of Weak Ties” Am. J. Soc. 78 (1973)

1360–1380. [4] Duncan J. Watts & Steven H. Strogatz, “Collective dynamics of `small -

world' networks, NATURE 393 (1998) 440. [5] M. Newman, “The structure of scientific collaboration networks”,

PNAS January 17, 2001, vol. 98. 404- 409. [6] M. Newman, “Coauthorship networks and patterns of cientific

collaboration”, PNAS, April6, 2004. Vol. 101. 5200–5205. [7] P. Holme, C R. Edling, Fredrik. Liljeros, “Structure and Time-

Evolution of an Internet D ating Community”, Soc. Networks (2004). Vol 26- 155–174.

[8] M. Girvan and M. E. J. Newman, Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99, 7821±7826 (2002).

[9] M. E. J. Newman, The structure and function of complex networks, SIAM Review 45, 167-256 (2003).

[10] A.-L. Barabási and R. Albert, Emergence of scaling in random networks, Science, 286:509-512 (1999).

[11] M.E.J Newman, ”Assortative Mixing in Networks”, Phys. Rev. Lett. 89 (2002) 208701.

[12] M.E.J. Newman, Juyong Park, “Why social networks are different from other types of networks”, Phys. Rev. E 68 (2003) 036122.

[13] L.A.N. Amaral, A. Scala, M. Barth and H.E. Stanley, Classes of small-world networks, PNAS.Oct 10, 2000.Vol.97,11149-11152.

[14] Marian Boguna, Romualdo Pastor-Satorras, Albert Dı´az- Guilera, Alex. Arenas, “Models of social networks based on social distance attachment”, Phys. Rev. E 70, 056122. (2004)

[15] T. Evans, J. Sarama¨ki, “Scale-free networks from self- organization”, Phys. Rev. E 72, 026138. (2005)

[16] Gabor Szabo, Mikko Alava, and Janos. Kertesz, Phys. “Structural transitions in scale-free networks”. Phys. Rev. E 67 056102. (2003)

[17] P. L. Krapivsky and S. Redner “Organization of growing random networks”. hys. Rev. E 63, 066123 (2001)

[18] Fei Yan, Jing Jiang, Yang Lu, Qingjun Luo, Ming Zhang “Community Discovery Based on Social Actors’, Interests and Social Relationships, Fourth International Conference on Semantics, Knowledge and Grid. Jeong978-0-7695-3401-5/08

[19] Albert-L aszl Barabasi, Reka Albert, Hawoong “Mean-_eld theory for scale-free random networks”, Physica A 272 (1999) 173-187.

[ieee 2010 ieee international workshop on: business applications of social network analysis (basna)...

Documents