social network analysis
DESCRIPTION
A Survey on Privacy of Personal DataTRANSCRIPT
© 2013, IJOURNALS All Rights Reserved
Page 116
Social Network Analysis – A Survey on
Privacy of personal Data
Author: Dushyant Tanna
(Department of Mathematics, Marwadi Engineering College, India)
Abstract
In this paper we will discuss about the usage of social
sites such as facebook, twitter and many more. As we
know that social networking sites has just modified the
way of usage of public sharing with the help of digital
technology. This means that people were using SN even
before the invention of digital technology but extensive
use of social networking sites have come up with recent
wide spread usage. People create social networking
profile on website to share their private information,
updates in social life and personal emotions to a limited
or wide number of users. This enables creation of
interconnected network and groups. Some of the main
resource for the usage of social networking sites like
chatting, messaging, emails, file sharing, video calling,
voice chatting, blogging and discussion groups. Some of
the Social networking sites from their origin in the
order of their release are as sixdegrees.com, Live
Journal, Blank Planet, Cyworld, Friendster, LinkedIn,
MySpace, Hi5, Orkut, Flicker, Facebook (Harvard),
Yahoo3600, Youtube, Facebook(Corporate), Windows
Live Space, Twitter and Facebook (everyone). There are
many more other than these also. With some of them
were just designed to frame a marketing strategy for
re-launch of certain brands. Since most of the software
that deals with digital communities are free for end
user so every user of site can modify its own content
Keywords: Social Network analysis, Node, Ego
1. INTRODUCTION Social Network analysis is a tool developed during
1950s and 1970s by Researchers and Sociologist in
Social Psychology. It is based on the basic acceptance
that there is importance of relationships among
nodes. Borgatti and Foster [1](2003) have proved
that there is exponential growth of literature in Social
Network research. Which has led us to the point that
one should be fully aware with the proper way of
using of SN sites. Since the number of user of SN sites,
is increasing of which the major group is of teenagers
who are least aware about the security and privacy of
SN sites so there is a great need of making them
aware about the threats of SN sites and how one can
misuse the information posted in SN sites. There is
really noticeable number of user from 1997. Thanks
to the efforts taken by so many great researchers to
increase the pace of our life and make it possible for
everyone to use SN sites to post our feeling publically
and that too with so many people at just one click in
fraction of seconds. Sometimes one feels so lonely
that just by sharing some of the personal thought –
will make them relaxed. It is even a pleasure to chat
with someone unknown person. Making friends has
become so easy for all the current generation.
Methodology:
Questionnaire, Interviews, Chi-Square Test
2. DEFINITIONS
2.1. Node It represents the individual actor in the network
2.2. Ego It is an individual focal node. In a network the number of ego is the same as the number of nodes. It can be groups, persons or entire society.
2.3. Alters In general for ego-centric networks, alters which are identified as connected to ego are a set that is unconnected with those for each other ego. Alters are really useful in a way like for example, if one could identify each of the alters connected to an ego by some relation, we can form a visual picture of the networks of some social associations.
3. SOME OF THE METHODS OF STUDYING SOCIAL NETWORK ANALYSIS
3.1 Full network method: In this method, one is required to collect each actor’s tie with other all other actors. . In essence, this
© 2013, IJOURNALS All Rights Reserved
Page 117
approach is taking a census of ties in a population of actors -- rather than a sample. For example we could collect data on shipments of steel between all pairs of nation states in the world system from International Monetary Fund records; we could examine the boards of directors of all public corporations for overlapping directors; we could count the number of vehicles moving between all pairs of cities; we could look at the flows of e-mail between all pairs of employees in a company; we could ask each child in a play group to identify their friends. Full network data is necessary to properly define and measure many of the structural concepts of network analysis.
3.2 Snowball method This method begins with a focal actor or set of actors. Each of these actors is asked to name some or all of their ties to other actors. Then, all the actors named (who were not part of the original list) are tracked down and asked for some or all of their ties. The process continues until no new actors are identified, or until we decide to stop (usually for reasons of time and resources, or because the new actors being named are very marginal to the group we are trying to study). The snowball method can be particularly helpful for tracking down "special" populations (often numerically small sub-sets of people mixed in with large numbers of others). Business contact networks, community elites, deviant sub-cultures, avid stamp collectors, kinship networks, and many other structures can be pretty effectively located and described by snowball methods. It is sometimes not as difficult to achieve closure in snowball "samples" as one might think. The limitations on the numbers of strong ties that most actors have, and the tendency for ties to be reciprocated often make it fairly easy to find the boundaries.
3.3 Ego – Centric Network (with alter
connection) In many cases it will not be possible (or necessary) to track down the full networks beginning with focal nodes (as in the snowball method). An alternative approach is to begin with a selection of focal nodes (egos), and identify the nodes to which they are connected. Then, we determine which of the nodes identified in the first stage are connected to one another. This can be done by contacting each of the nodes; sometimes we can ask ego to report which of the nodes that it is tied to are tied to one another.
This kind of approach can be quite effective for collecting a form of relational data from very large populations, and can be combined with attribute-based approaches. For example, we might take a simple random sample of male college students and ask them to report who are their close friends, and which of these friends know one another. This kind of approach can give us a good and reliable picture of the kinds of networks (or at least the local neighborhoods) in which individuals are embedded. We can find out such things as how many connections nodes have, and the extent to which these nodes are
close-knit groups. Such data can be very useful in helping to understand the opportunities and constraints that ego has as a result of the way they are embedded in their networks.
3.4 Ego – Centric Network (Ego only) Ego-centric methods really focus on the individual, rather than on the network as a whole. By collecting information on the connections among the actors connected to each focal ego, we can still get a pretty good picture of the "local" networks or "neighborhoods" of individuals. Such information is useful for understanding how networks affect individuals, and they also give a (incomplete) picture of the general texture of the network as a whole.
Suppose, however, that we only obtained information on ego's connections to alters -- but not information on the connections among those alters. Data like these are not really "network" data at all. That is, they cannot be represented as a square actor-by-actor array of ties. But doesn't mean that ego-centric data without connections among the alters are of no value for analysts seeking to take a structural or network approach to understanding actors. We can know, for example, that some actors have many close friends and kin, and others have few. Knowing this, we are able to understand something about the differences in the actors places in social structure, and make some predictions about how these locations constrain their behavior. What we cannot know from ego-centric data with any certainty is the nature of the macro-structure or the whole network.
In ego-centric networks, the alters identified as connected to each ego are probably a set that is unconnected with those for each other ego. While we cannot assess the overall density or connectedness of the population, we can sometimes be a bit more general. If we have some good theoretical reason to think about alters in terms of their social roles, rather than as individual occupants of social roles, ego-centered networks can tell us a good bit about local social structures. For example, if we identify each of the alters connected to an ego by a friendship relation as "kin," "co-worker," "member of the same church," etc., we can build up a picture of the networks of social positions (rather than the networks of individuals) in which egos are embedded. Such an approach, of course, assumes that such categories as "kin" are real and meaningful determinants of patterns of interaction.
4. DATA ON THE POPULARITY OF SN SITE AND ITS NUMBER OF USER
The below gives the list of the popularity of the some of the famous social networking sites and number of users. Based on the number of users we can definitely say that social network sites are gaining a lot of popularities these days.
© 2013, IJOURNALS All Rights Reserved
Page 118
1 Facebook: 800,000,000 - Estimated Unique Monthly Visitors
2 Twitter: 250,000,000 - Estimated Unique Monthly Visitors
3 LinkedIn: 200,000,000 - Estimated Unique Monthly Visitors
4 Pinterest: 120,000,000 - Estimated Unique Monthly Visitors
5 MySpace: 70,500,000 - Estimated Unique Monthly Visitors
6 Google Plus+: 65,000,000 - Estimated Unique Monthly Visitors
7 Instagram: 50,000,000 - Estimated Unique Monthly Visitors
8 DeviantArt: 25,500,000 - Estimated Unique Monthly Visitors
9 LiveJournal: 20,500,000 - Estimated Unique Monthly Visitors
10 Tagged: 19,500,000 - Estimated Unique Monthly Visitors
11 Orkut: 17,500,000 - Estimated Unique Monthly Visitors
12 CafeMom: 12,500,000 - Estimated Unique Monthly Visitors
© 2013, IJOURNALS All Rights Reserved
Page 119
13 Ning: 12,000,000 - Estimated Unique Monthly Visitors
14 Meetup: 7,500,000 - Estimated Unique Monthly Visitors
15 myLife: 5,400,000 - Estimated Unique Monthly Visitors
5. SOME OF THE FACTS PRESENTED WITH SIMPLE PERCENTAGE TABLE
5.1 Time Spent during leisure time The following table gives us the evidence about the leisure time. People were asked to categorise the time spent in most five preferred activity during a day.
Table 1: Time Spent during leisure time
Time Spent during leisure time Frequency Percentage
FB, Twiter / SN site 20 40
Reading a book 17 34
Refering to blogs 6 12
Hang out 5 10
To worship place 2 4
Total 50 100
Figure 1: Leisure time spent
So it is evident that 52% of the people prefer to spare
time on Facebook or Twitter of Blogs.
5.2 Website mostly visited
Table 2: Website Visited
Website Mostly Visited Frequency Percentage
Facebook 24 48 Gmail, yahoo mail, etc email sites 9 18
You Tube 2 4
Educational Sites 8 16
Others 7 14
50 100
05
10152025303540
40
34
12 10
4
Time Spent during leisure time
© 2013, IJOURNALS All Rights Reserved
Page 120
Figure 2: Website Visited
This is obvious that 48% visit facebook.
5.3 Frequency of Sharing Status or thoughts on SNS
Table 3: Frequency of Sharing
Frequency of Sharing Status or thoughts on SNS Frequency Percentage
Daily 4 times or more 8 16 daily 2 times - 4 times 20 40
Daily once or twice 10 20
Weekly 7 14 Fortnightly or monthly 5 10
Total 50 100
Figure 3: Frequency of Sharing
This is obvious that 56% people share thoughts
atleast 2 or more times on SNS
5.4 Preference of Activity
Table 4: Preference of Activity on SNS
Preference of Activity in SN sites Frequency Percentage
Download/Upload photos 11 22
Chatting 14 28 Browsing for unknown person's data 2 4 Important Work / Reading 15 30
Others 8 16
Total 50 100
05
101520253035404550
48
18
4
16 14
Website Mostly Visited
0
5
10
15
20
25
30
35
40
16
40
20
1410
Frequency of Sharing Status or thoughts on
SNS
© 2013, IJOURNALS All Rights Reserved
Page 121
Figure 4: Preference of Activity on SNS
54% people prefer to pass time in non useful activity
5.5 Membership in Academic Groups
Table 5: Membership in groups
Groups Related to academic Frequency Percentage
Less than 10 40 80
Between 10 to 30 7 14
Between 30 to 40 2 4
Between 40 to 50 0 0
More than 50 1 2
Total 50 100
Figure 5: Membership in groups related to
academic
It is evident that 80% of the people hold membership
in less than 10 educational related group.
6. RESEARCH PROBLEM To find out the usage of the Social networking sites by teenagers and to analyze whether they balance the
time of study. 6.1 RESEARCH METHODOLOGY 6.1.1 Population Size(N) The total element of the universe from which sample is selected for the purpose of study is known as the population. The population type includes teenagers of all age and people till age 40.
The population here is 200.
6.1.2 Sample Size (n) All the items considered in any field of enquiry constitutes of a universe for a population.
In this research only a few items can be selected from the population for our study purpose. The items selected constitute what is technically called as sample.
The population type includes teenagers of all age and people till age 40.
Here our sample size is 50 from the total population
to conduct the study.
6.2 DATA COLLECTION
The data source: Primary and secondary
6.2.1 The research approach: Survey method
6.2.2 The research instrument: Questionnaire Method (Primary source)
6.2.3 Secondary sources: journals, magazines, articles
6.2.4 The respondents: teenagers of all age and people till age 40
6.3 TOOLS OF ANALYSIS
6.3.1 Simple Percentage Analysis
Here the simple percentage analysis is used for calculating the percentage of usage of SN sites in the
total respondents.
6.3.2 CHI-Square Test
Chi-Square test is applied to test the goodness of fit, to verify the distribution of observed data with
05
1015202530
2228
4
30
16
Preference of Activity in SN sites
020406080
80
144 0 2
Groups Related to academic
© 2013, IJOURNALS All Rights Reserved
Page 122
assumed theoretical distribution. Therefore it is a measure to study the divergence of actual and expected frequencies. Karl Pearson[] has developed a method to test the difference between the theoretical value and the observed value.
Chi-square test (X2) = (O-E)2/Eij
Degree Of Freedom (df) =(R-1)(C-1) Where,
Oij = Observed Frequency
Eij = Expected Frequency
R = Number of Rows ;
C = Number of Columns
For the entire Chi–Square test the table value has been taken @ 5% significance level.
6.3.3 DATA ANALYSIS WITH CHI-
SQUARE TEST
TEST: CHI – Square Test is conducted to find out the relationship between the age group and the usage of Social Networking Sites.
HYPOTHESIS
Ho (Null): - There is no significant relationship between the Age Group and the usage of Social
Networking Sites.
Ha (Alternate): - There is significant relationship between the Age Group and the usage of Social Networking Sites.
Calculation for Observed Values
Calculation of Observed Value
Time Spent SN sites Academic Act Total
> 5 6 2 8
4 to 5 4 2 6
3 to 4 8 3 11
2 to 3 8 13 21
< 2 24 30 54
50 50 100
Calculation for Expected Values
Formula for Expected Value = (Row x Column) / (Total)
Calculation of Expected value
Time Spent SN sites Academic Act
> 5 4 4
4 to 5 3 3
3 to 4 5.5 5.5
2 to 3 10.5 10.5
< 2 27 27
Chi - Square Test
Time Spen
t Type Oij Eij
(Oij - Eij)^2
(Oij - Eij)^2/
Eij
> 5 SN 6 4 4 1
Acad 2 4 4 1
4 to 5
SN 4 3 1 0.33
Acad 2 3 1 0.33
3 to 4
SN 8 5.5 6.25 1.14
Acad 3 5.5 6.25 1.14
2 to 3
SN 8 10.5 6.25 0.60
Acad 13 10.5 6.25 0.60
< 2 SN 24 27 9 0.33
Acad 30 27 9 0.33
6.80
Degree Of Freedom
(r-1)*(c-
1) 4
at @ 5%
Significant
Table Value
9.488
Calcula
ted Value
6.797
© 2013, IJOURNALS All Rights Reserved
Page 123
CALCULATIONS
Degree of freedom (df) = (R-1) (C-1) = (3-1) (3-1) = 4 @ 5% significance level
Table value (X2tab) = 9.488
Calculated value (X2cal) = 3.17024
Since the calculated value is less than the table value. So the Null hypothesis is accepted.
6.3.4 CONCLUSION There is no significant relationship between the age group and the usage of Social Networking Sites
REFERENCES [1] Borgatti, S.P. & Foster, P.C. 2003. The network
paradigm in organizational research: A review and typology. Journal of Management, 29: 991-1013
[2] Belson, William (1981). The design and understanding of research questions. Hants, England: Garner Publishing.
[3] Hanneman, Robert A. and Mark Riddle. 2005. Introduction to social network methods. Riverside, CA: University of California, Riverside
[4] Bradburn, N., Sudman, S., & Wansink, B. (2004). Asking questions: the definitive guide to questionnaire design. San Francisco: Jossey-Bass.
[5] Waller J.L., Johnson M. H., (2013), Chi-Square and T-Tests Using SAS®: Performance and Interpretation, Georgia Regents University, Augusta, Georgia, SAS Global Forum 2013, Paper 430-2013
[6] Moore, D. S., 2010. The Basic Practice of Statistics. Fifth edition. W. H. Freeman and Company, New York, NY, USA
[7] RemenyiD., Onofrei G., English J., (2009), An Introduction to Statistics using Microsoft Excel, Academic Publishing Limited, UK
[8] R.A. Fisher and F. Yates, Statistical Tables for Biological Agricultural and Medical Research, 6th ed., Table IV, Oliver & Boyd, Ltd., Edinburgh