sylva workshop.gt that camp.2012

Post on 11-May-2015

727 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Social Network Analysis with SylvaSocial Network Analysis with Sylva

Juan Luis Suárez & Anabel Quan-HaaseWestern University

Overview of Workshop• General overview of the social network

approach• Key terminology• Uniqueness of collecting and analyzing

social network data• Entering data into Sylva• Importing/exporting data into Sylva• Example I:• Example II:• Understanding limitations and problems• Future Work and Gephi.org

What is SNA?Social network analysis is focused on uncovering the patterning of people’s interaction.…Network analysts believe that how an individual lives depends in large part on how that individual is tied into the larger web of social connections. Many believe, moreover, that the success or failure of societies and organizations often depends on the patterning of their internal structure (Freeman, 1998, November 11).

What is Unique about SNA?

Social science research and theory tends to focus on social actors’:

•attributes•attitudes •opinions•behavior

Focus is on individual level of analysis, less on network-structural level.

a whole is not simply the sum of its parts

Key Terminology

• 1. Social structure• 2. Social network• 3. Nodes• 4. Linkages/relations• 5. Additional terms of relevance:

– Nodes & edges– Directed graphs vs. undirected graphs– Ego– Alter– Homophily

1. Social Structure

• Sociological inquiry consists of understanding the constraining influence of social structure on social action

• BUT; how do we study social structure?

Attributes Networks

Figure 2: Social Structure as Social Network

Social ActorsTies

2. Social Network

3. Nodes

• The actors considered in a social network are exclusively social (alternatively referred to as agents, nodes, or social entities).

• These include individuals, organizations, institutions, nations, or groups (Wasserman & Faust, 1994).

Blurred Nodes

• Social actors can therefore be distinguished from non-social actors – e.g., neurons comprising a neural network.

• On occasion, the distinction between a social and a non-social actor is not absolute. For example, computer networks represent a hybrid type of network.

Node Attributes

• Every single node can have one or more attributes.

• These attributes describe the nodes and allow researchers to conduct complex queries of the database.

• Node attributes can include the time of publication of a book, its length, the number of authors, etc.

One-mode vs. Two-mode• Most social network analysis methods allow only one type of

social actor (for instance, individuals or corporations) in their analysis; these are referred to as one-mode networks (Wasserman & Faust, 1994).

• However, methods exist which allow two different types of social actors in their analysis; these are referred to as two-mode networks. For instance, a study may simultaneously analyze corporations and their directors.

• Two-mode networks may also include social actors from distinct networks, for example, a network comprised of adults and a network comprised of children.

• Two-mode networks allow for comparison between different types and sets of social actors.

4. Relationships

• Ties are links that connect social actors, and are the main focus of social network analysis. Ties are seen as “channels for transfer or “flow” of resources (either material or nonmaterial)” (Wasserman & Faust, 1994, p. 4).

Simple Relationships

• Naturally occurring ties among social actors are inherently complex and consist of numerous different interaction activities.

• However, unlike ethnographers network analysts do not focus on the complexity of interactions among individuals (Burt, 1983).

• Instead, social network analysts focus more on the pattern of relations amongst individuals and to do so simplify the inherent complexity of social relationships by categorizing interactions into different broad types. The types can be manifold. For example, a pair of social actors may have friendship, working, cooperation, or citation ties.

5. Additional Terms

• Directed graphs vs. undirected graphs• Ego• Alter• Homophily

Types of Network Analysis

• Ego-centered/Socio-centered Social Networks• Community-centered social networks

Ego-centered/Socio-centered Social Networks

Actor-Level Centrality

• Actor level degree centrality: Degree centrality measures the extent to which an actor is linked to all of the other actors in the network. Three different measures can be distinguished: nodal degree, indegree, and outdegree.

• Actor level closeness centrality: Closeness measures the distance that an actor has to all of the other actors in the network.

• Actor level betweenness centrality: Betweenness measures the extent to which an actor lies between two other actors and thus facilitates/controls the flow of information.

9

Face-to-face (1/week) CS

Community-Centered Social Networks

Network Level Centralization

• Cohesion Distance: measures the degree of separation between actors in a network. It indicates how many other people are between two actors - that is, actors between an actor and the actor this person needs to talk to.

• Network Centralization: measures the number of actors that are connected to each actor in the network. The more connections among actors, the greater the network centrality.

• Density: measures the degree of connection that exists in a network. The more actors talk to each other, the higher the density.

Measures of Centrality and AssumptionsMeasure Level Data Type Symmetry/Asymmetry

Nodal Degree Centrality Actor Dichotomized (>5) Symmetric (Maximum)

Indegree Centrality Actor Valued Asymmetric

Outdegree Centrality Actor Valued Asymmetric

Closeness Centrality Actor Dichotomized (>5) Symmetric (Maximum)

Betweenness Centrality Actor Dichotomized (>5) Symmetric (Maximum)

Network Cohesion Network Valued Asymmetric

Network Centrality Network Dichotomized (>5) Asymmetric

Network Density Network Dichotomized (>5) Symmetric (Maximum)

Uniqueness of Collecting and AnalyzingSocial Network Data

• Relational data• Boundary specification and sampling• Interdependence of data points• Query search• Complexity of data collection

– Manually-harvested– Data set– Behavioral– Self-report

25

Internet Resources ofSocial Network Analysis

• Center for the Study of Group Processeshttp://lime.weeg.uiowa.edu/~grpproc/

• INSNA International Network of Social Network Analysishttp://www.heinz.cmu.edu/project/INSNA/

• Barry Wellman’s Homepagehttp://www.chass.utoronto.ca/~wellman/index.html

• CulturePlex• http://cultureplex.ca/• Gephi.org• NodeXL

http://nodexl.codeplex.com/

27

Limitations of Social Network Analysis

• Boundary specification

• Data source

• Definition of social actors

• No distinct method

What is Sylva?

• A database system management system• Graph databases• NoSQL database• Built on top of Neo4J

Whose Needs Does Sylva Serve?• Sylva requires no programming skills• On-the-go modification of the schema• Storing data in a graph form• Work from the nodes or from the edges• Collaborative platform• Easy-to-use interface thanks to forms,

autocomplete, …• Multiple visualizations• Search and Query Engines

The Interface

The Dashboard

Creating a Database (Graph)

Schema vs Data

My First Schema

Creating a Schema on Sylva (manually)

• New Type of Node (person)• (2nd) New Type of Node (work)• Relation

– Incoming or outgoing– Allowed relationships

• (3rd) New Type of Node (institution)

Properties of Objects

• Data objects have properties• A property is an attribute that defines certain

operations than can be performed on the object

• We need properties to enter our data

Properties of “Person”

Properties of “Person”

Entering Data (manually)

My First Graph

The Node Level: Selecting and Expanding

Collaboration in Sylva

Case of Collaboration

Searching

• Returns a list

Importing and Exporting

• Importing a Schema• Exporting Data to Gephi

Cuba’s Prominence: Modeling The Latin American Afro in Topic Maps

• Objectives:– locating the various nodes of bibliographic

production associated with the generation of an image of the Latin-American Afro

– evaluating the causes that make certain nodes, i.e., Cuba and various Cuban intellectuals, emerge as key nodes in the network of production of Afro-Latin American images

Cuba’s Prominence

• Methodology: – a combination of traditional close-reading of texts

(extraction of nodes and relations) with– graph analysis of the emerging network with Page

Rank algorithm

Measurements (Gephi)• Closeness centrality: expresses how well connected an individual is to the whole

network. A high value in this measurement indicates better connectivity and thus expresses the importance of the individual with respect to other elements in the network.

• Betweenness centrality: indicates how important the individual is as a connection and transference point within the network. A high value indicates that it is a topic that is passed through in the communications (relationships) between the other topics on the map.

• Modularity: is a coefficient that enables us to group together those nodes which share connections and zones on the network, so that it divides the map into zones with high relationships between them.

• Influence between nodes: is an analysis which we shall carry out in the second part of the article. It is based on the Page Ranking algorithm. This is basic algorithm on which the Google search engine was originally based for calculating the importance of the pages that it comes up with after a search, and which it used to order the results. Its basic idea is that a given node within a network becomes important based on the importance of the nodes that relate with it or that point to it.

Betweennes Centrality

Modularity

Some numerical results

Sustaining a Global Community• Henrich et al. [1] have proven that the existence of norms that sustain

fairness in exchanges among strangers are connected with the diffusion of institutions such as market integration and the participation in world religions.

• Their research confirms the hypothesis that modern world religion may have contributed to the sustainability of large- scale societies and large-scale interactions and we propose that art is another institution that contributes to the arising and sustainability of large-scale societies.

• We use the case of the formation of an artistic network of paintings, schools, themes, genres, and artists whose development goes along with the expansion and colonization of the Hispanic Monarchy across America to show that this artistic network has a presence in all political territories encompassing most ethnicities and religions of indigenous origin.

Methodology• The data set comprising the paintings from the Baroque period are

organized and stored in a PostgreSQL web based database. • The data includes more than 100,000 total topics (11,443 of them

are artworks). A distinctive feature of the information is that it is organized around both text fields and ad-hoc descriptors that follow the model of a formal ontology.

• For our study we have decided to model the data in one of the possible networks, a network created from common descriptors as weighted edges and artworks as nodes.

• Some pruning methods had to be applied in order to overcome some of the shortcomings resulting from the millions of edges and the too many relational joins. We also split the dataset in 12 sections, each covering a 25 year-period, from 1550 to 1850 [4].

Methodology• Similarity Measure:

– S(Art1,Art2)=#{common descriptors of Art1 and Art2}

Artwork 1

Artwork 2

Descriptor 1

Descriptor 3

Descriptor 5

Descriptor 4

Descriptor 2

Descriptor 7

Descriptor 6

S=2

Research Questions

• Our research addresses the issue of the sustainability of communities through the existence of a flow of shared information.

• This question is of the utmost importance to understand the formation and dynamics of cultural groups and cultural areas.

• As important as the latter is the study of the spatial and temporal dimensions of any given political and cultural community as this will shed light on the cultural processes resulting from previous and currents waves of globalization

Baroque Paintings in the Hispanic World: A Network.

• The graph shows, for the first two periods of our study, the growth of the saints-related paintings (red cluster) as compared to the decrease of the cluster with virgins (blue). Portraits’ size (brown cluster) remains more or less the same, but they get more connected to saints’.

• FOTO

1550-1575 1575-1600 1600-1625 1625-1650 1650-1675

1750-17751700-1725 1725-1750 1775-1800 1800-1825

v v1675-1700

v1825-1850

v

v

v

vvvv

v

Clustering & Visualizations: Raw Graphs

http://zoom.it/vJVw#full

Further Work with Sylva

• Visualization of Schema• Two Visualizations of Data:

– Node-centered– Community centered

• Query System:– Pattern-matching– Traversals

• Need for multi-disciplinary teams• Complexity of analysis

Thank you!“With enough effort and perseverance:

Anything is possible”

top related