building rich social network data

1
Building Rich Social Network Data A schema to aid designing, collecting and evaluating social network data Eamonn O’Loughlin ([email protected]) Big Data = Different Challenges Our Approach What is a Schema? A Social Network Data Schema Why use a Schema? Privacy Concerns Data Collection is Expensive Many design decisions Different Practitioners Difficult to sample network data Pervasive sensor technology Reduced cost of data storage Increase in ability to analyse data “A schema allows us to represent in a particular way the structure and features of a particular object” In this research, our aim is to ‘fill in the gaps’ by creating a standard way to describe the type and features of social network data During the past number of decades the use techniques from of Social Network Analysis (SNA) have become significantly more pervasive among sociologists, statisticians and computer scientists. In additon, during this time the size, scope and complexity of analysed network data have grown substantially. This growth has in part been driven by technological advances (and society’s response to those advances) that have resulted in a reduction in the cost associated with collecting and analysing information about social networks. Compared to more traditional multi-dimensional data (including time series, panel and cross-sectional data), there are now a significantly larger number of methodological and design decisions that must be considered when creating a social network dataset. Furthermore, these decisions must be taken with care, as the features of a dataset determines whether or not it is suitable for particular types of analysis. Because these design decisions are more fundamental than simple implementation details (e.g. what data structure to use), they can easily be overlooked. In this paper we propose a standard schema for social network data. A standard schema is a mechanism that allows us to define the structure, content, and to some extent, the semantics of a dataset. Our proposed schema defines the most common features that social network datasets may have in a consistent way, allowing for the structure, content and scope of the social network data to be easily documented and communicated. This work was based upon an analysis of over 150 social network datasets, prepared by the dynamics lab at University College Dublin. This repository of datasets has been made public, and is available on the Dynamics Lab website at http://dl.ucd.ie Our Motivation Review the structure, size and features of over 100 publically available social network datasets Create functional groups of key features Outline a schema describing the features of social network datasets Discuss different components of schema and role in the possible types of subsequent analysis Outline how schema assists in designing and implementing social network data creation / collection strategy. Sourcing Data for Social Network Analysis Research Op#on 1: Collect the data through direct observa<on or survey Op#on 2: Retrieve the data by taking a subset of data from an exis<ng electronic system Op#on 3: Assess appropriatness / quality of an exis<ng social network dataset A schema is useful in the early stages of research when an approach is known or a hypothesis is under consideration. At this point, a schema will help in designing or locating appropriate data that can be used to test the hypothesis. In general, there are three distinct ways to access data – and how our schema would help with each is summarised below. The schema (summarised below) covers all of the types of features that a network dataset may contain. This allows the researcher to describe (or assess) in detail the scope, assumptions, and characteristics of their data. Serves as a useful ‘checklist’ prior to commencement of data collection Helps researchers identify potential additional avenues of research post initial analysis (i.e. aid in creation of datasets that support multiple analyses) Supports communication with data owner to increase quality of retrieved data Enables prioritisation desirable data features where constraints prevent all being met Helps in identifying appropriate or desirable publically available datasets Supports cross-teaming across academic disciplines (where data is required for different purposes)

Upload: eamonn-oloughlin

Post on 13-Jul-2015

90 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building Rich Social Network Data

Building Rich Social Network Data A schema to aid designing, collecting and evaluating

social network data!Eamonn O’Loughlin ([email protected])!

Big Data = Different Challenges"

Our Approach"

What is a Schema?"

A Social Network Data Schema"

Why use a Schema?"

Privacy Concerns!

Data Collection is Expensive! Many

design decisions!

Different Practitioners!

Difficult to sample

network data!

Pervasive sensor

technology!

Reduced cost of data

storage!

Increase in ability to

analyse data!

“A schema allows us to represent in a particular way the structure and features of a particular object”!

!In this research, our aim is to ‘fill in the gaps’ by creating a standard way

to describe the type and features of social network data!

During the past number of decades the use techniques from of Social Network Analysis (SNA) have become significantly more pervasive among sociologists, statisticians and computer scientists. In additon, during this time the size, scope and complexity of analysed network data have grown substantially.!!This growth has in part been driven by technological advances (and society’s response to those advances) that have resulted in a reduction in the cost associated with collecting and analysing information about social networks.!!Compared to more traditional multi-dimensional data (including time series, panel and cross-sectional data), there are now a significantly larger number of methodological and design decisions that must be considered when creating a social network dataset. Furthermore, these decisions must be taken with care, as the features of a dataset determines whether or not it is suitable for particular types of analysis. Because these design decisions are more fundamental than simple implementation details (e.g. what data structure to use), they can easily be overlooked.!!In this paper we propose a standard schema for social network data. A standard schema is a mechanism that allows us to define the structure, content, and to some extent, the semantics of a dataset. Our proposed schema defines the most common features that social network datasets may have in a consistent way, allowing for the structure, content and scope of the social network data to be easily documented and communicated. !!This work was based upon an analysis of over 150 social network datasets, prepared by the dynamics lab at University College Dublin. This repository of datasets has been made public, and is available on the Dynamics Lab website at http://dl.ucd.ie!

Our Motivation"

Ø Review the structure, size and features of over 100 publically available social network datasets!

Ø Create functional groups of key features!Ø Outline a schema describing the features of social network datasets!Ø Discuss different components of schema and role in the possible types of

subsequent analysis!Ø Outline how schema assists in designing and implementing social

network data creation / collection strategy.!

Sourcing  Data  for  Social  Network  Analysis  Research  

Op#on  1:  Collect  the  data  through  direct  observa<on  or  

survey  

Op#on  2:  Retrieve  the  data  by  taking  a  subset  of  data  

from  an  exis<ng  electronic  system  

Op#on  3:  Assess  appropriatness  /  quality  of  an  exis<ng  social  network  dataset  

A schema is useful in the early stages of research when an approach is known or a hypothesis is under consideration. At this point, a schema will help in designing or locating appropriate data that can be used to test the hypothesis. In general, there are three distinct ways to access data – and how our schema would help with each is summarised below.!

The schema (summarised below) covers all of the types of features that a network dataset may contain. This allows the researcher to describe (or assess) in detail the scope, assumptions, and characteristics of their data.!

Serves as a useful ‘checklist’ prior to commencement of data collection!

Helps researchers identify potential additional avenues of research post initial analysis (i.e. aid in creation of datasets that support multiple analyses)!

!!Supports communication with data owner to increase quality of retrieved data!

Enables prioritisation desirable data features where constraints prevent all being met!

Helps in identifying appropriate or desirable publically available datasets!

Supports cross-teaming across academic disciplines (where data is required for different purposes) !