tools and tips for analyzing social media data

76
Analyzing Social Media Systems Shelly Farnham, Emre Kiciman FUSE Labs & Internet Services Research Center, Microsoft Research CHI Course 2013

Upload: shelly-d-farnham-phd

Post on 15-Jul-2015

147 views

Category:

Social Media


2 download

TRANSCRIPT

Page 1: Tools and Tips for Analyzing Social Media Data

Analyzing Social Media Systems

Shelly Farnham, Emre Kiciman

FUSE Labs & Internet Services Research Center, Microsoft Research

CHI Course 2013

Page 2: Tools and Tips for Analyzing Social Media Data

Agenda

Introductions Overview Lesson scenarios with real data

Usage analysis: Predictors of coming back Social network analysis: Finding who you like Content analysis: Relationships, cliques, and their conversation

Focus on Tools/Tips, special consideration when examining social data

Page 3: Tools and Tips for Analyzing Social Media Data

MAKING MEANING OUT OF THE MESS

Page 4: Tools and Tips for Analyzing Social Media Data

SHELLY FARNHAM: INDUSTRY RESEARCH

Specialize in social technologies Social networks, community, identity, mobile

social

Early stage innovation Extremely rapid R&D cycle study, brainstorm, design, prototype, deploy,

evaluate (repeat) Convergent evaluation methodologies: usage

analysis, interviews, questionnaires

Career PhD in Social Psychology from UW 7 years Microsoft Research

Virtual Worlds, Social Computing, Community Technologies 4 years startup world Waggle Labs (consulting), Pathable 2 Years Yahoo! FUSE Labs, Microsoft Research

Personal Map

Page 5: Tools and Tips for Analyzing Social Media Data

EMRE KICIMAN Specialize in social data analytics

Social media, social networks, search

Methods Machine learning Information extraction, entity recognition from social

data Prototyping

Career Ph.D. and M.S. in computer science from Stanford

University B.S. in Electrical Engineering and Computer Currently at Internet Services Research Center,

Microsoft Research

Page 6: Tools and Tips for Analyzing Social Media Data

ANALYSIS THROUGHOUT R&D CYCLE

Importance of Information in selecting chat partner

0

1

2

3

4

5

6

7

Rank

Rating

Similarity

Interacts with friends

Ratings by friends

Page 7: Tools and Tips for Analyzing Social Media Data

USER STUDIES

Page 8: Tools and Tips for Analyzing Social Media Data

PROTOTYPING

Page 9: Tools and Tips for Analyzing Social Media Data

USAGE ANALYSISDo social responses matter in driving engagement?

Page 10: Tools and Tips for Analyzing Social Media Data

SOCIAL MEDIA ANALYSIS Common types

Usage analysis: behaviors, interactions Network analysis: patterns in networks (sets of

pair-wise connections) Content analysis: semantics, sentiment of

conversational content

Common steps Step 1. Getting started: defining questions Step 2. Processing data: extraction, cleaning,

summarization Step 3. High level analysis: inference

Page 11: Tools and Tips for Analyzing Social Media Data

CASE STUDY: USAGE ANALYSIS

So.cl is an experimental web site that allows people to connect around their interests by integrating search tools with social networking.

How important are social interactions in encouraging users to become engaged with an interest network?

So.cl usage analysis as case study scenario, lessons learned apply to other forms of social media and other forms of analysis

Page 12: Tools and Tips for Analyzing Social Media Data

search +sharing +networking= informal discovery and learning

SO.CL reimagining search as social from the ground up

History:Oct 2011: Pre-release deployment studyDec 2011: Private, invitation-only betaMay 2012: removed invitation restrictionsNov 2012: over 300K registered users, 13K active per month

Try it now! http://www.so.cl

Page 13: Tools and Tips for Analyzing Social Media Data

INTEREST NETWORKGOALS Find others around

common interests Be inspired by new

interests Learn from each other

through these shared interests

Page 14: Tools and Tips for Analyzing Social Media Data

HOW IT WORKS

FeedFeed

Search & PostSearch & Post

Feed FiltersFeed Filters

PeoplePeople

Try it now! http://www.so.cl – use facsumm tag

Page 15: Tools and Tips for Analyzing Social Media Data

POST BUILDING

ResultsResults

Search (Bing)Search (Bing)

Filter ResultsFilter ResultsPost BuilderPost Builder

Experience:Step 1: Perform searchStep 2: Click on items in results to add to postStep 3: Add a messageStep 4: Tag

Try it now! http://www.so.cl – use facsumm tag

Page 16: Tools and Tips for Analyzing Social Media Data

USAGE ANALYSIS

Page 17: Tools and Tips for Analyzing Social Media Data

STEP 1: GETTING STARTED

Page 18: Tools and Tips for Analyzing Social Media Data

Amount of data overwhelming – the more defined your question, the easier the analysis

What real world problem are you trying to explore?

Avoid pitfall of technology for technology’s sake

What argument do you want to be able to make?

State your problem as a hypothesis

DEFINING RESEARCH QUESTION

Page 19: Tools and Tips for Analyzing Social Media Data

CASE SCENARIO:

Real world problem: Help people learn online

Argument want to make: People are more motivated to explore new interests via social media than via search alone because of the opportunity to connect with others.

Hypothesis: If people receive a social response when they first join So.cl they are more likely to become engaged.

Page 20: Tools and Tips for Analyzing Social Media Data

OPERATIONALIZING CONSTRUCTS

Operationalize = to make measurable Always review related literature for best practices How do you measure…

Friendship? Similarity? Interest? Trend?

Conversation? Community? Engagement?

Can you operationalize with existing data, or do you need to generate more?

Page 21: Tools and Tips for Analyzing Social Media Data

CASE SCENARIO: Hypothesis:

If people receive a social response when they first join So.cl they are more likely to become engaged.

Measuring social/behavioral constructs: When first join

First session = time of first action to time of last action prior to an hour of inactivity

Social responsesFollows user, likes user’s post(s), comments on user’s post(s)

Engagement = coming backA second session = any action occurs 60 minutes or more after first session

Restating hypothesis: If a people receive follows, likes, and comments in their first session they are

more likely to come back for a second session

Page 22: Tools and Tips for Analyzing Social Media Data

STEP 2. PROCESSING DATA

Page 23: Tools and Tips for Analyzing Social Media Data

COLLECTING DATA Existing tools

APIs (Twitter, Foursquare, Yelp) Web analytics (Google Analytics)

Write crawlers Writing your own instrumentation system

e.g. log each call to server, query parameters

Page 24: Tools and Tips for Analyzing Social Media Data

RAW INSTRUMENTATION

Tendency to collect everything

incomprehensible, incoherent mess

Prone towards bugs

Page 25: Tools and Tips for Analyzing Social Media Data
Page 26: Tools and Tips for Analyzing Social Media Data

INSTRUMENTATION Convert to human readable

Page 27: Tools and Tips for Analyzing Social Media Data

Always look at your raw data: play with it,ask yourself if it makes sense, test!

Page 28: Tools and Tips for Analyzing Social Media Data

COMMON INSTRUMENTATION SCHEMA

Users table One row per user

Page 29: Tools and Tips for Analyzing Social Media Data

COMMON INSTRUMENTATION SCHEMA

Actions table One row per meaningful action Filter out non-meaningful, non-user generated actions

Page 30: Tools and Tips for Analyzing Social Media Data

COMMON INSTRUMENTATION SCHEMA

Content table(s): One row per content item, with text, URL, etc. of that item

e.g. messages, pictures shared, likes, tags

Page 31: Tools and Tips for Analyzing Social Media Data

COMMON INSTRUMENTATION SCHEMA

Across tables, with social systems instrument social target (PersonA responds to

PersonB) Instrument parent item (e.g., Comment A, Comment

B, Comment C, responses to parent item PostB)

In other words, instrument who interacting with whom, and in what context

Page 32: Tools and Tips for Analyzing Social Media Data

REDUCING LARGE DATA Filters

Time span, type of person, type of actions

Sampling Random selection Snow balling, so get complete picture of person’s

social experience

Consider your research questions, how you want to generalize

Page 33: Tools and Tips for Analyzing Social Media Data

FILTERING & SAMPLING

Filtered out administrators/community managers

New users only Date range: Sept 28 to Oct 13 100% sample for that time span: 2462

people

Page 34: Tools and Tips for Analyzing Social Media Data

SYSTEMATIC BIASES IN SOCIAL SYSTEMS

If you want to understand your “typical” users, keep in mind generally find: Large percent never become active or

return --“lookiloos” can unduly bias averages

Common reporting format:

X% performed Y behavior, of those averaged Z times each

5% commented on a post their first session, averaging 5 times each

Page 35: Tools and Tips for Analyzing Social Media Data

OUTLIERS Filtered out 13 people outliers z > 4 in number of

actions (if do more than sign in)

Page 36: Tools and Tips for Analyzing Social Media Data

SYSTEMATIC BIASES IN SOCIAL SYSTEMS

A small percent “hyper-active” users: avid, spammers, trolls, administrators, and can unduly bias averages Remove outliers

A substantial percent are consumers but not producers (“lurkers”), often no signal for lurkers

Consult literature, related work for estimates – so.cl, about 75% lurkers

Custom instrumentation, logging sign ins Web analytics for clicks

Page 37: Tools and Tips for Analyzing Social Media Data

PLAYING WITH YOUR DATA Very important to spend time examining data

Descriptives, Frequencies, Correlations, Graphs Use tool that easily generates graphs, correlations Does it make sense? If not, really chase it down. Often

a bug or misinterpretation of data.

Page 38: Tools and Tips for Analyzing Social Media Data

AGGREGATIONS

Aggregation: merging down for summarization What is your level of analysis?

Person, group, network Content types

If person is unit of analysis, aggregate measures to the person level

E.g. in SPSS: One line per person very important to have appropriate unit analysis, to avoid bias in

statistics

Page 39: Tools and Tips for Analyzing Social Media Data

AGGREGATIONS SPSS Syntax:

Page 40: Tools and Tips for Analyzing Social Media Data

DESCRIPTIVES OF ACTIVE SESSIONS Active session = a time of

activity (public), with 60 minute gap of no activity before or after

91% of users

only one active session On average,

34.6 hours apart First session,

1.6 minutes

Page 41: Tools and Tips for Analyzing Social Media Data

AA

DESCRIPTIVES OF ACTIONS

8% created a post there first session, of those averaged 1.5 times each

Actions in First Session

Page 42: Tools and Tips for Analyzing Social Media Data

DESCRIPTIVES OF COMING BACK 9.1% came back

for another active session(~25% including inactive)

On average, 35 hours later

Page 43: Tools and Tips for Analyzing Social Media Data

IN THE FIRST SESSION How often is user the target of social behavior? 23% received some response up to 2nd session

->3% if did not create a post, 37% if did create a post

Response *During* First Session Response *in Between* 1st and 2nd Sessions

Page 44: Tools and Tips for Analyzing Social Media Data

STEP 3. HIGH LEVEL ANALYSIS

Page 45: Tools and Tips for Analyzing Social Media Data

PRELIMINARY CORRELATIONS Always

ask, does this pattern make sense?

Page 46: Tools and Tips for Analyzing Social Media Data

PREDICTORS OF COMING BACK Social responses inspire people to return to

site, especially if occurring during first session

Social responses to user: following, commenting on post, liking post, liking comment, riffing

N = 2273 N = 179 N = 1942 N = 510

Page 47: Tools and Tips for Analyzing Social Media Data

WHICH RESPONSE MATTERS

Logistic Regression, Which Predicts Coming Back

B Sig.

Created post first session .95 .000

Followed .92 .003

Commented On .38 ns

Post Liked .87 .02

Comment Liked -.09 ns

Messaged -.09 ns

Riffed .00 ns

Logistic Regression, Any Response Predicts Coming BackB S.E. Sig.

Created post first session .71 .20 .000Response1: during first session 1.12 .21 .000Response2: after first session .60 .17 .000

Page 48: Tools and Tips for Analyzing Social Media Data

IDENTIFYING SUBGROUPS

Factors about equally predict if user comes back

Factor Analysis for Associated Behaviors:Three types of usage – creating, socializing, browsing

Principle components, varimax rotation [meaning forced to be orthoganol]

Component Matrixa

ComponentType: Creators Socialites Browsers

% Variance: 32% 12% 9%

Created post .86 .17 .10

Invited .01 -.16 .63

Followed -.03 .10 .37

Added item to post .83 .08 -.06

Searched .81 .03 .17

Commented .36 .64 .09

Liked post .15 .58 .32

Liked comment .13 .80 .06

Messaged -.09 .50 -.08

Viewed person .22 .47 .48

Navigated to All .51 .37 .53

Joined party .17 .09 .68

Browsing stronger predictor of overall activity levelRegression Coefficients

Beta t Sig

Creating 0.20 7.89 0.00

Socializing 0.17 6.58 0.00

Browsing 0.29 9.07 0.00

Regression Coefficients

Beta t Sig

Creating .14 5.28 .000

Socializing .07 2.61 .000

Browsing .19 7.20 .000

Page 49: Tools and Tips for Analyzing Social Media Data

NETWORK ANALYSIS

Page 50: Tools and Tips for Analyzing Social Media Data

Case Scenario 2:

illustrating network analysisReal world problem:

help people find and learn from others who share their interests online

Argument want to make: people do not just care about content around their interests, they want to develop friendships with others who share their interests

Hypothesis: People will interact with others more the more common tags they have

Design implication: Recommendations based on common overlapping tags

Page 51: Tools and Tips for Analyzing Social Media Data

PROCESSING NETWORK DATA Common format:

EntityA EntityB measure

EntityB EntityC measure

EntityB EntityD measure

EntityF EntityG measure

Units of analysis:EdgesNodes/verticesClusters, networks

Page 52: Tools and Tips for Analyzing Social Media Data

OPERATIONALIZING CONNECTION

How would you measure… Similar interests?

Friendship? Information flow?

Asymmetrical?

Often some form of co-occurrence

http://www.touchgraph.com/assets/navigator/help2/module_3_3.html

Page 53: Tools and Tips for Analyzing Social Media Data

NORMALIZATION adjusting values

measured on different scales to a notionally common scale

Allow the comparison of corresponding normalized values for different datasets in a way that eliminates the effects of certain gross influence

• Mary has 400 friends• Jim has 200 friends• Bob and 50 friends• Mary and Jim have 100

overlapping friends• Mary and Bob and 50

overlapping friends• How similar are they?• Who’s more similar?

Mary

Jim Bob

Page 54: Tools and Tips for Analyzing Social Media Data

CASE STUDY:

Real world problem: Help people find people like them online

Argument want to make: Interests you share and tag online are good indicator of what you are like

Hypothesis: If people more interested in receiving recommendations of whom to befriend based on overlapping tags than random others in the system

Page 55: Tools and Tips for Analyzing Social Media Data

CONNECTION VIA OVERLAPPING TAGS

Page 56: Tools and Tips for Analyzing Social Media Data
Page 57: Tools and Tips for Analyzing Social Media Data

NETWORK ANALYSIS (NODEXL) Playing with data, learned:

All tagging not a good indicator of what you are like – the tags on your posts are, whether or not you add them

Most common tags not very meaningful, unique overlapping tags are importance of normalization

Page 58: Tools and Tips for Analyzing Social Media Data

CONTENT ANALYSIS

Page 59: Tools and Tips for Analyzing Social Media Data

Douglas Wray - http://instagr.am/p/nm695/ @ThreeShipsMedia

Page 60: Tools and Tips for Analyzing Social Media Data

Outline

What’s in social media? (donuts)

Extracting relationships and their context

Using context with higher-level analyses

Page 61: Tools and Tips for Analyzing Social Media Data

Do people really talk about donuts? 1 week of tweets mentioning “donut” or

“doughnuts” Week of Feb 6-12, 2012. Matched ~180k messages

Train entity tagger for food and for restaurants (no disambiguation or canonicalization)

Let’s see what we find…

Page 62: Tools and Tips for Analyzing Social Media Data

Where do people get donuts?

Page 63: Tools and Tips for Analyzing Social Media Data

What do people drink with donuts?

Page 64: Tools and Tips for Analyzing Social Media Data

What kind of donuts do people eat?

Page 65: Tools and Tips for Analyzing Social Media Data

Beyond donuts… Drugs, diseases, and contagions

Paul and Dredze 2011; Sadilek, Kautz and Silenzio 2012.

Crises, disasters, and wars Starbird et al. 2010; Al-Ani, Mark & Semaan

2010; Monroy-Hernandez et al. 2012

Public Sentiment Political and election indices, market insights

Everyday life

Page 66: Tools and Tips for Analyzing Social Media Data

Relationships in Context

Page 67: Tools and Tips for Analyzing Social Media Data

Stage 1: Feature extraction

“I had fun hiking Tiger Mountain last weekend” – Alice said on Monday, at 10am

Location Tiger Mountain

Mood Happy

Activity Hiking

Name Alice

Gender Female

Post Time Mon 10am

Activity Time {Sat-Sun}

Page 68: Tools and Tips for Analyzing Social Media Data

Stage 2(A) Build a hyper-graph representation

“I had fun hiking Tiger Mountain last weekend” – Alice said on Monday, at 10am

Name: Alice

Location: Tiger

Mountain

Gender: Female

Mood: Happy

Post Time: Mon 10am

Activity Time:

{Sat-Sun}

Activity: Hiking

Page 69: Tools and Tips for Analyzing Social Media Data

Name: Alice

Location: Tiger

Mountain

Gender: Female

Mood: Happy

Post Time: Mon 10am

Activity Time:

{Sat-Sun}

Activity: Hiking

Name: Bob

Gender: Male

Post Time: Fri 3pm

Page 70: Tools and Tips for Analyzing Social Media Data

Location: Tiger

Mountain

Activity: Hiking

• Reduce graph to key domains• Statistical distributions of other domains provide key

context

Stage 2(B) Projection

Page 71: Tools and Tips for Analyzing Social Media Data

Location: Tiger

Mountain

Activity: Hiking

Gender: Female

Gender: Male

Page 72: Tools and Tips for Analyzing Social Media Data

Demo to show example relationships & contexts from several domains

Page 73: Tools and Tips for Analyzing Social Media Data

Using context with high-level analyses

Current Clustering Neighborhood discovery Network centrality

Context of discussion provides

Page 74: Tools and Tips for Analyzing Social Media Data

Demo to show example contexts for pseudo-cliques and network centrality

Page 75: Tools and Tips for Analyzing Social Media Data

CONCLUSIONS Define research questions early to help focus analysis Many special considerations with social media data

Operationalizing social constructs Attention to lookiloos, hyperactives, lurkers who bias outcomes Different types of users = different behaviors Different context meaningfully impact conversation

Processing data = simplification, getting meaningful measures summarized at appropriate level of analysis

Format your data and plug it into appropriate tool to enable you play with your data a *lot* Important for debugging, finding patterns

Great tools available for leveraging social media to describe, predict behaviors

Page 76: Tools and Tips for Analyzing Social Media Data

CONTACT INFOShelly Farnham, Researcher

Emre Kiciman, Researcher

(@shellyshelly; [email protected])

([email protected])

QUESTIONS