cse509 lecture 6

36
Arjumand Younus Web Science Research Group Institute of Business Administration (IBA) CSE509: Introduction to Web Science and Technology Lecture 6: Social Information Retrieval

Category:

Technology


0 download

DESCRIPTION

Lecture 6 of CSE509:Web Science and Technology Summer Course

TRANSCRIPT

Page 1: CSE509 Lecture 6

Arjumand Younus

Web Science Research Group

Institute of Business Administration (IBA)

CSE509: Introduction to Web Science

and Technology

Lecture 6: Social Information Re-trieval

Page 2: CSE509 Lecture 6

2

Last Time…

Transition from Web 1.0 to Web 2.0 Social Media Characteristics

Part I: Theoretical Aspects Social Networks as a Graph Properties of Social Networks

Part II: Getting Hands-On Experience on Social Media Analytics Twitter Data Hacks

Part III: Example Researches

August 13, 2011

Page 3: CSE509 Lecture 6

3

Today

Role of Today’s Web: Changing the way Information Needs are Satisfied

Social Search

Research Case by Microsoft Research: What do People Ask their Social Networks

Techniques for Influence Analysis in Social Networks

August 13, 2011

Page 4: CSE509 Lecture 6

4

Role of Today’s Web

August 13, 2011

Marketing Tool

Information Finding Tool

Media Tool

Page 5: CSE509 Lecture 6

5

New Dimensions in Search with The Social Web

Information Overload Search engines don’t always hold answers that users are looking for

Smart Search (CNN Money) “The Web, they say, is leaving the era of search and entering one of

discovery. What’s the difference? Search is what you do when you’re looking for something. Discovery is when something wonderful that you didn’t know existed, or didn’t know how to ask for, finds you.”

August 13, 2011

What does that mean for search engines? Will they be left behind?

Page 6: CSE509 Lecture 6

6

Role of Today’s Web

August 13, 2011

Marketing Tool

Information Finding Tool

Media Tool

Page 7: CSE509 Lecture 6

7

Social Search

Takes into account the “social graph” of the person initiating the query

Search activity in which users pose a question to their social networks

Search systems using statistical analytics over traces left behind by others Conducting a search over an existing database of content previously

provided by other users such as searching over the collection of public Twitter posts or searching through an archive of questions and answers

August 13, 2011

Page 8: CSE509 Lecture 6

8

Social Search Benefits

Reduced impact of link spam by lesser reliance on link structure of Web pages

Increased relevance due to each result being selected by users

Web pages relevance judged from reader’s perspective rather than author’s perspective

More current results through constant feedback

August 13, 2011

Improvements achieved by social search have not been quantified so far

Page 9: CSE509 Lecture 6

What Do People Ask Social Networks?

Meredith Ringel Morris, MSR

Jaime Teevan, MSR

Katrina Panovich, MIT

August 13, 2011

Page 10: CSE509 Lecture 6

10

Questions about People’s Questions

What questions do people ask? How are the questions phrased? What are the question types and topics? Who asks which questions and why?

Which questions get answered? How is answer speed and utility perceived? What are people’s motivations for answering?

August 13, 2011

Page 11: CSE509 Lecture 6

11

Survey of Asking via Status Messages

Survey content Used a status message to ask a question?

Frequency of asking, question type, responses received Provide an example

Answered a status message question? Why or why not? Provide an example

624 participants Focus on Facebook and Twitter behavior

August 13, 2011

Page 12: CSE509 Lecture 6

12

Questions: Types

Type % Example

Recommendation 29% Building a new playlist – any ideas for

good running songs?

Opinion 22% I am wondering if I should buy the Kitchen-Aid ice cream maker?

Factual 17% Anyone know a way to put Excel charts into LaTeX?

Rhetorical 14% Why are men so stupid?

Invitation 9% Who wants to go to Navya Lounge this evening?

Favor 4% Need a babysitter in a big way tonight… anyone??

Social connection 3% I am hiring in my team. Do you know

anyone who would be interested?

Offer 1% Could any of my friends use boys size 4 jeans?

August 13, 2011

Page 13: CSE509 Lecture 6

13

Questions: Topics

Topic % Example

Technology 29% Anyone know if WOW works on Windows 7?

Entertainment 17% Was seeing Up in the theater worth the

money?

Home & Family 12% So what’s the going rate for the tooth

fairy?

Professional 11% Which university is better for Masters? Cornell or Georgia Tech?

Places 8%Planning a trip to Whistler in the off-season. Recommendation on sites to see?

Restaurants 6% Hanging in Ballard tonight. Dinner recs?

Current events 5% What is your opinion on the recent

proposition that was passed in California?

Shopping 5% What’s a good Mother’s Day gift?

Philosophy 2% What would you do if you had a week to live?

Missing: Health, Religion

Politics, Dating, and Finance

August 13, 2011

Page 14: CSE509 Lecture 6

14

Questions: Who Asks What

August 13, 2011

Type

Recommendation

Opinion

Factual

Rhetorical

Invitation

Favor

Social connection

Offer

Topic

Technology

Entertainment

Home & Family

Professional

Places

Restaurants

Current events

Shopping

Philosophy

men

women

old

young

Twitter

Facebook

Page 15: CSE509 Lecture 6

15

Questions: Motives for Asking

Topic % Example

Trust 24.8%

I trust my friends more than I trust strangers.

Subjective

21.5%

Search engine can provide data but not an opinion.

Thinks search would fail

15.2%

I’m pretty search engine couldn’t answer a question of that nature.

Audience 14.9% Friends with kids, first hand real experience.

Connect 12.4%

I wanted my friends to know I was asking the question.

Speed 6.6% Quick response time, no formalities.

Context 5.4% Friends know my tastes.

Tried search 5.4% I tried searching and didn’t get good results.

Easy 5.4% Didn’t want to look through multiple search results.

Quality 4.1% Human-vetted responses.

August 13, 2011

Page 16: CSE509 Lecture 6

16

Questions About People’s Questions

What questions do people ask? How are the questions phrased? What are the question types and topics? Who asks which questions and why?

Which questions get answered? How is answer speed and utility perceived? What are people’s motivations for answering?

August 13, 2011

Page 17: CSE509 Lecture 6

17

Answers: Speed and Utility

94% of questions received an answer Answer speed

A quarter in 30 minutes, almost all in a day People expected faster, but satisfied with speed Shorter questions got more useful responses

Answer utility 69% of responses helpful

August 13, 2011

Page 18: CSE509 Lecture 6

Database and Multimedia Lab 18

Answers: Speed and Utility

Type

Recommendation

Opinion

Factual

Rhetorical

Invitation

Favor

Social connection

Offer

Topic

Technology

Entertainment

Home & Family

Professional

Places

Restaurants

Current events

Shopping

Philosophy

Fast

UnhelpfulNo

correlation

August 13, 2011

Page 19: CSE509 Lecture 6

19

Answers: Motives for Answering

Motive % Example

Altruism 37.0 Just trying to be helpful.

Expertise 31.9 If I’m an expert in the area.

Question 15.4 Interest in the topic.

Relationship 13.7 If I know and like the person.

Connect 13.5 Keeps my network alive.

Free time 12.3 Boredome/procrastination.

Social capital 10.5 I will get help when I need it myself.

Obligation 5.4 A tit-for-tat.

Humor 3.7 Thinking I might have a witty response.

Ego 3.4 Wish to seem knowledgeable.

Motives for Not

Answering

- Don’t know the answer

- Private topic

- Question im

personal

August 13, 2011

Page 20: CSE509 Lecture 6

20

Answers About People’s Questions

The questions people ask Short, directed to “anyone” Subjective questions on acceptable topics Social relationships important motivators

The questions that get answered Fast, helpful responses, related to length and type Answers motivated by altruism and expertise

August 13, 2011

Page 21: CSE509 Lecture 6

21

Enhancing Search using Social Network Features

Recency Crawling and Ranking Identification of Hot Topics on Social Web [YQG+11]

News in the Making Trend analysis Event detection

Real-Time Search Information Diffusion and Influence Analysis Community Detection Opinion Mining

August 13, 2011

Page 22: CSE509 Lecture 6

22August 13, 2011

Nodes, Ties and Influence

Page 23: CSE509 Lecture 6

23

Importance of Nodes

Not all nodes are equally important

Centrality Analysis Find out the most important nodes in one network

Commonly-used Measures Degree Centrality Closeness Centrality Betweenness Centrality Eigenvector Centrality

August 13, 2011

Page 24: CSE509 Lecture 6

24

Degree Centrality

The importance of a node is determined by the number of nodes adjacent to it The larger the degree, the more import the node is Only a small number of nodes have high degrees in many real-life

networks

Degree Centrality

Normalized Degree Centrality:

For node 1, degree centrality is 3;Normalized degree centrality is

3/(9-1)=3/8.

August 13, 2011

Page 25: CSE509 Lecture 6

25

Closeness Centrality

“Central” nodes are important, as they can reach the whole network more quickly than non-central nodes

Importance measured by how close a node is to other nodes

Average Distance

Closeness Centrality

August 13, 2011

Page 26: CSE509 Lecture 6

26

Closeness Centrality Example

Node 4 is more central than node 3August 13, 2011

Page 27: CSE509 Lecture 6

27

Betweenness Centrality

Node betweenness counts the number of shortest paths that pass one node

Nodes with high betweenness are important in communication and information diffusion

Betweenness Centrality

The number of shortest paths between s and t

The number of shortest paths between s and t that pass vi

27August 13, 2011

Page 28: CSE509 Lecture 6

28

Betweenness Centrality Example

The number of shortest paths between s and t

The number of shortest paths between s and t that pass vi

August 13, 2011

Page 29: CSE509 Lecture 6

29

Eigenvector Centrality

One’s importance is determined by his friends’ If one has many important friends, he should be important as

well.

The centrality corresponds to the top eigenvector of the adjacency matrix A.

A variant of this eigenvector centrality is the PageRank score.

August 13, 2011

Page 30: CSE509 Lecture 6

30

Weak and Strong Ties

In practice, connections are not of the same strength

Interpersonal social networks are composed of strong ties (close friends) and weak ties (acquaintances)

Strong ties and weak ties play different roles for community formation and information diffusion

Strength of Weak Ties (Granovetter, 1973) Occasional encounters with distant acquaintances can provide important

information about new opportunities for job search

August 13, 2011

Page 31: CSE509 Lecture 6

31

Connections in Social Media

• Social Media allows users to connect to each other more easily than ever One user might have thousands of friends online Who are the most important ones among your 300 Facebook friends?

• Imperative to estimate the strengths of ties for advanced analysis Analyze network topology Learn from User Profiles and Attributes

August 13, 2011

Page 32: CSE509 Lecture 6

32

Learning from Network Topology

Bridges connecting two different communities are weak ties

An edge is a bridge if its removal results in disconnection of its terminal nodes

e(2,5) is a bridge e(2,5) is NOT a bridge

August 13, 2011

Page 33: CSE509 Lecture 6

33

“shortcut” Bridge

Bridges are rare in real-life networks Alternatively, one can relax the definition by checking if the

distance between two terminal nodes increases if the edge is removed

The larger the distance, the weaker the tie is

d(2,5) = 4 if e(2,5) is removed d(5,6) = 2 if e(5,6) is removed e(5,6) is a stronger tie than e(2,5)

August 13, 2011

Page 34: CSE509 Lecture 6

34

Neighborhood Overlap

Tie Strength can be measured based on neighborhood overlap; the larger the overlap, the stronger the tie is

-2 in the denominator is to exclude vi and vj

August 13, 2011

Page 35: CSE509 Lecture 6

35

Neighborhood Overlap

Tie Strength can be measured based on neighborhood overlap; the larger the overlap, the stronger the tie is

-2 in the denominator is to exclude vi and vj

August 13, 2011

Page 36: CSE509 Lecture 6

36

Learning from Profiles and Interactions

Twitter: one can follow others without followee’s confirmation The real friendship network is determined by the frequency two users

talk to each other, rather than the follower-followee network The real friendship network is more influential in driving Twitter usage

Strengths of ties can be predicted accurately based on various information from Facebook Friend-initiated posts, message exchanged in wall post, number of

mutual friends, etc.

Learning numeric link strength by maximum likelihood estimation User profile similarity determines the strength Link strength in turn determines user interaction Maximize the likelihood based on observed profiles and interactions

36August 13, 2011