predicting communication intention in social media

42
Predicting Communication Intention in (Enterprise) Social Networks Charalampos “Harris” Chelmis Computer Science, University of Southern California Thanks to: Viktor K. Prasanna, Ming Hsieh Department of Electrical Engineering, USC Vikram Sorathia, Co-founder & CEO at Kensemble Tech Labs LLP All audio is muted. If you dialed in, you MUST enter your audio pin to be able to ask questions! We recommend that you keep your phone muted, and unmute yourself when you need to ask questions. You can view the upcoming seminar schedule at www.milibo.com/talent/events.aspx

Upload: charalampos-chelmis

Post on 23-Jun-2015

218 views

Category:

Technology


0 download

DESCRIPTION

In social networks, where users send messages to each other, the issue of what triggers communication between unrelated users arises: does communication between previously unrelated users depend on friend-of-a-friend type of relationships, common interests, or other factors? In this work, we study the problem of predicting directed communication intention between two users. Link prediction is similar to communication intention in that it uses network structure for prediction. However, these two problems exhibit fundamental differences that originate from their focus. Link prediction uses evidence to predict network structure evolution, whereas our focal point is directed communication initiation between users who are previously not structurally connected. To address this problem, we employ topological evidence in conjunction to transactional information in order to predict communication intention. It is not intuitive whether methods that work well for link prediction would work well in this case. In fact, we show in this work that network or content evidence, when considered separately, are not sufficiently accurate predictors. Our novel approach, which jointly considers local structural properties of users in a social network, in conjunction with their generated content, captures numerous interactions, direct and indirect, social and contextual, which have up to date been considered independently. We performed an empirical study to evaluate our method using an extracted network of directed @-messages sent between users of a corporate microblogging service, which resembles Twitter. We find that our method outperforms state of the art techniques for link prediction. Our findings have implications for a wide range of social web applications, such as contextual expert recommendation for Q&A, new friendship relationships creation, and targeted content delivery.

TRANSCRIPT

Page 1: Predicting Communication Intention in Social Media

Predicting Communication Intention

in (Enterprise) Social Networks

Charalampos “Harris” Chelmis

Computer Science, University of Southern California

Thanks to: Viktor K. Prasanna, Ming Hsieh Department of Electrical Engineering, USC

Vikram Sorathia, Co-founder & CEO at Kensemble Tech Labs LLP

•All audio is muted.

•If you dialed in, you MUST enter your audio pin to be able to ask questions!

•We recommend that you keep your phone muted, and unmute yourself when you need to ask questions.

•You can view the upcoming seminar schedule at www.milibo.com/talent/events.aspx

Page 2: Predicting Communication Intention in Social Media

Social Networks are Everywhere

2

• , ,

• Movie Networks

• Affiliation/co-authorship networks

• Professional networks

• Friendship networks

• Information networks

• Organizational Networks

• Q&A websites

• Even networks

Page 3: Predicting Communication Intention in Social Media

• Multiple applications

Targeted marketing

Personalization

− Content delivery

Recommendation

− People to connect, items to buy, movies to watch

Law enforcement

− Fraud detection

− Guilt by association

Epidemiology

Information dissemination/propagation

• Users interact with one another and content they create and

consume

Rich interactions

− Friendships based on similarity

− Following based on interest

Noisy

Social Network Analysis

3

Page 4: Predicting Communication Intention in Social Media

• Collaboration Enabling Technologies

Multiple communication channels

Spread of timely and relevant information

Search for data and experts

Collaboration Technologies at the Workplace

4

Page 5: Predicting Communication Intention in Social Media

• Main focus on business perspective

Less noisy than online social networks

Q&A

Problem solving

Information seeking

• But also

Assist in breaking barriers

Team building

Knowledge propagation

• Opportunities

Expert identification

− Experts vs. Influencers

Information Flow

Trends

− Technology adoption

− Company focus

Collaboration at the Workplace

5

Page 6: Predicting Communication Intention in Social Media

• More Opportunities

Collective Knowledge

− Generation

− Sharing

Collaborative Knowledge Management

− How do employees work together to complete tasks?

− How does innovation happen?

− Best practices

• Difficulties

Informal interactions

Heterogeneous, unstructured data

How to formally model knowledge?

Collaboration at the Workplace

6

Page 7: Predicting Communication Intention in Social Media

• Descriptive Modeling

Social network analysis

• Predictive modeling

Link prediction

Attribute prediction

• Typically networked data are represented as graphs

Nodes (e.g., users)

Edges

− Social relations

− Interactions

− Information flow

− Similarity

Weight

− Communication frequency

− Communication cost (e.g., distance)

− Reciprocity

− Type of interaction (e.g., family member, friend, or officemate)

Networked Data Modeling

7

Page 8: Predicting Communication Intention in Social Media

• Heterogeneous object and link types

• Both nodes and edges may carry attributes

• Attribute dependencies

Correlation between attribute values and link structure

− e.g. link prediction based on auxiliary information

Correlation among attributes of related nodes

− e.g. collaborative filtering

• Node dependencies

e.g. groups/communities

• Partial observations

e.g. labels

But Networked Data are Very Different than Graphs

8

Page 9: Predicting Communication Intention in Social Media

• Big Data

Billions of users

Billions of connections

Billions of “documents”

• Temporality

Affiliations

Interests

Friendships

• Context

Spatial

Temporal

Topical

• Content multimodality

Text

Multimedia

Networked Data ≠ Graphs

9

Page 10: Predicting Communication Intention in Social Media

• Edges are more than links

Type

− e.g. like vs. comment vs. share

Trust

Sentiment

Strength

Time

Number

• Edges “reveal” something about the relation between nodes

Prior “interaction” to compute similarity

Networked Data ≠ Graphs

10

Page 11: Predicting Communication Intention in Social Media

Networked Data ≠ Graphs

11

Page 12: Predicting Communication Intention in Social Media

• Integrated informal communication

• Context sensitive

• Temporal

• External Sources

• Analysis of implicit relations

Holistic Modeling of Complex Networks

12

Multiple collaborative platforms

Multimodal, heterogeneous content

from various sources

Meta-information about content

- Social Algebraic Operations

- Complex mining and analysis

- Correlation of different domains

- Temporal, semantic analysis

con

text

time

con

ten

t

connection

Page 13: Predicting Communication Intention in Social Media

• Directed communication graph G = (V,E)

Node u represents a user

Edge e = (u,v) exists iff user u has sent at least one message to user v

• Input

G0 = (V0,E0), subgraph of G consisting of all nodes in G and a subset of

edges in G

• Output

Ranked list L of edges, not present in G0, such that

Predicting Intention of Communication

13

ELE 0

Output Input

u

G0 u

G1

Page 14: Predicting Communication Intention in Social Media

• Edge semantics:

Conversation between users rather than friendship

• “What makes people initiate conversations with strangers?”

• “With whom do individuals choose to collaborate and why?”

≠ Link Prediction

14

Contextual – Temporal Properties

Directionality Matters

u1 ≠

m1(u1,u2,g1) m1(u1,u2,g2)

u2 u1 u2

u1 ≠

m1(u1,u2,g1) m1(u2,u1,g1)

u2 u1 u2

Page 15: Predicting Communication Intention in Social Media

• The tendency to relate to people with similar characteristics

status, beliefs, etc.

• Fundamental concept underlying social theories (e.g. Blau 1977)

• Fundamental basis for links in many types of social networks

“Similar” nodes tend to cluster together

• How does this helps us solve our problem?

Homophily

15

Page 16: Predicting Communication Intention in Social Media

• Machine learning

Probabilistic, supervised, computationally expensive

• Node attributes

No semantics

We instead exploit multiple features of variable types

• Network structure

How to Compute Similarity?

16

Graph Distance Length of shortest path between u and v

Common Neighbors

Jaccard Coefficient

Adamic/Adar

Preferential Attachment

Katz

Random walks

)()( vu

)()(

)()(

vu

vu

)()( )(log

1vuz z

)()( vu

1 ,

vupaths

Page 17: Predicting Communication Intention in Social Media

• If there is a tie between x and y and one between y and z, then in

a transitive network x and z will also be connected

• Such structural clues have been traditionally used for link

prediction

• Consider what happens if edge semantics change

• Or if we further include context

Transitivity

17

x

y

z

x

y

z asks ?

Page 18: Predicting Communication Intention in Social Media

Communication Network

18

Threaded Discussion

Bipartite Graph

Post-Reply Network

Page 19: Predicting Communication Intention in Social Media

Augmented, Directed Post-Reply Network

19

Page 20: Predicting Communication Intention in Social Media

• We model a user as a union of her:

connections and

her content

• We characterize microblogs using a set of attributes

each feature according to its type

Textual Features

− raw textual content (bag-of-words)

− #hashtags

− Groups

Temporal Features

− Date

− Time

• WordNet: enrich concepts with conceptually, semantically and

lexically related terms

Synonyms

Hypernyms

Hyponyms

User Representation

20

Page 21: Predicting Communication Intention in Social Media

• Semantic Similarity of textual concepts

Jaccard Index:

Synonym-based similarity:

Hypernym-based similarity:

Hyponym-based similarity:

• Calculate Semantic Similarity using weighted sum

Semantic Similarity of Textual Features

21

|SS|

|SS| )S ,s(S b) s(a,

ba

baba

)S ,(Ss b) (a,s bass

)S ,(Ss b) (a,s bahh

)S ,(Ss b) (a,s bahphp

Page 22: Predicting Communication Intention in Social Media

• Caveat: concepts belong to the same subtree

Solution: compute similarity between the union of annotations

• Account for lexical similarity: Levenshtein similarity

• Select the highest similarity, either semantic or lexical

Semantic Similarity of Textual Features

22

)HpHS ,HpHs(S

b), (a,s w b) (a,s w b) (a,s w

b), y(a,nSimilaritLevenshtei

max b) (a,s

bbbaaa

hphphhsstg

Page 23: Predicting Communication Intention in Social Media

• Textual Similarity between bag-of-words features:

tf.idf weight vector representation

Cosine similarity

• Date Similarity:

• Time Similarity:

• Timestamp similarity:

Feature Similarity

23

otherwise

T

dd

Tdd

d

d

,1

,0

)d ,(ds 21

21

21d

otherwise

T

tt

Ttt

t

t

,1

,0

) t,(ts 21

21

21t

)y ,(xs w )y ,(xs w y) (x, s ttttdddddf

Page 24: Predicting Communication Intention in Social Media

• We use a variation of Hausdorff point set distance measure:

Average of the maximum similarity of features in set A with respect to

features in set B

: any similarity measure on set elements ak and bi

Measure is asymmetric with respect to the sets

Feature Set Similarity

24

),(maxA

1 B)(A,S

A

1i

H ik

k

basim

),( ik basim

Page 25: Predicting Communication Intention in Social Media

• A weighted function of content and network proximity

λ controls the tradeoff between content and network proximity

• Content Proximity

User similarity with respect to their microblogs

Similarity of microblogs

− Combined weighted value of respective attributes similarities

• Network Proximity:

User Similarity

25

)p ,(ps w )p ,(psw)p,(pS w )p ,(ps w )p ,S(p 21dfdf21txtx21Htgg2g1gg21 tgtg

),(maxu

1 )u,(uS 21

u

1i

1

21C

1

ipkp

kp

uuSp

u

vuvus

||),( v)(u,SN

v)(u,)S-(1 v)(u,S v)S(u, NC

Asymmetric with respect to users

Page 26: Predicting Communication Intention in Social Media

• First construct the augmented communication graph G(V,E)

• Given a user u,

compute users similarity

− For all posts of user u with respect to all other users in the network

For all facets

Communication Intention Prediction

26

Page 27: Predicting Communication Intention in Social Media

• Complete snapshot (June 2010 – August 2011) of a corporate micro-

blogging service, which resembles Twitter

4,213 unique users

16,438 messages in total

− 8,174 thread starters

− 8,264 replies

8,139 threads

88 discussion groups

637 unique #hastags

Dataset

27

Page 28: Predicting Communication Intention in Social Media

• In our evaluation we focus on the Largest Connected Component

582 users

3,773 directed edges

11,684 messages

Average degree = 12.97

• Clustering coefficient = 0.2311 >> ccrandom = 0.0223

• Clustering coefficient as a function of node degree

Average clustering coefficient decreases with increasing node degree

Higher for nodes of low degree significant clustering among low-

degree nodes

Dataset

28

Page 29: Predicting Communication Intention in Social Media

Number of Neighbors

• Directed messages received vs. directed messages sent

Scattered across the diagonal

Cumulative distribution of the out-degree to in-degree ratio, exhibits

high correlation between in-degree and out-degree

Tendency of users to reply back when they receive a message from

other users?

29

Page 30: Predicting Communication Intention in Social Media

• Four-fold cross validation

• Randomly sample 100 users & recommend top-k links for each user

• Accuracy measures

Precision@k

Recall@k

MRR

• Baselines

Random

− Random selection

Shared Vocabulary

− Cosine similarity based on #hastags vector

Shortest distance

− Length of the shortest path

Common neighbors

Evaluation

30

Sp

k

k

pN

S

)(1

Sp

p

pp

F

RF

S

1

SpprankS

11

)()(v)sim(u, vu

Page 31: Predicting Communication Intention in Social Media

Lexical and Topical Alignment

• Is there a global vocabulary in the corporate microblogging service?

Hashtags vocabulary

“Groups vocabulary”

• Select user pairs at random and measure number of shared tags

Average nst = 1.001

Most common case is the absence of shared tags

• However adjacent users in social networks tend to share common

interests due to homophily

We measure user homophily with respect to hashtags as a function of

the distance of users in the network

• Select user pairs at random and measure number of shared groups

Average nsg = 1

Most common case is the absence of shared groups

31

Page 32: Predicting Communication Intention in Social Media

Lexical Alignment

• Average number of shared (distinct) hashtags for two users as a

function of their distance d along the network:

,

• Shared hashtags vocabulary up to distance 6!

32

22)()(

)()(),(

t vt u

t vu

tags

tftf

tftfvu

)()(tagsUvnun tt

t

t

v

t

u

Page 33: Predicting Communication Intention in Social Media

• Bold indicates best performing baseline

• Percentage lift

the % improvement achieved over the best performing baseline

Methods Comparison

33

Page 34: Predicting Communication Intention in Social Media

• How to choose best values of λ and weighing factors?

• Different datasets may lead to different optimal values

Grid search over ranges of values for these parameters

Measure accuracy on the validation set for each configuration setting

Weight Scheme Selection

34

Page 35: Predicting Communication Intention in Social Media

• 0 only considers network proximity

• 1 only considers content similarity

• All schemes perform better than the baseline

• Good value for λ is approximately 0.8

Effect of Parameter λ

35

Page 36: Predicting Communication Intention in Social Media

• Effect of weighting schemes on accuracy per user

• Different weighting schemes perform better for different users

Features importance is user specific

• Need personalization to achieve better accuracy overall

Effect of Weighting Scheme

36

Page 37: Predicting Communication Intention in Social Media

• Average precision (measured@ 5) of users having k

(a) posts or

(b) neighbors in the communication network

The more statistical evidence the better the overall precision

Content Availability and Structural Proximity

37

Page 38: Predicting Communication Intention in Social Media

• MRR as a function of λ for various restrictions

• Greater statistical evidence results in more accurate predictions

Content Availability and Structural Proximity

38

Page 39: Predicting Communication Intention in Social Media

• Performed modeling and analysis of informal communication at the

workplace

• We introduced the problem of communication intention prediction

• We addressed this problem by exploiting auxiliary information

Holistic modeling of structural clues and semantically enriched

content

• We tested the efficiency of our approach in a real-world dataset

The more statistical evidence available, the more accurate predictions

Need for personalization

• Potential applications

Contextual expert recommendation for Q&A

Search for “interesting” people to collaborate

• Open problems

Scalability

Replication of results for online social media

Conclusion and Open Problems

39

Page 40: Predicting Communication Intention in Social Media

• Semantic Social Network Analysis for the Enterprise

Contextual Recommendation

40

Employee ID:

Page 41: Predicting Communication Intention in Social Media

• Semantic Social Network Analysis for the Enterprise

Instantiate our modeling in Ontology

Collaboration analytics at the workplace

Real-world data evaluation

Contextual Recommendation

41

Contextual ego-network analysis

Expert Identification

Semantic Analysis

Page 42: Predicting Communication Intention in Social Media

• Questions?

• Resources

http://www-scf.usc.edu/~chelmis/index.php

http://pgroup.usc.edu/wiki/CSS

• Please send all inquiries at [email protected]

Thank you!

42