mining social network for personalized email prioritization

Post on 31-Dec-2015

14 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Mining Social Network for Personalized Email Prioritization. Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae Yoo, Yiming Yang, Frank Lin, and Il-Chul Moon. Outline. Problem Description Approaches Experiments Contributions. Problem Description. - PowerPoint PPT Presentation

TRANSCRIPT

Mining Social Networkfor Personalized Email Prioritization

Language Techonology InstituteSchool of Computer Science

Carnegie Mellon University

Shinjae Yoo, Yiming Yang, Frank Lin, and Il-Chul Moon

2

Outline

Problem Description Approaches Experiments Contributions

3

Problem Description

Email Overload is severe problem Identifying Importance of email will alleviate

email overload Challenges

No access to other people’s emails and labels Personalized labeling is time consuming The same message may have different

priority labels for different recipientsWe want to leverage the sparse training

data by using social network of each user

Sparse Training Data

4

Outline

Problem Description Approaches

Social Clustering Social Importance Semi-supervised Importance Propagation

Experiments Conclusion and Future Work

5

Social Clustering – Motivation

Personal Email Inbox Lots of unlabeled emails No privacy issue

Observations The sender can be important Some senders are not appeared in the training set at all

or very few instances Need generalization of sender Let’s find similar senders from social network

6

Social Clustering – Contact Network

Personal Contact NetworkG =(V,E ) All the network is constructed from personal

inbox

3 541 2Agent/Person

1 1

Social Clustering – Newman Clustering

Newman Clustering Algorithm [Newman, 04] Find social cliques or cohesive social groups Based on edge betweeness

The number of shortest path that go through the edge / the total number of shortest path

Drop edges from highest edge betweeness Hard clustering

1

2 3

4

5 6

9

4 4 4 4

Group A Group B

Social Clustering – Validations

8

Clusters are coherent!

Social Clustering – Feature Incorporation

Extended Vector Space text: social network: combined: The combined vector space is used as

enriched feature set to the email prioritizer

9

10

Social Importance – Motivations

Social Importance A person in the center of a cluster might be

more important than others Betweeness

Edge betweeness for Newman Clustering Vertex betweeness

The degree of communication bottleneck from social network Contact points among the network Might be important person We may try other kinds of social importance metrics too

11

Social Importance – Metrics

Metrics Degree (in, out, total) [Wasserman and Faust, 94] Clique Counts (ClqCnt) [Wasserman and Faust, 94]

The number of clique sub-graphs which contain a node v Betweeness (BetCent) [Freeman, 77] HITS Authority (Authority) [Kleinberg, 99]

λ: the greatest Eigen value r : the Eigen vector similar to PageRank scores

Neighborhood Connectivity (“Clustering Coefficient”, ClustCoef) [Boykin and Roychowdhury, 05]

measure the connectivity among the neighbor of a node v

Social Importance – Validations

Correlation coefficients with priority levels

12

),( yfPCC

]5..1[iy valuefeature ssender' email : thifi

SIP- Motivations

Semi-supervised Importance Propagation (SIP)

Can we propagate importance labels? Bi-partite graph, Labels only in Emails

13

Agent/Person

Emails

4 3 2 ? ?

???? ?

SIP- Email Network

A: Sender to Emails (N x M) BT: Email to Recipients (M x N) xk: kth importance labels for emails(M x 1)

yk=Bxk (N x 1)

1414

Agent/Person

Emails

4 3 2 ? ?

???? ?

SIP - Algorithm

Problems of the above propagation

may not be irreducible is insensitive to (not personalized)

Apply Personalized PageRank with Normalize and column-wise normalize

C :C’

15

kt

k

tTtk

Ttk

tk xBCxBBAyBAyCy

1

1tky

kx

': 11kk yy

]1,0[ and 1' where)1(' 1

kkkk yUUCE

ktkk yEy

TBAC

1ky

16

Outline

Problem Description Approaches Experiments Contributions

Collected Data 25 subjects are recruited from Canegie Mellon University 7 users who submitted more than 200 emails 1 faculty, 2 staffs, 4 students

17

Experiments – Data Collection

Training Testingtime

18

Experiments – Metrics

Mean Absolute Error (MAE)

1.0 MAE means on average the prediction is deviated from the truth by one priority level

MAE considers the difference among the errors It ranges from 0 to 4 when we use five importance level 1 vs. 5 and 4 vs. 5

Micro-MAE Pooling the test instances from all users to obtain a joint test set

Macro-MAE Compute each user MAE first and then take the average of per-user

MAE

Experiments – Setups

Features : four subsets Basic Feature (BF) : from, to, cc, title, body Newman Clustering (NC) Social Importance (SI) Semi-supervised Importance Propagation (SIP)

Ten times random shuffling among training data

Linear SVM 10 Fold C.V. for parameter tuning Tuned regularization parameter [10-3.. 103]

19

Experiments – Results

20

21

Contributions

The first study on personalized email prioritization Using statistical classification and clustering Based on fine-grained personal judgments with multiple

users Enriched representation through personal Social

Network Social Clustering Social Importance Estimation Semi-supervised Importance Propagation

Fully personalized methodology Technical development and Evaluation

top related