mining social network for personalized email prioritization

Mining Social Networkfor Personalized Email Prioritization

Language Techonology InstituteSchool of Computer Science

Carnegie Mellon University

Shinjae Yoo, Yiming Yang, Frank Lin, and Il-Chul Moon

Outline

Problem Description Approaches Experiments Contributions

Problem Description

Email Overload is severe problem Identifying Importance of email will alleviate

email overload Challenges

No access to other people’s emails and labels Personalized labeling is time consuming The same message may have different

priority labels for different recipientsWe want to leverage the sparse training

data by using social network of each user

Sparse Training Data

Outline

Problem Description Approaches

Social Clustering Social Importance Semi-supervised Importance Propagation

Experiments Conclusion and Future Work

Social Clustering – Motivation

Personal Email Inbox Lots of unlabeled emails No privacy issue

Observations The sender can be important Some senders are not appeared in the training set at all

or very few instances Need generalization of sender Let’s find similar senders from social network

Social Clustering – Contact Network

Personal Contact NetworkG =(V,E ) All the network is constructed from personal

3 541 2Agent/Person

Social Clustering – Newman Clustering

Newman Clustering Algorithm [Newman, 04] Find social cliques or cohesive social groups Based on edge betweeness

The number of shortest path that go through the edge / the total number of shortest path

Drop edges from highest edge betweeness Hard clustering

4 4 4 4

Group A Group B

Social Clustering – Validations

Clusters are coherent!

Social Clustering – Feature Incorporation

Extended Vector Space text: social network: combined: The combined vector space is used as

enriched feature set to the email prioritizer

Social Importance – Motivations

Social Importance A person in the center of a cluster might be

more important than others Betweeness

Edge betweeness for Newman Clustering Vertex betweeness

The degree of communication bottleneck from social network Contact points among the network Might be important person We may try other kinds of social importance metrics too

Social Importance – Metrics

Metrics Degree (in, out, total) [Wasserman and Faust, 94] Clique Counts (ClqCnt) [Wasserman and Faust, 94]

The number of clique sub-graphs which contain a node v Betweeness (BetCent) [Freeman, 77] HITS Authority (Authority) [Kleinberg, 99]

λ: the greatest Eigen value r : the Eigen vector similar to PageRank scores

Neighborhood Connectivity (“Clustering Coefficient”, ClustCoef) [Boykin and Roychowdhury, 05]

measure the connectivity among the neighbor of a node v

Social Importance – Validations

Correlation coefficients with priority levels

),( yfPCC

]5..1[iy valuefeature ssender' email : thifi

SIP- Motivations

Semi-supervised Importance Propagation (SIP)

Can we propagate importance labels? Bi-partite graph, Labels only in Emails

Agent/Person

Emails

4 3 2 ? ?

???? ?

SIP- Email Network

A: Sender to Emails (N x M) BT: Email to Recipients (M x N) xk: kth importance labels for emails(M x 1)

yk=Bxk (N x 1)

Agent/Person

Emails

4 3 2 ? ?

???? ?

SIP - Algorithm

Problems of the above propagation

may not be irreducible is insensitive to (not personalized)

Apply Personalized PageRank with Normalize and column-wise normalize

C :C’

tk xBCxBBAyBAyCy

': 11kk yy

]1,0[ and 1' where)1(' 1

kkkk yUUCE

ktkk yEy

Outline

Problem Description Approaches Experiments Contributions

Collected Data 25 subjects are recruited from Canegie Mellon University 7 users who submitted more than 200 emails 1 faculty, 2 staffs, 4 students

Experiments – Data Collection

Training Testingtime

Experiments – Metrics

Mean Absolute Error (MAE)

1.0 MAE means on average the prediction is deviated from the truth by one priority level

MAE considers the difference among the errors It ranges from 0 to 4 when we use five importance level 1 vs. 5 and 4 vs. 5

Micro-MAE Pooling the test instances from all users to obtain a joint test set

Macro-MAE Compute each user MAE first and then take the average of per-user

Experiments – Setups

Features : four subsets Basic Feature (BF) : from, to, cc, title, body Newman Clustering (NC) Social Importance (SI) Semi-supervised Importance Propagation (SIP)

Ten times random shuffling among training data

Linear SVM 10 Fold C.V. for parameter tuning Tuned regularization parameter [10-3.. 103]

Experiments – Results

Contributions

The first study on personalized email prioritization Using statistical classification and clustering Based on fine-grained personal judgments with multiple

users Enriched representation through personal Social

Network Social Clustering Social Importance Estimation Semi-supervised Importance Propagation

Fully personalized methodology Technical development and Evaluation

mining social network for personalized email prioritization

social cliques

cohesive social groupsbased

social networkcontact

kth importance labels

email prioritizer

sip email networka

different priority labels

emailsm x

Documents

mining large streams of user data for personalized

mining minds: an innovative framework for personalized...

mining social networks for personalized email prioritization...

biosensing and bioactuation proposed research...

personalized mining of web documents using link structures...

personalized links recommendation based on data mining in...

integrating web mining and neural network for personalized e

spotlight prioritization

mining minds: an innovative framework for personalized

negotiation & prioritization

transportation project prioritization...

requirements prioritization

the scent of a newsgroup: providing personalized access to...

cochrane frg prioritization activity · cochrane frg...

mining mobile app markets for prioritization of security...

personalized links recommendation based on data mining...

social-personalized versus computer-personalized methods...

the benefits of personalized data mining approaches to...

ais prioritization

prioritization & execution