mining social network for personalized email prioritization
Post on 31-Dec-2015
14 Views
Preview:
DESCRIPTION
TRANSCRIPT
Mining Social Networkfor Personalized Email Prioritization
Language Techonology InstituteSchool of Computer Science
Carnegie Mellon University
Shinjae Yoo, Yiming Yang, Frank Lin, and Il-Chul Moon
2
Outline
Problem Description Approaches Experiments Contributions
3
Problem Description
Email Overload is severe problem Identifying Importance of email will alleviate
email overload Challenges
No access to other people’s emails and labels Personalized labeling is time consuming The same message may have different
priority labels for different recipientsWe want to leverage the sparse training
data by using social network of each user
Sparse Training Data
4
Outline
Problem Description Approaches
Social Clustering Social Importance Semi-supervised Importance Propagation
Experiments Conclusion and Future Work
5
Social Clustering – Motivation
Personal Email Inbox Lots of unlabeled emails No privacy issue
Observations The sender can be important Some senders are not appeared in the training set at all
or very few instances Need generalization of sender Let’s find similar senders from social network
6
Social Clustering – Contact Network
Personal Contact NetworkG =(V,E ) All the network is constructed from personal
inbox
3 541 2Agent/Person
1 1
Social Clustering – Newman Clustering
Newman Clustering Algorithm [Newman, 04] Find social cliques or cohesive social groups Based on edge betweeness
The number of shortest path that go through the edge / the total number of shortest path
Drop edges from highest edge betweeness Hard clustering
1
2 3
4
5 6
9
4 4 4 4
Group A Group B
Social Clustering – Validations
8
Clusters are coherent!
Social Clustering – Feature Incorporation
Extended Vector Space text: social network: combined: The combined vector space is used as
enriched feature set to the email prioritizer
9
10
Social Importance – Motivations
Social Importance A person in the center of a cluster might be
more important than others Betweeness
Edge betweeness for Newman Clustering Vertex betweeness
The degree of communication bottleneck from social network Contact points among the network Might be important person We may try other kinds of social importance metrics too
11
Social Importance – Metrics
Metrics Degree (in, out, total) [Wasserman and Faust, 94] Clique Counts (ClqCnt) [Wasserman and Faust, 94]
The number of clique sub-graphs which contain a node v Betweeness (BetCent) [Freeman, 77] HITS Authority (Authority) [Kleinberg, 99]
λ: the greatest Eigen value r : the Eigen vector similar to PageRank scores
Neighborhood Connectivity (“Clustering Coefficient”, ClustCoef) [Boykin and Roychowdhury, 05]
measure the connectivity among the neighbor of a node v
Social Importance – Validations
Correlation coefficients with priority levels
12
),( yfPCC
]5..1[iy valuefeature ssender' email : thifi
SIP- Motivations
Semi-supervised Importance Propagation (SIP)
Can we propagate importance labels? Bi-partite graph, Labels only in Emails
13
Agent/Person
Emails
4 3 2 ? ?
???? ?
SIP- Email Network
A: Sender to Emails (N x M) BT: Email to Recipients (M x N) xk: kth importance labels for emails(M x 1)
yk=Bxk (N x 1)
1414
Agent/Person
Emails
4 3 2 ? ?
???? ?
SIP - Algorithm
Problems of the above propagation
may not be irreducible is insensitive to (not personalized)
Apply Personalized PageRank with Normalize and column-wise normalize
C :C’
15
kt
k
tTtk
Ttk
tk xBCxBBAyBAyCy
1
1tky
kx
': 11kk yy
]1,0[ and 1' where)1(' 1
kkkk yUUCE
ktkk yEy
TBAC
1ky
16
Outline
Problem Description Approaches Experiments Contributions
Collected Data 25 subjects are recruited from Canegie Mellon University 7 users who submitted more than 200 emails 1 faculty, 2 staffs, 4 students
17
Experiments – Data Collection
Training Testingtime
18
Experiments – Metrics
Mean Absolute Error (MAE)
1.0 MAE means on average the prediction is deviated from the truth by one priority level
MAE considers the difference among the errors It ranges from 0 to 4 when we use five importance level 1 vs. 5 and 4 vs. 5
Micro-MAE Pooling the test instances from all users to obtain a joint test set
Macro-MAE Compute each user MAE first and then take the average of per-user
MAE
Experiments – Setups
Features : four subsets Basic Feature (BF) : from, to, cc, title, body Newman Clustering (NC) Social Importance (SI) Semi-supervised Importance Propagation (SIP)
Ten times random shuffling among training data
Linear SVM 10 Fold C.V. for parameter tuning Tuned regularization parameter [10-3.. 103]
19
Experiments – Results
20
21
Contributions
The first study on personalized email prioritization Using statistical classification and clustering Based on fine-grained personal judgments with multiple
users Enriched representation through personal Social
Network Social Clustering Social Importance Estimation Semi-supervised Importance Propagation
Fully personalized methodology Technical development and Evaluation
top related