mehdi kargar aijun an york university, toronto, canada discovering top-k teams of experts...
TRANSCRIPT
Mehdi KargarAijun An
York University, Toronto, Canada
Discovering Top-k Teams of Expertswith/without a Leader in Social Networks
Overview
• Team Formation in Social Networks• Communication Cost• Challenges in Finding Teams• Approximation Algorithm for Finding Teams• Enumerating Top-k Teams in Polynomial Delay• Finding Teams with Leader• Empirical Results• Conclusion
2/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Team of Experts
• Given a social network, find top-k teams of experts that can effectively collaborate in order to complete a project.
• Each team might/might not have a leader.
3/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Team of Experts
• Project: set of required skills
• Expert: an individual with a specific skill-set
• Social Network: represents strength of relationships (the degree of collaboration between any two experts).• For example: LinkedIn, DBLP and …
4/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Example
5/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
2 1
1
567 6
6 7
Project = {AI, DB, DM, IR}
The numbers on the edge represents how easily two experts can communicate, smallernumbers represents better communication.
8
Are Jack and Thomas Able to Communicate Effectively ?!
6/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
2 1
1
56 7 6
6 7
8
What about Jack, John and Susan?
7/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
2 1
1
567 6
6 7
8
Team of Experts• Team of Expers: Given a set of experts and a project that
requires a set of skills {s1, s2, . . . , sp}, a team of experts is a set of p skill-expert pairs:
{(s1, cs1), (s2, cs2) , . . . , (sp, csp) },
where csk is an expert having skill sk for k = 1, . . . , p.
• A skill-expert pair (sk, csk) means that expert csk is responsible for skill sk in the project.
• How to make sure that the experts can communicate together?
8/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Communication Cost• For the following team of experts (without a leader)
• The sum of distances of a team of experts is the sum of the shortest distances between the experts responsible for each pair of skills.
9/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Benefit of Sum of Distances• Previous work in this area defines two communication cost
functions [Lappas et. al]• Diameter of the sub-graph
• The largest shortest path between any two nodes in the sub-graph
• Cost of Minimum Spanning Tree
• The above measures have the following problems:1)They do not consider communication costs between each pair of
skill holders.
2)Instability:a)A slight change in the graph may result in a radical change in the solution.
b)On the other hand, they may be insensitive to adding, deleting and changing a connection in the graph since they only measure part of the communication cost.
10/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Problem 1
• Team Formation without a Leader : Given a project P and a graph G representing the social
network of a set of experts C, the problem of team formation without a leader is to find a team of experts T for P from G so that the communication cost of T, defined as the sum of distances of T, is minimized.
• What about finding top-k teams of experts? • User might be interested in finding more than one team.
11/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Challenges• Theorem: Problem 1 is NP-hard.
• Proved in the paper by reduction from 3-satisfiability (3-SAT).
• Solution : Approximation algorithm with guaranteed ratio.
• Total number of teams is exponential regarding the number of required skills.• It is not efficient to generate all teams and then sort them.
• Solution : Enumerating teams in polynomial delay.
12/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Finding Best Approximate Team (without Leader)
• Step 1: for all experts (skill holders) n, for all required skills ki, find the closest node which contains ki.
• Step 2: for all experts n, for all required skills ki, calculate the sum of distances from n to the holder of ki.
• Step 3: Find the expert with the minimum sum of distances among other experts.
• Step 4: Return the set of experts with the minimum sum of distances.
• The approximation ratio of the algorithm is equal to 2.• The weight of the answer is at most twice of the weight of the optimal
answer.13/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Enumerating in Approximate Order• The Lawler’s technique is used for finding the top-k teams.• In each iteration, the next team is generated by finding the
top team under constraints.• Two problems should be solved
14/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
1- What are the constraints?
2- How top answer can be found efficiently under the constraints?
System Overview
15/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Required Skills
+Value of k
Find best Team with
no Constraint
Insert the best team with the
search space in priority queue
Fetch the best team from
priority queue and print it
Divide the related search space of the top answer
into sub-spaces
Find best team in each sub-space with associated
constrains
Insert each answer with the related
search space into priority queue
Top-k already printedOR
Empty priority queue
?YESTerminate
NO
Constraints and Search Space
16/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
• Let’s do it using an example !
• Suppose that the required skills are {k1, k2, k3, k4}.
• Ci = {set of experts that holds skill ki }.
• The search space that contains the best team can be represented as {C1 ᵡ C2 ᵡ C3 ᵡ C4}.
• Assume that the best team is (v1, v2, v3, v4), where vi is an expert containing skill ki .
The whole search space
Team of Experts with a Leader• A project often has a leader who is
responsible for monitoring and coordinating the project
• Each expert in the team needs to communicate with the leader to report the progress and discuss issues related to the project.
• The communication cost of the team heavily depends on the distance between the leader and each of the project members.
17/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Communication Cost• For the following team of experts (with a leader)
• assume that team has a leader L, where L is an expert in the social network which may or may not belong to the team.
• The leader distance of a team of experts is the sum of the shortest distances between its leader and the expert for each required skill.
18/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Problem 2
• Team Formation with a Leader : Given a project P and a graph G representing the social
network of a set of experts C, the problem of team formation with a leader is to find a team of experts T and an expert L from C as the leader of the team so that the communication cost, defined as the leader distance is minimized.
• This problem can be solved in polynomial time.
19/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Finding Best Team (with Leader)
• Step 1: for all individuals i, for all required skills ki, find the closest expert to i which contains ki.
• Step 2: for all individuals i (leader candidates), for all required skills ki, calculate the leader distance from i to the holder of ki.
• Step 3: Find the individual (leader) with the minimum leader distance among other individuals.
• Step 4: Return the leader with the set of experts which has the minimum leader distance.
20/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Experimental Results• The proposed algorithms in this work (Best-SumDistance
and Best-Leader) are compared with the following methods.• Rarest-First, minimizing diameter• Enhanced-Steiner, minimizing the cost of minimum spanning tree
• Two datasets are used: DBLP and IMDb.• DBLP contains 5,658 experts and 8,588 edges.• IMDb contains 6,784 experts and 35,875 edges.
• For the purpose of comparison, exact answers of NP-hard problems are achieved by exhaustive search.
21/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Communication Cost
DBLP Dataset22/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Communication Cost
IMDb Dataset23/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Other Quality Measures Approximation Algorithms
DBLP Dataset24/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Other Quality Measures Exact Algorithms
DBLP Dataset25/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
Scalability
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks
26/28DBLP Dataset
Conclusion• Two problems are defined:• Finding top-k teams of experts with a leader.• Finding top-k teams of experts without a leader.• An approximation algorithm for finding a team of experts
without a leader with bounded guarantee has been proposed.
• An exact polynomial algorithm for finding a team of experts with a leader has been proposed.
• A procedure of finding top-k teams of experts with polynomial delay is introduced.
27/28
CIKM’11 Discovering Top-k Teams of Experts with/without a Leader in Social Networks