clique finder by ryan lange, thomas dvornik, wesley hamilton, and bill hess
DESCRIPTION
Introduction How Can We Group Friends? How can your friends be grouped logically? What are the important factors of people joining cliques? Shared interests, high school, family, college, work, etc. Differences between Facebook and Real Life? How We Define A Clique Desired Results High school friends, family, or co-workers will be grouped together as expected. Possibly form cliques or groups of people within your friend’s list that may not have been considered before.TRANSCRIPT
CLIQUE FINDERBy Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess
Outline
Intro Problem Solution
Implementation Distance Algorithm Clustering Algorithm
Validation Test Data set Real Data set
Demo
Introduction
How Can We Group Friends? How can your friends be grouped logically? What are
the important factors of people joining cliques? Shared interests, high school, family, college, work, etc. Differences between Facebook and Real Life?
How We Define A Clique Desired Results
High school friends, family, or co-workers will be grouped together as expected.
Possibly form cliques or groups of people within your friend’s list that may not have been considered before.
Implementation
Gather DataDistance AlgorithmClustering Algorithm
Input: Distance Matrix Output: Two dimensional array of friends Test app
Output
Distance Algorithm
ProblemsFacebook limitsServer limits
Retrieving and processing over 30,000 photos can take up to 3-6 minutes
Important informationWhat information should be processed?Used photo tags and wall counts
Data collectedAverage of 8,000 photos across all friends
Distance Algorithm (continued)
Survey of 50 users 5 useful pieces of information
personal information, wall post, photos, groups, and events
Distance Algorithm (continued)
Facebook resultsOne picture with 5 tags = 5 results
Process resultsTurn into a list of friends with tagged photosFind a distance between each friendTurn into a distance matrix
Run time – worse case(number of users)^2*(number of photos)^2
Improved Distance Equation
Dist
ance
Percentage of tagged photos where users appear together
Clustering AlgorithmHierarchical ClusteringAverage Linkage ClustersGeneralized to work on any objects with a
distance functionClustering stops when the closest two clusters
are > threshold distance apart
Point-Based Test Driver
Validation – Sample Data Set
Validation – Sample Data Set How we measured correctness
Thresholds 3-10 gave us the correct number of cliques however, 5 was placed incorrectly
Error rate of 10% because 1/10 users was misplaced
Choose the mid-point value of 6 for our threshold
Validation – Real Data Set
• We chose to use Thomas Dvornik's account– Moderate amount of data– His friends could be separated into well-defined
cliques• Threshold on real data
• Threshold gave highest accuracy at 3 and second highest at 6
Validation – Improvements After improvements
Again, based on our accuracy measurement
Improvements/Future Work
• Caching– The number of queries and computation can
get very large– Store the distance matrix for 24 hours
• Accuracy– Use all aspects of Facebook
• Some activity is not even considered– Using weights for different data sources
• Not all activity is equally important– Analysis of produced cliques
• Survey to see if cliques are accurate
Demohttp://apps.facebook.com/mine_cliques/
Questions?