clique finder by ryan lange, thomas dvornik, wesley hamilton, and bill hess

17
CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Upload: lynne-hines

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

Introduction How Can We Group Friends? How can your friends be grouped logically? What are the important factors of people joining cliques? Shared interests, high school, family, college, work, etc. Differences between Facebook and Real Life? How We Define A Clique Desired Results High school friends, family, or co-workers will be grouped together as expected. Possibly form cliques or groups of people within your friend’s list that may not have been considered before.

TRANSCRIPT

Page 1: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

CLIQUE FINDERBy Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Page 2: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Outline

Intro Problem Solution

Implementation Distance Algorithm Clustering Algorithm

Validation Test Data set Real Data set

Demo

Page 3: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Introduction

How Can We Group Friends? How can your friends be grouped logically? What are

the important factors of people joining cliques? Shared interests, high school, family, college, work, etc. Differences between Facebook and Real Life?

How We Define A Clique Desired Results

High school friends, family, or co-workers will be grouped together as expected.

Possibly form cliques or groups of people within your friend’s list that may not have been considered before.

Page 4: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Implementation

Gather DataDistance AlgorithmClustering Algorithm

Input: Distance Matrix Output: Two dimensional array of friends Test app

Output

Page 5: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Distance Algorithm

ProblemsFacebook limitsServer limits

Retrieving and processing over 30,000 photos can take up to 3-6 minutes

Important informationWhat information should be processed?Used photo tags and wall counts

Data collectedAverage of 8,000 photos across all friends

Page 6: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Distance Algorithm (continued)

Survey of 50 users 5 useful pieces of information

personal information, wall post, photos, groups, and events

Page 7: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Distance Algorithm (continued)

Facebook resultsOne picture with 5 tags = 5 results

Process resultsTurn into a list of friends with tagged photosFind a distance between each friendTurn into a distance matrix

Run time – worse case(number of users)^2*(number of photos)^2

Page 8: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Improved Distance Equation

Dist

ance

Percentage of tagged photos where users appear together

Page 9: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Clustering AlgorithmHierarchical ClusteringAverage Linkage ClustersGeneralized to work on any objects with a

distance functionClustering stops when the closest two clusters

are > threshold distance apart

Page 10: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Point-Based Test Driver

Page 11: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Validation – Sample Data Set

Page 12: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Validation – Sample Data Set How we measured correctness

Thresholds 3-10 gave us the correct number of cliques however, 5 was placed incorrectly

Error rate of 10% because 1/10 users was misplaced

Choose the mid-point value of 6 for our threshold

Page 13: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Validation – Real Data Set

• We chose to use Thomas Dvornik's account– Moderate amount of data– His friends could be separated into well-defined

cliques• Threshold on real data

• Threshold gave highest accuracy at 3 and second highest at 6

Page 14: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Validation – Improvements After improvements

Again, based on our accuracy measurement

Page 15: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Improvements/Future Work

• Caching– The number of queries and computation can

get very large– Store the distance matrix for 24 hours

• Accuracy– Use all aspects of Facebook

• Some activity is not even considered– Using weights for different data sources

• Not all activity is equally important– Analysis of produced cliques

• Survey to see if cliques are accurate

Page 16: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Demohttp://apps.facebook.com/mine_cliques/

Page 17: CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess

Questions?