generating and tracking communities based on implicit affinities matthew smith – [email protected]...

24
Generating and Tracking Communities Based on Implicit Affinities Matthew Smith [email protected] BYU Data Mining Lab April 2007

Upload: corey-bell

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Generating and Tracking Communities Based on Implicit Affinities

Matthew Smith – [email protected]

BYU Data Mining Lab

April 2007

Page 2: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Introduction

Online Communities Continually emerging – many sites are adding this aspect Like offline communities, they are complex and dynamic

Examples USENET (1980), Google Groups, Wikipedia LinkedIn, Flickr, YouTube, MySpace, Facebook, etc. Medical Communities (e.g., DailyStrength, NAAF) Political Communities Blogosphere – focus of experiments

Page 3: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Motivation

Explicit Links

Explicit Social Network (ESN)Links: Friends, Web Links, etc.

Page 4: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Motivation

Explicit LinksImplicit Affinities

smoke

cancer

bald

ESN and Implicit Affinity Network (IAN)

Applications: Medical, Blogosphere, etc.

Page 5: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Implicit Affinity

Affinity: The overlapping of attributes-values for any

common attribute

Community: Set of individuals characterized by attributes Linked by affinities rather than explicit

relationships

Page 6: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

IAN Community Generation

Individuals – nodes characterized by attributes

Affinities – edges unlike traditional social networks where links

represent explicit relationships, the links in our approach are based strictly on affinities

Connections emerge naturally

Page 7: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Affinity Scoring

Affinity score for a particular attribute

Affinity score for all attributes

Page 8: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Affinity Network Building

IAN

Page 9: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Social Capital for Community Tracking

Social Capital: The advantage available through connections between individuals within a particular network

Bonding and Bridging Metrics

Page 10: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Preliminary

Experiments & Observations

Page 11: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Scobleizer’s Blog List

Robert Scoble (“Scobleizer”) Blogger and book author Technical evangelist (formerly with Microsoft)

Data Set Details: Scobleizer’s reading list at Bloglines.com 570 blogs 2380 bloggers

Page 12: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Data Set Statistics – Blog posts per day

We observe fewer posts during the weekend (Friday & Saturday)

Lack of data for all bloggers during first few days

Page 13: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Single Attribute: Companies

Motivation Many bloggers talk about various companies and

what they are doing Methodology

Whenever a company is mentioned in a blogger’s post, it becomes a feature of the blogger

Static company list used as attributes 1,914 company names

Page 14: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Cyclic Feature Usage

Page 15: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Power-law Behavior – Features

Observations Few companies

mentioned by many

Many companies mentioned by few

Page 16: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Blog Community Evolution

Observations Weekend bonding?

Bridging indicates newly used features new bloggers

Overall bonding (expected)

static set of features no decay blogosphere is full of buzz

Page 17: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Blog-based IAN – Feb. 24

niche sub-communities exist

Page 18: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Conclusions

Blog posts were cyclic within this community Posted more during the week and less during the weekends Interestingly, bonding occurs during the weekends

Companies were mentioned in a power-law way Few companies are mentioned often Most companies are mentioned rarely

Niche sub-communities Bloggers focusing on long-tail companies were identified

Blog-based IAN Appears to follow power-law connectivity like ESNs

Page 19: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Future Work (In Progress)

Compare IAN and ESN of the same community Analyze evolution (social capital vs. density) Compare snapshots Identify and report similarities and differences Develop hybrid sub-community identification

Experiment on domain-specific communities Medical – patient communities Political – jump start grass-roots campaigns

Page 20: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

More Future Work

Refine implicit attribute extraction Allow for dynamic feature extraction Allow features to naturally decay with time Use LDA to extract “concepts”

Putnam’s puzzle Consider adapting Social Capital measures to allow

for uncorrelated bonding and bridging

Page 21: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Questions

?

Page 22: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Affinity Score Distribution

Page 23: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Blog-based IANs – Filtered by Threshold

Affinity Scores GTE 0.5 Affinity Score of 1.0

Page 24: Generating and Tracking Communities Based on Implicit Affinities Matthew Smith – smitty@byu.edu BYU Data Mining Lab April 2007

Brigham Young University - Data Mining Lab (http://dml.cs.byu.edu)

Blog-based IAN – Filtered by Thresholds

Affinity ThresholdsScore GTE 0.5Count GTE 3

2/15 – 3/15