collating social network profiles. objective 2 system

23
Collating Social Network Profiles

Upload: jonah-bates

Post on 01-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Collating Social Network Profiles. Objective 2 System

Collating Social Network Profiles

Page 2: Collating Social Network Profiles. Objective 2 System

2

<Twitter Profile, Facebook Profile, G+ Profile, …>

Objective

<Company Name> System<Twitter Profile, Facebook Profile, G+ Profile, …>

Page 3: Collating Social Network Profiles. Objective 2 System

3

<Twitter Profile, Facebook Profile, G+ Profile, …>

Objective

Company Name SystemSocial Network

Profiles

Input Output

Page 4: Collating Social Network Profiles. Objective 2 System

4

Record Linkage+

Identity

Page 5: Collating Social Network Profiles. Objective 2 System

5

Agenda

Introduction Objective

Contrast to Existing Work

Work Done Baseline System

Individual Network Approach

Machine Learning Experiments

Next Steps, Q&A

Page 6: Collating Social Network Profiles. Objective 2 System

6

Baseline System

Page 7: Collating Social Network Profiles. Objective 2 System

7

Ground Truth

Two networks: Facebook and TwitterTop seventy 2013 Fortune 500 companies

Page 8: Collating Social Network Profiles. Objective 2 System

8

Baseline Algorithm

1.Take company name.

2.Search Facebook/Twitter API using it.

3.Return first result from each.

Page 9: Collating Social Network Profiles. Objective 2 System

9

Baseline Performance

Facebook Twitter Both0

10

20

30

40

50

60

70

34

52

30

Corr

ect

Matc

hes

Page 10: Collating Social Network Profiles. Objective 2 System

10

Individual Network Approach

Page 11: Collating Social Network Profiles. Objective 2 System

11

New Approach

Score profiles based onEdit Distance

Company Name – Username

Company Name – Display Name

Relative Popularity

Page 12: Collating Social Network Profiles. Objective 2 System

12

Display Name

Username

Page 13: Collating Social Network Profiles. Objective 2 System

13

New Approach

Score profiles based onEdit Distance

Company Name – Username

Company Name – Display Name

Relative Popularity

Page 14: Collating Social Network Profiles. Objective 2 System

14

Scoring

Edit Distance Score:

Popularity Score:

Page 15: Collating Social Network Profiles. Objective 2 System

15

Best Performing Combination

Facebook Twitter Both0

10

20

30

40

50

60

70

34

52

30

40

50

34

Baseline Username Edit Distance + Popularity

Corr

ect

Matc

hes

Page 16: Collating Social Network Profiles. Objective 2 System

16

Machine Learning Experiments

Page 17: Collating Social Network Profiles. Objective 2 System

17

Freebase Ground Truth

1,422 with a social media presence

917 with Facebook, 687 with Twitter

598 with both

553 with valid profiles

Page 18: Collating Social Network Profiles. Objective 2 System

18

Training Set

553 Correct

553 Incorrect

1106

Total

Page 19: Collating Social Network Profiles. Objective 2 System

19

Cross Validation Results

Classifier Test | Train Train | Test

Linear Regression 0.734 0.707

Gaussian Naïve Bayes 0.972 0.956

Multinomial Naïve Bayes 0.511 0.506

Bernoulli Naïve Bayes 0.720 0.701

Decision Tree 0.954 0.935

Page 20: Collating Social Network Profiles. Objective 2 System

20

Next Steps

Improve training set: provide harder examples

Page 21: Collating Social Network Profiles. Objective 2 System

21

Next Steps

Improve training set: provide harder examplesIncorporate more profile data

Page 22: Collating Social Network Profiles. Objective 2 System

22

Next Steps

Improve training set: provide harder examplesIncorporate more profile dataBuild system around classifiers

Page 23: Collating Social Network Profiles. Objective 2 System

23

Agenda

Introduction ObjectiveContrast to Existing Work

Work Done Baseline SystemIndividual Network ApproachMachine Learning Experiments

Next Steps, Q&A