classification of commercial and personal profiles on my space

16
Classification of Commercial and Personal Profiles on MySpace Pervasive Computing and Communications Workshops (PERCOM Workshops), 2011 IEEE International Conference on 1 Advisor : Yin-Fu Huang Student : Chen-Ju Lai

Upload: es712

Post on 24-Apr-2015

400 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Classification of commercial and personal profiles on my space

1

Classification of Commercialand Personal Profiles on MySpace

Pervasive Computing and Communications Workshops (PERCOM Workshops), 2011 IEEE International Conference on

Advisor : Yin-Fu Huang

Student : Chen-Ju Lai

Page 2: Classification of commercial and personal profiles on my space

2

Outline Introduction Related Work Data analysis Classifier Practical application Summary

Page 3: Classification of commercial and personal profiles on my space

3

Introduction This research focuses on the protection of privacy

and anonymity of individuals using online social networks.

This paper will consider the specific attributes of commercial sites as compared to personal sites. Ex: Membership Patterns, Age Patterns, Gender Patterns.

A C4.5 decision tree whose accuracy varies from 92.25% to 96.4% depending on the attributes used.

The algorithm is then applied to a Privacy-Preserving Data Publishing (PPDP) service to provide anonymity for online social network users.

Page 4: Classification of commercial and personal profiles on my space

4

Related Work Privacy-Preserving Data Publishing: A Survey of

Recent Developments[9] A large-scale study of MySpace:Observations and

implications for online social networks[3] User interactions in social networks and their

implications[2]

Page 5: Classification of commercial and personal profiles on my space

5

Data analysis We visually inspected approximately 5,000

randomly selected profiles to classify these profiles as commercial or personal.

Gender Distributions Almost Gender neutral profile are commercial profiles

Member Profile Patterns Most of the commercial profiles are located at age 0

Page 6: Classification of commercial and personal profiles on my space

6

Data analysis(cont.) Publishers versus Friends Profile Owner and Publisher Age Difference

Commercial Profile : age difference is within 15 years of age

Personal Profile : age difference is within 5 years of age

Neutral versus Male and Female

1.The number of friends and publishersNeutral > Male and Female

2. The frequency of publishersMale and Female > Neutral

Page 7: Classification of commercial and personal profiles on my space

7

Classifier Three major considerations were taken into

account during analysis They are the ease in collecting the attributes The actual classification technique used The ease of implementation

Several techniques are well known for data classification C4.5 decision tree Bayesian classifiers Neural Networks Support Vector Machines

Page 8: Classification of commercial and personal profiles on my space

8

Classifier – Classification Criteria Static attributes

Such as gender, account, age, friends and blogs Deep inspection

Publisher age distribution and publisher versus friends ratio

3 data-sets Gender, age and friends Gender, age, friends ,account and blog counts (static

attribute) All attribute ,both static and deep inspection

Using WEKA

Page 9: Classification of commercial and personal profiles on my space

9

Classifier – J48 Classifier J48 is a recursive algorithm for generating C4.5

pruned or unpruned decision trees. Decision trees are created within the J48

algorithm by using information entropy on a set of training data.

Data attributes are organized into subsets and the normalized information gain, measured by the difference in entropy, is used to measure these subsets to identify the optimum attributes used as nodes in the decision tree.

non-leaf node

branches

Leaf node

Page 10: Classification of commercial and personal profiles on my space

10

Classifier – J48 Classifier Two data sets were derived from the original

614,970 public profiles collected by randomly creating a list of “analysis” and “holdout” sets.

For these sets, each profile was manually inspected and classified to be either commercial or personal. Private、 Removed、 Reused、 Blocked、 Undecided

Page 11: Classification of commercial and personal profiles on my space

11

Classifier – J48 Classifier The result of the classification of 6,366 profiles

resulted in 5,153 profiles available for analysis

Dataset Commercial

Personal Total

Analysis 667 3238 3905

Holdout 121 1127 1248

Page 12: Classification of commercial and personal profiles on my space

12

Classifier – J48 Classifier The entropy i (N), information empurity, for each

was calculated using the formula below [7], [8], where P (wi) is the fraction of patterns at node N that are in the category wi.

age : 0.583 gender : 0.578 the number of friends : 0.531

Page 13: Classification of commercial and personal profiles on my space

13

Practical application To circumvent the loss of privacy and maintain

anonymity, the classifier is used to ascertain when commercial profiles are the target of a user publishing activity.

Redirects the publication to an anonymous privacy avatar, which post the desired content on behalf of the personal profile.

Page 14: Classification of commercial and personal profiles on my space

14

Practical application - Overview The main components of the system are the

online social network server, avatar service and client browser.

The avatar solution includes a transparent proxy, classifier and avatar engine, rendering engine, authentication, security filters.

Page 15: Classification of commercial and personal profiles on my space

15

Practical application(cont.) Avatar Masquerading and Impersonation

Avatar proxy Classifier Database Security violation Filters policy Re-distributes

Avatar Publishing Operations C# Web Browser control object Request object HTML Document DOM object JASON object Render engine

Commercial

Client

Publishing request

Response

Avatar

Page 16: Classification of commercial and personal profiles on my space

16

Summary The classification of personal and commercial

profiles is important. The classifier is a decision tree used to identify a

profile as being either commercial or personal. The decision tree uses profile attributes include

age, gender and publishing relationships. The result of the classifier yields a binary tree with

a degree of accuracy of 92.25% to 96.42. To circumvent the loss of privacy, an avatar

solution is presented.