classification of commercial and personal profiles on my space

Post on 24-Apr-2015

400 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

1

Classification of Commercialand Personal Profiles on MySpace

Pervasive Computing and Communications Workshops (PERCOM Workshops), 2011 IEEE International Conference on

Advisor : Yin-Fu Huang

Student : Chen-Ju Lai

2

Outline Introduction Related Work Data analysis Classifier Practical application Summary

3

Introduction This research focuses on the protection of privacy

and anonymity of individuals using online social networks.

This paper will consider the specific attributes of commercial sites as compared to personal sites. Ex: Membership Patterns, Age Patterns, Gender Patterns.

A C4.5 decision tree whose accuracy varies from 92.25% to 96.4% depending on the attributes used.

The algorithm is then applied to a Privacy-Preserving Data Publishing (PPDP) service to provide anonymity for online social network users.

4

Related Work Privacy-Preserving Data Publishing: A Survey of

Recent Developments[9] A large-scale study of MySpace:Observations and

implications for online social networks[3] User interactions in social networks and their

implications[2]

5

Data analysis We visually inspected approximately 5,000

randomly selected profiles to classify these profiles as commercial or personal.

Gender Distributions Almost Gender neutral profile are commercial profiles

Member Profile Patterns Most of the commercial profiles are located at age 0

6

Data analysis(cont.) Publishers versus Friends Profile Owner and Publisher Age Difference

Commercial Profile : age difference is within 15 years of age

Personal Profile : age difference is within 5 years of age

Neutral versus Male and Female

1.The number of friends and publishersNeutral > Male and Female

2. The frequency of publishersMale and Female > Neutral

7

Classifier Three major considerations were taken into

account during analysis They are the ease in collecting the attributes The actual classification technique used The ease of implementation

Several techniques are well known for data classification C4.5 decision tree Bayesian classifiers Neural Networks Support Vector Machines

8

Classifier – Classification Criteria Static attributes

Such as gender, account, age, friends and blogs Deep inspection

Publisher age distribution and publisher versus friends ratio

3 data-sets Gender, age and friends Gender, age, friends ,account and blog counts (static

attribute) All attribute ,both static and deep inspection

Using WEKA

9

Classifier – J48 Classifier J48 is a recursive algorithm for generating C4.5

pruned or unpruned decision trees. Decision trees are created within the J48

algorithm by using information entropy on a set of training data.

Data attributes are organized into subsets and the normalized information gain, measured by the difference in entropy, is used to measure these subsets to identify the optimum attributes used as nodes in the decision tree.

non-leaf node

branches

Leaf node

10

Classifier – J48 Classifier Two data sets were derived from the original

614,970 public profiles collected by randomly creating a list of “analysis” and “holdout” sets.

For these sets, each profile was manually inspected and classified to be either commercial or personal. Private、 Removed、 Reused、 Blocked、 Undecided

11

Classifier – J48 Classifier The result of the classification of 6,366 profiles

resulted in 5,153 profiles available for analysis

Dataset Commercial

Personal Total

Analysis 667 3238 3905

Holdout 121 1127 1248

12

Classifier – J48 Classifier The entropy i (N), information empurity, for each

was calculated using the formula below [7], [8], where P (wi) is the fraction of patterns at node N that are in the category wi.

age : 0.583 gender : 0.578 the number of friends : 0.531

13

Practical application To circumvent the loss of privacy and maintain

anonymity, the classifier is used to ascertain when commercial profiles are the target of a user publishing activity.

Redirects the publication to an anonymous privacy avatar, which post the desired content on behalf of the personal profile.

14

Practical application - Overview The main components of the system are the

online social network server, avatar service and client browser.

The avatar solution includes a transparent proxy, classifier and avatar engine, rendering engine, authentication, security filters.

15

Practical application(cont.) Avatar Masquerading and Impersonation

Avatar proxy Classifier Database Security violation Filters policy Re-distributes

Avatar Publishing Operations C# Web Browser control object Request object HTML Document DOM object JASON object Render engine

Commercial

Client

Publishing request

Response

Avatar

16

Summary The classification of personal and commercial

profiles is important. The classifier is a decision tree used to identify a

profile as being either commercial or personal. The decision tree uses profile attributes include

age, gender and publishing relationships. The result of the classifier yields a binary tree with

a degree of accuracy of 92.25% to 96.42. To circumvent the loss of privacy, an avatar

solution is presented.

top related