classification of commercial and personal profiles on my space
DESCRIPTION
TRANSCRIPT
1
Classification of Commercialand Personal Profiles on MySpace
Pervasive Computing and Communications Workshops (PERCOM Workshops), 2011 IEEE International Conference on
Advisor : Yin-Fu Huang
Student : Chen-Ju Lai
2
Outline Introduction Related Work Data analysis Classifier Practical application Summary
3
Introduction This research focuses on the protection of privacy
and anonymity of individuals using online social networks.
This paper will consider the specific attributes of commercial sites as compared to personal sites. Ex: Membership Patterns, Age Patterns, Gender Patterns.
A C4.5 decision tree whose accuracy varies from 92.25% to 96.4% depending on the attributes used.
The algorithm is then applied to a Privacy-Preserving Data Publishing (PPDP) service to provide anonymity for online social network users.
4
Related Work Privacy-Preserving Data Publishing: A Survey of
Recent Developments[9] A large-scale study of MySpace:Observations and
implications for online social networks[3] User interactions in social networks and their
implications[2]
5
Data analysis We visually inspected approximately 5,000
randomly selected profiles to classify these profiles as commercial or personal.
Gender Distributions Almost Gender neutral profile are commercial profiles
Member Profile Patterns Most of the commercial profiles are located at age 0
6
Data analysis(cont.) Publishers versus Friends Profile Owner and Publisher Age Difference
Commercial Profile : age difference is within 15 years of age
Personal Profile : age difference is within 5 years of age
Neutral versus Male and Female
1.The number of friends and publishersNeutral > Male and Female
2. The frequency of publishersMale and Female > Neutral
7
Classifier Three major considerations were taken into
account during analysis They are the ease in collecting the attributes The actual classification technique used The ease of implementation
Several techniques are well known for data classification C4.5 decision tree Bayesian classifiers Neural Networks Support Vector Machines
8
Classifier – Classification Criteria Static attributes
Such as gender, account, age, friends and blogs Deep inspection
Publisher age distribution and publisher versus friends ratio
3 data-sets Gender, age and friends Gender, age, friends ,account and blog counts (static
attribute) All attribute ,both static and deep inspection
Using WEKA
9
Classifier – J48 Classifier J48 is a recursive algorithm for generating C4.5
pruned or unpruned decision trees. Decision trees are created within the J48
algorithm by using information entropy on a set of training data.
Data attributes are organized into subsets and the normalized information gain, measured by the difference in entropy, is used to measure these subsets to identify the optimum attributes used as nodes in the decision tree.
non-leaf node
branches
Leaf node
10
Classifier – J48 Classifier Two data sets were derived from the original
614,970 public profiles collected by randomly creating a list of “analysis” and “holdout” sets.
For these sets, each profile was manually inspected and classified to be either commercial or personal. Private、 Removed、 Reused、 Blocked、 Undecided
11
Classifier – J48 Classifier The result of the classification of 6,366 profiles
resulted in 5,153 profiles available for analysis
Dataset Commercial
Personal Total
Analysis 667 3238 3905
Holdout 121 1127 1248
12
Classifier – J48 Classifier The entropy i (N), information empurity, for each
was calculated using the formula below [7], [8], where P (wi) is the fraction of patterns at node N that are in the category wi.
age : 0.583 gender : 0.578 the number of friends : 0.531
13
Practical application To circumvent the loss of privacy and maintain
anonymity, the classifier is used to ascertain when commercial profiles are the target of a user publishing activity.
Redirects the publication to an anonymous privacy avatar, which post the desired content on behalf of the personal profile.
14
Practical application - Overview The main components of the system are the
online social network server, avatar service and client browser.
The avatar solution includes a transparent proxy, classifier and avatar engine, rendering engine, authentication, security filters.
15
Practical application(cont.) Avatar Masquerading and Impersonation
Avatar proxy Classifier Database Security violation Filters policy Re-distributes
Avatar Publishing Operations C# Web Browser control object Request object HTML Document DOM object JASON object Render engine
Commercial
Client
Publishing request
Response
Avatar
16
Summary The classification of personal and commercial
profiles is important. The classifier is a decision tree used to identify a
profile as being either commercial or personal. The decision tree uses profile attributes include
age, gender and publishing relationships. The result of the classifier yields a binary tree with
a degree of accuracy of 92.25% to 96.42. To circumvent the loss of privacy, an avatar
solution is presented.