itddm04 fr

A SYSTEM TO FILTER UNWANTED MESSAGES

FROM OSN USER WALLS

INTRODUCTION:

Online Social Networks (OSNs) are today one of the most popular interactive medium to

communicate, share, and disseminate a considerable amount of human life information. Daily

and continuous communications imply the exchange of several types of content, including free

text, image, audio, and video data. In the sharing In OSNs, information filtering can also be used

for a different, more sensitive, purpose. This is due to the fact that in OSNs there is the

possibility of posting or commenting other posts on particular public/private areas, called in

general walls. Information filtering can therefore be used to give users the ability to

automatically control the messages written on their own walls, by filtering out unwanted

messages.

SCOPE OF THE PROJECT:

The OSN have three layer there are graphical user interface, social network application and

social network managers. The social network managers handle the basic functionalities like

profile management, network based function etc. But in this project focused on other two layers

and apply some new condition. Application layer have short text classifier and content based

message filtering. Short text classifier classifying the messages based on the content. Content

based message filter have black list and filtering policies. First, find relationship between the

user and message senders and it will filter and calculate the probabilities using classifier. And the

send a empty message below the probabilities result to the user. So our proposed system will

give the direct control to the user that what kind of messages displays on their wall.

LITERATURE SURVEY:

Title: BoosTexter: A Boosting-based System for Text Categorization

Author: Robert E Schapire, Yoram Singer

Year: 2000

Description: We use adopt a different approach in which we use two extensions of AdaBoost that were

specifically intended for multiclass, multi-label data. In the first extension, the goal of the learning algorithm is to

predict all and only all of the correct labels. Thus, the learned classifier is evaluated in terms of its ability to predict a

good approximation of the set of labels associated with a given document. In the second extension, the goal is to

design a classifier that ranks the labels so that the correct labels will receive the highest ranks. We next describe

BoosTexter, a system which embodies four versions of boosting based on these extensions, and we discuss the

implementation issues that arise in multilabel text categorization.

Title:A Comparison of Classifiers and Document Representations for the Routing Problem

Author: H. Schutze, D.A. Hull,J.O. Pedersen

YEAR:1995

Description: We compare two approaches to document routing, relevance feedback via query

expansion and statistical classification with error minimization. We show that advanced

classification algorithms perform 10-15% better than relevance feedback on the Tipster

document collection. Since learning algorithms based on error minimization and numerical

optimization are computationally intensive and prone to over fitting in a high dimensional

feature space, it is necessary to apply some method of dimensionality reduction. We examine

two different approaches, latent semantic indexing and feature selection of terms using a x2 -test

of non independence.

Title: Content-Based Book Recommending Using Learning for Text Categorization

Author: Raymond J. Mooney, Loriene Roy

Year: 1999

Description: Recommender systems improve access to relevant products and information by

making personalized suggestions based on previous examples of a user's likes and dislikes. Most

existing recommender systems use social filtering methods that base recommendations on other

users' preferences. By contrast, content-based methods use information about an item itself to

make suggestions. This approach has the advantage of being able to recommended previously

unrated items to users with unique interests and to provide explanations for its recommendations.

We describe a content-based book recommending system that utilizes information extractionand

a machine-learning algorithm for text categorization. Initial experimental results demonstrate

that this approach can produce accurate recommendations. These experiments are based on

ratings from random samplings of items and we discuss problems with previous experiments that

employ skewed samples of user-selected examples to evaluate performance.

Title: Content-based Filtering in On-line Social Networks

Author: M. Vanetti, E. Binaghi, B. Carminati, M. Carullo and E. Ferrari

Year: 2010

Description: This work is the first step of a wider project. The early encouraging results we have

obtained on the classification procedure prompt us to continue with other work that will aim to

improve the quality of classification. Additionally, we plan to enhance our filtering rule system,

with a more sophisticated approach to manage those messages caught just for the tolerance and

to decide when a user should be inserted into a BL. For instance, the system can automatically

take a decision about the messages blocked because of the tolerance, on the basis of some

statistical data (e.g., number of blocked messages from the same author, number of times the

creator has been inserted in the BL) as well as data on creator profile (e.g., relationships with the

wall owner, age, sex). Further, we plan to test the robustness of our system against different

adversary models. The development of a GUI to make easier BL and filtering rule specification

is also a direction we plan to investigate.

Title: Inductive Learning Algorithms and Representations for Text Categorization

Author: Susan Dumais, John Platt, David Heckerman, Mehran Sahami

Year: 1998

Description: Here, we describe results from experiments using a collection of hand-tagged

financial newswire stories from Reuters. We use supervised learning methods to build our

classifiers, and evaluate the resulting models on new test cases. The focus of our work has been

on comparing the effectiveness of different inductive learning algorithms (Find Similar, Naïve

Bayes, Bayesian Networks, Decision Trees, and Support Vector Machines) in terms of learning

speed, real-time classification speed, and classification accuracy. We also explored alternative

document representations (words vs. syntactic phrases, and binary vs. non-binary features), and

training set size.

Title: Learning and Revising User Profiles: The Identification of Interesting Web SitesAuthor: M.J. Pazzani and D. Billsus

Year: 1997

Description: We discuss algorithms for learning and revising user profiles that can determine

which World Wide Web sites on a given topic would be interesting to a user. We describe the

use of a naive Bayesian classifier for this task, and demonstrate that it can incrementally learn

profiles from user feedback on the interestingness of Web sites. Furthermore, the Bayesian

classifier may easily be extended to revise user provided profiles. In an experimental evaluation

we compare the Bayesian classifier to computationally more intensive alternatives, and show that

it performs at least as well as these approaches throughout a range of different domains. In

addition, we empirically analyze the effects of providing the classifier with background

knowledge in form of user defined profiles and examine the use of lexical knowledge for feature

selection. We find that both approaches can substantially increase the prediction accuracy.

Title: Towards the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible ExtensionsAuthor: Gediminas Adomavicius and Alexander Tuzhilin

Year: 2001

Description: The paper presents an overview of the field of recommender

systems and describes the current generation of recommendation methods

that are usually classified into the following three main categories: content-

based, collaborative, and hybrid recommendation approaches. The paper

also describes various limitations of current recommendation methods and

discusses possible extensions that can improve recommendation capabilities

and make recommender systems applicable to an even broader range of

applications. These extensions include, among others, improvement of

understanding of users and items, incorporation of the contextual

information into the recommendation process, support for multi-criteria

Database

ratings, and provision of more flexible and less intrusive types of

recommendations.

Title:Text Categorization with Support Vector Machines: Learning with Many Relevant

Features

Author: T. Joachims

Year: 1998

Description: They are introduces support vector machines for text

categorization. It provides both theoretical and empirical evidence that SVMs

are very well suited for text categorization. The theoretical analysis

concludes that SVMs acknowledge the particular properties of text In that

experimental results show that SVMs consistently achieve good performance

on text on text categorization tasks, outperforming existing methods

substantially and significantly. With their ability to generalized well in high

dimensional feature spaces, SVMs eliminate the need for feature selection,

making the application of text categorization considerably easier. Another

advantage of SVMs over the conventional methods in their robustness.

Modules:

Authentication

Profile Generation

Accept Friends Request

Send Request

Share Photos

Post Comment or Text

Block Unwanted Comments

Module Description and Module Diagram:

Authentication (login /Registration)

User Create Profile Data Base

User New Friends Data Base

View Friends

In this module User want to register the personal details in the database and get the

authentication processes to go forward. In this module User want to give the database to admin

all the registration process is done by a admin. After the registration process completed User can

get the authentication permission, by using username and password login website. If the user

enters a valid username/password combination they will be granted to access data. If the user

enter invalid username and password that user will be considered as unauthorized user and

denied access to that user.

Profile Generation

In this module user make our profile that details store in database the profile contains name,

contact no, and email address, photos, and other information. Logged users can see their details

and if they wish to change any of their information they can edit it.

Accept Friends Request

In this module user add new friends and view our friends and details. Logged users can see their

friend list and if they wish to add friends

User Select friend Data Base

Send request

User Add New Photos

Data Base

View Photos

Publish

Share Photo Post Comments

Known or Unknown person

Send Request

In this module user select friend to send request. logged user view request accept our friend request.

Share Photos

In this module user get the photo from the database and add new photo and publish on their wall.

Post comments

Post Comments

Calculate Probability based on content of comments

Data Base

Display the probability result on user private wall

In this module user post any photo in public wall, any one post a comments for that photo.

Unknown persons also post comments but we don’t know the character about the member in

case out of group.

Block the unwanted comments

In this module we have to calculate the probability of the message contents. That result will be

display on the user private wall with two options like accept and reject

Input Output Design:

Authentication (login /Registration)

Input: Register the user Details and give the username and Password to login

Output: They will be granted to access the data’s

Profile

Input: user enter our information like email id, contact no, and other information

Output: create profile and store information into database

Friends

Input: user selects our friend to add

Output: add friend in our profile

Send Request

Input: user select friend to request

Output: Send request

Post Photos

Input: User post some photos on their wall.

Output: Display the photos.

Post Comments

Input: Known or unknown persons are send a comment.

Output: The comments will display.


Input: The comments compare to the database data and calculate the probability.

Output: The probability results display on user private wall so we have to control directly.

TECHNIQUE USED OR ALGORITHM USED

Machine Learning (ML) Text Categorization Techniques

Step 1: SHORT TEXT CLASSIFIER

i)Text Representation

we consider three types of features, BoW, Document properties (Dp) andContextual Features

(CF). they are entirely derived from the information contained within the text of the message.

Text representation using endogenous knowledge has a good general applicability; however, in

operational settings, it is legitimate to use also exogenous knowledge, i.e., any source of

information outside the message body but directly or indirectly related to the message itself.

ii) Machine Learning-Based Classification

We address short text categorization as a hierarchical twolevel classification process. The first-

level classifier performs a binary hard categorization that labels messages as Neutral and

Nonneutral. The first-level filtering task facilitates thesubsequent second-level task in which a

finer-grained classification is performed. The second-level classifier performs a soft-partition of

Nonneutral messages assigning a given message a gradual membership to each of the nonneutral

classes.

iii) Radial Basis Function Networks (RBFN)

RFBNs have a single hidden layer of processing units with local, restricted activation domain: a

Gaussian function is commonly used, but any other locally tunable function can be used. They

were introduced as a neural network evolution of exact interpolation, and are demonstrated to

have the universal approximation property.

Step 2: FILTERING RULES AND BLACKLIST MANAGEMENT

i)Filtering Rules:

a) Creator specification

This implies to state conditions on type, depth, and trust values of the relationship( s)

creators should be involved in order to apply themthe specified rules. A creator

specification creatorSpec implicitly denotes a set of OSN users

A set of attribute constraints of the form an OP av,where an is a user profile

attribute name, av and OP are, respectively, a profile attribute value and a

comparison operator, compatible with an’s domain.

A set of relationship constraints of the form (m, rt, minDepth, maxTrust) denoting

all the OSN users participating with user m in a relationship of type rt, having a

depth greater than or equal to minDepth, and a trust value less than or equal to

maxTrust.

b)Filtering Rule

author is the user who specifies the rule;

creatorSpec is a creator specification, specified according to Definition 1;

contentSpec is a Boolean expression defined on content constraints of the form

(C, ml) where C is a class of the first or second level and ml is the minimum

membership level threshold required for class C to make the constraint satisfied;

action { fblock notify} denotes the action to be performed by the system on the

messages matching contentSpec and created by users identified by creatorSpec.

ii) Blacklists

A BL rule is a tuple (author, creatorSpec, creatorBehavior, T), where

author is the OSN user who specifies the rule, i.e., the wall owner;

creatorSpec is a creator specification, specified according to Definition 1;

creatorBehavior consists of two components RFBlocked and minBanned.

T denotes the time period the users identified by creatorSpec and creatorBehavior

have to be banned from author wall.

HARDWARE AND SOFTWARE REQUIREMENTSHARDWARE REQUIREMENTS:

Processor : Pentium Dual Core 2.00GHZ

Hard disk : 40 GB

Mouse : Logitech.

RAM : 2GB(minimum)

Keyboard : 110 keys enhanced.

SOFTWARE REQUIREMENTS:

Operating system : Windows7

IDE : Microsoft Visual Studio .Net 2010

Technology : Asp.Net

Coding Language : C#

Backend : SQL Server 2008

SYSTEM DESIGN

In our project user make our profile that details store in database. Logged users can see their

details and if they wish to change any of their information they can edit it.user add new friends

and view our friends and details. Logged users can see their friend list and if they wish to add

friends user add new friends and view our friends and details. Logged users can see their friend

list and if they wish to add friends user post new message and view our friend message. Logged

users post a photo on their wall at the time known or unknown persons are post comments about

that photo. Sometimes they are sending unwanted messages like vulgar, politics, Violence etc.

So we have to calculate the probability based on the comment contents. And those results will

send to user private wall. Based on the result user will take the decision.

USE CASE DIAGRAM & EXPLANATION

A use case diagram is a type of behavioral diagram created from a Use-case analysis. The

purpose of use case is to present overview of the functionality provided by the system in terms

of actors, their goals and any dependencies between those use cases.

In the below diagram eleven use cases are depicted. They are used to search result using CST

methods.

OSN User

Known Person

Unknown Person

Register/Login

Profile

Accept the Request

Send request

Post Photos and display

Comments

Calculate the Probability

OSN Managers

CLASS DIAGRAM & EXPLANATION

A class diagram in the UML, is a type of static structure diagram that describes the structure of

a system by showing the system’s classes, their attributes, and the relationships between the

classes.

Private visibility hides information from anything outside the class partition. Public

visibility allows all other classes to view the marked information.

Protected visibility allows child classes to access information they inherited from a parent class.

User

UsernamePassword

Login()Post Photos()Accept/Reject()Send Request()Receive Request()

Known/Unkown

UsernamePassword

Post Comments()Send request()Receive Request()

OSN Server

ProfileRelationship

Classifier()Black List()Calculate Probabilities()

OBJECT DIAGRAM & EXPLANATION

An object diagram in the Unified Modeling Language (UML) is a diagram that shows a

complete or partial view of the structure of a modeled system at a specific time.

An Object diagram focuses on some particular set of object instances and attributes, and the

links between the instances. A correlated set of object diagrams provides insight into how an

arbitrary view of a system is expected to evolve over time. Object diagrams are more concrete

Photos

Add photos=****

MessagePost message=****Probability=****

User LoginName=*****

Password=******

Profile

Category=*****Property=*****

Friends

Characters=****Book Information’s=****

Photos

Add photos=****

than class diagrams, and are often used to provide examples, or act as test cases for the class

diagrams. Only those aspects of a model that are of current interest need be shown on an object

diagram.

STATE DIAGRAM & EXPLANATION

A state diagram is a type of diagram used in computer science and related fields to describe the

behavior of systems. State diagrams require that the system described is composed of a finite

number of states; sometimes, this is indeed the case, while at other times this is a

reasonable abstraction. There are many forms of state diagrams, which differ slightly and have

different semantics.

User

Known/Unknown Person

Send Request

Accept the Request

Share the Photo

Post Comments


Login

ACTIVITY DIAGRAM & EXPLANATION

Activity diagram are a loosely defined diagram to show workflows of stepwise activities and

actions, with support for choice, iteration and concurrency. UML, activity diagrams can be used

to describe the business and operational step-by-step workflows of components in a system.

UML activity diagrams could potentially model the internal logic of a complex operation. In

many ways UML activity diagrams are the object-oriented equivalent of flow charts and data

flow diagrams(DFDs)from structural development.

Login

User Known/Unknown Persons

Send Request

Accept the Requests

Share Photos/Messages

Post the comments

SEQUENCE DIAGRAM & EXPLANATION

A sequence diagram in UML is a kind of interaction diagram that shows how processes operate

with one another and in what order.

It is a construct of a message sequence chart. Sequence diagrams are sometimes called

Event-trace diagrams, event scenarios, and timing diagrams.

Login User Known/Unknown Person

FirewallPublic Wall

Login

Login

Send request

Acccept the requests

Share Photos

Post Comments

Classification based on the content

Calculate the Probability based on the comments content

Direct Control Accept/Reject

COLLABORATION DIAGRAM & EXPLANATION

A collaboration diagram show the objects and relationships involved in an interaction,

and the sequence of messages exchanged among the objects during the interaction.

The collaboration diagram can be a decomposition of a class, class diagram, or part of

a class diagram. it can be the decomposition of a use case, use case diagram, or part of a use

case diagram.

The collaboration diagram shows messages being sent between classes and object

(instances). A diagram is created for each system operation that relates to the current

development cycle(iteration).

UserKnown/Unkno

wn Person

Login

FirewallPublic Wall

2: Login

6: Post Comments

7: Classification based on the content

8: Calculate the Probability based on the comments content

1: Login

3: Send request

4: Acccept the requests

5: Share Photos9: Direct Control Accept/Reject

COMPONENT DIAGRAM & EXPLANATION

A collaboration diagram show the objects and relationships involved in an interaction, and the

sequence of messages exchanged among the objects during the interaction.

The collaboration diagram can be a decomposition of a class, class diagram, or part of a

class diagram. It can be the decomposition of a use case, use case diagram, or part of a use case

diagram.

The collaboration diagram shows messages being sent between classes and object

(instances). A diagram is created for each system operation that relates to the current

development cycle (iteration).

Calculate Probability

User Known/Unknown Person

FireWall

Classifier

Login

Black List

DATA FLOW DIAGRAM & EXPLANATION

A data flow diagram(DFD) is a graphical representation of the “flow” of data through

an information system. It differs from the flowchart as it shows the data flow instead of the

control flow of the program. A data flow diagram can also be used for the visualization of data

processing. The DFD is designed to show how a system is divided into smaller portions and to

highlight the flow of data between those parts.

Level 0:

Level 1:

Level 2:

Level 3:

Level 4:

All Level Diagram:

Users Filter Wall

Having Account

Password

Username

Email ID

Profile

Known/UnknownPersons

Classifier

Black List

Password

Username Profile

Address

E-R DIAGRAM & EXPLANATION

In software engineering, an entity-relationship model (ERM) is an abstract and conceptual

representation of data. Entity-relationship modeling is a database modeling method, used to

produce a type of conceptual schema or semantic data model of a system, often a relational

database, and its requirements in a top-down fashion. Diagrams created by this process are

called entity-relationship diagrams, ER diagrams, or ERDs.

SYSTEM ARCHITECTURE & EXPLANATION

The OSN have three layers, there are graphical user interface, social network application and

social network managers. The social network managers handle the basic functionalities like

profile management, network based function etc. But in this project focused on other two layers

Send Friend Request

Accept the friend Request

Filter Wall Private Wall

Share Files

Known/Unknown

OSN User

Post Unwanted Comments

Label the comments based on the content

Analysis Creator Specification

Probability Result

and apply some new condition. Application layer have short text classifier and content based

message filtering. Short text classifier classifying the messages based on the content. Content

based message filter have black list and filtering policies. First, find relationship between the

user and message senders and it will filter and calculate the probabilities using classifier. And the

send a empty message below the probabilities result to the user. So our proposed system will

give the direct control to the user that what kind of messages displays on their wall.

FUTURE ENHANCEMENT

Future Enhancement Description

We plan to extend this work in several directions. In proposed system we have black

Comments temporarily based on the contents and create specification. But, in our future we have

to block the permanently based on the content probability and creator specifications.

Future Enhancement Module Diagram

Permanent Block

In this module the unknown persons post comments for the user share files. So we have plan to

black unknown creator comments permanently.

ADVANTAGES:

Proposed system give the control to what kind of messages post on own wall.

It is secure because filter wall act as Administrator.

The core components of the proposed system are the Content-Based Messages

Filtering (CBMF) and the Short Text Classifier modules.

Rule layer adopted for filtering unwanted messages. We start by describing

FRs, then we illustrate the use of BLs.

User

Known/Unknown Persons

Black Permanently

Share Photo

Comments

Creator Specification

Users

Data base

Known/Unknown

Black Permanently

Share Photo

BL mechanism to avoid messages from undesired creators, independent from

their contents.

APPLICATIONS

Online Social Network Application:

In online social networks application every account holder post the comments for the

user sharing files like messages or photos. In this process some known or unknown persons

post unwanted comments like politics, vulgar, violence and etc. So users have direct control

which kind of messages post on their wall. In this concept we have to implement all kind of

online social networks like facebook, twitter, orkut etc.

CONCLUSION:

We have presented system direct control to the user block unwanted messages on their

social network wall. The system using the machine language soft classifier to label the contents

is Neutral and Nonneutral. And then applying the Filter Rule based on the creators. Moreover,

the flexibility of the system in terms of filtering options is enhanced through the management of

BLs.

REFERENCE OR BIBLIOGRAPHY

1.M. Chau and H. Chen, “A Machine Learning Approach to Web Page Filtering Using Content

and Structure Analysis,” Decision Support Systems, vol. 44, no. 2, pp. 482-494, 2008.

2. R.J. Mooney and L. Roy, “Content-Based Book Recommending Using Learning for Text

Categorization,” Proc. Fifth ACM Conf. Digital Libraries, pp. 195-204, 2000.

3.F. Sebastiani, “Machine Learning in Automated Text Categorization,” ACM Computing

Surveys, vol. 34, no. 1, pp. 1-47, 2002.

4. M. Vanetti, E. Binaghi, B. Carminati, M. Carullo, and E. Ferrari, “Content-Based Filtering in

On-Line Social Networks,” Proc. ECML/PKDD Workshop Privacy and Security Issues in Data

Mining and Machine Learning (PSDML ’10), 2010.

5. N.J. Belkin and W.B. Croft, “Information Filtering and Information Retrieval: Two Sides of

the Same Coin?” Comm. ACM, vol. 35, no. 12, pp. 29-38, 1992.

6. P.J. Denning, “Electronic Junk,” Comm. ACM, vol. 25, no. 3, pp. 163-165, 1982.

7. P.W. Foltz and S.T. Dumais, “Personalized Information Delivery: An Analysis of Information

Filtering Methods,” Comm. ACM, vol. 35, no. 12, pp. 51-60, 1992. [9] P.S. Jacobs and L.F. Rau,

“Scisor: Extracting Information from On- Line News,” Comm. ACM, vol. 33, no. 11, pp. 88 97,

1990.

8.S. Pollock, “A Rule-Based Message Filtering System,” ACM Trans. Office Information

Systems, vol. 6, no. 3, pp. 232-254, 1988.

9.P.E. Baclace, “Competitive Agents for Information Filtering,” Comm. ACM, vol. 35, no. 12, p.

50, 1992.

10. M.J. Pazzani and D. Billsus, “Learning and Revising User Profiles: The Identification of

Interesting Web Sites,” Machine Learning, vol. 27, no. 3, pp. 313-331, 1997.

itddm04 fr

Documents