itddm04 fr
DESCRIPTION
ADdsdasTRANSCRIPT
A SYSTEM TO FILTER UNWANTED MESSAGES
FROM OSN USER WALLS
INTRODUCTION:
Online Social Networks (OSNs) are today one of the most popular interactive medium to
communicate, share, and disseminate a considerable amount of human life information. Daily
and continuous communications imply the exchange of several types of content, including free
text, image, audio, and video data. In the sharing In OSNs, information filtering can also be used
for a different, more sensitive, purpose. This is due to the fact that in OSNs there is the
possibility of posting or commenting other posts on particular public/private areas, called in
general walls. Information filtering can therefore be used to give users the ability to
automatically control the messages written on their own walls, by filtering out unwanted
messages.
SCOPE OF THE PROJECT:
The OSN have three layer there are graphical user interface, social network application and
social network managers. The social network managers handle the basic functionalities like
profile management, network based function etc. But in this project focused on other two layers
and apply some new condition. Application layer have short text classifier and content based
message filtering. Short text classifier classifying the messages based on the content. Content
based message filter have black list and filtering policies. First, find relationship between the
user and message senders and it will filter and calculate the probabilities using classifier. And the
send a empty message below the probabilities result to the user. So our proposed system will
give the direct control to the user that what kind of messages displays on their wall.
LITERATURE SURVEY:
Title: BoosTexter: A Boosting-based System for Text Categorization
Author: Robert E Schapire, Yoram Singer
Year: 2000
Description: We use adopt a different approach in which we use two extensions of AdaBoost that were
specifically intended for multiclass, multi-label data. In the first extension, the goal of the learning algorithm is to
predict all and only all of the correct labels. Thus, the learned classifier is evaluated in terms of its ability to predict a
good approximation of the set of labels associated with a given document. In the second extension, the goal is to
design a classifier that ranks the labels so that the correct labels will receive the highest ranks. We next describe
BoosTexter, a system which embodies four versions of boosting based on these extensions, and we discuss the
implementation issues that arise in multilabel text categorization.
Title:A Comparison of Classifiers and Document Representations for the Routing Problem
Author: H. Schutze, D.A. Hull,J.O. Pedersen
YEAR:1995
Description: We compare two approaches to document routing, relevance feedback via query
expansion and statistical classification with error minimization. We show that advanced
classification algorithms perform 10-15% better than relevance feedback on the Tipster
document collection. Since learning algorithms based on error minimization and numerical
optimization are computationally intensive and prone to over fitting in a high dimensional
feature space, it is necessary to apply some method of dimensionality reduction. We examine
two different approaches, latent semantic indexing and feature selection of terms using a x2 -test
of non independence.
Title: Content-Based Book Recommending Using Learning for Text Categorization
Author: Raymond J. Mooney, Loriene Roy
Year: 1999
Description: Recommender systems improve access to relevant products and information by
making personalized suggestions based on previous examples of a user's likes and dislikes. Most
existing recommender systems use social filtering methods that base recommendations on other
users' preferences. By contrast, content-based methods use information about an item itself to
make suggestions. This approach has the advantage of being able to recommended previously
unrated items to users with unique interests and to provide explanations for its recommendations.
We describe a content-based book recommending system that utilizes information extractionand
a machine-learning algorithm for text categorization. Initial experimental results demonstrate
that this approach can produce accurate recommendations. These experiments are based on
ratings from random samplings of items and we discuss problems with previous experiments that
employ skewed samples of user-selected examples to evaluate performance.
Title: Content-based Filtering in On-line Social Networks
Author: M. Vanetti, E. Binaghi, B. Carminati, M. Carullo and E. Ferrari
Year: 2010
Description: This work is the first step of a wider project. The early encouraging results we have
obtained on the classification procedure prompt us to continue with other work that will aim to
improve the quality of classification. Additionally, we plan to enhance our filtering rule system,
with a more sophisticated approach to manage those messages caught just for the tolerance and
to decide when a user should be inserted into a BL. For instance, the system can automatically
take a decision about the messages blocked because of the tolerance, on the basis of some
statistical data (e.g., number of blocked messages from the same author, number of times the
creator has been inserted in the BL) as well as data on creator profile (e.g., relationships with the
wall owner, age, sex). Further, we plan to test the robustness of our system against different
adversary models. The development of a GUI to make easier BL and filtering rule specification
is also a direction we plan to investigate.
Title: Inductive Learning Algorithms and Representations for Text Categorization
Author: Susan Dumais, John Platt, David Heckerman, Mehran Sahami
Year: 1998
Description: Here, we describe results from experiments using a collection of hand-tagged
financial newswire stories from Reuters. We use supervised learning methods to build our
classifiers, and evaluate the resulting models on new test cases. The focus of our work has been
on comparing the effectiveness of different inductive learning algorithms (Find Similar, Naïve
Bayes, Bayesian Networks, Decision Trees, and Support Vector Machines) in terms of learning
speed, real-time classification speed, and classification accuracy. We also explored alternative
document representations (words vs. syntactic phrases, and binary vs. non-binary features), and
training set size.
Title: Learning and Revising User Profiles: The Identification of Interesting Web SitesAuthor: M.J. Pazzani and D. Billsus
Year: 1997
Description: We discuss algorithms for learning and revising user profiles that can determine
which World Wide Web sites on a given topic would be interesting to a user. We describe the
use of a naive Bayesian classifier for this task, and demonstrate that it can incrementally learn
profiles from user feedback on the interestingness of Web sites. Furthermore, the Bayesian
classifier may easily be extended to revise user provided profiles. In an experimental evaluation
we compare the Bayesian classifier to computationally more intensive alternatives, and show that
it performs at least as well as these approaches throughout a range of different domains. In
addition, we empirically analyze the effects of providing the classifier with background
knowledge in form of user defined profiles and examine the use of lexical knowledge for feature
selection. We find that both approaches can substantially increase the prediction accuracy.
Title: Towards the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible ExtensionsAuthor: Gediminas Adomavicius and Alexander Tuzhilin
Year: 2001
Description: The paper presents an overview of the field of recommender
systems and describes the current generation of recommendation methods
that are usually classified into the following three main categories: content-
based, collaborative, and hybrid recommendation approaches. The paper
also describes various limitations of current recommendation methods and
discusses possible extensions that can improve recommendation capabilities
and make recommender systems applicable to an even broader range of
applications. These extensions include, among others, improvement of
understanding of users and items, incorporation of the contextual
information into the recommendation process, support for multi-criteria
Database
ratings, and provision of more flexible and less intrusive types of
recommendations.
Title:Text Categorization with Support Vector Machines: Learning with Many Relevant
Features
Author: T. Joachims
Year: 1998
Description: They are introduces support vector machines for text
categorization. It provides both theoretical and empirical evidence that SVMs
are very well suited for text categorization. The theoretical analysis
concludes that SVMs acknowledge the particular properties of text In that
experimental results show that SVMs consistently achieve good performance
on text on text categorization tasks, outperforming existing methods
substantially and significantly. With their ability to generalized well in high
dimensional feature spaces, SVMs eliminate the need for feature selection,
making the application of text categorization considerably easier. Another
advantage of SVMs over the conventional methods in their robustness.
Modules:
Authentication
Profile Generation
Accept Friends Request
Send Request
Share Photos
Post Comment or Text
Block Unwanted Comments
Module Description and Module Diagram:
Authentication (login /Registration)
User Create Profile Data Base
User New Friends Data Base
View Friends
In this module User want to register the personal details in the database and get the
authentication processes to go forward. In this module User want to give the database to admin
all the registration process is done by a admin. After the registration process completed User can
get the authentication permission, by using username and password login website. If the user
enters a valid username/password combination they will be granted to access data. If the user
enter invalid username and password that user will be considered as unauthorized user and
denied access to that user.
Profile Generation
In this module user make our profile that details store in database the profile contains name,
contact no, and email address, photos, and other information. Logged users can see their details
and if they wish to change any of their information they can edit it.
Accept Friends Request
In this module user add new friends and view our friends and details. Logged users can see their
friend list and if they wish to add friends
User Select friend Data Base
Send request
User Add New Photos
Data Base
View Photos
Publish
Share Photo Post Comments
Known or Unknown person
Send Request
In this module user select friend to send request. logged user view request accept our friend request.
Share Photos
In this module user get the photo from the database and add new photo and publish on their wall.
Post comments
Post Comments
Calculate Probability based on content of comments
Data Base
Display the probability result on user private wall
In this module user post any photo in public wall, any one post a comments for that photo.
Unknown persons also post comments but we don’t know the character about the member in
case out of group.
Block the unwanted comments
In this module we have to calculate the probability of the message contents. That result will be
display on the user private wall with two options like accept and reject
Input Output Design:
Authentication (login /Registration)
Input: Register the user Details and give the username and Password to login
Output: They will be granted to access the data’s
Profile
Input: user enter our information like email id, contact no, and other information
Output: create profile and store information into database
Friends
Input: user selects our friend to add
Output: add friend in our profile
Send Request
Input: user select friend to request
Output: Send request
Post Photos
Input: User post some photos on their wall.
Output: Display the photos.
Post Comments
Input: Known or unknown persons are send a comment.
Output: The comments will display.
Block the unwanted comments
Input: The comments compare to the database data and calculate the probability.
Output: The probability results display on user private wall so we have to control directly.
TECHNIQUE USED OR ALGORITHM USED
Machine Learning (ML) Text Categorization Techniques
Step 1: SHORT TEXT CLASSIFIER
i)Text Representation
we consider three types of features, BoW, Document properties (Dp) andContextual Features
(CF). they are entirely derived from the information contained within the text of the message.
Text representation using endogenous knowledge has a good general applicability; however, in
operational settings, it is legitimate to use also exogenous knowledge, i.e., any source of
information outside the message body but directly or indirectly related to the message itself.
ii) Machine Learning-Based Classification
We address short text categorization as a hierarchical twolevel classification process. The first-
level classifier performs a binary hard categorization that labels messages as Neutral and
Nonneutral. The first-level filtering task facilitates thesubsequent second-level task in which a
finer-grained classification is performed. The second-level classifier performs a soft-partition of
Nonneutral messages assigning a given message a gradual membership to each of the nonneutral
classes.
iii) Radial Basis Function Networks (RBFN)
RFBNs have a single hidden layer of processing units with local, restricted activation domain: a
Gaussian function is commonly used, but any other locally tunable function can be used. They
were introduced as a neural network evolution of exact interpolation, and are demonstrated to
have the universal approximation property.
Step 2: FILTERING RULES AND BLACKLIST MANAGEMENT
i)Filtering Rules:
a) Creator specification
This implies to state conditions on type, depth, and trust values of the relationship( s)
creators should be involved in order to apply themthe specified rules. A creator
specification creatorSpec implicitly denotes a set of OSN users
A set of attribute constraints of the form an OP av,where an is a user profile
attribute name, av and OP are, respectively, a profile attribute value and a
comparison operator, compatible with an’s domain.
A set of relationship constraints of the form (m, rt, minDepth, maxTrust) denoting
all the OSN users participating with user m in a relationship of type rt, having a
depth greater than or equal to minDepth, and a trust value less than or equal to
maxTrust.
b)Filtering Rule
author is the user who specifies the rule;
creatorSpec is a creator specification, specified according to Definition 1;
contentSpec is a Boolean expression defined on content constraints of the form
(C, ml) where C is a class of the first or second level and ml is the minimum
membership level threshold required for class C to make the constraint satisfied;
action { fblock notify} denotes the action to be performed by the system on the
messages matching contentSpec and created by users identified by creatorSpec.
ii) Blacklists
A BL rule is a tuple (author, creatorSpec, creatorBehavior, T), where
author is the OSN user who specifies the rule, i.e., the wall owner;
creatorSpec is a creator specification, specified according to Definition 1;
creatorBehavior consists of two components RFBlocked and minBanned.
T denotes the time period the users identified by creatorSpec and creatorBehavior
have to be banned from author wall.
HARDWARE AND SOFTWARE REQUIREMENTSHARDWARE REQUIREMENTS:
Processor : Pentium Dual Core 2.00GHZ
Hard disk : 40 GB
Mouse : Logitech.
RAM : 2GB(minimum)
Keyboard : 110 keys enhanced.
SOFTWARE REQUIREMENTS:
Operating system : Windows7
IDE : Microsoft Visual Studio .Net 2010
Technology : Asp.Net
Coding Language : C#
Backend : SQL Server 2008
SYSTEM DESIGN
In our project user make our profile that details store in database. Logged users can see their
details and if they wish to change any of their information they can edit it.user add new friends
and view our friends and details. Logged users can see their friend list and if they wish to add
friends user add new friends and view our friends and details. Logged users can see their friend
list and if they wish to add friends user post new message and view our friend message. Logged
users post a photo on their wall at the time known or unknown persons are post comments about
that photo. Sometimes they are sending unwanted messages like vulgar, politics, Violence etc.
So we have to calculate the probability based on the comment contents. And those results will
send to user private wall. Based on the result user will take the decision.
USE CASE DIAGRAM & EXPLANATION
A use case diagram is a type of behavioral diagram created from a Use-case analysis. The
purpose of use case is to present overview of the functionality provided by the system in terms
of actors, their goals and any dependencies between those use cases.
In the below diagram eleven use cases are depicted. They are used to search result using CST
methods.
OSN User
Known Person
Unknown Person
Register/Login
Profile
Accept the Request
Send request
Post Photos and display
Comments
Calculate the Probability
OSN Managers
CLASS DIAGRAM & EXPLANATION
A class diagram in the UML, is a type of static structure diagram that describes the structure of
a system by showing the system’s classes, their attributes, and the relationships between the
classes.
Private visibility hides information from anything outside the class partition. Public
visibility allows all other classes to view the marked information.
Protected visibility allows child classes to access information they inherited from a parent class.
User
UsernamePassword
Login()Post Photos()Accept/Reject()Send Request()Receive Request()
Known/Unkown
UsernamePassword
Post Comments()Send request()Receive Request()
OSN Server
ProfileRelationship
Classifier()Black List()Calculate Probabilities()
OBJECT DIAGRAM & EXPLANATION
An object diagram in the Unified Modeling Language (UML) is a diagram that shows a
complete or partial view of the structure of a modeled system at a specific time.
An Object diagram focuses on some particular set of object instances and attributes, and the
links between the instances. A correlated set of object diagrams provides insight into how an
arbitrary view of a system is expected to evolve over time. Object diagrams are more concrete
Photos
Add photos=****
MessagePost message=****Probability=****
User LoginName=*****
Password=******
Profile
Category=*****Property=*****
Friends
Characters=****Book Information’s=****
Photos
Add photos=****
than class diagrams, and are often used to provide examples, or act as test cases for the class
diagrams. Only those aspects of a model that are of current interest need be shown on an object
diagram.
STATE DIAGRAM & EXPLANATION
A state diagram is a type of diagram used in computer science and related fields to describe the
behavior of systems. State diagrams require that the system described is composed of a finite
number of states; sometimes, this is indeed the case, while at other times this is a
reasonable abstraction. There are many forms of state diagrams, which differ slightly and have
different semantics.
User
Known/Unknown Person
Send Request
Accept the Request
Share the Photo
Post Comments
Block the unwanted comments
Login
ACTIVITY DIAGRAM & EXPLANATION
Activity diagram are a loosely defined diagram to show workflows of stepwise activities and
actions, with support for choice, iteration and concurrency. UML, activity diagrams can be used
to describe the business and operational step-by-step workflows of components in a system.
UML activity diagrams could potentially model the internal logic of a complex operation. In
many ways UML activity diagrams are the object-oriented equivalent of flow charts and data
flow diagrams(DFDs)from structural development.
Login
User Known/Unknown Persons
Send Request
Accept the Requests
Share Photos/Messages
Post the comments
SEQUENCE DIAGRAM & EXPLANATION
A sequence diagram in UML is a kind of interaction diagram that shows how processes operate
with one another and in what order.
It is a construct of a message sequence chart. Sequence diagrams are sometimes called
Event-trace diagrams, event scenarios, and timing diagrams.
Login User Known/Unknown Person
FirewallPublic Wall
Login
Login
Send request
Acccept the requests
Share Photos
Post Comments
Classification based on the content
Calculate the Probability based on the comments content
Direct Control Accept/Reject
COLLABORATION DIAGRAM & EXPLANATION
A collaboration diagram show the objects and relationships involved in an interaction,
and the sequence of messages exchanged among the objects during the interaction.
The collaboration diagram can be a decomposition of a class, class diagram, or part of
a class diagram. it can be the decomposition of a use case, use case diagram, or part of a use
case diagram.
The collaboration diagram shows messages being sent between classes and object
(instances). A diagram is created for each system operation that relates to the current
development cycle(iteration).
UserKnown/Unkno
wn Person
Login
FirewallPublic Wall
2: Login
6: Post Comments
7: Classification based on the content
8: Calculate the Probability based on the comments content
1: Login
3: Send request
4: Acccept the requests
5: Share Photos9: Direct Control Accept/Reject
COMPONENT DIAGRAM & EXPLANATION
A collaboration diagram show the objects and relationships involved in an interaction, and the
sequence of messages exchanged among the objects during the interaction.
The collaboration diagram can be a decomposition of a class, class diagram, or part of a
class diagram. It can be the decomposition of a use case, use case diagram, or part of a use case
diagram.
The collaboration diagram shows messages being sent between classes and object
(instances). A diagram is created for each system operation that relates to the current
development cycle (iteration).
Calculate Probability
User Known/Unknown Person
FireWall
Classifier
Login
Black List
DATA FLOW DIAGRAM & EXPLANATION
A data flow diagram(DFD) is a graphical representation of the “flow” of data through
an information system. It differs from the flowchart as it shows the data flow instead of the
control flow of the program. A data flow diagram can also be used for the visualization of data
processing. The DFD is designed to show how a system is divided into smaller portions and to
highlight the flow of data between those parts.
Level 0:
Level 1:
Level 2:
Level 3:
Level 4:
All Level Diagram:
Users Filter Wall
Having Account
Password
Username
Email ID
Profile
Known/UnknownPersons
Classifier
Black List
Password
Username Profile
Address
E-R DIAGRAM & EXPLANATION
In software engineering, an entity-relationship model (ERM) is an abstract and conceptual
representation of data. Entity-relationship modeling is a database modeling method, used to
produce a type of conceptual schema or semantic data model of a system, often a relational
database, and its requirements in a top-down fashion. Diagrams created by this process are
called entity-relationship diagrams, ER diagrams, or ERDs.
SYSTEM ARCHITECTURE & EXPLANATION
The OSN have three layers, there are graphical user interface, social network application and
social network managers. The social network managers handle the basic functionalities like
profile management, network based function etc. But in this project focused on other two layers
Send Friend Request
Accept the friend Request
Filter Wall Private Wall
Share Files
Known/Unknown
OSN User
Post Unwanted Comments
Label the comments based on the content
Analysis Creator Specification
Probability Result
and apply some new condition. Application layer have short text classifier and content based
message filtering. Short text classifier classifying the messages based on the content. Content
based message filter have black list and filtering policies. First, find relationship between the
user and message senders and it will filter and calculate the probabilities using classifier. And the
send a empty message below the probabilities result to the user. So our proposed system will
give the direct control to the user that what kind of messages displays on their wall.
FUTURE ENHANCEMENT
Future Enhancement Description
We plan to extend this work in several directions. In proposed system we have black
Comments temporarily based on the contents and create specification. But, in our future we have
to block the permanently based on the content probability and creator specifications.
Future Enhancement Module Diagram
Permanent Block
In this module the unknown persons post comments for the user share files. So we have plan to
black unknown creator comments permanently.
ADVANTAGES:
Proposed system give the control to what kind of messages post on own wall.
It is secure because filter wall act as Administrator.
The core components of the proposed system are the Content-Based Messages
Filtering (CBMF) and the Short Text Classifier modules.
Rule layer adopted for filtering unwanted messages. We start by describing
FRs, then we illustrate the use of BLs.
User
Known/Unknown Persons
Black Permanently
Share Photo
Comments
Creator Specification
Users
Data base
Known/Unknown
Black Permanently
Share Photo
BL mechanism to avoid messages from undesired creators, independent from
their contents.
APPLICATIONS
Online Social Network Application:
In online social networks application every account holder post the comments for the
user sharing files like messages or photos. In this process some known or unknown persons
post unwanted comments like politics, vulgar, violence and etc. So users have direct control
which kind of messages post on their wall. In this concept we have to implement all kind of
online social networks like facebook, twitter, orkut etc.
CONCLUSION:
We have presented system direct control to the user block unwanted messages on their
social network wall. The system using the machine language soft classifier to label the contents
is Neutral and Nonneutral. And then applying the Filter Rule based on the creators. Moreover,
the flexibility of the system in terms of filtering options is enhanced through the management of
BLs.
REFERENCE OR BIBLIOGRAPHY
1.M. Chau and H. Chen, “A Machine Learning Approach to Web Page Filtering Using Content
and Structure Analysis,” Decision Support Systems, vol. 44, no. 2, pp. 482-494, 2008.
2. R.J. Mooney and L. Roy, “Content-Based Book Recommending Using Learning for Text
Categorization,” Proc. Fifth ACM Conf. Digital Libraries, pp. 195-204, 2000.
3.F. Sebastiani, “Machine Learning in Automated Text Categorization,” ACM Computing
Surveys, vol. 34, no. 1, pp. 1-47, 2002.
4. M. Vanetti, E. Binaghi, B. Carminati, M. Carullo, and E. Ferrari, “Content-Based Filtering in
On-Line Social Networks,” Proc. ECML/PKDD Workshop Privacy and Security Issues in Data
Mining and Machine Learning (PSDML ’10), 2010.
5. N.J. Belkin and W.B. Croft, “Information Filtering and Information Retrieval: Two Sides of
the Same Coin?” Comm. ACM, vol. 35, no. 12, pp. 29-38, 1992.
6. P.J. Denning, “Electronic Junk,” Comm. ACM, vol. 25, no. 3, pp. 163-165, 1982.
7. P.W. Foltz and S.T. Dumais, “Personalized Information Delivery: An Analysis of Information
Filtering Methods,” Comm. ACM, vol. 35, no. 12, pp. 51-60, 1992. [9] P.S. Jacobs and L.F. Rau,
“Scisor: Extracting Information from On- Line News,” Comm. ACM, vol. 33, no. 11, pp. 88 97,
1990.
8.S. Pollock, “A Rule-Based Message Filtering System,” ACM Trans. Office Information
Systems, vol. 6, no. 3, pp. 232-254, 1988.
9.P.E. Baclace, “Competitive Agents for Information Filtering,” Comm. ACM, vol. 35, no. 12, p.
50, 1992.
10. M.J. Pazzani and D. Billsus, “Learning and Revising User Profiles: The Identification of
Interesting Web Sites,” Machine Learning, vol. 27, no. 3, pp. 313-331, 1997.