© 2019 ijrar august 2019, volume 6, issue 3 ...ijrar.org/papers/ijrar19k4870.pdf · a comparative...
TRANSCRIPT
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 488
A COMPARATIVE STUDY AND SENTIMENT
ANALYSIS USING KONSTANZ INFORMATION
MINER IN SOCIAL NETWORKS 1A Anitha of 1st Author2DrSSivakumar of 2nd Author
1MSc ampResearch scholar of 1st Author2MCAMPhilPhDampHead and Asstprofessor of 2nd Author 1Department of Computer Science of 1st Author
1Thanthai Hans Roever College of 1st AuthorPerambalurIndia
Abstract The sentiment analysis process that gives this work its name is the main theme of the work Since the beginning of 2000
sentiment analysis has become one of the most active research areas by researchers working on natural language processing and social
networking analysis In addition data mining web mining and text mining are also studied extensively Moreover the method of
sentiment analysis has spread over many fields from computer science to management science from social science to economics due
to the importance given to the business world as a whole and the collectivity In this study Konstanz Information Miner (KNIME)
which is a powerful data mining tool with its richest features and many visualization tools was used on Facebook data Ten thousand
Facebook data were used in this study The sentiment analysis study which is in fact a classification study was conducted using machine learning algorithms on Facebook data The results of the study were interpreted by carrying out an accuracy analysis It is
anticipated that the use of the KNIME which has rich visualization tools will be widespread in sentiment analysis studies to make
these works both easier and more reliable As the same the texts messages and the contents of the users datasets are collected and analyzed for positive and negative
words whereas the users can encrypt their contents and allow access for requests as well as the messages received to and fro from the
sender as well as the receiver The shared contents can be defined here as public and private in which public posts can be viewed by
all whereas private shared contents can be viewed only by the permission allowed by the owner At the same time more than two or
three attempts of negative words from the sender side has been analyzed and blocked for the sake of senders security issues
IndexTerms - Sentiment Analysis Opinion Mining KNIME Facebook Social Media
IINTRODUCTION
Opinions are at the center of almost all human activities and are an important reflection of our behavior Our belief in reality
our perceptions and our choices depend on how others see and appreciate the world For this reason we often refer to others opinions
when we need to make a decision This does not apply only to individuals It also applies to organizations In the real world companies
and organizations always want to receive opinions and comments from consumers or the public about their products and services
Individual consumers want to know the views of current users before a product is purchased or the opinions of others about political
candidates before giving a vote on political elections Getting public opinion and consumer perspectives has long been a major
workload for marketing public relations and political campaign companies With the help of social networks (for example criticism forum discussions blogs microblogs Facebook comments and posts on social networking sites) and as a result of increased power in
decision-making of social media individuals and organizations have become inevitable to take into account the content of these media
In recent years industrial activities involving sentiment analysis are also developing rapidly A large number of new initiatives have
emerged in this area Many large corporations have developed their own sentiment analysis systems to measure the quality of on-site
services thereby creating awareness in the business and social environment
Social media plays a vital role in marketing and creating relationships with customers With limited barrier to entry small
businesses are beginning to use social media as a means of marketing Unfortunately many small businesses struggle to use social
media and have no strategy going into it As a result without a basic understanding of the advantages of social media and how to use
it to engage customers countless opportunities are missed The research aims to acquire an initial understanding of how a small
business recognized for using social media to grow the business uses social media to engage customers In todayrsquos technology driven
world social networking sites have become an avenue where retailers can extend their marketing campaigns to a wider range of
consumers Chi (2011 46) defines social media marketing as a ldquoconnection between brands and consumers [while] offering a personal channel and currency for user centered networking and social interactionrdquo The tools and approaches for communicating with
customers have changed greatly with the emergence of social media therefore businesses must learn how to use social media in a
way that is consistent with their business plan (Mangold and Faulds 2099) This is especially true for companies striving to gain a
competitive advantage This review examines current literature that focuses on a retailerrsquos development and use of social media as an
extension of their marketing strategy This phenomenon has only developed within the last decade thus social media research has
largely focused on (1) defining what it is through the explanation of new terminology and concepts that makeup its foundations and
(2) exploring the impact of a companyrsquos integration of social media on consumer behavior This paper begins with an explanation of
terminology that defines social media marketing followed by a discussion of the four main themes found within current research
studies Virtual Brand Communities Consumers Attitudes and Motives User Generated Content and Viral Advertising 2 Although
social media marketing is a well-researched topic it has only been studied through experimental and theoretical research studies
never precisely describe the benefits retailers gain from this marketing tactic In reviewing the rich plethora of multi-disciplinary literature it is has become clear that studies are focusing on describing what social media marketing is as well as examining what
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 489
factors affect consumer behavior relative to social networking Despite the initial progress made by researchers development in this
area of study has been limited
II DOMAIN INTRODUCTION
Big Data
Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process
them using traditional data processing applications The challenges include analysis capture duration search sharing storage
transfer visualization and privacy violations The trend to larger data sets is due to the additional information derivable from analysis
of a single large set of related data as compared to separate smaller sets with the same total amount of data allowing correlations to
be found to spot business trends prevent diseases combat crime and so on
Scientists regularly encounter limitations due to large data sets in many areas including meteorology genomics connectomics complex physics simulations and biological and environmental research The limitations also affect Internet
search finance and business informatics Data sets grow in size in part because they are increasingly being gathered by ubiquitous
information-sensing mobile devices aerial sensory technologies (remote sensing) software logs cameras microphones radio-
frequency identification (RFID) readers and wireless sensor networks The worlds technological per-capita capacity to store
information has roughly doubled every 40 months since the 1980s as of 2012 every day 25exabytes (25times1018) of data were created
The challenge for large enterprises is determining who should own big data initiatives that straddle the entire organization
Big data is difficult to work with using most relational database management systems and desktop statistics and visualization
packages requiring instead massively parallel software running on tens hundreds or even thousands of servers What is considered
big data varies depending on the capabilities of the organization managing the set and on the capabilities of the applications that
are traditionally used to process and analyze the data set in its domain Big Data is a moving target what is considered to be Big
today will not be so years ahead For some organizations facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options For others it may take tens or hundreds of terabytes before data size becomes a significant
consideration
Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture curate manage and
process data within a tolerable elapsed time Big data size is a constantly moving target as of 2012 ranging from a few dozen
terabytes to many peta bytes of data Big data is a set of techniques and technologies that require new forms of integration to uncover
large hidden values from large datasets that are diverse complex and of a massive scale
In a 2001 research report and related lectures META Group (now Gartner) analyst Doug Laney defined data growth challenges and
opportunities as being three-dimensional ie increasing volume (amount of data) velocity (speed of data in and out) and variety
(range of data types and sources) Gartner and now much of the industry continue to use this 3Vs model for describing big data In
2012Gartner updated its definition as follows Big data is high volume high velocity andor high variety information assets that
require new forms of processing to enable enhanced decision making
insight discovery and process optimization Additionally a new V Veracity is added by some organizations to describe it If Gartnerrsquos definition (the 3Vs) is still widely used the growing maturity of the concept fosters a more sound difference between big
data and Business Intelligence regarding data and their use
Business Intelligence uses descriptive statistics with data with high information density to measure things detect trends etc Big data
uses inductive statistics and concepts from nonlinear system identification to infer laws (regressions nonlinear relationships and
causal effects) from large sets of data with low information density to reveal relationships dependencies and perform predictions of
outcomes and behaviors Big data can also be defined as Big data is a large volume unstructured data which cannot be handled by
standard database management systems
like DBMS RDBMS or ORDBMS
Big data can be described by the following characteristics
Volume ndash The quantity of data that is generated is very important in this context It is the size of the data which determines the value
and potential of the data under consideration and whether it can actually be considered as Big Data or not The name lsquoBig Datarsquo itself
contains a term which is related to size and hence the characteristic
Variety - The next aspect of Big Data is its variety This means that the category to which Big Data belongs to is also a very essential
fact that needs to be known by the data analysts This helps the people who are closely analyzing the data and are associated with it
to effectively use the data to their advantage and thus upholding the importance of the Big Data
Velocity - The term lsquovelocityrsquo in the context refers to the speed of generation of data or how fast the data is generated and processed
to meet the demands and the challenges which lie ahead in the path of growth and development
Variability - This is a factor which can be a problem for those who analyze the data This refers to the inconsistency which can be
shown by the data at times thus hampering the process of being able to handle and manage the data effectively
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 490
Veracity - The quality of the data being captured can vary greatly Accuracy of analysis depends on the veracity of the source data
Complexity - Data management can become a very complex process especially when large volumes of data come from multiple sources These data need to be linked connected and correlated in order to be able to grasp the information that is supposed to be
conveyed by these data This situation is therefore termed as the lsquocomplexityrsquo of Big Data Big data analytics enables organizations
to analyze a mix of structured semi-structured and unstructured data in search of valuable business information and insights
III EXISTING SYSTEM Long before the awareness of the Internet became widespread some of our friends asked me to recommend a television ask
who they planned to vote for in local elections ask colleagues to ask for reference letters about business owners for job applications
or what dishwashers they wanted to buy Today however the development of internet technologies has given us the opportunity to
discover the views and experiences that we both have both personal and well known professional critics Such studies show that more
and more people are starting to present their opinions for foreigners on the internet Ideas such as opinions measurements evaluations
attitudes interpretations and concepts related to them are the areas of study of sentiment analysis and opinion mining The rapid growth of workspaces has helped to increase the use of forums discussion and dating pages blogs micro blogs and other social media
tools among people With the increasing use of the Internet communication infrastructure has undergone radical changes The social
sharing sites that emerged with the development of information technologies have had an important place in human life Among the
most popular social networking sites on the planet are web sites and applications like Facebook Instagram YouTube Google Play
Vine blog micro blog social networking and social bookmarking services All these services come together to reveal the Social
Media structure
In addition to real-life applications research articles have also been published in the field of sentiment analysis For example Leilei
and his team conducted a sentiment analysis study using Facebook data to predict the election results Different sentiment analysis
studies were conducted by Bernardo Mahesh and Venetis using Facebook data movie reviews and blogs to estimate the box office
revenue of films Ozel has conducted a survey using a software tool called Limesurvey a web based survey interface In this study
the effect of using Facebook of employees on company profile was analyzed The survey was tweeted and retweeted by 10 different
Facebook accounts to reach two thousand Facebook users The obtained data were analyzed by statistical analysis Sentiment analysis studies using supervised learning approach from machine learning methods in social networks In doing this study the data set
consisting of the interpretations of various products of some food companies on Facebook manually are obtained Oguz and his team
have used Facebook messages and newspaper sites to investigate the detection of influenza-like illnesses through social media
Facebook data were collected using free Topsy real-time search engine application developed for social media Yazan and Uskudarli
have made earthquake detection through social networks The Streaming API developed by Facebook is used to get the data
Disadvantages of the existing system
Event detection and summarization opinion mining sentiment analysis and many others
Limited length of a tweet (ie 140 characters) and no restrictions on its writing styles tweets often contain grammatical
errors misspellings and informal abbreviations
On the other hand despite the noisy nature of tweets the core semantic information is well preserved in tweets in the form of named entities or semantic phrases
IV PROPOSED SYSTEM
In this study sentiment analysis study was done on Facebook data The first step of this study is to collect data It is known
that collecting the data and the data sets require the most time and power for the researchers who work on social media There are
many different types of data collection on social networks When we look at literature there are many tools and methods for collecting
data There are many tools and methods for collecting data when we look at previous studies Among these the most commonly used
are custom designed APIs web crawling web scraping operations and scripts In this study it is aimed to obtain the data sets in a
meaningful and regular manner based on Facebook data and to carry out sentiment analysis work In this study tagged Facebook data set named Sentiment 140 was used This set was created by Stanford Universitys
Computer Science graduate students Alec Go Richa Bhayani and Lei Huang This data set contains about 16 million positive and
negative tagged Facebook data When these data were collected and tagged the emoji contained in each data was used For example
smiley is considered to have a positive tag because it is an emotion expressing happiness It is likewise classified as negative because
it is an emotion containing the phrase 1048623 sadness 10 thousand Facebook data were used in this study Two different data sets were
created The total number of Facebook data in both sets is 5 thousand The number of Facebook data with positive and negative tags
in each set was calculated
Advantages
Reduces noisy and irrelevant words that are not associated to the users and maintains their privacy
Data and information are maintained with better security and dilutes the negative words up to an extent
Provides secure measures by avoiding or blocking negative words and improves the efficiency
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 491
V LITERATURE SURVEY
51 Improving Entity Resolution with Global Constraints
Jim Gemmell
In this paper investigate another online socio-economic property that to our knowledge has never been exploited that site
listing entities have an incentive to avoid gratuitous duplicates For instance duplicate movies in IMDB would have reviews and
corrections applied to one copy and not the other If Netix has one entry for a DVD and a duplicate for the Blu-ray version then their
customers might be looking at one and not realize the other is available Hulu supports Face-book likes for their movies and could
have the like counts diluted by duplicates
Additional examples in other domains are easily constructed We leverage this socio-economic property to resolve entities across the
different web sites by applying a global one-to-one constraint to produced matchings The resulting resolution has much better
accuracy compared to matching without such a constraint Our framework for one-to-one entity resolution (ER) is generic in that it
can constrain existing resolution methods using weighted graph matching The goal is not to engineer features or tune high-
performance domain-specific ER but rather to develop generic algorithms that can be combined with existing methods for improving retrieval performance The purpose of this paper is not to investigate alternate scoring functions but rather to explore generic
algorithms for constrained ER In this section we describe the abstract ER problem the framework for including particular scoring
approaches and several generic algorithms for constrained ER
52 Entity Resolution Theory Practice amp Open Challenges
Author Lise Getoor
In this paper begin by introducing a simple abstraction for the entity resolution problem We categorize ER based on the type
of input ndash single-entity ER where all mentions correspond to a single entity type relational ER where real world entities are linked
(like in a social network) and multi-entity ER representing the most general problem with potentially linked mentions of different
entity types (eg products sellers and reviews) We survey classical techniques for ER which assume that there exists a distance
function between pairs of mentions These techniques can be broadly classified as pair-wise ER where the decision to match a pair
of mentions is made independent of other mentions and cluster-based ER where equivalence classes of entities are constructed via
clustering Pair-wise ER is well suited for the problem of aligning two databases of the same set of entities (eg lists of restaurants from two sites) We survey common algorithms for computing similarity functions between mentions and rule based and probabilistic
methods for pair-wise and cluster based ER We also discuss techniques for computing cluster representatives aka canonical entities
from database and machine learning communities We conclude this section by discussing the state of the art collective probabilistic
inference techniques for multi entity ER These techniques are becoming popular due to an abundance of redundant mentions of
entities on the Web that are also linked and techniques that only consider one entity type and that ignore links perform poorly We
describe approaches based on multi-relational clustering algorithms probabilistic generative models and probabilistic logical
languages eg Markov logic networks and probabilistic soft logic
53 Matching Unstructured Product Offers to Structured Product Specifications
Author Anitha Kannan
We have a large database of product specifications Each product specification (which we shall interchangeably call
lsquoproductrsquo) consists of a set of attribute ⟨name value⟩ pairs and is represented in the database as a structured record Some of the attributes can be numeric while the others can be categorical The unstructured offer descriptions (which we shall call lsquoofferrsquo for
short) are comprised of free text The text has embedded in it some of the values and possibly some attribute names corresponding to
one of the products The text may also contain additional words The attribute names and values in the text may not precisely match
those found in the database The text does not contain an identifier that uniquely identifies the corresponding product Different textual
descriptions may be provided for the same product An offer may match more than one product as only partial descriptions are provided
in the offers and because the same real-world product might have multiple representations in the product database We performed
extensive experiments using Bing Shopping catalog to understand the performance characteristics of the proposed solution The
experimental results show that the proposed approach scores high on F-measure and consistently beats baseline approaches for product
categories that have reasonably rich attribute structure and good data They also point to the desirability of hybrid solutions that
additionally make use of classical text matching techniques for attribute impoverished product categories The methodology we
employed for analyzing the experimental results might also be of interest to those building and analyzing web scale systems
54 Title Frameworks for entity matching A comparison
Author Hanna Koumlpcke
The functional comparison reveals a number of further research directions All frameworks focus on offline matching ie
they do not yet cover online matching The definition of the blocking key is not yet derived (semi) automatically from training data
but has to be specified manually in all considered frameworks While attribute value matchers are well supported the combination of
context and attribute matchers is not and should be further studied Training-based EM frameworks should provide more support for
(semi-)automatic selection of suitable training data with low labeling effort So far training- based approaches only helped to optimize
some decisions eg determining parameters for matchers (eg similarity thresholds) and combination functions (eg weights for
matchers) while other decisions (eg selection of the similarity functions and attributes to be evaluated) still have to be determined
manually The published framework evaluations used diverse methodologies measures and test problems making it difficult to assess
the effectiveness and efficiency of each single system While the reported evaluation results are usually very positive the tests so far mostly dealt with small match problems so that the scalability of most approaches is unclear Hence scalability to large test cases
needs to be better addressed in future frameworks Some recent work regarding scalability has focused on computational aspects of
string similarity computation and time completeness trade-offs Furthermore we see a strong need for comparative performance
evaluations of different frameworks and EM strategies Standardized benchmarks for entity matching are needed for comparative
investigations first proposals exist but have not yet been implemented or applied Published evaluation results should also be
reproducible by other researchers ideally by providing the prototype implementations and test data
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 492
VI ARCHITECTURE DIAGRAM
VII MODULES
Data Acquisition
Preprocessing
Hybrid segmentation
Named Entity Recognition
Performance Evaluation
71 Modules Description
711 Data Acquisition
Facebook is an online social networking service that enables users to send and read messages images as well as videos posts
Registered users can read and post but unregistered users can only read them it is also only if the concerned data owner provides
permission then only it is possible Users access Facebook through the website interface or mobile device app In order to have an
opinion about the user his posts have to be examined Therefore using Facebook API all posts posted by user are crawled first In
this study we tried to examine the user with not only his posts but also his friendsrsquo posts However crawling all friendsrsquo posts is a
huge overload and misleading since Facebook following mechanism does not show an actual interest every time People sometimes tend to follow some users for a temporary occasion and then forget to un-follow Sometimes they follow some users just to be informed
of although they are not actually interested in There are also friends that do not post for a long time but still followed by the user In
this module we can upload the datasets as CSV file It contains following id followers id time stamp user following user followers
and posts The data of entire consumers of the facebook has been examined and their entities are analyzed for better process The data
that has been acquired are the posts that has been done by the users messages that has been sent and received and so on it continuous
712 Preprocessing
For named entities to be extracted successfully the informal writing style in posts has to be handled Before real data has
entered our lives studies on the area were being conducted on formal texts such as news articles Generally named entities are assumed
as words written in uppercase or mixed case phrases where uppercased letters are at the beginning and ending and almost all of the
studies bases on this assumption However capitalization is not a strong indicator in posts like informal texts sometimes even misleading As the example of capitalization shows the approaches have to be changed To extract named entities in posts the effect
of the informality of the posts has to be minimized as possible
713 Hybrid segmentation Hybrid Segmentation learns from both global and local contexts and has the ability of learning from pseudo feedback
HybridSeg is also designed to iteratively learn from confident segments as pseudo feedback Posts are posted for information sharing
and communication The named entities and semantic phrases are well preserved in posts The global context derived from Web pages
therefore helps identifying the meaningful segments in posts The well preserved linguistic features in these posts facilitate named
entity recognition with high accuracy Each named entity is a valid segment The method utilizing local linguistic features is denoted
by HybridSegNER It obtains confident segments based on the voting results of multiple off-the-shelf NER tools Another method
utilizing local collocation knowledge denoted by HybridSegNGram is proposed based on the observation that many posts published
within a short time period are about the same topic HybridSegNGram segments the posts by estimating the term-dependency within a batch of posts The segments recognized based on local context with high confidence serve as good feedback to extract more
meaningful segments The learning from pseudo feedback is conducted iteratively and the method implementing the iterative learning
is named HybridSegIter
714 Named Entity Recognition
Named Entity Recognition can be basically defined as identifying and categorizing certain type of data (ie person location
organization names and date-time and numeric expressions) in a certain type of text On the other hand tweets are characteristically
Data Acquisition
Datasets
Preprocessing
Stop Removal
Stemming words
analysis
Tokenization
Hybrid
Segmentation Global
Context Local Context
Pseudo
Feedback
POS tagger
Named Entity
Recognition
Network
Content Features
Blog Features
KNN classifiers
Trained
Rumors
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 493
short and noisy Given the length of a posts and restriction free writing style named entity recognition on this type of data become
challenging After basic segmentation a great number of named entities in the text such as personal names location names and organization names are not yet segmented and recognized properly Part of speech tagging is applicable to a wide range of NLP tasks
including named entity segmentation and information extraction Named Entity Recognition strategies vary on basically three factors
Language textual genre and domain and entity type Language is very important because language characteristics affect approaches
Assign each word to its most frequent tag and assign each Out of Vocabulary (OOV) word the most common POS tag Textual genre
is another concept whose effects cannot be neglected
715 Performance Evaluation
In this module we can evaluate the process of the system using accuracy rate and normalized utility Our proposed system
provides improved accuracy rate and normalized utility Once the messages that has been received by the receiver form the sender
side it analysis the data for negative words and positive words and incase of presence of negative it warns the user and if the process
continuous more than 3 or 4 times it will make a suggestion and blocks the users who are associated with the negative contents And also the posts that are shared by the data owner can be shared as a public and private In public the posts can be viewed by the entire
users present in the data owners profile whereas in private mode only the owner permitted users can access the posts that has been
posted by the data owner
VIII CONCLUSION
The increase in the use of computers and the internet has caused a serious increase in the methods of information extraction
from social media In this study the information access and interpretation steps used in the literature have been investigated in detail
The sentiment analysis work was conducted on Facebook data Obtaining Facebook data clearing data transforming data into
numerical form extracting meaningful results and interpreting them are performed Machine learning algorithms are used by using KNIME software
In this study Decision Tree Learner and K-NN algorithms showing the accuracy sensitivity sensitivity and selectivity
values have been examined in detail in two different experimental sets The results obtained were compared in detail with the help of
tables In the future studies it is aimed to carry out sentiment analysis studies on different sets using different machine learning and
intelligent optimization algorithms In order to increase value of accuracy it is foreseen to prepare more suitable data sets to increase
the accuracy rate of the studies Using intelligent search and optimization algorithms with optimized parameters may also be used
with integrated feature selection methods to increase the sentiment analysis performances We designed novel features for use in the
classification of posts in order to develop a system through which informational data may be filtered from the conversations which
are not of much value in the context of searching for immediate information for relief efforts or bystanders to utilize in order to
minimize damages The results of our experiments show that classifying tweets as ldquorumorrdquo vs ldquonon rumorrdquo can use solely the proposed
features if computing resources are concerned since the computing power required to process data into featured is immensely
decreased in comparison to a BOW feature set which contains a substantially larger number of features However if computing power and time necessary to process incoming Facebook data are not a concern a combined feature set of the proposed features and BOW-
presence approach will maximize overall accuracy
IX FUTURE ENHANCEMENT
In future work we can extend our approach implement various classification algorithm to predict the attackers and also
eliminate the attackers from facebook datasets And try this approach to implement in various languages in facebook At the same it
can be extended to analyze not only texts but also images videos and so on So that the exact scenario of entire users and their entities
are managed with proper efficiency and avoids inappropriate medias
REFERENCES [1] John A H (2008) Online shopping Pew Internet amp American Life Project Report
[2] Com S Kelsey G (2007) Online consumer-generated reviews have significant impact on offline purchase behavior Press
Release November
[3] Chen B Leilei Z Daniel K Dongwon L (2010) What is an opinion about Exploring Political Standpoints Using Opinion
Scoring Model In Proceeedings of AAAI Conference on Artificial Intelligence
[4] Asur S Bernardo A Huberman (2010) Predicting the future with social media Arxiv preprint arXiv10035699
[5] Joshi M Dipanjan D Kevin G Noah A S (2010) Movie reviews and revenues An experiment in text regression in
Proceedings of the North American Chapter of the Association for computational Linguistics Human Language Technologies Conference (NAACL)
[6] Sadikov E Parameswaran A Petros V (2009) Blogs as predictors of movie success 1048824n Proceedings of the Third International Conference on Weblogs and Social Media (ICWSM)
[7] Nizam H (2016) Sosyal Medyada Makine Ouml1048824renmesi ile Duygu Analizinde Dengeli ve Dengesiz Veri Setlerinin
Performanslar1048824n1048824n Kar10488241048824la1048824t1048824r1048824lmas1048824 International Artificial Intelligence and Data Processing Symposium (IDAP16)
[8] Bilge U Bozkurt S O1048824uz Y B Oumlzel D (2011) Sosyal medya araccedillar1048824 Tuumlrkiyedeki grip benzeri hastal1048824klar1048824 saptayabilmek
iccedilin kullan1048824labilir mi XVI Tuumlrkiyede 1048824nternet Konferans1048824 1048824zmir
[9] K1048824vanccedil Y (2015) Sosyal A1048824lar Uumlzerinden Deprem Tespiti XVII Akademik Bili1048824im Konferans1048824 Bo1048824aziccedili Uumlniversitesi
[10] Suumltcuuml S Bayrakccedil1048824 S (2014) Sosyal Medya Gazeteleri Nas1048824l Etkiliyor Haberlerin Tw1048824tterrsquoda Yay1048824lmas1048824 Uumlzerine Bir
Ara1048824t1048824rma The Turkish Online Journal of Design Art and Communication ndash TOJDAC April 2014 Volume 4 Issue 2 40-52
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 494
BIOGRAPHIES
AAnitha MSc
Research Scholar
Department of Computer Science
Thanthai Hans Roever College
Perambalur
DrSSivakumar MCAMphilPhD
Head and Asstprofessor Department of Computer Science
Thanthai Hans Roever College
Perambalur
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 489
factors affect consumer behavior relative to social networking Despite the initial progress made by researchers development in this
area of study has been limited
II DOMAIN INTRODUCTION
Big Data
Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process
them using traditional data processing applications The challenges include analysis capture duration search sharing storage
transfer visualization and privacy violations The trend to larger data sets is due to the additional information derivable from analysis
of a single large set of related data as compared to separate smaller sets with the same total amount of data allowing correlations to
be found to spot business trends prevent diseases combat crime and so on
Scientists regularly encounter limitations due to large data sets in many areas including meteorology genomics connectomics complex physics simulations and biological and environmental research The limitations also affect Internet
search finance and business informatics Data sets grow in size in part because they are increasingly being gathered by ubiquitous
information-sensing mobile devices aerial sensory technologies (remote sensing) software logs cameras microphones radio-
frequency identification (RFID) readers and wireless sensor networks The worlds technological per-capita capacity to store
information has roughly doubled every 40 months since the 1980s as of 2012 every day 25exabytes (25times1018) of data were created
The challenge for large enterprises is determining who should own big data initiatives that straddle the entire organization
Big data is difficult to work with using most relational database management systems and desktop statistics and visualization
packages requiring instead massively parallel software running on tens hundreds or even thousands of servers What is considered
big data varies depending on the capabilities of the organization managing the set and on the capabilities of the applications that
are traditionally used to process and analyze the data set in its domain Big Data is a moving target what is considered to be Big
today will not be so years ahead For some organizations facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options For others it may take tens or hundreds of terabytes before data size becomes a significant
consideration
Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture curate manage and
process data within a tolerable elapsed time Big data size is a constantly moving target as of 2012 ranging from a few dozen
terabytes to many peta bytes of data Big data is a set of techniques and technologies that require new forms of integration to uncover
large hidden values from large datasets that are diverse complex and of a massive scale
In a 2001 research report and related lectures META Group (now Gartner) analyst Doug Laney defined data growth challenges and
opportunities as being three-dimensional ie increasing volume (amount of data) velocity (speed of data in and out) and variety
(range of data types and sources) Gartner and now much of the industry continue to use this 3Vs model for describing big data In
2012Gartner updated its definition as follows Big data is high volume high velocity andor high variety information assets that
require new forms of processing to enable enhanced decision making
insight discovery and process optimization Additionally a new V Veracity is added by some organizations to describe it If Gartnerrsquos definition (the 3Vs) is still widely used the growing maturity of the concept fosters a more sound difference between big
data and Business Intelligence regarding data and their use
Business Intelligence uses descriptive statistics with data with high information density to measure things detect trends etc Big data
uses inductive statistics and concepts from nonlinear system identification to infer laws (regressions nonlinear relationships and
causal effects) from large sets of data with low information density to reveal relationships dependencies and perform predictions of
outcomes and behaviors Big data can also be defined as Big data is a large volume unstructured data which cannot be handled by
standard database management systems
like DBMS RDBMS or ORDBMS
Big data can be described by the following characteristics
Volume ndash The quantity of data that is generated is very important in this context It is the size of the data which determines the value
and potential of the data under consideration and whether it can actually be considered as Big Data or not The name lsquoBig Datarsquo itself
contains a term which is related to size and hence the characteristic
Variety - The next aspect of Big Data is its variety This means that the category to which Big Data belongs to is also a very essential
fact that needs to be known by the data analysts This helps the people who are closely analyzing the data and are associated with it
to effectively use the data to their advantage and thus upholding the importance of the Big Data
Velocity - The term lsquovelocityrsquo in the context refers to the speed of generation of data or how fast the data is generated and processed
to meet the demands and the challenges which lie ahead in the path of growth and development
Variability - This is a factor which can be a problem for those who analyze the data This refers to the inconsistency which can be
shown by the data at times thus hampering the process of being able to handle and manage the data effectively
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 490
Veracity - The quality of the data being captured can vary greatly Accuracy of analysis depends on the veracity of the source data
Complexity - Data management can become a very complex process especially when large volumes of data come from multiple sources These data need to be linked connected and correlated in order to be able to grasp the information that is supposed to be
conveyed by these data This situation is therefore termed as the lsquocomplexityrsquo of Big Data Big data analytics enables organizations
to analyze a mix of structured semi-structured and unstructured data in search of valuable business information and insights
III EXISTING SYSTEM Long before the awareness of the Internet became widespread some of our friends asked me to recommend a television ask
who they planned to vote for in local elections ask colleagues to ask for reference letters about business owners for job applications
or what dishwashers they wanted to buy Today however the development of internet technologies has given us the opportunity to
discover the views and experiences that we both have both personal and well known professional critics Such studies show that more
and more people are starting to present their opinions for foreigners on the internet Ideas such as opinions measurements evaluations
attitudes interpretations and concepts related to them are the areas of study of sentiment analysis and opinion mining The rapid growth of workspaces has helped to increase the use of forums discussion and dating pages blogs micro blogs and other social media
tools among people With the increasing use of the Internet communication infrastructure has undergone radical changes The social
sharing sites that emerged with the development of information technologies have had an important place in human life Among the
most popular social networking sites on the planet are web sites and applications like Facebook Instagram YouTube Google Play
Vine blog micro blog social networking and social bookmarking services All these services come together to reveal the Social
Media structure
In addition to real-life applications research articles have also been published in the field of sentiment analysis For example Leilei
and his team conducted a sentiment analysis study using Facebook data to predict the election results Different sentiment analysis
studies were conducted by Bernardo Mahesh and Venetis using Facebook data movie reviews and blogs to estimate the box office
revenue of films Ozel has conducted a survey using a software tool called Limesurvey a web based survey interface In this study
the effect of using Facebook of employees on company profile was analyzed The survey was tweeted and retweeted by 10 different
Facebook accounts to reach two thousand Facebook users The obtained data were analyzed by statistical analysis Sentiment analysis studies using supervised learning approach from machine learning methods in social networks In doing this study the data set
consisting of the interpretations of various products of some food companies on Facebook manually are obtained Oguz and his team
have used Facebook messages and newspaper sites to investigate the detection of influenza-like illnesses through social media
Facebook data were collected using free Topsy real-time search engine application developed for social media Yazan and Uskudarli
have made earthquake detection through social networks The Streaming API developed by Facebook is used to get the data
Disadvantages of the existing system
Event detection and summarization opinion mining sentiment analysis and many others
Limited length of a tweet (ie 140 characters) and no restrictions on its writing styles tweets often contain grammatical
errors misspellings and informal abbreviations
On the other hand despite the noisy nature of tweets the core semantic information is well preserved in tweets in the form of named entities or semantic phrases
IV PROPOSED SYSTEM
In this study sentiment analysis study was done on Facebook data The first step of this study is to collect data It is known
that collecting the data and the data sets require the most time and power for the researchers who work on social media There are
many different types of data collection on social networks When we look at literature there are many tools and methods for collecting
data There are many tools and methods for collecting data when we look at previous studies Among these the most commonly used
are custom designed APIs web crawling web scraping operations and scripts In this study it is aimed to obtain the data sets in a
meaningful and regular manner based on Facebook data and to carry out sentiment analysis work In this study tagged Facebook data set named Sentiment 140 was used This set was created by Stanford Universitys
Computer Science graduate students Alec Go Richa Bhayani and Lei Huang This data set contains about 16 million positive and
negative tagged Facebook data When these data were collected and tagged the emoji contained in each data was used For example
smiley is considered to have a positive tag because it is an emotion expressing happiness It is likewise classified as negative because
it is an emotion containing the phrase 1048623 sadness 10 thousand Facebook data were used in this study Two different data sets were
created The total number of Facebook data in both sets is 5 thousand The number of Facebook data with positive and negative tags
in each set was calculated
Advantages
Reduces noisy and irrelevant words that are not associated to the users and maintains their privacy
Data and information are maintained with better security and dilutes the negative words up to an extent
Provides secure measures by avoiding or blocking negative words and improves the efficiency
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 491
V LITERATURE SURVEY
51 Improving Entity Resolution with Global Constraints
Jim Gemmell
In this paper investigate another online socio-economic property that to our knowledge has never been exploited that site
listing entities have an incentive to avoid gratuitous duplicates For instance duplicate movies in IMDB would have reviews and
corrections applied to one copy and not the other If Netix has one entry for a DVD and a duplicate for the Blu-ray version then their
customers might be looking at one and not realize the other is available Hulu supports Face-book likes for their movies and could
have the like counts diluted by duplicates
Additional examples in other domains are easily constructed We leverage this socio-economic property to resolve entities across the
different web sites by applying a global one-to-one constraint to produced matchings The resulting resolution has much better
accuracy compared to matching without such a constraint Our framework for one-to-one entity resolution (ER) is generic in that it
can constrain existing resolution methods using weighted graph matching The goal is not to engineer features or tune high-
performance domain-specific ER but rather to develop generic algorithms that can be combined with existing methods for improving retrieval performance The purpose of this paper is not to investigate alternate scoring functions but rather to explore generic
algorithms for constrained ER In this section we describe the abstract ER problem the framework for including particular scoring
approaches and several generic algorithms for constrained ER
52 Entity Resolution Theory Practice amp Open Challenges
Author Lise Getoor
In this paper begin by introducing a simple abstraction for the entity resolution problem We categorize ER based on the type
of input ndash single-entity ER where all mentions correspond to a single entity type relational ER where real world entities are linked
(like in a social network) and multi-entity ER representing the most general problem with potentially linked mentions of different
entity types (eg products sellers and reviews) We survey classical techniques for ER which assume that there exists a distance
function between pairs of mentions These techniques can be broadly classified as pair-wise ER where the decision to match a pair
of mentions is made independent of other mentions and cluster-based ER where equivalence classes of entities are constructed via
clustering Pair-wise ER is well suited for the problem of aligning two databases of the same set of entities (eg lists of restaurants from two sites) We survey common algorithms for computing similarity functions between mentions and rule based and probabilistic
methods for pair-wise and cluster based ER We also discuss techniques for computing cluster representatives aka canonical entities
from database and machine learning communities We conclude this section by discussing the state of the art collective probabilistic
inference techniques for multi entity ER These techniques are becoming popular due to an abundance of redundant mentions of
entities on the Web that are also linked and techniques that only consider one entity type and that ignore links perform poorly We
describe approaches based on multi-relational clustering algorithms probabilistic generative models and probabilistic logical
languages eg Markov logic networks and probabilistic soft logic
53 Matching Unstructured Product Offers to Structured Product Specifications
Author Anitha Kannan
We have a large database of product specifications Each product specification (which we shall interchangeably call
lsquoproductrsquo) consists of a set of attribute ⟨name value⟩ pairs and is represented in the database as a structured record Some of the attributes can be numeric while the others can be categorical The unstructured offer descriptions (which we shall call lsquoofferrsquo for
short) are comprised of free text The text has embedded in it some of the values and possibly some attribute names corresponding to
one of the products The text may also contain additional words The attribute names and values in the text may not precisely match
those found in the database The text does not contain an identifier that uniquely identifies the corresponding product Different textual
descriptions may be provided for the same product An offer may match more than one product as only partial descriptions are provided
in the offers and because the same real-world product might have multiple representations in the product database We performed
extensive experiments using Bing Shopping catalog to understand the performance characteristics of the proposed solution The
experimental results show that the proposed approach scores high on F-measure and consistently beats baseline approaches for product
categories that have reasonably rich attribute structure and good data They also point to the desirability of hybrid solutions that
additionally make use of classical text matching techniques for attribute impoverished product categories The methodology we
employed for analyzing the experimental results might also be of interest to those building and analyzing web scale systems
54 Title Frameworks for entity matching A comparison
Author Hanna Koumlpcke
The functional comparison reveals a number of further research directions All frameworks focus on offline matching ie
they do not yet cover online matching The definition of the blocking key is not yet derived (semi) automatically from training data
but has to be specified manually in all considered frameworks While attribute value matchers are well supported the combination of
context and attribute matchers is not and should be further studied Training-based EM frameworks should provide more support for
(semi-)automatic selection of suitable training data with low labeling effort So far training- based approaches only helped to optimize
some decisions eg determining parameters for matchers (eg similarity thresholds) and combination functions (eg weights for
matchers) while other decisions (eg selection of the similarity functions and attributes to be evaluated) still have to be determined
manually The published framework evaluations used diverse methodologies measures and test problems making it difficult to assess
the effectiveness and efficiency of each single system While the reported evaluation results are usually very positive the tests so far mostly dealt with small match problems so that the scalability of most approaches is unclear Hence scalability to large test cases
needs to be better addressed in future frameworks Some recent work regarding scalability has focused on computational aspects of
string similarity computation and time completeness trade-offs Furthermore we see a strong need for comparative performance
evaluations of different frameworks and EM strategies Standardized benchmarks for entity matching are needed for comparative
investigations first proposals exist but have not yet been implemented or applied Published evaluation results should also be
reproducible by other researchers ideally by providing the prototype implementations and test data
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 492
VI ARCHITECTURE DIAGRAM
VII MODULES
Data Acquisition
Preprocessing
Hybrid segmentation
Named Entity Recognition
Performance Evaluation
71 Modules Description
711 Data Acquisition
Facebook is an online social networking service that enables users to send and read messages images as well as videos posts
Registered users can read and post but unregistered users can only read them it is also only if the concerned data owner provides
permission then only it is possible Users access Facebook through the website interface or mobile device app In order to have an
opinion about the user his posts have to be examined Therefore using Facebook API all posts posted by user are crawled first In
this study we tried to examine the user with not only his posts but also his friendsrsquo posts However crawling all friendsrsquo posts is a
huge overload and misleading since Facebook following mechanism does not show an actual interest every time People sometimes tend to follow some users for a temporary occasion and then forget to un-follow Sometimes they follow some users just to be informed
of although they are not actually interested in There are also friends that do not post for a long time but still followed by the user In
this module we can upload the datasets as CSV file It contains following id followers id time stamp user following user followers
and posts The data of entire consumers of the facebook has been examined and their entities are analyzed for better process The data
that has been acquired are the posts that has been done by the users messages that has been sent and received and so on it continuous
712 Preprocessing
For named entities to be extracted successfully the informal writing style in posts has to be handled Before real data has
entered our lives studies on the area were being conducted on formal texts such as news articles Generally named entities are assumed
as words written in uppercase or mixed case phrases where uppercased letters are at the beginning and ending and almost all of the
studies bases on this assumption However capitalization is not a strong indicator in posts like informal texts sometimes even misleading As the example of capitalization shows the approaches have to be changed To extract named entities in posts the effect
of the informality of the posts has to be minimized as possible
713 Hybrid segmentation Hybrid Segmentation learns from both global and local contexts and has the ability of learning from pseudo feedback
HybridSeg is also designed to iteratively learn from confident segments as pseudo feedback Posts are posted for information sharing
and communication The named entities and semantic phrases are well preserved in posts The global context derived from Web pages
therefore helps identifying the meaningful segments in posts The well preserved linguistic features in these posts facilitate named
entity recognition with high accuracy Each named entity is a valid segment The method utilizing local linguistic features is denoted
by HybridSegNER It obtains confident segments based on the voting results of multiple off-the-shelf NER tools Another method
utilizing local collocation knowledge denoted by HybridSegNGram is proposed based on the observation that many posts published
within a short time period are about the same topic HybridSegNGram segments the posts by estimating the term-dependency within a batch of posts The segments recognized based on local context with high confidence serve as good feedback to extract more
meaningful segments The learning from pseudo feedback is conducted iteratively and the method implementing the iterative learning
is named HybridSegIter
714 Named Entity Recognition
Named Entity Recognition can be basically defined as identifying and categorizing certain type of data (ie person location
organization names and date-time and numeric expressions) in a certain type of text On the other hand tweets are characteristically
Data Acquisition
Datasets
Preprocessing
Stop Removal
Stemming words
analysis
Tokenization
Hybrid
Segmentation Global
Context Local Context
Pseudo
Feedback
POS tagger
Named Entity
Recognition
Network
Content Features
Blog Features
KNN classifiers
Trained
Rumors
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 493
short and noisy Given the length of a posts and restriction free writing style named entity recognition on this type of data become
challenging After basic segmentation a great number of named entities in the text such as personal names location names and organization names are not yet segmented and recognized properly Part of speech tagging is applicable to a wide range of NLP tasks
including named entity segmentation and information extraction Named Entity Recognition strategies vary on basically three factors
Language textual genre and domain and entity type Language is very important because language characteristics affect approaches
Assign each word to its most frequent tag and assign each Out of Vocabulary (OOV) word the most common POS tag Textual genre
is another concept whose effects cannot be neglected
715 Performance Evaluation
In this module we can evaluate the process of the system using accuracy rate and normalized utility Our proposed system
provides improved accuracy rate and normalized utility Once the messages that has been received by the receiver form the sender
side it analysis the data for negative words and positive words and incase of presence of negative it warns the user and if the process
continuous more than 3 or 4 times it will make a suggestion and blocks the users who are associated with the negative contents And also the posts that are shared by the data owner can be shared as a public and private In public the posts can be viewed by the entire
users present in the data owners profile whereas in private mode only the owner permitted users can access the posts that has been
posted by the data owner
VIII CONCLUSION
The increase in the use of computers and the internet has caused a serious increase in the methods of information extraction
from social media In this study the information access and interpretation steps used in the literature have been investigated in detail
The sentiment analysis work was conducted on Facebook data Obtaining Facebook data clearing data transforming data into
numerical form extracting meaningful results and interpreting them are performed Machine learning algorithms are used by using KNIME software
In this study Decision Tree Learner and K-NN algorithms showing the accuracy sensitivity sensitivity and selectivity
values have been examined in detail in two different experimental sets The results obtained were compared in detail with the help of
tables In the future studies it is aimed to carry out sentiment analysis studies on different sets using different machine learning and
intelligent optimization algorithms In order to increase value of accuracy it is foreseen to prepare more suitable data sets to increase
the accuracy rate of the studies Using intelligent search and optimization algorithms with optimized parameters may also be used
with integrated feature selection methods to increase the sentiment analysis performances We designed novel features for use in the
classification of posts in order to develop a system through which informational data may be filtered from the conversations which
are not of much value in the context of searching for immediate information for relief efforts or bystanders to utilize in order to
minimize damages The results of our experiments show that classifying tweets as ldquorumorrdquo vs ldquonon rumorrdquo can use solely the proposed
features if computing resources are concerned since the computing power required to process data into featured is immensely
decreased in comparison to a BOW feature set which contains a substantially larger number of features However if computing power and time necessary to process incoming Facebook data are not a concern a combined feature set of the proposed features and BOW-
presence approach will maximize overall accuracy
IX FUTURE ENHANCEMENT
In future work we can extend our approach implement various classification algorithm to predict the attackers and also
eliminate the attackers from facebook datasets And try this approach to implement in various languages in facebook At the same it
can be extended to analyze not only texts but also images videos and so on So that the exact scenario of entire users and their entities
are managed with proper efficiency and avoids inappropriate medias
REFERENCES [1] John A H (2008) Online shopping Pew Internet amp American Life Project Report
[2] Com S Kelsey G (2007) Online consumer-generated reviews have significant impact on offline purchase behavior Press
Release November
[3] Chen B Leilei Z Daniel K Dongwon L (2010) What is an opinion about Exploring Political Standpoints Using Opinion
Scoring Model In Proceeedings of AAAI Conference on Artificial Intelligence
[4] Asur S Bernardo A Huberman (2010) Predicting the future with social media Arxiv preprint arXiv10035699
[5] Joshi M Dipanjan D Kevin G Noah A S (2010) Movie reviews and revenues An experiment in text regression in
Proceedings of the North American Chapter of the Association for computational Linguistics Human Language Technologies Conference (NAACL)
[6] Sadikov E Parameswaran A Petros V (2009) Blogs as predictors of movie success 1048824n Proceedings of the Third International Conference on Weblogs and Social Media (ICWSM)
[7] Nizam H (2016) Sosyal Medyada Makine Ouml1048824renmesi ile Duygu Analizinde Dengeli ve Dengesiz Veri Setlerinin
Performanslar1048824n1048824n Kar10488241048824la1048824t1048824r1048824lmas1048824 International Artificial Intelligence and Data Processing Symposium (IDAP16)
[8] Bilge U Bozkurt S O1048824uz Y B Oumlzel D (2011) Sosyal medya araccedillar1048824 Tuumlrkiyedeki grip benzeri hastal1048824klar1048824 saptayabilmek
iccedilin kullan1048824labilir mi XVI Tuumlrkiyede 1048824nternet Konferans1048824 1048824zmir
[9] K1048824vanccedil Y (2015) Sosyal A1048824lar Uumlzerinden Deprem Tespiti XVII Akademik Bili1048824im Konferans1048824 Bo1048824aziccedili Uumlniversitesi
[10] Suumltcuuml S Bayrakccedil1048824 S (2014) Sosyal Medya Gazeteleri Nas1048824l Etkiliyor Haberlerin Tw1048824tterrsquoda Yay1048824lmas1048824 Uumlzerine Bir
Ara1048824t1048824rma The Turkish Online Journal of Design Art and Communication ndash TOJDAC April 2014 Volume 4 Issue 2 40-52
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 494
BIOGRAPHIES
AAnitha MSc
Research Scholar
Department of Computer Science
Thanthai Hans Roever College
Perambalur
DrSSivakumar MCAMphilPhD
Head and Asstprofessor Department of Computer Science
Thanthai Hans Roever College
Perambalur
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 490
Veracity - The quality of the data being captured can vary greatly Accuracy of analysis depends on the veracity of the source data
Complexity - Data management can become a very complex process especially when large volumes of data come from multiple sources These data need to be linked connected and correlated in order to be able to grasp the information that is supposed to be
conveyed by these data This situation is therefore termed as the lsquocomplexityrsquo of Big Data Big data analytics enables organizations
to analyze a mix of structured semi-structured and unstructured data in search of valuable business information and insights
III EXISTING SYSTEM Long before the awareness of the Internet became widespread some of our friends asked me to recommend a television ask
who they planned to vote for in local elections ask colleagues to ask for reference letters about business owners for job applications
or what dishwashers they wanted to buy Today however the development of internet technologies has given us the opportunity to
discover the views and experiences that we both have both personal and well known professional critics Such studies show that more
and more people are starting to present their opinions for foreigners on the internet Ideas such as opinions measurements evaluations
attitudes interpretations and concepts related to them are the areas of study of sentiment analysis and opinion mining The rapid growth of workspaces has helped to increase the use of forums discussion and dating pages blogs micro blogs and other social media
tools among people With the increasing use of the Internet communication infrastructure has undergone radical changes The social
sharing sites that emerged with the development of information technologies have had an important place in human life Among the
most popular social networking sites on the planet are web sites and applications like Facebook Instagram YouTube Google Play
Vine blog micro blog social networking and social bookmarking services All these services come together to reveal the Social
Media structure
In addition to real-life applications research articles have also been published in the field of sentiment analysis For example Leilei
and his team conducted a sentiment analysis study using Facebook data to predict the election results Different sentiment analysis
studies were conducted by Bernardo Mahesh and Venetis using Facebook data movie reviews and blogs to estimate the box office
revenue of films Ozel has conducted a survey using a software tool called Limesurvey a web based survey interface In this study
the effect of using Facebook of employees on company profile was analyzed The survey was tweeted and retweeted by 10 different
Facebook accounts to reach two thousand Facebook users The obtained data were analyzed by statistical analysis Sentiment analysis studies using supervised learning approach from machine learning methods in social networks In doing this study the data set
consisting of the interpretations of various products of some food companies on Facebook manually are obtained Oguz and his team
have used Facebook messages and newspaper sites to investigate the detection of influenza-like illnesses through social media
Facebook data were collected using free Topsy real-time search engine application developed for social media Yazan and Uskudarli
have made earthquake detection through social networks The Streaming API developed by Facebook is used to get the data
Disadvantages of the existing system
Event detection and summarization opinion mining sentiment analysis and many others
Limited length of a tweet (ie 140 characters) and no restrictions on its writing styles tweets often contain grammatical
errors misspellings and informal abbreviations
On the other hand despite the noisy nature of tweets the core semantic information is well preserved in tweets in the form of named entities or semantic phrases
IV PROPOSED SYSTEM
In this study sentiment analysis study was done on Facebook data The first step of this study is to collect data It is known
that collecting the data and the data sets require the most time and power for the researchers who work on social media There are
many different types of data collection on social networks When we look at literature there are many tools and methods for collecting
data There are many tools and methods for collecting data when we look at previous studies Among these the most commonly used
are custom designed APIs web crawling web scraping operations and scripts In this study it is aimed to obtain the data sets in a
meaningful and regular manner based on Facebook data and to carry out sentiment analysis work In this study tagged Facebook data set named Sentiment 140 was used This set was created by Stanford Universitys
Computer Science graduate students Alec Go Richa Bhayani and Lei Huang This data set contains about 16 million positive and
negative tagged Facebook data When these data were collected and tagged the emoji contained in each data was used For example
smiley is considered to have a positive tag because it is an emotion expressing happiness It is likewise classified as negative because
it is an emotion containing the phrase 1048623 sadness 10 thousand Facebook data were used in this study Two different data sets were
created The total number of Facebook data in both sets is 5 thousand The number of Facebook data with positive and negative tags
in each set was calculated
Advantages
Reduces noisy and irrelevant words that are not associated to the users and maintains their privacy
Data and information are maintained with better security and dilutes the negative words up to an extent
Provides secure measures by avoiding or blocking negative words and improves the efficiency
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 491
V LITERATURE SURVEY
51 Improving Entity Resolution with Global Constraints
Jim Gemmell
In this paper investigate another online socio-economic property that to our knowledge has never been exploited that site
listing entities have an incentive to avoid gratuitous duplicates For instance duplicate movies in IMDB would have reviews and
corrections applied to one copy and not the other If Netix has one entry for a DVD and a duplicate for the Blu-ray version then their
customers might be looking at one and not realize the other is available Hulu supports Face-book likes for their movies and could
have the like counts diluted by duplicates
Additional examples in other domains are easily constructed We leverage this socio-economic property to resolve entities across the
different web sites by applying a global one-to-one constraint to produced matchings The resulting resolution has much better
accuracy compared to matching without such a constraint Our framework for one-to-one entity resolution (ER) is generic in that it
can constrain existing resolution methods using weighted graph matching The goal is not to engineer features or tune high-
performance domain-specific ER but rather to develop generic algorithms that can be combined with existing methods for improving retrieval performance The purpose of this paper is not to investigate alternate scoring functions but rather to explore generic
algorithms for constrained ER In this section we describe the abstract ER problem the framework for including particular scoring
approaches and several generic algorithms for constrained ER
52 Entity Resolution Theory Practice amp Open Challenges
Author Lise Getoor
In this paper begin by introducing a simple abstraction for the entity resolution problem We categorize ER based on the type
of input ndash single-entity ER where all mentions correspond to a single entity type relational ER where real world entities are linked
(like in a social network) and multi-entity ER representing the most general problem with potentially linked mentions of different
entity types (eg products sellers and reviews) We survey classical techniques for ER which assume that there exists a distance
function between pairs of mentions These techniques can be broadly classified as pair-wise ER where the decision to match a pair
of mentions is made independent of other mentions and cluster-based ER where equivalence classes of entities are constructed via
clustering Pair-wise ER is well suited for the problem of aligning two databases of the same set of entities (eg lists of restaurants from two sites) We survey common algorithms for computing similarity functions between mentions and rule based and probabilistic
methods for pair-wise and cluster based ER We also discuss techniques for computing cluster representatives aka canonical entities
from database and machine learning communities We conclude this section by discussing the state of the art collective probabilistic
inference techniques for multi entity ER These techniques are becoming popular due to an abundance of redundant mentions of
entities on the Web that are also linked and techniques that only consider one entity type and that ignore links perform poorly We
describe approaches based on multi-relational clustering algorithms probabilistic generative models and probabilistic logical
languages eg Markov logic networks and probabilistic soft logic
53 Matching Unstructured Product Offers to Structured Product Specifications
Author Anitha Kannan
We have a large database of product specifications Each product specification (which we shall interchangeably call
lsquoproductrsquo) consists of a set of attribute ⟨name value⟩ pairs and is represented in the database as a structured record Some of the attributes can be numeric while the others can be categorical The unstructured offer descriptions (which we shall call lsquoofferrsquo for
short) are comprised of free text The text has embedded in it some of the values and possibly some attribute names corresponding to
one of the products The text may also contain additional words The attribute names and values in the text may not precisely match
those found in the database The text does not contain an identifier that uniquely identifies the corresponding product Different textual
descriptions may be provided for the same product An offer may match more than one product as only partial descriptions are provided
in the offers and because the same real-world product might have multiple representations in the product database We performed
extensive experiments using Bing Shopping catalog to understand the performance characteristics of the proposed solution The
experimental results show that the proposed approach scores high on F-measure and consistently beats baseline approaches for product
categories that have reasonably rich attribute structure and good data They also point to the desirability of hybrid solutions that
additionally make use of classical text matching techniques for attribute impoverished product categories The methodology we
employed for analyzing the experimental results might also be of interest to those building and analyzing web scale systems
54 Title Frameworks for entity matching A comparison
Author Hanna Koumlpcke
The functional comparison reveals a number of further research directions All frameworks focus on offline matching ie
they do not yet cover online matching The definition of the blocking key is not yet derived (semi) automatically from training data
but has to be specified manually in all considered frameworks While attribute value matchers are well supported the combination of
context and attribute matchers is not and should be further studied Training-based EM frameworks should provide more support for
(semi-)automatic selection of suitable training data with low labeling effort So far training- based approaches only helped to optimize
some decisions eg determining parameters for matchers (eg similarity thresholds) and combination functions (eg weights for
matchers) while other decisions (eg selection of the similarity functions and attributes to be evaluated) still have to be determined
manually The published framework evaluations used diverse methodologies measures and test problems making it difficult to assess
the effectiveness and efficiency of each single system While the reported evaluation results are usually very positive the tests so far mostly dealt with small match problems so that the scalability of most approaches is unclear Hence scalability to large test cases
needs to be better addressed in future frameworks Some recent work regarding scalability has focused on computational aspects of
string similarity computation and time completeness trade-offs Furthermore we see a strong need for comparative performance
evaluations of different frameworks and EM strategies Standardized benchmarks for entity matching are needed for comparative
investigations first proposals exist but have not yet been implemented or applied Published evaluation results should also be
reproducible by other researchers ideally by providing the prototype implementations and test data
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 492
VI ARCHITECTURE DIAGRAM
VII MODULES
Data Acquisition
Preprocessing
Hybrid segmentation
Named Entity Recognition
Performance Evaluation
71 Modules Description
711 Data Acquisition
Facebook is an online social networking service that enables users to send and read messages images as well as videos posts
Registered users can read and post but unregistered users can only read them it is also only if the concerned data owner provides
permission then only it is possible Users access Facebook through the website interface or mobile device app In order to have an
opinion about the user his posts have to be examined Therefore using Facebook API all posts posted by user are crawled first In
this study we tried to examine the user with not only his posts but also his friendsrsquo posts However crawling all friendsrsquo posts is a
huge overload and misleading since Facebook following mechanism does not show an actual interest every time People sometimes tend to follow some users for a temporary occasion and then forget to un-follow Sometimes they follow some users just to be informed
of although they are not actually interested in There are also friends that do not post for a long time but still followed by the user In
this module we can upload the datasets as CSV file It contains following id followers id time stamp user following user followers
and posts The data of entire consumers of the facebook has been examined and their entities are analyzed for better process The data
that has been acquired are the posts that has been done by the users messages that has been sent and received and so on it continuous
712 Preprocessing
For named entities to be extracted successfully the informal writing style in posts has to be handled Before real data has
entered our lives studies on the area were being conducted on formal texts such as news articles Generally named entities are assumed
as words written in uppercase or mixed case phrases where uppercased letters are at the beginning and ending and almost all of the
studies bases on this assumption However capitalization is not a strong indicator in posts like informal texts sometimes even misleading As the example of capitalization shows the approaches have to be changed To extract named entities in posts the effect
of the informality of the posts has to be minimized as possible
713 Hybrid segmentation Hybrid Segmentation learns from both global and local contexts and has the ability of learning from pseudo feedback
HybridSeg is also designed to iteratively learn from confident segments as pseudo feedback Posts are posted for information sharing
and communication The named entities and semantic phrases are well preserved in posts The global context derived from Web pages
therefore helps identifying the meaningful segments in posts The well preserved linguistic features in these posts facilitate named
entity recognition with high accuracy Each named entity is a valid segment The method utilizing local linguistic features is denoted
by HybridSegNER It obtains confident segments based on the voting results of multiple off-the-shelf NER tools Another method
utilizing local collocation knowledge denoted by HybridSegNGram is proposed based on the observation that many posts published
within a short time period are about the same topic HybridSegNGram segments the posts by estimating the term-dependency within a batch of posts The segments recognized based on local context with high confidence serve as good feedback to extract more
meaningful segments The learning from pseudo feedback is conducted iteratively and the method implementing the iterative learning
is named HybridSegIter
714 Named Entity Recognition
Named Entity Recognition can be basically defined as identifying and categorizing certain type of data (ie person location
organization names and date-time and numeric expressions) in a certain type of text On the other hand tweets are characteristically
Data Acquisition
Datasets
Preprocessing
Stop Removal
Stemming words
analysis
Tokenization
Hybrid
Segmentation Global
Context Local Context
Pseudo
Feedback
POS tagger
Named Entity
Recognition
Network
Content Features
Blog Features
KNN classifiers
Trained
Rumors
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 493
short and noisy Given the length of a posts and restriction free writing style named entity recognition on this type of data become
challenging After basic segmentation a great number of named entities in the text such as personal names location names and organization names are not yet segmented and recognized properly Part of speech tagging is applicable to a wide range of NLP tasks
including named entity segmentation and information extraction Named Entity Recognition strategies vary on basically three factors
Language textual genre and domain and entity type Language is very important because language characteristics affect approaches
Assign each word to its most frequent tag and assign each Out of Vocabulary (OOV) word the most common POS tag Textual genre
is another concept whose effects cannot be neglected
715 Performance Evaluation
In this module we can evaluate the process of the system using accuracy rate and normalized utility Our proposed system
provides improved accuracy rate and normalized utility Once the messages that has been received by the receiver form the sender
side it analysis the data for negative words and positive words and incase of presence of negative it warns the user and if the process
continuous more than 3 or 4 times it will make a suggestion and blocks the users who are associated with the negative contents And also the posts that are shared by the data owner can be shared as a public and private In public the posts can be viewed by the entire
users present in the data owners profile whereas in private mode only the owner permitted users can access the posts that has been
posted by the data owner
VIII CONCLUSION
The increase in the use of computers and the internet has caused a serious increase in the methods of information extraction
from social media In this study the information access and interpretation steps used in the literature have been investigated in detail
The sentiment analysis work was conducted on Facebook data Obtaining Facebook data clearing data transforming data into
numerical form extracting meaningful results and interpreting them are performed Machine learning algorithms are used by using KNIME software
In this study Decision Tree Learner and K-NN algorithms showing the accuracy sensitivity sensitivity and selectivity
values have been examined in detail in two different experimental sets The results obtained were compared in detail with the help of
tables In the future studies it is aimed to carry out sentiment analysis studies on different sets using different machine learning and
intelligent optimization algorithms In order to increase value of accuracy it is foreseen to prepare more suitable data sets to increase
the accuracy rate of the studies Using intelligent search and optimization algorithms with optimized parameters may also be used
with integrated feature selection methods to increase the sentiment analysis performances We designed novel features for use in the
classification of posts in order to develop a system through which informational data may be filtered from the conversations which
are not of much value in the context of searching for immediate information for relief efforts or bystanders to utilize in order to
minimize damages The results of our experiments show that classifying tweets as ldquorumorrdquo vs ldquonon rumorrdquo can use solely the proposed
features if computing resources are concerned since the computing power required to process data into featured is immensely
decreased in comparison to a BOW feature set which contains a substantially larger number of features However if computing power and time necessary to process incoming Facebook data are not a concern a combined feature set of the proposed features and BOW-
presence approach will maximize overall accuracy
IX FUTURE ENHANCEMENT
In future work we can extend our approach implement various classification algorithm to predict the attackers and also
eliminate the attackers from facebook datasets And try this approach to implement in various languages in facebook At the same it
can be extended to analyze not only texts but also images videos and so on So that the exact scenario of entire users and their entities
are managed with proper efficiency and avoids inappropriate medias
REFERENCES [1] John A H (2008) Online shopping Pew Internet amp American Life Project Report
[2] Com S Kelsey G (2007) Online consumer-generated reviews have significant impact on offline purchase behavior Press
Release November
[3] Chen B Leilei Z Daniel K Dongwon L (2010) What is an opinion about Exploring Political Standpoints Using Opinion
Scoring Model In Proceeedings of AAAI Conference on Artificial Intelligence
[4] Asur S Bernardo A Huberman (2010) Predicting the future with social media Arxiv preprint arXiv10035699
[5] Joshi M Dipanjan D Kevin G Noah A S (2010) Movie reviews and revenues An experiment in text regression in
Proceedings of the North American Chapter of the Association for computational Linguistics Human Language Technologies Conference (NAACL)
[6] Sadikov E Parameswaran A Petros V (2009) Blogs as predictors of movie success 1048824n Proceedings of the Third International Conference on Weblogs and Social Media (ICWSM)
[7] Nizam H (2016) Sosyal Medyada Makine Ouml1048824renmesi ile Duygu Analizinde Dengeli ve Dengesiz Veri Setlerinin
Performanslar1048824n1048824n Kar10488241048824la1048824t1048824r1048824lmas1048824 International Artificial Intelligence and Data Processing Symposium (IDAP16)
[8] Bilge U Bozkurt S O1048824uz Y B Oumlzel D (2011) Sosyal medya araccedillar1048824 Tuumlrkiyedeki grip benzeri hastal1048824klar1048824 saptayabilmek
iccedilin kullan1048824labilir mi XVI Tuumlrkiyede 1048824nternet Konferans1048824 1048824zmir
[9] K1048824vanccedil Y (2015) Sosyal A1048824lar Uumlzerinden Deprem Tespiti XVII Akademik Bili1048824im Konferans1048824 Bo1048824aziccedili Uumlniversitesi
[10] Suumltcuuml S Bayrakccedil1048824 S (2014) Sosyal Medya Gazeteleri Nas1048824l Etkiliyor Haberlerin Tw1048824tterrsquoda Yay1048824lmas1048824 Uumlzerine Bir
Ara1048824t1048824rma The Turkish Online Journal of Design Art and Communication ndash TOJDAC April 2014 Volume 4 Issue 2 40-52
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 494
BIOGRAPHIES
AAnitha MSc
Research Scholar
Department of Computer Science
Thanthai Hans Roever College
Perambalur
DrSSivakumar MCAMphilPhD
Head and Asstprofessor Department of Computer Science
Thanthai Hans Roever College
Perambalur
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 491
V LITERATURE SURVEY
51 Improving Entity Resolution with Global Constraints
Jim Gemmell
In this paper investigate another online socio-economic property that to our knowledge has never been exploited that site
listing entities have an incentive to avoid gratuitous duplicates For instance duplicate movies in IMDB would have reviews and
corrections applied to one copy and not the other If Netix has one entry for a DVD and a duplicate for the Blu-ray version then their
customers might be looking at one and not realize the other is available Hulu supports Face-book likes for their movies and could
have the like counts diluted by duplicates
Additional examples in other domains are easily constructed We leverage this socio-economic property to resolve entities across the
different web sites by applying a global one-to-one constraint to produced matchings The resulting resolution has much better
accuracy compared to matching without such a constraint Our framework for one-to-one entity resolution (ER) is generic in that it
can constrain existing resolution methods using weighted graph matching The goal is not to engineer features or tune high-
performance domain-specific ER but rather to develop generic algorithms that can be combined with existing methods for improving retrieval performance The purpose of this paper is not to investigate alternate scoring functions but rather to explore generic
algorithms for constrained ER In this section we describe the abstract ER problem the framework for including particular scoring
approaches and several generic algorithms for constrained ER
52 Entity Resolution Theory Practice amp Open Challenges
Author Lise Getoor
In this paper begin by introducing a simple abstraction for the entity resolution problem We categorize ER based on the type
of input ndash single-entity ER where all mentions correspond to a single entity type relational ER where real world entities are linked
(like in a social network) and multi-entity ER representing the most general problem with potentially linked mentions of different
entity types (eg products sellers and reviews) We survey classical techniques for ER which assume that there exists a distance
function between pairs of mentions These techniques can be broadly classified as pair-wise ER where the decision to match a pair
of mentions is made independent of other mentions and cluster-based ER where equivalence classes of entities are constructed via
clustering Pair-wise ER is well suited for the problem of aligning two databases of the same set of entities (eg lists of restaurants from two sites) We survey common algorithms for computing similarity functions between mentions and rule based and probabilistic
methods for pair-wise and cluster based ER We also discuss techniques for computing cluster representatives aka canonical entities
from database and machine learning communities We conclude this section by discussing the state of the art collective probabilistic
inference techniques for multi entity ER These techniques are becoming popular due to an abundance of redundant mentions of
entities on the Web that are also linked and techniques that only consider one entity type and that ignore links perform poorly We
describe approaches based on multi-relational clustering algorithms probabilistic generative models and probabilistic logical
languages eg Markov logic networks and probabilistic soft logic
53 Matching Unstructured Product Offers to Structured Product Specifications
Author Anitha Kannan
We have a large database of product specifications Each product specification (which we shall interchangeably call
lsquoproductrsquo) consists of a set of attribute ⟨name value⟩ pairs and is represented in the database as a structured record Some of the attributes can be numeric while the others can be categorical The unstructured offer descriptions (which we shall call lsquoofferrsquo for
short) are comprised of free text The text has embedded in it some of the values and possibly some attribute names corresponding to
one of the products The text may also contain additional words The attribute names and values in the text may not precisely match
those found in the database The text does not contain an identifier that uniquely identifies the corresponding product Different textual
descriptions may be provided for the same product An offer may match more than one product as only partial descriptions are provided
in the offers and because the same real-world product might have multiple representations in the product database We performed
extensive experiments using Bing Shopping catalog to understand the performance characteristics of the proposed solution The
experimental results show that the proposed approach scores high on F-measure and consistently beats baseline approaches for product
categories that have reasonably rich attribute structure and good data They also point to the desirability of hybrid solutions that
additionally make use of classical text matching techniques for attribute impoverished product categories The methodology we
employed for analyzing the experimental results might also be of interest to those building and analyzing web scale systems
54 Title Frameworks for entity matching A comparison
Author Hanna Koumlpcke
The functional comparison reveals a number of further research directions All frameworks focus on offline matching ie
they do not yet cover online matching The definition of the blocking key is not yet derived (semi) automatically from training data
but has to be specified manually in all considered frameworks While attribute value matchers are well supported the combination of
context and attribute matchers is not and should be further studied Training-based EM frameworks should provide more support for
(semi-)automatic selection of suitable training data with low labeling effort So far training- based approaches only helped to optimize
some decisions eg determining parameters for matchers (eg similarity thresholds) and combination functions (eg weights for
matchers) while other decisions (eg selection of the similarity functions and attributes to be evaluated) still have to be determined
manually The published framework evaluations used diverse methodologies measures and test problems making it difficult to assess
the effectiveness and efficiency of each single system While the reported evaluation results are usually very positive the tests so far mostly dealt with small match problems so that the scalability of most approaches is unclear Hence scalability to large test cases
needs to be better addressed in future frameworks Some recent work regarding scalability has focused on computational aspects of
string similarity computation and time completeness trade-offs Furthermore we see a strong need for comparative performance
evaluations of different frameworks and EM strategies Standardized benchmarks for entity matching are needed for comparative
investigations first proposals exist but have not yet been implemented or applied Published evaluation results should also be
reproducible by other researchers ideally by providing the prototype implementations and test data
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 492
VI ARCHITECTURE DIAGRAM
VII MODULES
Data Acquisition
Preprocessing
Hybrid segmentation
Named Entity Recognition
Performance Evaluation
71 Modules Description
711 Data Acquisition
Facebook is an online social networking service that enables users to send and read messages images as well as videos posts
Registered users can read and post but unregistered users can only read them it is also only if the concerned data owner provides
permission then only it is possible Users access Facebook through the website interface or mobile device app In order to have an
opinion about the user his posts have to be examined Therefore using Facebook API all posts posted by user are crawled first In
this study we tried to examine the user with not only his posts but also his friendsrsquo posts However crawling all friendsrsquo posts is a
huge overload and misleading since Facebook following mechanism does not show an actual interest every time People sometimes tend to follow some users for a temporary occasion and then forget to un-follow Sometimes they follow some users just to be informed
of although they are not actually interested in There are also friends that do not post for a long time but still followed by the user In
this module we can upload the datasets as CSV file It contains following id followers id time stamp user following user followers
and posts The data of entire consumers of the facebook has been examined and their entities are analyzed for better process The data
that has been acquired are the posts that has been done by the users messages that has been sent and received and so on it continuous
712 Preprocessing
For named entities to be extracted successfully the informal writing style in posts has to be handled Before real data has
entered our lives studies on the area were being conducted on formal texts such as news articles Generally named entities are assumed
as words written in uppercase or mixed case phrases where uppercased letters are at the beginning and ending and almost all of the
studies bases on this assumption However capitalization is not a strong indicator in posts like informal texts sometimes even misleading As the example of capitalization shows the approaches have to be changed To extract named entities in posts the effect
of the informality of the posts has to be minimized as possible
713 Hybrid segmentation Hybrid Segmentation learns from both global and local contexts and has the ability of learning from pseudo feedback
HybridSeg is also designed to iteratively learn from confident segments as pseudo feedback Posts are posted for information sharing
and communication The named entities and semantic phrases are well preserved in posts The global context derived from Web pages
therefore helps identifying the meaningful segments in posts The well preserved linguistic features in these posts facilitate named
entity recognition with high accuracy Each named entity is a valid segment The method utilizing local linguistic features is denoted
by HybridSegNER It obtains confident segments based on the voting results of multiple off-the-shelf NER tools Another method
utilizing local collocation knowledge denoted by HybridSegNGram is proposed based on the observation that many posts published
within a short time period are about the same topic HybridSegNGram segments the posts by estimating the term-dependency within a batch of posts The segments recognized based on local context with high confidence serve as good feedback to extract more
meaningful segments The learning from pseudo feedback is conducted iteratively and the method implementing the iterative learning
is named HybridSegIter
714 Named Entity Recognition
Named Entity Recognition can be basically defined as identifying and categorizing certain type of data (ie person location
organization names and date-time and numeric expressions) in a certain type of text On the other hand tweets are characteristically
Data Acquisition
Datasets
Preprocessing
Stop Removal
Stemming words
analysis
Tokenization
Hybrid
Segmentation Global
Context Local Context
Pseudo
Feedback
POS tagger
Named Entity
Recognition
Network
Content Features
Blog Features
KNN classifiers
Trained
Rumors
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 493
short and noisy Given the length of a posts and restriction free writing style named entity recognition on this type of data become
challenging After basic segmentation a great number of named entities in the text such as personal names location names and organization names are not yet segmented and recognized properly Part of speech tagging is applicable to a wide range of NLP tasks
including named entity segmentation and information extraction Named Entity Recognition strategies vary on basically three factors
Language textual genre and domain and entity type Language is very important because language characteristics affect approaches
Assign each word to its most frequent tag and assign each Out of Vocabulary (OOV) word the most common POS tag Textual genre
is another concept whose effects cannot be neglected
715 Performance Evaluation
In this module we can evaluate the process of the system using accuracy rate and normalized utility Our proposed system
provides improved accuracy rate and normalized utility Once the messages that has been received by the receiver form the sender
side it analysis the data for negative words and positive words and incase of presence of negative it warns the user and if the process
continuous more than 3 or 4 times it will make a suggestion and blocks the users who are associated with the negative contents And also the posts that are shared by the data owner can be shared as a public and private In public the posts can be viewed by the entire
users present in the data owners profile whereas in private mode only the owner permitted users can access the posts that has been
posted by the data owner
VIII CONCLUSION
The increase in the use of computers and the internet has caused a serious increase in the methods of information extraction
from social media In this study the information access and interpretation steps used in the literature have been investigated in detail
The sentiment analysis work was conducted on Facebook data Obtaining Facebook data clearing data transforming data into
numerical form extracting meaningful results and interpreting them are performed Machine learning algorithms are used by using KNIME software
In this study Decision Tree Learner and K-NN algorithms showing the accuracy sensitivity sensitivity and selectivity
values have been examined in detail in two different experimental sets The results obtained were compared in detail with the help of
tables In the future studies it is aimed to carry out sentiment analysis studies on different sets using different machine learning and
intelligent optimization algorithms In order to increase value of accuracy it is foreseen to prepare more suitable data sets to increase
the accuracy rate of the studies Using intelligent search and optimization algorithms with optimized parameters may also be used
with integrated feature selection methods to increase the sentiment analysis performances We designed novel features for use in the
classification of posts in order to develop a system through which informational data may be filtered from the conversations which
are not of much value in the context of searching for immediate information for relief efforts or bystanders to utilize in order to
minimize damages The results of our experiments show that classifying tweets as ldquorumorrdquo vs ldquonon rumorrdquo can use solely the proposed
features if computing resources are concerned since the computing power required to process data into featured is immensely
decreased in comparison to a BOW feature set which contains a substantially larger number of features However if computing power and time necessary to process incoming Facebook data are not a concern a combined feature set of the proposed features and BOW-
presence approach will maximize overall accuracy
IX FUTURE ENHANCEMENT
In future work we can extend our approach implement various classification algorithm to predict the attackers and also
eliminate the attackers from facebook datasets And try this approach to implement in various languages in facebook At the same it
can be extended to analyze not only texts but also images videos and so on So that the exact scenario of entire users and their entities
are managed with proper efficiency and avoids inappropriate medias
REFERENCES [1] John A H (2008) Online shopping Pew Internet amp American Life Project Report
[2] Com S Kelsey G (2007) Online consumer-generated reviews have significant impact on offline purchase behavior Press
Release November
[3] Chen B Leilei Z Daniel K Dongwon L (2010) What is an opinion about Exploring Political Standpoints Using Opinion
Scoring Model In Proceeedings of AAAI Conference on Artificial Intelligence
[4] Asur S Bernardo A Huberman (2010) Predicting the future with social media Arxiv preprint arXiv10035699
[5] Joshi M Dipanjan D Kevin G Noah A S (2010) Movie reviews and revenues An experiment in text regression in
Proceedings of the North American Chapter of the Association for computational Linguistics Human Language Technologies Conference (NAACL)
[6] Sadikov E Parameswaran A Petros V (2009) Blogs as predictors of movie success 1048824n Proceedings of the Third International Conference on Weblogs and Social Media (ICWSM)
[7] Nizam H (2016) Sosyal Medyada Makine Ouml1048824renmesi ile Duygu Analizinde Dengeli ve Dengesiz Veri Setlerinin
Performanslar1048824n1048824n Kar10488241048824la1048824t1048824r1048824lmas1048824 International Artificial Intelligence and Data Processing Symposium (IDAP16)
[8] Bilge U Bozkurt S O1048824uz Y B Oumlzel D (2011) Sosyal medya araccedillar1048824 Tuumlrkiyedeki grip benzeri hastal1048824klar1048824 saptayabilmek
iccedilin kullan1048824labilir mi XVI Tuumlrkiyede 1048824nternet Konferans1048824 1048824zmir
[9] K1048824vanccedil Y (2015) Sosyal A1048824lar Uumlzerinden Deprem Tespiti XVII Akademik Bili1048824im Konferans1048824 Bo1048824aziccedili Uumlniversitesi
[10] Suumltcuuml S Bayrakccedil1048824 S (2014) Sosyal Medya Gazeteleri Nas1048824l Etkiliyor Haberlerin Tw1048824tterrsquoda Yay1048824lmas1048824 Uumlzerine Bir
Ara1048824t1048824rma The Turkish Online Journal of Design Art and Communication ndash TOJDAC April 2014 Volume 4 Issue 2 40-52
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 494
BIOGRAPHIES
AAnitha MSc
Research Scholar
Department of Computer Science
Thanthai Hans Roever College
Perambalur
DrSSivakumar MCAMphilPhD
Head and Asstprofessor Department of Computer Science
Thanthai Hans Roever College
Perambalur
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 492
VI ARCHITECTURE DIAGRAM
VII MODULES
Data Acquisition
Preprocessing
Hybrid segmentation
Named Entity Recognition
Performance Evaluation
71 Modules Description
711 Data Acquisition
Facebook is an online social networking service that enables users to send and read messages images as well as videos posts
Registered users can read and post but unregistered users can only read them it is also only if the concerned data owner provides
permission then only it is possible Users access Facebook through the website interface or mobile device app In order to have an
opinion about the user his posts have to be examined Therefore using Facebook API all posts posted by user are crawled first In
this study we tried to examine the user with not only his posts but also his friendsrsquo posts However crawling all friendsrsquo posts is a
huge overload and misleading since Facebook following mechanism does not show an actual interest every time People sometimes tend to follow some users for a temporary occasion and then forget to un-follow Sometimes they follow some users just to be informed
of although they are not actually interested in There are also friends that do not post for a long time but still followed by the user In
this module we can upload the datasets as CSV file It contains following id followers id time stamp user following user followers
and posts The data of entire consumers of the facebook has been examined and their entities are analyzed for better process The data
that has been acquired are the posts that has been done by the users messages that has been sent and received and so on it continuous
712 Preprocessing
For named entities to be extracted successfully the informal writing style in posts has to be handled Before real data has
entered our lives studies on the area were being conducted on formal texts such as news articles Generally named entities are assumed
as words written in uppercase or mixed case phrases where uppercased letters are at the beginning and ending and almost all of the
studies bases on this assumption However capitalization is not a strong indicator in posts like informal texts sometimes even misleading As the example of capitalization shows the approaches have to be changed To extract named entities in posts the effect
of the informality of the posts has to be minimized as possible
713 Hybrid segmentation Hybrid Segmentation learns from both global and local contexts and has the ability of learning from pseudo feedback
HybridSeg is also designed to iteratively learn from confident segments as pseudo feedback Posts are posted for information sharing
and communication The named entities and semantic phrases are well preserved in posts The global context derived from Web pages
therefore helps identifying the meaningful segments in posts The well preserved linguistic features in these posts facilitate named
entity recognition with high accuracy Each named entity is a valid segment The method utilizing local linguistic features is denoted
by HybridSegNER It obtains confident segments based on the voting results of multiple off-the-shelf NER tools Another method
utilizing local collocation knowledge denoted by HybridSegNGram is proposed based on the observation that many posts published
within a short time period are about the same topic HybridSegNGram segments the posts by estimating the term-dependency within a batch of posts The segments recognized based on local context with high confidence serve as good feedback to extract more
meaningful segments The learning from pseudo feedback is conducted iteratively and the method implementing the iterative learning
is named HybridSegIter
714 Named Entity Recognition
Named Entity Recognition can be basically defined as identifying and categorizing certain type of data (ie person location
organization names and date-time and numeric expressions) in a certain type of text On the other hand tweets are characteristically
Data Acquisition
Datasets
Preprocessing
Stop Removal
Stemming words
analysis
Tokenization
Hybrid
Segmentation Global
Context Local Context
Pseudo
Feedback
POS tagger
Named Entity
Recognition
Network
Content Features
Blog Features
KNN classifiers
Trained
Rumors
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 493
short and noisy Given the length of a posts and restriction free writing style named entity recognition on this type of data become
challenging After basic segmentation a great number of named entities in the text such as personal names location names and organization names are not yet segmented and recognized properly Part of speech tagging is applicable to a wide range of NLP tasks
including named entity segmentation and information extraction Named Entity Recognition strategies vary on basically three factors
Language textual genre and domain and entity type Language is very important because language characteristics affect approaches
Assign each word to its most frequent tag and assign each Out of Vocabulary (OOV) word the most common POS tag Textual genre
is another concept whose effects cannot be neglected
715 Performance Evaluation
In this module we can evaluate the process of the system using accuracy rate and normalized utility Our proposed system
provides improved accuracy rate and normalized utility Once the messages that has been received by the receiver form the sender
side it analysis the data for negative words and positive words and incase of presence of negative it warns the user and if the process
continuous more than 3 or 4 times it will make a suggestion and blocks the users who are associated with the negative contents And also the posts that are shared by the data owner can be shared as a public and private In public the posts can be viewed by the entire
users present in the data owners profile whereas in private mode only the owner permitted users can access the posts that has been
posted by the data owner
VIII CONCLUSION
The increase in the use of computers and the internet has caused a serious increase in the methods of information extraction
from social media In this study the information access and interpretation steps used in the literature have been investigated in detail
The sentiment analysis work was conducted on Facebook data Obtaining Facebook data clearing data transforming data into
numerical form extracting meaningful results and interpreting them are performed Machine learning algorithms are used by using KNIME software
In this study Decision Tree Learner and K-NN algorithms showing the accuracy sensitivity sensitivity and selectivity
values have been examined in detail in two different experimental sets The results obtained were compared in detail with the help of
tables In the future studies it is aimed to carry out sentiment analysis studies on different sets using different machine learning and
intelligent optimization algorithms In order to increase value of accuracy it is foreseen to prepare more suitable data sets to increase
the accuracy rate of the studies Using intelligent search and optimization algorithms with optimized parameters may also be used
with integrated feature selection methods to increase the sentiment analysis performances We designed novel features for use in the
classification of posts in order to develop a system through which informational data may be filtered from the conversations which
are not of much value in the context of searching for immediate information for relief efforts or bystanders to utilize in order to
minimize damages The results of our experiments show that classifying tweets as ldquorumorrdquo vs ldquonon rumorrdquo can use solely the proposed
features if computing resources are concerned since the computing power required to process data into featured is immensely
decreased in comparison to a BOW feature set which contains a substantially larger number of features However if computing power and time necessary to process incoming Facebook data are not a concern a combined feature set of the proposed features and BOW-
presence approach will maximize overall accuracy
IX FUTURE ENHANCEMENT
In future work we can extend our approach implement various classification algorithm to predict the attackers and also
eliminate the attackers from facebook datasets And try this approach to implement in various languages in facebook At the same it
can be extended to analyze not only texts but also images videos and so on So that the exact scenario of entire users and their entities
are managed with proper efficiency and avoids inappropriate medias
REFERENCES [1] John A H (2008) Online shopping Pew Internet amp American Life Project Report
[2] Com S Kelsey G (2007) Online consumer-generated reviews have significant impact on offline purchase behavior Press
Release November
[3] Chen B Leilei Z Daniel K Dongwon L (2010) What is an opinion about Exploring Political Standpoints Using Opinion
Scoring Model In Proceeedings of AAAI Conference on Artificial Intelligence
[4] Asur S Bernardo A Huberman (2010) Predicting the future with social media Arxiv preprint arXiv10035699
[5] Joshi M Dipanjan D Kevin G Noah A S (2010) Movie reviews and revenues An experiment in text regression in
Proceedings of the North American Chapter of the Association for computational Linguistics Human Language Technologies Conference (NAACL)
[6] Sadikov E Parameswaran A Petros V (2009) Blogs as predictors of movie success 1048824n Proceedings of the Third International Conference on Weblogs and Social Media (ICWSM)
[7] Nizam H (2016) Sosyal Medyada Makine Ouml1048824renmesi ile Duygu Analizinde Dengeli ve Dengesiz Veri Setlerinin
Performanslar1048824n1048824n Kar10488241048824la1048824t1048824r1048824lmas1048824 International Artificial Intelligence and Data Processing Symposium (IDAP16)
[8] Bilge U Bozkurt S O1048824uz Y B Oumlzel D (2011) Sosyal medya araccedillar1048824 Tuumlrkiyedeki grip benzeri hastal1048824klar1048824 saptayabilmek
iccedilin kullan1048824labilir mi XVI Tuumlrkiyede 1048824nternet Konferans1048824 1048824zmir
[9] K1048824vanccedil Y (2015) Sosyal A1048824lar Uumlzerinden Deprem Tespiti XVII Akademik Bili1048824im Konferans1048824 Bo1048824aziccedili Uumlniversitesi
[10] Suumltcuuml S Bayrakccedil1048824 S (2014) Sosyal Medya Gazeteleri Nas1048824l Etkiliyor Haberlerin Tw1048824tterrsquoda Yay1048824lmas1048824 Uumlzerine Bir
Ara1048824t1048824rma The Turkish Online Journal of Design Art and Communication ndash TOJDAC April 2014 Volume 4 Issue 2 40-52
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 494
BIOGRAPHIES
AAnitha MSc
Research Scholar
Department of Computer Science
Thanthai Hans Roever College
Perambalur
DrSSivakumar MCAMphilPhD
Head and Asstprofessor Department of Computer Science
Thanthai Hans Roever College
Perambalur
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 493
short and noisy Given the length of a posts and restriction free writing style named entity recognition on this type of data become
challenging After basic segmentation a great number of named entities in the text such as personal names location names and organization names are not yet segmented and recognized properly Part of speech tagging is applicable to a wide range of NLP tasks
including named entity segmentation and information extraction Named Entity Recognition strategies vary on basically three factors
Language textual genre and domain and entity type Language is very important because language characteristics affect approaches
Assign each word to its most frequent tag and assign each Out of Vocabulary (OOV) word the most common POS tag Textual genre
is another concept whose effects cannot be neglected
715 Performance Evaluation
In this module we can evaluate the process of the system using accuracy rate and normalized utility Our proposed system
provides improved accuracy rate and normalized utility Once the messages that has been received by the receiver form the sender
side it analysis the data for negative words and positive words and incase of presence of negative it warns the user and if the process
continuous more than 3 or 4 times it will make a suggestion and blocks the users who are associated with the negative contents And also the posts that are shared by the data owner can be shared as a public and private In public the posts can be viewed by the entire
users present in the data owners profile whereas in private mode only the owner permitted users can access the posts that has been
posted by the data owner
VIII CONCLUSION
The increase in the use of computers and the internet has caused a serious increase in the methods of information extraction
from social media In this study the information access and interpretation steps used in the literature have been investigated in detail
The sentiment analysis work was conducted on Facebook data Obtaining Facebook data clearing data transforming data into
numerical form extracting meaningful results and interpreting them are performed Machine learning algorithms are used by using KNIME software
In this study Decision Tree Learner and K-NN algorithms showing the accuracy sensitivity sensitivity and selectivity
values have been examined in detail in two different experimental sets The results obtained were compared in detail with the help of
tables In the future studies it is aimed to carry out sentiment analysis studies on different sets using different machine learning and
intelligent optimization algorithms In order to increase value of accuracy it is foreseen to prepare more suitable data sets to increase
the accuracy rate of the studies Using intelligent search and optimization algorithms with optimized parameters may also be used
with integrated feature selection methods to increase the sentiment analysis performances We designed novel features for use in the
classification of posts in order to develop a system through which informational data may be filtered from the conversations which
are not of much value in the context of searching for immediate information for relief efforts or bystanders to utilize in order to
minimize damages The results of our experiments show that classifying tweets as ldquorumorrdquo vs ldquonon rumorrdquo can use solely the proposed
features if computing resources are concerned since the computing power required to process data into featured is immensely
decreased in comparison to a BOW feature set which contains a substantially larger number of features However if computing power and time necessary to process incoming Facebook data are not a concern a combined feature set of the proposed features and BOW-
presence approach will maximize overall accuracy
IX FUTURE ENHANCEMENT
In future work we can extend our approach implement various classification algorithm to predict the attackers and also
eliminate the attackers from facebook datasets And try this approach to implement in various languages in facebook At the same it
can be extended to analyze not only texts but also images videos and so on So that the exact scenario of entire users and their entities
are managed with proper efficiency and avoids inappropriate medias
REFERENCES [1] John A H (2008) Online shopping Pew Internet amp American Life Project Report
[2] Com S Kelsey G (2007) Online consumer-generated reviews have significant impact on offline purchase behavior Press
Release November
[3] Chen B Leilei Z Daniel K Dongwon L (2010) What is an opinion about Exploring Political Standpoints Using Opinion
Scoring Model In Proceeedings of AAAI Conference on Artificial Intelligence
[4] Asur S Bernardo A Huberman (2010) Predicting the future with social media Arxiv preprint arXiv10035699
[5] Joshi M Dipanjan D Kevin G Noah A S (2010) Movie reviews and revenues An experiment in text regression in
Proceedings of the North American Chapter of the Association for computational Linguistics Human Language Technologies Conference (NAACL)
[6] Sadikov E Parameswaran A Petros V (2009) Blogs as predictors of movie success 1048824n Proceedings of the Third International Conference on Weblogs and Social Media (ICWSM)
[7] Nizam H (2016) Sosyal Medyada Makine Ouml1048824renmesi ile Duygu Analizinde Dengeli ve Dengesiz Veri Setlerinin
Performanslar1048824n1048824n Kar10488241048824la1048824t1048824r1048824lmas1048824 International Artificial Intelligence and Data Processing Symposium (IDAP16)
[8] Bilge U Bozkurt S O1048824uz Y B Oumlzel D (2011) Sosyal medya araccedillar1048824 Tuumlrkiyedeki grip benzeri hastal1048824klar1048824 saptayabilmek
iccedilin kullan1048824labilir mi XVI Tuumlrkiyede 1048824nternet Konferans1048824 1048824zmir
[9] K1048824vanccedil Y (2015) Sosyal A1048824lar Uumlzerinden Deprem Tespiti XVII Akademik Bili1048824im Konferans1048824 Bo1048824aziccedili Uumlniversitesi
[10] Suumltcuuml S Bayrakccedil1048824 S (2014) Sosyal Medya Gazeteleri Nas1048824l Etkiliyor Haberlerin Tw1048824tterrsquoda Yay1048824lmas1048824 Uumlzerine Bir
Ara1048824t1048824rma The Turkish Online Journal of Design Art and Communication ndash TOJDAC April 2014 Volume 4 Issue 2 40-52
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 494
BIOGRAPHIES
AAnitha MSc
Research Scholar
Department of Computer Science
Thanthai Hans Roever College
Perambalur
DrSSivakumar MCAMphilPhD
Head and Asstprofessor Department of Computer Science
Thanthai Hans Roever College
Perambalur
copy 2019 IJRAR August 2019 Volume 6 Issue 3 wwwijrarorg (E-ISSN 2348-1269 P- ISSN 2349-5138)
IJRAR19K4870 International Journal of Research and Analytical Reviews (IJRAR) wwwijrarorg 494
BIOGRAPHIES
AAnitha MSc
Research Scholar
Department of Computer Science
Thanthai Hans Roever College
Perambalur
DrSSivakumar MCAMphilPhD
Head and Asstprofessor Department of Computer Science
Thanthai Hans Roever College
Perambalur