agent-based user-adapted filters forcategories of
TRANSCRIPT
Agent-based user-adapted Filters forCategories of Harassing CommunicationZenefa Rahaman
The University of Tulsa
Tulsa, Oklahoma
Sandip Sen
The University of Tulsa
Tulsa, Oklahoma
ABSTRACTOnline communication and information sharing platforms have
witnessed rapid growth in usage leading to peer-to-peer communi-
cation at unprecedented scale and diversity. Unfortunately, these
platforms also witness an abundance of hateful aggression and ha-
rassment towards individuals targeted because of their identities or
expressed opinions. Such online harassment can have a significant
negative impact on individual health and social relationships. Indi-
viduals vary in their perception and sensibility towards different
categories of harassing communication. Individual preferences can
be used to develop an adaptive agent-based interventionmechanism
to protect social media users from harassing communication.
We present the design and implementation of an user-adapted
agent-based harassment filter that adapts to user sensibilities and
filtering preferences continuously. In this paper, we introduce a
taxonomy of harassing communication types, describe the process
of collecting and analyzing a crowdsourced Twitter dataset high-
lighting the need for agent-based user-adapted filters, and present
comparative results from user-adapted filters trained with different
learning schemes that conform to user filtering preferences. Re-
sults show that user-adapted, user-specific filters can significantly
outperform general filters even with limited user input
KEYWORDSCyber harassment; detection; adaptive agent; agent-based filter;
1 INTRODUCTIONToday’s citizens are highly engaged in online social media: both
the number of users and the time spent online per user is on the
upswing. According to a report by the Pew research center, nearly
76% of American adults use social networking sites in 2017, which
was only 7% in 2005 [10]. These social media platforms become
valuable repositories of people’s opinions and sentiments about ser-
vices they use as well as their political and religious views. Hence,
these sites have key influence on user’s opinions and sentiments.
Correspondingly, the collected data serve as valuable data sources
for businesses, researchers, and policymakers. Whereas these new
communication channels, such as online social networks [19] and
news sharing sites [20], offer myriad opportunities for knowledge
sharing and opinion mobilization [6], they also reveal an abundance
of unfortunate intimidatory and hateful aggression [27] towards
individuals targeted [37] because of their expressed opinions or
identities. Cyber-harassment involves any aggressive and unwanted
online communication with the intent to target and intimidate a vic-
tim. A recent study [14] found that 40% of adult Internet users have
experienced online harassment with young women enduring par-
ticularly severe forms of it. 38% of women who had been harassed
online reported the experience could be described as extremely up-
setting. This victimization of individuals [9] have significant social
costs varying from social ostracism to opinion marginalization and
suppression, and can cause severe health detriments ranging from
anxiety [31] to depression [39] to suicide ideation [18, 29].
Victims of such online attacks are often minority groups or
individuals voicing dissent, professionals covering controversial
topics essential for informing a democratic society, and groups
raising awareness about important issues [4]. A study of offline and
online harassment of female journalists found that two-thirds of
the respondents reported acts of intimidation, threats, and abuse
related to their work [3].
While several computational studies have developed automated
mechanisms for detection of unwarranted victimization and harass-
ing attacks on social network and microblogging platforms [26, 32,
40], more comprehensive detection and intervention mechanisms
that are grounded in well-founded, interdisciplinary theory of hu-
man aggressive and predatory behavior is needed. In this paper,
we present a taxonomy of harassment categories to characterize
different types of hateful and abusive rhetoric that is common in
online social media platforms. This taxonomy will facilitate the
development of harassment filters, our research focus as elaborated
below.
The goal of this paper is to develop an agent-based user-adapted
harassment filters that continuously adapt to individual user’s
threat perception, tolerance, and sensitivity. As users may have
varying sensibility towards different types of harassment, our first
goal is to understand the need for agent-based user-adapted filtering
of harassing tweets.
To gather ground truth data necessary to identify varying ha-
rassment perceptions and to demonstrate the feasibility of training
user-adapted filters, we needed user-labeled data-sets. To obtain
this data, we first collected a set of ≈ 5230 tweets using the Twitter
API and matching keywords associated with the categories identi-
fied in the taxonomy. Thereafter, we used a crowdsourcing service,
Amazon Mechanical Turk, where MTurk workers were tasked to
label presented tweets based on their perceived harassment inten-
sities and if they wanted to filter them. We collected approximately
26,300 responses from 360 participants for this study (see Section 4
for further details). The goal of the survey is to understand the vari-
ation of sensibility among the users, the variation in the tolerance
or acceptance of harassment categories in our taxonomy, and to
build agent-based user-adapted filter mechanisms. The collected
data is used to find the answers to the following research questions:
(1) How user sensibility vary from category to category?
(2) How the perception and the acceptance of the different type
of harassment vary in a population?
(3) How the same data can have different impacts on users and
how their reactions vary?
We performed an in-depth analysis of the collected Survey Data
to identify the variation of harassment sensibility and tolerance
levels among the users over different harassment categories in our
taxonomy. Our analysis show how users’ perception of harassment
intensity and filtering preference varies depending on harassment
categories. Also, different users’ perceived intensity and acceptance
can vary significantly for the same tweet. We evaluated filtering
mechanisms for each user and found that agent-based, user-adapted
filters are more accurate in predicting user preferences when com-
pared with a general filter trained on the entire dataset. The general
filter failed to protect a sensitive user from exposure to unwanted
or unacceptable tweets as the general filter learns the filtering need
of the tweets based on the majority of filter/no-filter labels over the
population. The observations support the need for and benefit of
user-adapted filtering of harassing communication.
2 CATEGORIES OF CYBER-HARASSMENTMost cyber-harassment research that we have come across have
primarily focused on binary classification of harassment: a commu-
nication either contains harassing content or does not [11, 13, 35].
Though a standard definition of online harassment does not
exist, most definitions include the following key components: (1)
unwanted behavior that occurs through electronically mediated
communication, (2) behavior which violates the dignity of a person
by creating a hostile, degrading, or offensive environment [5]. Be-
haviors can include offensive name calling, attempts to embarrass,
physical threats, stalking [14], gender harassment, unwanted sexual
attention, sexual coercion, denigration (sending harmful or cruel
statement about a person to other people online), impersonation
(pretending to be another person in order to make that person look
bad), flaming (sending angry, vulgar or rude messages about an
individual through an online, public forum), and exclusion (the
exclusion of an individual from an online group) [33]. Definitions
vary in terms of whether the behavior must occur multiple times,
intentionally cause harm, and/or involve a perpetrator known to
the victim [14].
[25] have emphasized the need for clear definitions in research
for the field to progress. Much of the current literature examining
offline harassment require knowledge of a perpetrator’s intentions
which, while difficult to discern in offline environments, is almost
impossible to confirm in online environments such as social media
platforms. For this reason, online harassment as currently under-
stood by researchers is defined in terms of a victim or third party’s
understanding rather than a perpetrator’s motives. Researchers
believe there are three main reasons perpetrators may purposefully
engage in harassing behavior. Purposefully harassing behavior in-
cludes (1) rude comments used as a form of self-expression [12], (2)
intimidation strategically designed to (a) interrupt communication
on a topic or (b) retaliate for past reports or comments [4]; and
(3) acts with no strategic aims other than causing psychological
or physical harm [7]. Further complicating an understanding of
harassment is the variability in how harassment is perceived. Many
definitions require the victim to view the behavior as offensive or
threatening [16]. Understanding a victim’s reaction is equally as
difficult as discerning a perpetrator’s motive when researchers are
unable to directly communicate with the victim.
To understand different types of online harassment, a taxonomy
was prepared containing different categories of cyber-harassment
that was paired with an associated vocabulary. The initial category
list is generated from existing literature [14, 28, 33] :(i) Insults
and name calling, (ii) Sending harmful or cruel statement about
one user to others, (iii) Religious/racial/ethnic epitaphs, (iv) Sexual
orientation, (v) Sex/ gender, (vi) Threat, (vii) Multiple type (message
contain more than one harassing type), (viii) Revealing personal
information online about target, (ix) Tapping, computer viruses,
and other digital security threats, (x) General threat of physical
harm to self or others, (xi) Specific threats of physical harm with
individual, and (xii) Impersonation online.
After analyzing a stream of tweets over a period of time, we
observed that some of the categories that we initially listed did
not appear on Twitter streams. Accordingly, we paired the initial
list down to the following that was used in our subsequent data
gathering and experimentation:
General harassment: Communication is harassing, but does
not easily fit into any other identified category.
Cruel statement: Communication contains information that
is negative and personal. This information is directed at a
specific target and does not specifically address a person’s
religion, race, ethnicity, sexual orientation, or gender/sex.
Religious/racial/ethnic slurs: Communication designed to
highlight or attack a person’s race, religion, or ethnicity.
Harassment based sexual orientation: Communication de-
signed to highlight or attack a person’s sexual orientation.
Sexual harassment(Gender based harassment): Communication
contains a short, negative label designed to highlight or at-
tack a person’s sex or gender.
Threats of physical harm/violence: Communication contains
a direct threat, metaphorical or actual, against a person’s
property, family, self, or digital presence.
Multiple types: Communication contains more than one type
of harassment defined.
Non-harassment: Communication does not fit into any of the
discussed harassment categories.
We also identified keywords for each of the identified harassment
categories based on frequent words used in corresponding tweets.
3 HARASSMENT DATA SETWe chose the Twitter platform for collecting harassing communi-
cation. Using Twitter’s streaming API, we obtained a set of tweets
matching the keywords associated with the initially identified cat-
egories and stored them in a MySql database. As is the case with
many real-world data, the collected data had several issues which
required cleaning and pre-processing to facilitate further research
and analysis. After pre-processing, 5231 out of 8000 collected tweets
were found to be usable. For each tweet in this set, we had three
individuals label it according to one of the category types in our tax-
onomy. The majority label was then associated with the tweet and
used for the remainder of our experiments reported in this paper.
2
Table 1: Frequency of Tweet Categories
Category # of tweets
General harassment 79
Cruel statement 1054
Religious/racial/ethnic 89
Sexual orientation 11
Sex/ gender 656
Threat 236
Multiple types 106
Non-harassment 2382
Non-codable 618
Total data 5231
The result of the labeling is presented in Table 1 which contains the
number of tweets assigned to each of the aforementioned categories.
We refer to this dataset as the Twitter Categorical dataset.Almost 45% of this usable data are categorized as non-harassing or
do not fit into any given categories. We observed, that several data
points lacked in unanimous labels. This observation shows that
there is a pronounced variation in harassment perception among
the human labelers. This perception variability results in similar
tweets being assigned different category labels.
4 SURVEY DATA: USER-BASED SENSITIVITYDATA
We wanted to understand how different users have distinct sensi-
bilities towards different harassment categories. We used Amazon
Mechanical Turk [34] workers to rate harassment intensities and
the need for filtering of individual tweets1. The MTurk survey par-
ticipants were provided a collection of unlabeled tweets, drawn
from multiple harassment categories of the Twitter Categorical
dataset. Participants were asked to respond to the following two
questions about each tweet:
(1) What intensity of harassment is present in the following
tweet?
Options: (1) None, (2) Minimal, (3) Moderate, (4) High, (5)Extreme.
(2) Do you want to filter this tweet?
Options: (1) Yes, (2) No.
The survey is prepared using Django, a based web application
hosted on the Heroku website [23]. The link of the survey was
posted on Amazon Mechanical Turk website to reach MTurkers.
For each MTurk participant, the survey contained 75 tweets se-
lected from all categories of the Twitter Categorical data. Each
tweet was presented to 5 MTurk workers. Around 360 MTurkers
successfully completed this study over a two week period which
allowed us to collect around 26,500 responses. We used this data
set to understand how perception and acceptance of the different
type of harassment vary in a population. As described below, the
analysis of the dataset also provided the rationale for user-adapted
, agent-based harassment filters learnt from user filtering choices.
1Amazon Mechanical Turk (MTurk) is a crowdsourcing platform where researchers
can post human intelligence tasks to be performed by paid human workers (MTurkers).
Figure 1: Percentage of data selected by users to be filteredfor the different harassment categories.
5 NEED FOR USER-ADAPTED FILTERSWewanted to understand if different users have distinct harassment
perception and filtering sensibility/preference, with the goal of
determining if user-adapted agent based harassment filters are
warranted.
As mentioned above, each user was asked for each of the tweets
presented, if (s)he would prefer that tweet to be automatically
filtered from their tweet stream. The percentage of data in each
of the harassment categories that the users wanted to be filtered
is shown in Figure 1. In Figure 1, the X-axis represents different
harassment categories and the Y-axis represents the percentage of
tweets selected to be filtered. Each bar represents one category and
the ’red’ colored part of the bar represents the percentage of tweets
to be filtered for that category. We observe there are variations in
the percentage of data to be filtered for different categories.
Observation 1. User filtering preferences or acceptance of cyber-harassment vary by harassment categories.
Figure 2 depicts the percentage of tweets to be filtered for each
category given the perceived harassment intensity. In the figure, the
X-axis represents different intensity levels. The Y-axis represents
the percentage of tweets selected to be filtered. Each colored line
corresponds to a particular harassment category.
Observation 2. The percentage of tweets selected for filteringincreases with the increase in perceived harassment intensity.
This pattern holds for each of the harassment categories. This ob-
servation supports the hypothesis that users have lower acceptance
of higher intensity of cyber-harassment. The figure also shows that
the tolerance level does vary between categories for the same ha-
rassment intensity. That leads to the following critical observation:
Observation 3. Harassment tolerance depends both on the cate-gory of harassment as well as the perceived intensity of harassment.
5.1 Intensity level influence on filtering choiceIn this section, we present results from statistical tests to determine
the significance of the observed trends in the survey.
5.1.1 ANOVA test: We used one-way ANOVA tests to determine
if the choice to filter tweets varied significantly by the perceived ha-
rassment intensity level. ANOVA checks the impact of one or more
3
Figure 2: Percentage of tweets selected to be filtered for eachcategory and for different perceived harassment intensitylevels. Y-axis represents the percentage of tweets selected tobe filtered and the x-axis represents different intensity lev-els.
factors by comparing the means of different samples. Using this
test we can conclude if there is a statistically significant difference
between the percentage of tweets selected for filtering between
different perceived harassment intensity levels. The following two
hypothesis are considered in ANOVA testing:
Null hypothesis (H0): All intensity levels have the similar
effect on the population of users
Alternate hypothesis: Different intensity levels have the sig-
nificantly different effect on users
The ANOVA test for the collected data for different intensity levels
yielded an F-statistic value that measures the differences in the
means of different intensity levels and suggests whether the levels
are significantly different or not. α is considered the significance
level and the probability of rejecting the null hypothesis while it
is true. If the F-statistic is more than the F-critical value for the
chosen α value, then the null hypothesis (H0) can be rejected and
we can say that different intensity levels have significant effects
on users. For the data presented above, ANOVA test returned an F-
value of 304.7577 which is greater than the F-critical value, 2.64146,
for the selected alpha level of 0.05. The significance can also be
determined by comparing the p-value calculated with the α =0.05 value selected: if the P-value is less than α , we can reject the
Null Hypothesis. The ANOVA result is [F (4, 35) = 304.758,p =1.139e−26 < 0.05], and based on calculated p-value (p = 1.139e−26).Hence the null hypothesis (H0) can be rejected and we claim that
different intensity levels have significantly different effect on user
filtering selections.
5.1.2 Effect size: In addition to checking the statistical signifi-
cance, it is useful to report the effect size measure for an ANOVA
test, which reflects the size of the performance difference of the
alternatives being considered. A high effect size value signifies
that not only there are more than two groups which significantly
differ from each other but also suggests that the difference is sig-
nificantly high. For the ANOVA test we ran, the effect size value,
η2 = SSintensit iesSSTotal
, is 0.972. This suggests that some of the intensity
levels are different from others by a high margin.
The limitations of the one-way ANOVA testing is that while it
can suggest that at least two of the tested data groups were different
from each other, it cannot identify which groups were different.
To identify those groups, we needed to further test what are the
intensity levels that result in significant differences.
For this purpose we use Tukey test which is a statistical signifi-
cance test that is often used to identify the effect size with ANOVA
analysis.
5.1.3 Tukey Honest Significant Difference (HSD):. Tukey’s Hon-est Significant Difference (HSD) test or Tukey test is a post-hoc
test based on the studentized range distribution. Tukey’s HSD is
considered to be the strongest test to find out which groups are
significantly different and is widely used. This test calculates all
possible pairs of means and then calculate the significant difference.
Based on the computed q-statistics by Tukey test on our dataset,
we observe that Intensity level 1-3 are significantly different from
all the other intensity levels (1-5) with a 99% confidence. Intensity
4 is different from intensity 5 with 95% confidence value. All the
pairwise comparisons support the alternative hypothesis.
5.2 Influence of Harassment Categories onfiltering choice
We have so far presented statistical analysis to understand whether
user‘s filtering choice are affected by the perceived intensity level. In
this section, we discuss the statistical significance of user’s filtering
choice as influenced by the the particular category of harassment
contained in the received communication. In particular, for a given
perceived harassment intensity level, whether the percentage of
tweets marked for filtering varied for different harassment cate-
gories? It is possible that such differences can exist for only some,
but not for all, perceived intensity levels. As we have only one value,
percentage of tweets marked for filtering for each harassment cat-
egory, for each intensity level, we needed a different significance
testing mechanism than ANOVA. In the following, we present the
results from using the Confidence Interval Coefficient measure used
for this analysis.
5.2.1 Confidence Interval Coefficient: Confidence Interval Coef-ficient [2] compares proportions of samples through constructing
simultaneous confidence intervals and P-difference for the con-
fidence intervals. We constructed confidence intervals to make
pairwise comparisons of the percentage of tweets to be filtered
for different categories for a given intensity. The significance is
determined by comparing the p-difference with the α value (we
chose α = 0.5). If the p-difference calculated for confidence inter-
vals is greater than the α level selected, it suggests that for the same
intensity level there is a significant difference in the chosen cate-
gories. In Figure 2, we have presented the percentage of tweets to
be filtered for each category and for all the intensity levels. Results
of Confidence Interval Coefficient for intensity level ’2’ is shown
in Table 2. The rows that are highlighted in green indicates the
categories which are significantly different from each other for the
given intensity level. Results in Table 2 suggests that for intensity
level ’2’, users choose a different percentage of tweets for category
0 compared to category 3-7 and these differences are statistically
4
Table 2: Confidence Interval Coefficient for Intensity Level‘2’ (columns highlighted denote statistically significant dif-ference; 19 pairs are different).
Category Category P-Diff Lower Upper
0 1 0.0704 -0.2139 0.0759
0 2 0.0055 -0.1448 0.1339
0 3 0.133 -0.2791 0.0182
0 4 0.137 0.01433 0.2546
0 5 0.0898 -0.2342 0.0582
0 6 0.1142 -0.2597 0.0357
0 7 0.0698 -0.2133 0.0764
1 2 0.0649 -0.0818 0.2091
1 3 0.0626 -0.2157 0.0929
1 4 0.2075 0.0764 0.3307
1 5 0.0194 -0.1709 0.133
1 6 0.0438 -0.1963 0.1105
1 7 0.0006 -0.15007 0.1513
2 3 0.1275 -0.2742 0.02415
2 4 0.1427 0.01909 0.2607
2 5 0.0843 -0.2294 0.0641
2 6 0.1087 -0.2548 0.0416
2 7 0.0632 -0.0923 0.2163
3 4 0.2702 0.1335 0.3963
3 5 0.0432 -0.1132 0.1979
3 6 0.0188 -0.1385 0.1754
3 7 0.0632 -0.0923 0.2163
4 5 0.2269 -0.3511 -0.0938
4 6 0.2514 -0.3767 -0.1162
4 7 0.2069 -0.3299 -0.0758
5 6 0.0244 -0.1786 0.13073
5 7 0.01999 -0.1324 0.17157
6 7 0.0444 -0.1098 0.1969
significant. For space considerations, we are not presenting the ta-
bles for other perceived harassment intensities. We have observed
that user acceptance of different harassment categories vary signifi-
cantly at lower intensity levels. As the perceived intensity increases,
however, the difference in the acceptance of harassment of different
categories faded. This is primarily due to the fact that more and
more tweets, irrespective of categories, are selected to be filtered
by users at high perceived harassment intensity levels. Observation
from the statistical test on the user responses confirms that user
choice of tweets to filter is dependent on both the category and the
intensity level of a particular tweet. Results from ANOVA testing
and Tukey test shows that user filtering choice have significant
differences with varying intensity levels. The corresponding effect
sizes were also found to be high. Results of Confidence Interval Co-
efficient suggests that the user reaction varies for certain categories
of harassment for the same intensity levels. These observations
highlight the need for user-adaptiveness.
6 USER-ADAPT HARASSMENT FILTERSWe now focus on developing agent-based user-adapted filtering
of harassment content for the users. As we observed above, users
acceptance or tolerance for harassing communication varies based
Figure 3: Histogram of # users wanting to filter certain num-ber of tweets. X axis represents # of tweets (0-75), Y axis rep-resents # of users wanting to filter that many tweets out ofthe 75 presented.
on intensity level and harassment category. In addition, different
users have a different filtering preference for different intensity
levels and categories. So a general filter, trained on all labeled data,
is going to be ineffective to meet the filtering needs of individual
users.
To develop an adaptive personal agent for filtering harassing
communication, we further analyzed the Survey Data. In Figure 3,
we present a histogram of the counts for users with the number of
tweets they wanted to be filtered. Each bar in the histogram shows
how many people wanted to filter that number of tweets out of the
75 tweets they labeled. The x-axis of the histogram represents the
number of tweets (0-75) and the Y-axis represents the number of
people who wanted to filter that many tweets out of the 75 tweets
presented. The histogram demonstrates that there is considerable
variation in the population based on individual user’s acceptance of
harassment and hence the need for user-adapted filtering of harass-
ing communication. The results clearly demonstrate that there were
participants who were extremely tolerant about harassing content
and chose not to filter any of the tweets. On the other end, there
were participants who were extremely sensitive to harassment and
chose to filter all the tweets they receive based on the contents.
Variation in sensibility is reflected in the variations in the number
of tweets to be filtered as selected by the users.
We recall that each tweet from the Twitter Categorical data was
labeled by 5 MTurk users for harassment intensity and filtering
preference. In Figure 4 (a) we present a count of tweets that received
a certain number of votes, from 0-5, for filtering. In Figure 4 (b)
we show how many tweets were unanimously voted by all the five
users to filter/not-filter, how many got significant majority (4:1 or
1:4 ratio) of filter/not-filter selection, and how many tweets had
maximal disagreement (3 to 2 or 2 to 3 counts) for filter/no-filter
choices. Results show that only 18% of the tweets were unanimously
voted on for filter/no-filter. Approximately 38% and 45% of tweets
were marked for majority and maximal disagreement groups re-
spectively. These observations clearly demonstrates that different
users vary in their harassment sensibility and acceptance.
5
(a)
(b)
Figure 4: (a) Frequency plot of tweets labeled for filtering by0-5 users (X-axis represents howmany users, out of 5, choseto filter a tweet, Y-axis represents the # of tweets). (b) UserChoice variability plot.
6.1 A general harassment filterBefore developing agent-based user-adapted harassment filters, we
created a general filter that learns to filter based on the majority of
filter/no-filter value for each tweet. The filter-label of each tweet is
decided based on the majority of the 5 votes. Developing a single
filter for all users, rather than user-adapted, agent-based filters
learnt from interactions with each user, is computationally cheap.
We believe, however, that a single general filter will not be effective
for all users because of significant false positives and false negatives,
and hence precision and recall errors. In particular, such a general
filter will not be able to protect a sensitive user from exposure to
unwanted or unacceptable harassing tweets.
6.2 Agent-based user-adapted filtersIn the agent-based filter mechanism, the agents are not only ini-
tially trained by learning on the user filtering choices but can also
continuously monitor and adapt based on the users’ online behavior.
Based on these interaction experience, agents can adapt their filter-
ing behavior to best reflect changing user sensibilities and protect
them from harassing encounters on their social media accounts.
In our experiments, we allocated one agent for each user in the
system (Figure 5 presents the interaction between the agent and
Figure 5: An agent-based user-adapted harassment filter.
Figure 6: Accuracy of General Vs user-adapted Filters.
its associated user.). We provide training data from the filtering
choices on tweets to an agent from its associated user. That agent
then used effective supervised classification algorithms to learn
user-adapted filters for its user. With experience an agent learns
about user’s different interaction preferences, and classifiers trained
using this gathered knowledge helps the agent adapt to current user
preferences. While this modality of agent-based filter development
requires much more effort, as one agent has to be trained for each
user, we believe that performance improvements in intervention by
shielding users from unwanted harassing communication, justifies
this additional computational cost.
We usedNaive Bayes (NB) [22], Support VectorMachine (SVM) [17]
and Random forest classifier [21] to train user-adapted filters. We
also use the same classification mechanisms to build the general
filters. In Figure 6, we present a comparison of the 10-fold cross-
validation accuracy of a Naive Bayes based user-adapted filter for
each agent to that of the corresponding general filter when used for
each user. In Figure 6, the X-axis represent each user and the Y-axis
represents the accuracy of each user for general and agent-based
user-adapted filtering mechanism. Note that the users are sorted in
6
non-decreasing order of the number of tweets they chose to filter
out of the 75 presented to them, i.e., higher numbered users chose to
filter a higher percentage of the tweets they rated (the user number,
on the x-axis, does not directly map into the percentage of tweets
filtered).
We observe that for almost all the cases, user-adapted filter sig-
nificantly outperform the general filter. The V-shape of the accuracy
plot for the user-adapted filter reflects the fact that it is easier to
predict the choice of users who chose to filter almost all or almost
none of the tweets, compared to the user who were more discerning,
e.g., chose to filter half of the tweets rated. We also find the accuracy
rate of predicting the choice of the more discerning users to be more
dispersed: choices of some of these users are significantly easier to
predict than others. While the accuracy rates of filtering choices
for the users in the middle range is not stellar, an online agent
may be able improve its prediction with more interactions with
the associated user. There are also a few odd instances where the
general classifier do well; we believe those users may have filtering
preferences more aligned with the "population average". As trends
from the user-adapted and general filter trained using SVMs and
Random Forest classifiers are similar, we have not included them.
6.3 Comparison of user-adapted classifier withmajority filtering decision
We did further analysis with a Majority Class filter for each user
which uses the majority filtering decision of a user, i.e., if a user
chooses to filter a majority of the tweets presented then all tweets
are filtered and vice versa. The purpose of using the Majority filter
is to check whether an agent is able to leverage user-specific train-
ing data to build a user-adapted filtering mechanism and improve
on a baseline filtering scheme which also is based only on that
user’s choices (as compared to the general filter in the previous
section that is trained on data from all users). In Figure 7, the X-axis
represents each user, ordered in increasing number of the tweets
they chose to filter, and the Y-axis represents the accuracy of each
user for a user-adapted Majority Class filter and the user-adapted
filtering mechanisms. Figure 7 also shows the classification accu-
racy all three (NB, SVM and Random Forest) user-adapted learning
classifiers. Table 3 which shows the number cases the majority
classifier did better (win), worse, or tied with each of the learning
filters. Results show that the user-adapted classifiers using Naive
Bayes, Random Forest and SVM classifier does sligthly better than
the user-adapted majority filter, whereas the Naive Bayes filter
significantly outperforms it.
To further verify if the performance differences of the user-
adapted learning and the majority filters are statistically significant,
we use the Wilcoxon Signed Rank Test [38], which gives the fol-
lowing results: (a) the user-adapted filtering mechanisms based
on Naive Bayes, Support Vector Machine and Random Forest per-
formed significantly different from the user-adapted Majority class
classifier, (b) the Naive Bayes filter performs significantly different
than Random Forest and Support Vector Machine, (c) performance
of Random Forest and Support Vector Machines are not statistically
significantly different from each other.
Figure 7: Accuracy of Majority class Vs (NB, SVM, RF).
Table 3: Comparison of performance of the learning algo-rithm with majority Algorithm.
Majority Vs SVM RF NB
Win 149 132 52
Loose 166 143 136
tie 42 83 169
These results show that the user-adapted learning filtering mech-
anisms are able to leverage user-based training data to improve on
an user-adapted baseline classifier.
7 RELATEDWORKPrior work on harassment detection spans several web platforms
including Twitter, Facebook, Instagram, Yahoo!, YouTube, etc. Dif-
ferent platforms have unique communicationmodalities, user demo-
graphics and content and may, therefore, display different subtypes
of hateful communication. For instance, one should expect quite
different types of hate content on a platform catering to adolescents
than on a web-platform used by a wider cross-section of the general
public. Manual analysis of data and establishment of relationships
between multiple features are often error-prone. Machine learning
has been used to address this issue.
In [40], a supervised classification technique is used along with
local, sentimental and contextual features extracted from a post
using Term Frequency-Inverse Document Frequency (TF-IDF). The
classification technique is conjugated with n-grams and other fea-
tures, such as incorporating abusiveness, to train a model for detect-
ing harassment. A significant improvement over the general TF-IDF
scheme is observed while adding the sentimental and contextual fea-
tures. In [32], Support vector machines (SVMs) were used to learn a
model of profanity using the bag of words (BOW) approach to find
the optimal features. This approach surpasses the performance of
all previous list-based profanity detection techniques. A linguistic
and behavioral pattern based model [24] was proposed to filter
short texts, detect spam and abusive users in the network. It used
real-world SMS data set from a large telecommunications operator
7
from the US and a social media corpus. It also addressed different
ways to deal with short text message challenges such as tokeniza-
tion and entity detection by using text normalization and substring
clustering techniques. A comprehensive approach to detecting hate
speech was proposed in [35] which presents a plan that targets spe-
cific group characteristics, including ethnic origin, religion, gender,
and sexual orientation. The paragraph2vec approach [13] is used to
classify anti-Semitic speech on data collected over a 6-month period
from Yahoo Finance website. In [26], a comprehensive lists of slurs,
obtained from hate speech and an array of features for abusive
language detection. such as POS tags, the presence of blacklisted
words, n-gram features including token and character n-grams
and length features, are used. Their scheme outperformed a deep
learning approach by focusing on good annotation guidelines that
help detect specific abusive language. In [8], an exploratory sin-
gle blended model of cyber-hate that incorporates knowledge of
features across multiple types was used. The proposed method im-
proved classification for different types of cyber-hate beyond the
use of a BOW and known hateful terms. In [36], author analyzed
the impact of various extra-linguistic features in conjunction with
character n-grams for hate speech detection. It was observed that
though the gender feature is informative, differences in the geo-
graphic and word-length distribution do not improve performance
over character level features. A list of criteria based on critical race
theory to identify racist and sexist slurs was presented.
Most of the research studies that we have come across, including
the ones summarized above, have focused on binary classification
of communication. Cyber-harassment, however, is multi-faceted
and may convey diverse content including hostility, humiliation,
insults, threats, unwanted sexual advances, etc. to a target.
Human moderation efforts for curbing online abuse have proved
to be inadequate and automated agents can be effective tools for
mitigating cyber-harassment. Despite the research on automated
harassment detection, quality research on automated intervention
is sparse. Two recent studies, however, explore automated inter-
vention: (i) [15] uses bots to help maintain block-lists, but requires
manual user additions, and (ii) [1] uses a bot, imitating a personwith
similar demographic characteristics to the troll, to provide feedback
to dissuade hateful comments. These approaches do not consider in-
dividual user’s harassment perception or sensibility. In [30], author
discussed some research challenges that can be approached with
intelligent agents based solution. Author also sheds light on how an
user-based agent can be useful to learn user’s threat perception and
acceptance and filter different abusive incoming communications
directed towards the user.
As there is a gap between psychological theories and computa-
tional models and concepts of what constitutes aggression, training
computers for comprehensive harassment detection has proved to
be challenging. The goal of this research is to remove this discon-
nect and leverage computational and psychological approaches to
identify different aggression categories, crowdsourced feedback,
surveys and statistical techniques. Based on the identified cate-
gories, an effective agent-based detection tool has been built to
filter communication based on user’s sensibility and preferences.
8 CONCLUSIONS AND FUTUREWORKOur goal is to develop a better understanding of cyber harassment
types, the varying perception among the users to harassing com-
munication, as well as the variation in the tolerance or acceptance
of harassment categories, and hence the need for user-adapted fil-
tering of harassing communication. To achieve our goal, we used a
taxonomy of harassment categories grounded in the psychology
literature, collected a set of tweets that matched identified key-
words associated with different harassment categories, developed a
crowd-sourced data-set where MTurk workers were asked to rate
the harassment intensity of presented tweets and if they wanted to
filtered them from their input stream. We performed an in-depth
analysis of the labeled data to identify the variation of harassment
sensibility and tolerance level among the users over different ha-
rassment categories. These results show how user’s perspective
varies depending on harassment categories and the intensity of the
tweets. As different users’ perceived intensity and acceptance could
vary significantly for the same tweet, this highlighted the need
for an adaptive learning agent that can user-adapted intervention
mechanisms to mitigate the effects of cyber-harassment.
We trained agent-based user-adapted filters from labeled data
and evaluated their ability to filter harassing communication for
each user. Each agent learned to filter for a user using only about 60
tweets. The user-adapted filters performed better than one general
filter trained on labeled data from all users. We also found that the
learned user-adapted filters outperformed a base-line non-learning
filter using the majority filtering choice for any tweet.
These results reaffirmed the pressing need for user-adapted ha-
rassment filtering mechanisms rather than relying on a single,
generic filter for all. This work can be extended to provide support-
ive communication to the victims of cyber harassment/aggression.
The victims of aggressive communication can be heartened by re-
ceiving supportive communication. A user based agent can identify
the personality traits and the aggression tolerance levels and thus
can provide supportive communication that dilute the chain of ha-
rassing comments received, divert attention from these negative
inputs, and bolster the confidence and self-esteem of the target
of aggression. A different and possibly effective future research
avenue would be to develop an agent that sends responses to online
aggressors with the goal of disarming, discouraging, and eliminat-
ing future harassment. Based on the learned user perception and
tolerance, an agent can detect aggressive tweets and can identify
the source. Based on other attributes like the frequency, intensity
level, and the number of users attacked by such sources, an agent
can detect potential threat. An agent can send commensurate re-
sponses to the aggressor or harasser to mitigate the situation and
disarm the aggressor.
In the future, we can investigate unsupervised grouping of users
of similar harassing preference and training a limited number, one
per group, of user-adapted filters. An orthogonal research direction
would be for adaptive agents to track any changing preference of the
user and accordingly adapt the associated filter. This is particularly
relevant as users change their attitude towards particular categories
of harassment based on their personal life situation and interaction
with new acquaintances.
8
REFERENCES[1] 2016. Troll Hunters: The Twitterbots That Fight
Against Online Abuse. http://cacm.acm.org/news/
205680-troll-hunters-the-twitterbots-that-fight-against-online-abuse/fulltext.
(2016). New Scientist, Communications of the ACM.
[2] Alan Agresti, Matilde Bini, Bruno Bertaccini, and Euijung Ryu. 2008. Simultane-
ous confidence intervals for comparing binomial parameters. Biometrics 64, 4(2008), 1270–1275.
[3] Alana Barton and Hannah Storm. 2014. Violence and harassment against women
in the news media: a global picture. New York: WomenâĂŹs Media Foundationand the International News Safety Institute. Accessed November 6 (2014), 2014.
[4] Anita Bernstein. 2014. Abuse and Harassment Diminish Free Speech. Pace LawReview 35, 1 (2014).
[5] Adam M Bossler, Thomas J Holt, and David C May. 2012. Predicting online
harassment victimization among a juvenile population. Youth & Society 44, 4
(2012), 500–523.
[6] Danah Boyd, Scott Golder, and Gilad Lotan. 2010. Tweet, tweet, retweet: Conver-
sational aspects of retweeting on twitter. In System Sciences (HICSS), 2010 43rdHawaii International Conference on. IEEE, 1–10.
[7] Erin E Buckels, Paul D Trapnell, and Delroy L Paulhus. 2014. Trolls just want to
have fun. Personality and individual Differences 67 (2014), 97–102.[8] Pete Burnap and Matthew L Williams. 2016. Us and them: identifying cyber hate
on Twitter across multiple protected characteristics. EPJ Data Science 5, 1 (2016),11.
[9] Marilyn A Campbell. 2005. Cyber Bullying: An Old Problem in a New Guise?.
Australian journal of Guidance and Counselling 15, 01 (2005), 68–76.
[10] Pew Research Center. 2017. Social Media Usage: 2005-2017. http://www.
pewinternet.org/2015/10/08/social-networking-usage-2005-2017/. (2017).
[11] Ying Chen, Yilu Zhou, Sencun Zhu, and Heng Xu. 2012. Detecting offensive
language in social media to protect adolescent online safety. In Privacy, Security,Risk and Trust (PASSAT), 2012 International Conference on and 2012 InternationalConfernece on Social Computing (SocialCom). IEEE, 71–80.
[12] Kevin Coe, Kate Kenski, and Stephen A Rains. 2014. Online and uncivil? Patterns
and determinants of incivility in newspaper website comments. Journal ofCommunication 64, 4 (2014), 658–679.
[13] Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavl-
jevic, and Narayan Bhamidipati. 2015. Hate speech detection with comment
embeddings. In Proceedings of the 24th International Conference on World WideWeb. ACM, 29–30.
[14] Maeve Duggan. 2014. Online harassment. Pew Research Center.
[15] R Stuart Geiger. 2016. Bot-based collective blocklists in Twitter: the counter-
public moderation of harassment in a networked public space. Information,Communication & Society 19, 6 (2016), 787–803.
[16] Romulus Gidro, Aurelia Gidro, et al. 2016. Aspects Concerning Sexual And Moral
Harassment In TheWorkplace. Curentul Juridic, The Juridical Current, Le CourantJuridique 64 (2016), 65–73.
[17] Marti A. Hearst, Susan T Dumais, Edgar Osuna, John Platt, and Bernhard
Scholkopf. 1998. Support vector machines. IEEE Intelligent Systems and theirapplications 13, 4 (1998), 18–28.
[18] Sameer Hinduja and Justin W Patchin. 2010. Bullying, cyberbullying, and suicide.
Archives of suicide research 14, 3 (2010), 206–221.
[19] Andreas M Kaplan and Michael Haenlein. 2010. Users of the world, unite! The
challenges and opportunities of Social Media. Business horizons 53, 1 (2010),
59–68.
[20] Chei Sian Lee and Long Ma. 2012. News sharing in social media: The effect of
gratifications and prior experience. Computers in Human Behavior 28, 2 (2012),331–339.
[21] Andy Liaw, Matthew Wiener, et al. 2002. Classification and regression by ran-
domForest. R news 2, 3 (2002), 18–22.[22] Andrew McCallum, Kamal Nigam, et al. 1998. A comparison of event models
for naive bayes text classification. In AAAI-98 workshop on learning for textcategorization, Vol. 752. Citeseer, 41–48.
[23] Neil Middleton and Richard Schneeman. 2013. Heroku: Up and Running: EffortlessApplication Deployment and Scaling. " O’Reilly Media, Inc.".
[24] Alejandro Mosquera, Lamine Aouad, Slawomir Grzonkowski, and Dylan Morss.
2014. On Detecting Messaging Abuse in Short Text Messages using Linguistic
and Behavioral patterns. arXiv preprint arXiv:1408.3934 (2014).[25] Elana Newman, Susan Drevo, Bradley Brummel, Gavin Rees, and Bruce Shapiro.
2016. Online abuse of women journalists: Towards an EvidenceâĂŘbased Ap-
proach to Prevention and Intervention. (2016), 46–52.
[26] Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang.
2016. Abusive Language Detection in Online User Content. In Proceedings of the25th International Conference on World Wide Web. International World Wide Web
Conferences Steering Committee, 145–153.
[27] Gwenn Schurgin O’Keeffe, Kathleen Clarke-Pearson, et al. 2011. The impact
of social media on children, adolescents, and families. Pediatrics 127, 4 (2011),800–804.
[28] Thomas E Page, Afroditi Pina, and Roger Giner-Sorolla. 2015. "It Was Only
Harmless Banter!" The development and preliminary validation of the moral
disengagement in sexual harassment scale. Aggressive behavior (2015).[29] Jay P Paul, Joseph Catania, Lance Pollack, Judith Moskowitz, Jesse Canchola,
Thomas Mills, Diane Binson, and Ron Stall. 2002. Suicide attempts among gay
and bisexual men: lifetime prevalence and antecedents. American journal of publichealth 92, 8 (2002), 1338–1345.
[30] Sandip Sen, Zenefa Rahaman, Chad Crawford, and Osman Yücel. 2018. Agents
for Social (Media) Change. In Proceedings of the 17th International Conferenceon Autonomous Agents and MultiAgent Systems. International Foundation for
Autonomous Agents and Multiagent Systems, 1198–1202.
[31] Lindsay H Shaw and Larry M Gant. 2002. In defense of the Internet: The relation-
ship between Internet communication and depression, loneliness, self-esteem,
and perceived social support. Cyberpsychology & behavior 5, 2 (2002), 157–171.[32] Sara Owsley Sood, Judd Antin, and Elizabeth F Churchill. 2012. Using Crowd-
sourcing to Improve Profanity Detection.. In AAAI Spring Symposium: Wisdomof the Crowd.
[33] Frithjof Staude-Müller, Britta Hansen, and Melanie Voss. 2012. How stressful is
online victimization? Effects of victim’s personality and properties of the incident.
European Journal of Developmental Psychology 9, 2 (2012), 260–274.
[34] Amazon Mechanical Turk. 2012. Amazon mechanical turk. Retrieved August 17(2012), 2012.
[35] William Warner and Julia Hirschberg. 2012. Detecting hate speech on the world
wide web. In Proceedings of the Second Workshop on Language in Social Media.Association for Computational Linguistics, 19–26.
[36] Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predic-
tive features for hate speech detection on twitter. In Proceedings of NAACL-HLT.88–93.
[37] Nancy Willard. 2006. Cyberbullying and cyberthreats. Eugene, OR: Center forSafe and Responsible Internet Use (2006).
[38] RF Woolson. 2007. Wilcoxon signed-rank test. Wiley encyclopedia of clinicaltrials (2007), 1–3.
[39] Michele L Ybarra. 2004. Linkages between depressive symptomatology and
Internet harassment among young regular Internet users. CyberPsychology &Behavior 7, 2 (2004), 247–257.
[40] Dawei Yin, Zhenzhen Xue, Liangjie Hong, Brian D Davison, April Kontostathis,
and Lynne Edwards. 2009. Detection of harassment on web 2.0. Proceedings ofthe Content Analysis in the WEB 2 (2009), 1–7.
9