learning to decipher hate symbols - arxiv · derstanding of hate symbols (e.g., 14 words, kigy) and...

10
Learning to Decipher Hate Symbols Jing Qian, Mai ElSherief, Elizabeth Belding, William Yang Wang Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 USA {jing qian,mayelsherif,ebelding,william}@cs.ucsb.edu Abstract Existing computational models to understand hate speech typically frame the problem as a simple classification task, bypassing the un- derstanding of hate symbols (e.g., 14 words, kigy) and their secret connotations. In this pa- per, we propose a novel task of deciphering hate symbols. To do this, we leverage the Ur- ban Dictionary and collected a new, symbol- rich Twitter corpus of hate speech. We investi- gate neural network latent context models for deciphering hate symbols. More specifically, we study Sequence-to-Sequence models and show how they are able to crack the ciphers based on context. Furthermore, we propose a novel Variational Decipher and show how it can generalize better to unseen hate symbols in a more challenging testing setting. 1 Introduction The statistics are sobering. The Federal Bureau of Investigation of United States 1 reported over 6,000 criminal incidents motivated by bias against race, ethnicity, ancestry, religion, sexual orienta- tion, disability, gender, and gender identity dur- ing 2016. The most recent 2016 report shows an alarming 4.6% increase, compared with 2015 data 2 . In addition to these reported cases, thou- sands of Internet users, including celebrities, are forced out of social media due to abuse, hate speech, cyberbullying, and online threats. While such social media data is abundantly available, the broad question we are asking is—What can ma- chine learning and natural language processing do to help and prevent online hate speech? The vast quantity of hate speech on social me- dia can be analyzed to study online abuse. In 1 https://www.fbi.gov/news/stories/2016-hate-crime- statistics 2 https://www.fbi.gov/news/stories/2015-hate-crime- statistics-released Figure 1: An example tweet with hate symbols. recent years, there has been a growing trend of developing computational models of hate speech. However, the majority of the prior studies focus solely on modeling hate speech as a binary or multiclass classification task (Djuric et al., 2015; Waseem and Hovy, 2016; Burnap and Williams, 2016; Wulczyn et al., 2017; Pavlopoulos et al., 2017). While developing new features for hate speech detection certainly has merits, we believe that un- derstanding hate speech requires us to design com- putational models that can decipher hate sym- bols that are commonly used by hate groups. Figure 1 shows an example usage of hate sym- bols in an otherwise seemingly harmless tweet that promotes hate. For example, Aryan War- rior is a longstanding racist prison gang based in the Nevada prison system. WPWW is the acronym for White Pride World Wide. The hate symbols 1488 and 2316 are more implicit. 14 symbolizes the 14 words: “WE MUST SECURE THE EXISTENCE OF OUR PEOPLE AND A FUTURE FOR WHITE CHILDREN”, spoken by members of the Order neo-Nazi movement. H is the 8th letter of the alphabet, so 88=HH=Heil Hitler. Similarly, W is the 23rd and P is the 16th letter of the alphabet, so 2316=WP=White Power. arXiv:1904.02418v1 [cs.CL] 4 Apr 2019

Upload: others

Post on 25-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning to Decipher Hate Symbols - arXiv · derstanding of hate symbols (e.g., 14 words, kigy) and their secret connotations. In this pa-per, we propose a novel task of deciphering

Learning to Decipher Hate Symbols

Jing Qian, Mai ElSherief, Elizabeth Belding, William Yang WangDepartment of Computer Science

University of California, Santa BarbaraSanta Barbara, CA 93106 USA

{jing qian,mayelsherif,ebelding,william}@cs.ucsb.edu

Abstract

Existing computational models to understandhate speech typically frame the problem as asimple classification task, bypassing the un-derstanding of hate symbols (e.g., 14 words,kigy) and their secret connotations. In this pa-per, we propose a novel task of decipheringhate symbols. To do this, we leverage the Ur-ban Dictionary and collected a new, symbol-rich Twitter corpus of hate speech. We investi-gate neural network latent context models fordeciphering hate symbols. More specifically,we study Sequence-to-Sequence models andshow how they are able to crack the ciphersbased on context. Furthermore, we proposea novel Variational Decipher and show how itcan generalize better to unseen hate symbolsin a more challenging testing setting.

1 Introduction

The statistics are sobering. The Federal Bureauof Investigation of United States1 reported over6,000 criminal incidents motivated by bias againstrace, ethnicity, ancestry, religion, sexual orienta-tion, disability, gender, and gender identity dur-ing 2016. The most recent 2016 report showsan alarming 4.6% increase, compared with 2015data2. In addition to these reported cases, thou-sands of Internet users, including celebrities, areforced out of social media due to abuse, hatespeech, cyberbullying, and online threats. Whilesuch social media data is abundantly available, thebroad question we are asking is—What can ma-chine learning and natural language processing doto help and prevent online hate speech?

The vast quantity of hate speech on social me-dia can be analyzed to study online abuse. In

1https://www.fbi.gov/news/stories/2016-hate-crime-statistics

2https://www.fbi.gov/news/stories/2015-hate-crime-statistics-released

Figure 1: An example tweet with hate symbols.

recent years, there has been a growing trend ofdeveloping computational models of hate speech.However, the majority of the prior studies focussolely on modeling hate speech as a binary ormulticlass classification task (Djuric et al., 2015;Waseem and Hovy, 2016; Burnap and Williams,2016; Wulczyn et al., 2017; Pavlopoulos et al.,2017).

While developing new features for hate speechdetection certainly has merits, we believe that un-derstanding hate speech requires us to design com-putational models that can decipher hate sym-bols that are commonly used by hate groups.Figure 1 shows an example usage of hate sym-bols in an otherwise seemingly harmless tweetthat promotes hate. For example, Aryan War-rior is a longstanding racist prison gang basedin the Nevada prison system. WPWW is theacronym for White Pride World Wide. The hatesymbols 1488 and 2316 are more implicit. 14symbolizes the 14 words: “WE MUST SECURETHE EXISTENCE OF OUR PEOPLE AND AFUTURE FOR WHITE CHILDREN”, spoken bymembers of the Order neo-Nazi movement. His the 8th letter of the alphabet, so 88=HH=HeilHitler. Similarly, W is the 23rd and P is the16th letter of the alphabet, so 2316=WP=WhitePower.

arX

iv:1

904.

0241

8v1

[cs

.CL

] 4

Apr

201

9

Page 2: Learning to Decipher Hate Symbols - arXiv · derstanding of hate symbols (e.g., 14 words, kigy) and their secret connotations. In this pa-per, we propose a novel task of deciphering

In this work, we propose the first models for de-ciphering hate symbols. We investigate two fami-lies of neural network approaches: the Sequence-to-Sequence models (Sutskever et al., 2014; Choet al., 2014) and a novel Variational Decipherbased on the Conditional Variational Autoen-coders (Kingma and Welling, 2014; Sohn et al.,2015; Larsen et al., 2016). We show how theseneural network models are able to guess the mean-ing of hate symbols based on context embeddingsand even generalize to unseen hate symbols duringtesting. Our contributions are three-fold:

• We propose a novel task of learning to deci-pher hate symbols, which moves beyond thestandard formulation of hate speech classifi-cation settings.

• We introduce a new, symbol-rich tweetdataset for developing computational modelsof hate speech analysis, leveraging the UrbanDictionary.

• We investigate a sequence-to-sequence neu-ral network model and show how it is able toencode context and crack the hate symbols.We also introduce a novel Variational Deci-pher, which generalizes better in a more chal-lenging setting.

In the next section, we outline related work in textnormalization, machine translation, conditionalvariational autoencoders, and hate speech analy-sis. In Section 3, we introduce our new dataset fordeciphering hate speech. Next, in Section 4, wedescribe the design of two neural network mod-els for the decipherment problem. Quantitativeand qualitative experimental results are presentedin Section 5. Finally, we conclude in Section 6.

2 Related Work

2.1 Text Normalization in Social Media

The proposed task is related to text normal-ization focusing on the problems presented byuser-generated content in online sources, such asmisspelling, rapidly changing out-of-vocabularyslang, short-forms and acronyms, punctuation er-rors or omissions, etc. These problems usuallyappear as out-of-vocabulary words. Extensive re-search has focused on this task (Beaufort et al.,2010; Liu et al., 2011; Gouws et al., 2011; Hanand Baldwin, 2011; Han et al., 2012; Liu et al.,

2012; Chrupała, 2014). However, our task is dif-ferent from the general text normalization in so-cial media in that instead of the out-of-vocabularywords, we focus on the symbols conveying hatefulmeaning. These hate symbols can go beyond lex-ical variants of the vocabulary and thus are morechallenging to understand.

2.2 Machine TranslationAn extensive body of work has been dedi-cated to machine translation. Knight et al. (2006)study a number of natural language deciphermentproblems using unsupervised learning. Ravi andKnight (2011) further frame the task of machinetranslation as decipherment and tackle it withoutparallel training data. Machine translation usingdeep learning (Neural Machine Translation) hasbeen proposed in recent years. Sutskever et al.(2014) and Cho et al. (2014) use Sequence to Se-quence (Seq2Seq) learning with Recurrent Neu-ral Networks (RNN). Bahdanau et al. (2015) fur-ther improve translation performance using theattention mechanism. Google’s Neural MachineTranslation System (GNMT) employs a deep at-tentional LSTM network with residual connec-tions (Wu et al., 2016). Recently, machine transla-tion techniques have been also applied to explainnon-standard English expressions (Ni and Wang,2017). However, our deciphering task is not thesame as machine translation in that hate symbolsare short and cannot be modeled as language.

Our task is more closely related to (Hill et al.,2016) and (Noraset et al., 2017). Hill et al. (2016)propose using neural language embedding mod-els to map the dictionary definitions to the wordrepresentations, which is the inverse of our task.Noraset et al. (2017) propose the definition mod-eling task. However, in their task, for each wordto be defined, its pre-trained word embedding isrequired as an input, which is actually the priorknowledge of the words. However, such kind ofprior knowledge is not available in our decipher-ment task. Therefore, our task is more challengingand is not simply a definition modeling task.

2.3 Conditional Variational AutoencoderUnlike the original Seq2Seq model that directlyencodes the input into a latent space, the Varia-tional Autoencoder (VAE) (Kingma and Welling,2014) approximates the underlying probabilitydistribution of data. VAE has shown promisein multiple generation tasks, such as handwritten

Page 3: Learning to Decipher Hate Symbols - arXiv · derstanding of hate symbols (e.g., 14 words, kigy) and their secret connotations. In this pa-per, we propose a novel task of deciphering

digits (Kingma and Welling, 2014; Salimans et al.,2015), faces (Kingma and Welling, 2014; Rezendeet al., 2014), and machine translation (Zhanget al., 2016). Conditional Variational Autoen-coder (Larsen et al., 2016; Sohn et al., 2015) ex-tends the original VAE framework by incorporat-ing conditions during generation. In addition toimage generation, CVAE has been successfullyapplied to some NLP tasks. For example, Zhaoet al. (2017) apply CVAE to dialog generation,while Guu et al. (2018) use CVAE for sentencegeneration.

2.4 Hate Speech Analysis

Closely related to our work are Pavlopoulos et al.(2017); Gao et al. (2017). Pavlopoulos et al.(2017) build an RNN supplemented by an at-tention mechanism that outperforms the previousstate of the art system in user comment moder-ation (Wulczyn et al., 2017). Gao et al. (2017)propose a weakly-supervised approach that jointlytrains a slur learner and a hate speech classifier.While their work contributes to the automation ofharmful content detection and the highlighting ofsuspicious words, our work builds upon these con-tributions by providing a learning mechanism thatdeciphers suspicious hate symbols used by com-munities of hate to bypass automated content mod-eration systems.

3 Dataset

In this section, we describe the dataset we col-lected for hate symbol decipherment.

3.1 Hate Symbols

We first collect hate symbols and the correspond-ing definitions from the Urban Dictionary. Eachterm with one of the following hashtags: #hate,#racism, #racist, #sexism, #sexist, #nazi is se-lected as a candidate and added to the set S0. Wecollected a total of 1,590 terms. Next, we expandthis set by different surface forms using the UrbanDictionary API. For each term si in set S0, we ob-tain a set of terms Ri that have the same mean-ing as si but with different surface forms. Forexample, for the term brown shirt, there are fourterms with different surface forms: brown shirt,brown shirts, Brownshirts, brownshirt. Each termin Ri has its own definition in Urban Dictionary,but since these terms have exactly the same mean-ing, we select a definition di with maximum up-

vote/downvote ratio for all the terms in Ri. Forexample, for each term in the setRi={brown shirt,brown shirts, Brownshirts, brownshirt}, the corre-sponding definition is “Soldiers in Hitler’s stormtrooper army, SA during the Nazi regime...” Afterexpanding, we obtain 2,105 distinct hate symbolterms and their corresponding definitions. On av-erage, each symbol consists of 9.9 characters, 1.5words. Each definition consists of 96.8 characters,17.0 words.

3.2 Tweet CollectionFor each of the hate symbols, we collect all tweetsfrom 2011-01-01 to 2017-12-31 that contain ex-actly the same surface form of hate symbol in thetext. Since we only focus on hate speech, wetrain an SVM (Cortes and Vapnik, 1995) classi-fier to filter the collected tweets. The SVM modelis trained on the dataset published by Waseemand Hovy (2016). Their original dataset containsthree labels: Sexism, Racism, and None. Since theSVM model is used to filter the non-hate speech,we merge the instances labeled as sexism andracism, then train the SVM model to do binaryclassification. After filtering out all the tweetsclassified as non-hate, our final dataset consists of18,667 (tweet, hate symbol, definition) tuples.

4 Our Approach

We formulate hate symbol deciphering as the fol-lowing equation:

Obj =∑

(u,s,d∗)∈X

log p(d∗|(u, s)) (1)

X is the dataset, (u, s, d∗) is the (tweet, symbol,definition) tuple in the dataset. The inputs are thetweet and the hate symbol in this tweet. The out-put is the definition of the symbol. Our objective isto maximize the probability of the definition giventhe (tweet, symbol) pair. This objective functionis very similar to that of machine translation. Sowe first try to tackle it based on the Sequence-to-Sequence model, which is commonly used in ma-chine translation.

4.1 Sequence-to-Sequence ModelWe implement an RNN Encoder-Decoder modelwith attention mechanism based on Bahdanauet al. (2015). We use GRU (Cho et al., 2014) fordecoding. However, instead of also using GRUfor encoding, we found that LSTM (Hochreiter

Page 4: Learning to Decipher Hate Symbols - arXiv · derstanding of hate symbols (e.g., 14 words, kigy) and their secret connotations. In this pa-per, we propose a novel task of deciphering

Figure 2: Our Seq2Seq model. u, sw are the word em-beddings of the tweet text and hate symbol. sc is thecharacter embedding of the symbol. cu is the encodedtweet and h is the concatenated hidden states. d is thegenerated text. Detailed explanation is in section 4.1.

and Schmidhuber, 1997) performs better on ourtask. Therefore, our Seq2Seq model uses LSTMencoders and GRU decoders. An overview of ourSeq2Seq model is shown in Figure 2. The compu-tation process is shown as the following equations:

cu, hu = fu(u) (2)

csw, hsw = fsw(sw) (3)

csc, hsc = fsc(sc) (4)

u is the word embedding of the tweet text, sw isthe word embedding of the hate symbol, sc is thecharacter embedding of the symbol. fu, fsw, andfsc are LSTM functions. cu, csw, csc are the out-puts of the LSTMs at the last time step and hu,hsw, hsc are the hidden states of the LSTMs atall time steps. We use two RNN encoders to en-code the symbol, one encodes at the word leveland the other one encodes at the character level.The character-level encoded hate symbol is used toprovide the feature of the surface form of the hatesymbol while the word-level encoded hate symbolis used to provide the semantic information of thehate symbol. The hidden states of the two RNNencoders for hate symbols are concatenated:

h = hsw ⊕ hsc (5)

cu is the vector of encoded tweet text. The tweettext is the context of the hate symbol, whichprovides additional information during decoding.Therefore, the encoded tweet text it is also fed into

the RNN decoder. The detailed attention mecha-nism and decoding process at time step t are asfollows:

wt = σ(lw(dt−1 ⊕ et−1)) (6)

at =T∑i=1

wtihi (7)

bt = σ(lc(dt−1 ⊕ at)) (8)

ot, et = k(cu ⊕ bt, et−1) (9)

p(dt|u, s) = σ(lo(ot)) (10)

wt is the attention weights at time step t and wtiis the ith weight of wt. dt−1 is the generated wordat last time step and et−1 is the hidden state of thedecoder at last time step. hi is the ith time stepsegment of h. lw, lc, and lo are linear functions.σ is a nonlinear activation function. k is the GRUfunction. ot is the output and et is the hidden stateof the GRU. p(dt|u, s) is the probability distribu-tion of the vocabulary at time step t. The attentionweights wt are computed based on the decoder’shidden state and the generated word at time stept− 1. Then the computed weights are applied tothe concatenated hidden states h of encoders. Theresult at is the context vector for the decoder attime step t. The context vector and the last gen-erated word are combined by a linear function lcfollowed by a nonlinear activation function. Theresult bt is concatenated with the encoded tweetcontext cu, and then fed into GRU together withthe decoder’s last hidden state et−1. Finally, theprobability of each vocabulary word is computedfrom ot.

4.2 Variational DecipherThe Variational Decipher is based on the CVAEmodel, which is another model that can beused to parametrize the conditional probabilityp(d∗|(u, s)) in the objective function (Equation1). Unlike the Seq2Seq model, which directlyparametrizes p(d∗|(u, s)), our variational decipherformulates the task as follows:

Obj =∑

(u,s,d∗)∈X

log p(d∗|(u, s))

=∑

(u,s,d∗)∈X

log

∫zp(d∗|z)p(z|(u, s))dz

(11)

where z is the latent variable. p(d∗|(u, s) is writ-ten as the marginalization of the product of twoterms over the latent space. Since the integration

Page 5: Learning to Decipher Hate Symbols - arXiv · derstanding of hate symbols (e.g., 14 words, kigy) and their secret connotations. In this pa-per, we propose a novel task of deciphering

Figure 3: The Variational Decipher. Note that thisstructure is used during training. During testing, thestructure is slightly different. d∗ is the word embed-dings of the definition. x is the encoded definition.c is the concatenation of the encoded tweet and hatesymbol. p and p′ are output distributions. z is the la-tent variable. The definitions of other variables are thesame as those in Figure 2. Detailed explanation is insection 4.2.

over z is intractable, we instead try to maximizethe evidence lower bound (ELBO). Our variationallower bound objective is in the following form:

Obj =E[log pϕ(d∗|z, u, s)]−DKL[pα(z|d∗, u, s)||pβ(z|u, s)]

(12)

where pϕ(d∗|z, u, s) is the likelihood,pα(z|d∗, u, s) is the posterior, pβ(z|u, s) isthe prior, and DKL is the Kullback-Leibler (KL)divergence. We use three neural networks tomodel these three probability distributions. Anoverview of our variational decipher is shownin Figure 3. We first use four recurrent neuralnetworks to encode the (tweet, symbol, definition)pair in the dataset. Similar to what we do in theSeq2Seq model, there are two encoders for thehate symbol. One is at the word level and the otheris at the character level. The encoding of symbolsand tweets are exactly the same as in our Seq2Seqmodel (see Equations 2-4). The difference isthat we also need to encode definitions for theVariational Decipher.

x, hd = fd(d∗) (13)

Here, fd is the LSTM function. x is the output ofthe LSTM at the last time step and hd is the hiddenstate of the LSTM at all time steps. The conditionvector c is the concatenation of the encoded sym-bol words, symbol characters, and the tweet text:

c = cu ⊕ csw ⊕ csc (14)

We use multi-layer perceptron (MLP) to modelthe posterior and the prior in the objective func-tion. The posterior network and the prior networkhave the same structure and both output a prob-ability distribution of latent variable z. The onlydifference is that the input of the posterior net-work is the concatenation of the encoded defini-tion x and the condition vector c while the inputof the prior network is only the condition vectorc. Therefore, the output of the posterior networkp = pα(z|d∗, u, s) and the output of the prior net-work p′ = pβ(z|u, s). By assuming the latent vari-able z has a multivariate Gaussian distribution, theactual outputs of the posterior and prior networksare the mean and variance: (µ, Σ) for the posteriornetwork and (µ′, Σ′) for the prior network.

µ,Σ = g(x⊕ c) (15)

µ′,Σ′ = g′(c) (16)

g is the MLP function of the posterior network andg′ is that of the prior network. During training,the latent variable z is randomly sampled fromthe Gaussian distribution N (µ,Σ) and fed intothe likelihood network. During testing, the pos-terior network is replaced by the prior network,so z is sampled from N (µ′,Σ′). The likelihoodnetwork is modeled by an RNN decoder with at-tention mechanism, very similar to the decoder ofour Seq2Seq model. The only difference lies inthe input for the GRU. The decoder in our Varia-tional Decipher model is to model the likelihoodpϕ(d∗|z, u, s), which is conditioned on the latentvariable, tweet context, and the symbol. There-fore, for the Variational Decipher, the conditionvector c and the sampled latent variable z are fedinto the decoder.

ot, et = k(z ⊕ c⊕ bt, et−1) (17)

et−1 is the hidden state of the RNN decoder at thelast time step. k is the GRU function. ot is itsoutput and et is its hidden state. Detailed decodingprocess and explanations are in section 4.1.

Page 6: Learning to Decipher Hate Symbols - arXiv · derstanding of hate symbols (e.g., 14 words, kigy) and their secret connotations. In this pa-per, we propose a novel task of deciphering

According to the objective function in Equation12, the loss function of the Variational Decipher isas follows:

L =LREC + LKL

=Ez∼pα(z|d∗,u,s)[− log pϕ(d∗|z, u, s)]+DKL[pα(z|d∗, u, s)||pβ(z|u, s)]

(18)

It consists of two parts. The first part LRECis called reconstruction loss. Optimizing LRECcan push the sentences generated by the posteriornetwork and the likelihood network closer to thegiven definitions. The second part LKL is the KLdivergence loss. Optimizing this loss can pushthe output Gaussian Distributions of the prior net-work closer to that of the posterior network. Thismeans we teach the prior network to learn thesame knowledge learned by the posterior network,such that during testing time, when the referen-tial definition d∗ is no longer available for gener-ating the latent variable z, the prior network canstill output a reasonable probability distributionover the latent variable z. The complete trainingand testing process for the Variational Decipher isshown in Algorithm 1. M is the predefined maxi-mum length of the generated text. BCE refers tothe Binary Cross Entropy loss.

5 Experiments

5.1 Experimental Settings

We use the dataset collected as described in sec-tion 3 for training and testing. We randomly se-lected 2,440 tuples for testing and use the remain-ing 16,227 tuples for training. Note that there areno overlapping hate symbols between the trainingdataset U and the testing dataset D.

We split the 2,440 tuples of the testing datasetDinto two separate parts,Ds andDd. Ds consists of1,681 examples and Dd consists of 759 examples.In the first testing dataset Ds, although each hatesymbol does not appear in the training dataset, thecorresponding definition appears in the trainingdataset. In the second testing dataset Dd, neitherthe hate symbols nor the corresponding definitionsappear in the training dataset. We do this split be-cause deciphering hate symbols in these two caseshas different levels of difficulty.

This split criterion means that for each hatesymbol in Ds, there exists some symbol in thetraining dataset that has the same meaning but indifferent surface forms. For example, the hate

Algorithm 1 Train & Test Variational Decipher1: function TRAIN(U )2: randomly initialize network parameters ϕ, α, β;3: for epoch = 1, E do4: for (tweet, symbol, definition) in U do5: get embeddings u, sw, sc, d∗;6: compute x, c and h with RNN encoders;7: compute µ, Σ with the posterior network;8: compute µ′, Σ′ with the prior network;9: compute KL-divergence loss LKL;

10: sample z = reparameterize(µ,Σ);11: initialize the decoder state e0 = c;12: LREC = 0;13: for t = 1,M do14: compute attention weights wt;15: compute ot, et and p(dt|z, u, s);16: dt = indmax(p(dt|z, u, s));17: LREC+ = BCE(dt, d

∗t );

18: if dt==EOS then19: break;20: end if21: end for22: update ϕ, α, β on L = LREC + LKL;23: end for24: end for25: end function26:27: function TEST(V )28: for (tweet, symbol, definition) in V do29: get embeddings u, sw, sc;30: compute c and h with RNN encoders;31: compute µ′, Σ′ with the prior network;32: sample z = reparameterize(µ′,Σ′);33: initialize the decoder state e0 = c;34: for t = 1,M do35: compute attention weights w;36: compute ot, et and p(dt|z, u, s);37: dt = indmax(p(dt|z, u, s));38: if dt==EOS then39: break;40: end if41: end for42: end for43: end function

symbol wigwog and Wig Wog have the same def-inition but one is in the training dataset, the otheris in the first testing dataset. We assume that suchtypes of hate symbols share similar surface formsor similar tweet contexts. Therefore, the first test-ing dataset Ds is to evaluate how well the modelcaptures the semantic similarities among the tweetcontexts in different examples or the similaritiesamong different surface forms of a hate symbol.

Deciphering the hate symbols in the second test-ing dataset Dd is more challenging. Both theunseen hate symbols and definitions require themodel to have the ability to accurately capture thesemantic information in the tweet context and thenmake a reasonable prediction. The second testingdataset Dd is used to evaluate how well the modelgeneralizes to completely new hate symbols.

Page 7: Learning to Decipher Hate Symbols - arXiv · derstanding of hate symbols (e.g., 14 words, kigy) and their secret connotations. In this pa-per, we propose a novel task of deciphering

Dataset Method BLEU ROUGE-L METEOR

DsSeq2Seq 37.80 41.05 36.67

VD 34.77 32.96 31.03

DdSeq2Seq 25.44 12.96 5.54

VD 28.38 14.01 5.41

DSeq2Seq 33.96 32.32 26.98

VD 32.75 27.00 23.16

Table 1: The BLEU, ROUGE-L and METEOR scoreson testing datasets. VD refers to the Variational Deci-pher. D is the entire testing dataset. Ds is the first partof D and Dd is the second part. The better results arein bold.

For the Seq2Seq model, we use negative log-likelihood loss for training. Both models are op-timized using Adam optimizer (Kingma and Ba,2015). The hyper-parameters of two models areexactly the same. We set the maximum generationlength M = 50. The hidden size of the encodersis 64. The size of the word embedding is 200 andthat of character embedding is 100. The word em-beddings and character embeddings are randomlyinitialized. Each model is trained for 50 epochs.We report the deciphering results of two modelson three testing datasets D, Ds and Dd.

5.2 Experimental Results

Quantitative Results: We use equally weightedBLEU score for up to 4-grams (Papineni et al.,2002), ROUGE-L (Lin, 2004) and METEOR(Banerjee and Lavie, 2005) to evaluate the deci-pherment results. The results are shown in Ta-ble 1. Figure 4 shows the BLEU score achievedby the two models on three testing datasets D,Ds and Dd during the training process. Both ourSeq2Seq model and Variational Decipher achievereasonable BLEU scores on the testing datasets.The Seq2Seq model outperforms the VariationalDecipher on Ds while Variational Decipher out-performs Seq2Seq on Dd. Note that Ds is morethan twice the size ofDd. Therefore, Seq2Seq out-performs Variational Decipher on the entire testingdataset D. The different performance of the twomodels onDs andDd is more obvious in Figure 4.The gap between the performance of the Seq2Seqmodel on Ds and Dd is much larger than that be-tween the performance of the Variational Decipheron these two datasets.

Human Evaluation: We employed crowd-sourced workers to evaluate the deciphering re-sults of two models. We randomly sampled 100items of deciphering results from Ds and another

Figure 4: BLEU scores of two models on the testingdataset D, Ds and Dd. The three dotted curves rep-resent the performance of the Seq2Seq model whilethe three solid curves represent the performance of theVariational Decipher.

Dataset Seq2Seq Lose Seq2Seq Win TieDs 31.0% 32.0% 37.0%Dd 30.5% 22.0% 47.5%

Table 2: The results of human evaluation on two sepa-rate testing datasets Ds and Dd.

100 items from Dd. Each item composes a choicequestion and each choice question is assigned tofive workers on Amazon Mechanical Turk. Ineach choice question, the workers are given thehate symbol, the referential definition, the origi-nal tweet and two machine-generated plain textsfrom the Seq2Seq model and Variational Decipher.Workers are asked to select the more reasonableof the two results. In each choice question, theorder of the results from the two models is per-muted. Ties are permitted for answers. We batchfive items in one assignment and insert an artificialitem with two identical outputs as a sanity check.The workers who fail to choose “tie” for that itemare rejected from our test. The human evaluationresults are shown in Table 2, which coincide withthe results in Table 1 and Figure 4.

Discussion: When deciphering the hate symbolsthat have the same definitions as in the trainingdataset, the model can rely more on the surfaceforms of hate symbols than the tweet context tomake a prediction because usually the hate sym-bols that share the same definitions also have sim-ilar surface forms. However, when it comes to thehate symbols with unseen definitions, simply re-lying on the surface forms cannot lead to a rea-sonable deciphering result. Instead, the modelshould learn the relationships between the con-

Page 8: Learning to Decipher Hate Symbols - arXiv · derstanding of hate symbols (e.g., 14 words, kigy) and their secret connotations. In this pa-per, we propose a novel task of deciphering

Figure 5: Some example errors in the generated results of our Seq2Seq model and Variational Decipher.

text information and the definition of the symbol.Therefore, the different performances of two mod-els on the two testing datasets Ds and Dd indi-cate that the Seq2Seq model is better at captur-ing the similarities among different surface formsof a hate symbol, while the Variational Decipheris better at capturing the semantic relationship be-tween the tweet context and the hate symbol. TheSequence-to-Sequence model tries to capture suchkinds of relationships by compressing all the con-text information into a fixed length vector, so itsdeciphering strategy is actually behavior cloning.On the other hand, the Variational Decipher cap-tures such relationships by explicitly modeling theposterior and likelihood distributions. The mod-eled distributions provide higher-level semanticinformation compared to the compressed context,which allows the Variational Decipher to general-ize better to the symbols with unseen definitions.This explains why the gap between the perfor-mance of the Seq2Seq model on two datasets islarger.

5.3 Error Analysis

Figure 5 shows some example errors of the deci-phering results of our Seq2Seq model and Varia-tional Decipher. One problem with the decipher-ing results is that the generated sentences havepoor grammatical structure, as shown in Figure 5.This is mainly because the size of our dataset issmall, and the models need a much larger corpusto learn the grammar. We anticipate that the gener-ation performance will be improved with a largerdataset.

For the hate symbols in Ds, the deciphering re-sults are of high quality when the length of refer-ential definitions are relatively short. An exampleis macaca, a French slur shows in Figure 5. Thedeciphering result of the Seq2Seq model is closeto the referential definition. As to the VariationalDecipher, although the result is not literally thesame as the definition, the meaning is close. closethomosexuals in Figure 5 is another example. How-ever, when the length of the referential definitionincreases, the performance of both models tends tobe unsatisfactory, as the third example confederateflag shows in Figure 5. Although there exists thesymbol Confederate Flag with the same definitionin the training set, both models fail on this exam-ple. One possible reason is that the complexityof generating the referential definition grows sub-stantially with the increasing length, so when thetweet context and the symbol itself cannot provideenough information, the generation model cannotlearn the relationship between the symbol and itsdefinition.

Deciphering hate symbols in Dd is much morechallenging. Even for humans, deciphering com-pletely new hate symbols is not a simple task. Thetwo examples in Figure 5 show that the modelshave some ability to capture the semantic simi-larities. For the symbol niggering, the VariationalDecipher generates the word nigger and Seq2Seqmodel generates black. For Heil Hitler, the Varia-tional Decipher generates leader person and Nazi,while Seq2Seq also generates Nazi. Althoughthese generated words are not in the definition,they still make some sense.

Page 9: Learning to Decipher Hate Symbols - arXiv · derstanding of hate symbols (e.g., 14 words, kigy) and their secret connotations. In this pa-per, we propose a novel task of deciphering

6 Conclusion

We propose a new task of learning to decipher hatesymbols and create a symbol-rich tweet dataset.We split the testing dataset into two parts to ana-lyze the characteristics of the Seq2Seq model andthe Variational Decipher. The different perfor-mance of these two models indicates that the mod-els can be applied to different scenarios of hatesymbol deciphering. The Seq2Seq model outper-forms the Variational Decipher for deciphering thehate symbols with similar definitions to that in thetraining dataset. This means the Seq2Seq modelcan better explain the hate symbols when Twitterusers intentionally misspell or abbreviate commonslur terms. On the other hand, the Variational De-cipher tends to be better at deciphering hate sym-bols with unseen definitions, so it can be appliedto explain newly created hate symbols on Twitter.Although both models show promising decipher-ing results, there still exists much room for im-provement.

ReferencesDzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben-

gio. 2015. Neural machine translation by jointlylearning to align and translate. In Proceedings ofthe 3rd International Conference on Learning Rep-resentations.

Satanjeev Banerjee and Alon Lavie. 2005. Meteor:An automatic metric for MT evaluation with im-proved correlation with human judgments. In Pro-ceedings of the ACL workshop on intrinsic and ex-trinsic evaluation measures for machine translationand/or summarization, pages 65–72.

Richard Beaufort, Sophie Roekhaut, Louise-AmelieCougnon, and Cedrick Fairon. 2010. A hybridrule/model-based finite-state framework for normal-izing SMS messages. In Proceedings of the 48th An-nual Meeting of the Association for ComputationalLinguistics, pages 770–779. Association for Com-putational Linguistics.

Pete Burnap and Matthew L Williams. 2016. Us andthem: identifying cyber hate on Twitter across mul-tiple protected characteristics. EPJ Data Science,5(1):11.

Kyunghyun Cho, Bart van Merrienboer, Caglar Gul-cehre, Dzmitry Bahdanau, Fethi Bougares, HolgerSchwenk, and Yoshua Bengio. 2014. Learningphrase representations using RNN encoder–decoderfor statistical machine translation. In Proceedings ofthe 2014 Conference on Empirical Methods in Nat-ural Language Processing (EMNLP), pages 1724–1734.

Grzegorz Chrupała. 2014. Normalizing tweets withedit scripts and recurrent neural embeddings. InProceedings of the 52nd Annual Meeting of the As-sociation for Computational Linguistics (Volume 2:Short Papers), volume 2, pages 680–686.

Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning, 20(3):273–297.

Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Gr-bovic, Vladan Radosavljevic, and Narayan Bhamidi-pati. 2015. Hate speech detection with commentembeddings. In Proceedings of the 24th ACM In-ternational Conference on World Wide Web, pages29–30.

Lei Gao, Alexis Kuppersmith, and Ruihong Huang.2017. Recognizing explicit and implicit hate speechusing a weakly supervised two-path bootstrappingapproach. In Proceedings of the Eighth Interna-tional Joint Conference on Natural Language Pro-cessing, pages 774–782.

Stephan Gouws, Dirk Hovy, and Donald Metzler. 2011.Unsupervised mining of lexical variants from noisytext. In Proceedings of the First workshop on Unsu-pervised Learning in NLP, pages 82–90. Associationfor Computational Linguistics.

Kelvin Guu, Tatsunori B Hashimoto, Yonatan Oren,and Percy Liang. 2018. Generating sentences byediting prototypes. Transactions of the Associationof Computational Linguistics, 6:437–450.

Bo Han and Timothy Baldwin. 2011. Lexical normali-sation of short text messages: Makn sens a# twitter.In Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics: HumanLanguage Technologies-Volume 1, pages 368–378.Association for Computational Linguistics.

Bo Han, Paul Cook, and Timothy Baldwin. 2012. Au-tomatically constructing a normalisation dictionaryfor microblogs. In Proceedings of the 2012 JointConference on Empirical Methods in Natural Lan-guage Processing and Computational Natural Lan-guage Learning, pages 421–432. Association forComputational Linguistics.

Felix Hill, KyungHyun Cho, Anna Korhonen, andYoshua Bengio. 2016. Learning to understandphrases by embedding the dictionary. Transac-tions of the Association of Computational Linguis-tics, 4:17–30.

Sepp Hochreiter and Jurgen Schmidhuber. 1997.Long short-term memory. Neural Computation,9(8):1735–1780.

Diederik P. Kingma and Jimmy Ba. 2015. Adam: Amethod for stochastic optimization. In Proceed-ings of the 3rd International Conference on Learn-ing Representations.

Page 10: Learning to Decipher Hate Symbols - arXiv · derstanding of hate symbols (e.g., 14 words, kigy) and their secret connotations. In this pa-per, we propose a novel task of deciphering

Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the2nd International Conference on Learning Repre-sentations.

Kevin Knight, Anish Nair, Nishit Rathod, and KenjiYamada. 2006. Unsupervised analysis for decipher-ment problems. In Proceedings of the 21st Interna-tional Conference on Computational Linguistics and44th Annual Meeting of the Association for Compu-tational Linguistics, pages 499–506.

Anders Boesen Lindbo Larsen, Søren Kaae Sønderby,Hugo Larochelle, and Ole Winther. 2016. Autoen-coding beyond pixels using a learned similarity met-ric. In Proceedings of the 33nd International Con-ference on Machine Learning, pages 1558–1566.

Chin-Yew Lin. 2004. Rouge: A package for auto-matic evaluation of summaries. Text SummarizationBranches Out.

Fei Liu, Fuliang Weng, and Xiao Jiang. 2012. A broad-coverage normalization system for social media lan-guage. In Proceedings of the 50th Annual Meet-ing of the Association for Computational Linguis-tics: Long Papers-Volume 1, pages 1035–1044. As-sociation for Computational Linguistics.

Fei Liu, Fuliang Weng, Bingqing Wang, and Yang Liu.2011. Insertion, deletion, or substitution?: nor-malizing text messages without pre-categorizationnor supervision. In Proceedings of the 49th An-nual Meeting of the Association for ComputationalLinguistics: Human Language Technologies: shortpapers-Volume 2, pages 71–76. Association forComputational Linguistics.

Ke Ni and William Yang Wang. 2017. Learning to ex-plain non-standard english words and phrases. InProceedings of the Eighth International Joint Con-ference on Natural Language Processing (Volume 2:Short Papers), volume 2, pages 413–417.

Thanapon Noraset, Chen Liang, Larry Birnbaum, andDoug Downey. 2017. Definition modeling: Learn-ing to define word embeddings in natural language.In Thirty-First AAAI Conference on Artificial Intel-ligence.

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automaticevaluation of machine translation. In Proceedingsof the 40th Annual Meeting on Association for Com-putational Linguistics, pages 311–318. Associationfor Computational Linguistics.

John Pavlopoulos, Prodromos Malakasiotis, and IonAndroutsopoulos. 2017. Deeper attention to abu-sive user content moderation. In Proceedings of the2017 Conference on Empirical Methods in NaturalLanguage Processing, pages 1125–1135.

Sujith Ravi and Kevin Knight. 2011. Deciphering for-eign language. In Proceedings of the 49th Annual

Meeting of the Association for Computational Lin-guistics: Human Language Technologies-Volume 1,pages 12–21. Association for Computational Lin-guistics.

Danilo Jimenez Rezende, Shakir Mohamed, and DaanWierstra. 2014. Stochastic backpropagation and ap-proximate inference in deep generative models. InProceedings of the 31st International Conferenceon International Conference on Machine Learning,pages 1278–1286.

Tim Salimans, Diederik Kingma, and Max Welling.2015. Markov chain monte carlo and variational in-ference: Bridging the gap. In Proceedings of the32nd International Conference on Machine Learn-ing, pages 1218–1226.

Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015.Learning structured output representation usingdeep conditional generative models. In Advances inNeural Information Processing Systems 28: AnnualConference on Neural Information Processing Sys-tems 2015, pages 3483–3491.

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014.Sequence to sequence learning with neural net-works. In Advances in Neural Information Process-ing Systems 27: Annual Conference on Neural Infor-mation Processing Systems 2014, pages 3104–3112.

Zeerak Waseem and Dirk Hovy. 2016. Hateful sym-bols or hateful people? predictive features forhate speech detection on Twitter. In Proceed-ings of the Student Research Workshop, SRW@HLT-NAACL 2016, pages 88–93.

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc VLe, Mohammad Norouzi, Wolfgang Macherey,Maxim Krikun, Yuan Cao, Qin Gao, KlausMacherey, et al. 2016. Google’s neural ma-chine translation system: Bridging the gap betweenhuman and machine translation. arXiv preprintarXiv:1609.08144.

Ellery Wulczyn, Nithum Thain, and Lucas Dixon.2017. Ex machina: Personal attacks seen at scale.In Proceedings of the 26th International Conferenceon World Wide Web, pages 1391–1399.

Biao Zhang, Deyi Xiong, Hong Duan, Min Zhang,et al. 2016. Variational neural machine translation.In Proceedings of the 2016 Conference on Empiri-cal Methods in Natural Language Processing, pages521–530.

Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi.2017. Learning discourse-level diversity for neuraldialog models using conditional variational autoen-coders. In Proceedings of the 55th Annual Meet-ing of the Association for Computational Linguistics(Volume 1: Long Papers), volume 1, pages 654–664.