predicting $gme stock price movement using sentiment from

Predicting $GME Stock Price Movement Using Sentiment from Redditr/wallstreetbets

Charlie Wang , Ben LuoDepartment of Computer Science, University of Illinois at Chicago

[email protected], [email protected]

AbstractIn early 2021, GameStop’s (NYSE: GME) stockunderwent a prolonged period of rapid, drastic fluc-tuations in price resulting from a short squeeze or-chestrated by the r/wallstreetbets community onReddit. In this paper, we present our experi-ments and findings for predicting the direction ofGME’s daily price movement using sentiment fromr/wallstreetbets posts. We use VADER for auto-matic sentiment analysis combined with word2vecand BERT semantic representations, and experimentwith a variety of classifier models, including bothtraditional ML and deep learning approaches.

1 IntroductionGameStop (NYSE: GME) is the world’s largest video gameretailer, best known in North America for its ubiquitous brick-and-mortar retail locations. Starting in the late 2000’s, the riseof online retail and digital video game sales contributed to thecompany’s steady decline over the following decade.

However, in late January of 2021, the company’s stockprice experienced a sudden and drastic reversal of fortune asamateur retail investors banded together online to buy GMEshares en masse in a manic rush, driving up the share priceand disrupting the short selling strategies of major Wall Streetplayers. By mid February, GME had risen in price from astarting point of under $20 USD to a pre-market peak of over$500 USD per share [Orland, 2021]. As of this writing, theprice continues to fluctuate wildly across this range.

According to Business Insider (2021), GME was the mostshorted stock on the New York Stock Exchange as of January2021. Short selling is an investment strategy that speculates onthe decline of a stock’s price — i.e., someone who is selling astock short stands to profit when the price of that stock declinesover a certain period of time. A short squeeze occurs when,fearing losses due to upward stock price movement, shortsellers rush to buy shares to cover their losses, thus furtherdriving up demand for the stock and forcing its price upwardeven more. In this case, GME was the target of a short squeezethat appeared to be orchestrated in large part by members ofReddit’s r/wallstreetbets community1 [Orland, 2021].

1https://www.reddit.com/r/wallstreetbets/

The r/wallstreetbets subreddit began as a meme-heavy off-shoot of more serious investment subreddits such as r/stocks,r/investing, or r/pennystocks. Compared to these other com-munities, r/wallstreetbets is distinguished by the high level ofirony that permeates its discourse. On a surface level, membersof this subreddit seem to approach investment with a some-what nihilistic gambling mindset as opposed to the rational,strategic attitude typical of financial contexts. For example,r/wallstreetbets members often post screenshots of their in-vestment portfolio performance showing the disastrous resultsof reckless speculation. In these threads, fellow users bothcommiserate and celebrate over these losses, terming them asloss porn.

Boasting a membership of over 10 million users as of May2021, this subreddit has become a formidable community ofsmall-time individual investors in its own right, earning oc-casional mentions in financial news and investment-relatedmedia even before its starring role in the GME frenzy. Thecommunity’s struggle against hedge funds, investment broker-ages, and even the SEC has helped draw widespread mediacoverage to these events, which often depicts the situation asa David vs. Goliath underdog story [Orland, 2021].

These historic events surrounding GME are, in many ways,unprecedented and highly irregular. Thus, conventional meth-ods for stock price prediction may generalize poorly to thisparticular situation. The principal novel contribution of this pa-per is our approach to data preprocessing, which is specificallyaimed at capturing sentiment information from r/wallstreetbetstext and was developed in response to linguistic peculiaritieswe observed in the data harvested from the subreddit.

Given the unique relationship between r/wallstreetbets andthe GME turmoil, we hypothesize that general sentiment onr/wallstreetbets is correlated with GME price movement. Weattempt to exploit this relationship by predicting the directionof GME’s daily net price movement using sentiment fromposts on r/wallstreetbets.

2 Related WorkSecurities and cryprocurrency market prediction is a popularproblem, which is perhaps not surprising given the obviousfinancial incentive. Prediction using textual sentiment as aninput feature is perhaps less explored than prediction primarilybased on financial data and market indicators. Li et al. (2014)predict price for individual stocks based on sentiment from

22Proceedings of the Third Workshop on Financial Technology and Natural Language Processing

(FinNLP@IJCAI 2021), pages 22-30, Online, August 19, 2021.

financial news posts. Similarly, Prosky et al. (2018) predictstock price movement based on sentiment from news, addi-tionally incorporating event extraction and entity recognitionto enhance sentiment analysis.

However, user-generated text on social media platforms hasbeen shown to differ significantly from standard English, suchas that found in news articles [Baldwin et al., 2013]. Thispresents different challenges to NLP that must be overcomein order to extract useful information from noisy social mediatext.

A number of past efforts predict stock and cryptocurrencyprice movement using sentiment information gathered fromTwitter. Pagolu et al. (2016) find a strong correlation betweenTwitter sentiment and the movement of the Dow Jones Indus-trial Average index. Bing et al. (2014) examine the effectof Twitter sentiment on individual stocks. Kim et al. (2016)predict movement in the value of cryptocurrencies based Twit-ter sentiment, and use statistical methods to demonstrate acausal relationship. Valencia et al. (2019) similarly predictcryptocurrency value using Twitter sentiment. These latter twopapers employ VADER for automatic sentiment analysis, andinclude additional non-linguistic features alongside sentiment,such topic count and reply count. We borrow extensively fromthis methodology.

To our knowledge, little prior work focuses on sentimentextracted from investment-related posts on Reddit in particular.Twitter text has been shown to differ stylistically in severalimportant ways from more long-form online text, such as thatfound in forums like Reddit [Baldwin et al., 2013]. Upon ana-lyzing the collected data, we notice a number of characteristicsof r/wallstreetbets language that present further obstacles tosentiment analysis compared to Twitter language (and evento language on Reddit outside of r/wallstreetbets and relatedsubreddits), which we discuss in detail in Section 3.3. Ourwork most significantly differs from the above prior effortsin the source of our textual sentiment (i.e., r/wallstreetbetsas opposed to news or Twitter) and the innovations in datapreprocessing that are necessitated thereby.

3 Data3.1 r/wallstreetbets postsWe leverage an existing corpus of r/wallstreetbets posts avail-able on Kaggle.com2, consisting of all 44,293 posts fromSeptember 2020 through March 2021. We initially inves-tigated the Pushshift API, but found that it was missing asignificant amount of data over our target time period due totechnical issues. We filter the Kaggle data by date to obtaina dataset totaling 35,726 posts from all weekdays (excludingtrading holidays) from January 4th, 2021 – March 31st, 2021.

Apart from standard text preprocessing (e.g., removal ofURLs, punctuation, user handle mentions, etc.), notable pro-cessing steps include concatenating the post title and bodytogether, as well as replacing emojis with descriptive text tagssandwiched by delimiter symbols. This approach enables usto take advantage of word embeddings, which we discuss laterin Section 4.2, to capture semantic information from emojis.

2https://www.kaggle.com/gpreda/reddit-wallstreetsbets-posts/

Table 1: Distribution of r/wallstreetbets posts by direction of dailyprice movement

Price movementdirection # of posts

up 7583 (21.2%)down 28152 (78.8%)

3.2 GME price dataWe extract stock price data for the same time period (Jan. 4th

– Mar. 31st) from the RapidAPI API for Yahoo! Finance3.We calculate net price movement for each trading day as(close_price− open_price), and from this we determine thedirection of price movement (up or down) for each day.

This begs the question: how should posts made outside oftrading hours be associated with stock price data? NYSE trad-ing hours are limited to Monday – Friday, 9:30AM – 4:00PMEastern Time. Limited trading and changes in stock price cantake place after-hours. Prior work in stock prediction employsa variety of heuristics for associating events to time seriesdata, dependent largely on the relationship between predictivefeatures and the aspect of stock performance being predicted.If a post is made after market close, is it indicative of thatday’s performance, or the price change between market closeand the next day’s market open? At its root, the question is es-sentially whether there is a causal relationship between GMEprice movement and general discussion on r/wallstreetbets.

In this paper, we wish not to assume any such causal rela-tionship, as any investigation of such is outside the scope ofour work. However, short of discarding all posts made outsidetrading hours, any choice would seem to imply some kindof causal assumption. Considering this, we choose the easi-est method to implement, which is to simply associate postswith the trading day on which they were made. This carries asoft assumption that on a given day, posts made after marketclose tend to discuss that day’s stock performance, while postsbefore market open tend to discuss the upcoming trading day.

Following the simple association scheme detailed above,Table 1 shows the distribution of r/wallstreetbets posts bydirection of price movement.

3.3 Qualitative analysisIn our dataset, 52% of posts contain no body text. These areusually image or video posts, where the only readily parsabletext is the post title. Images or videos may contain some text,but multimodal techniques would be needed in order to extractsuch text. We leave this as a possible direction for future work.

There are few explicit mentions of ’$GME’, ’GME,’ or’GameStop’ among post titles and bodies — filtering for thesethree keywords leaves us with only approximately 1.1k postsout of the original 35k. While the topic of GME and a fewother short squeeze targets (e.g., AMC, BB, and NOK) cer-tainly dominate the most popular threads on r/wallstreetbetsduring this time period, users also continue to, as usual, discussa wide variety of other securities in other posts.

Without sophisticated multimodal techniques combinedwith named entity recognition, it is difficult to identify the

3https://rapidapi.com/apidojo/api/yahoo-finance1

23

Post TitlesIt’s not about the money, it’s about sending a message |rocket| |rocket| |fullmoon| |fullmoon|If this isn’t raw talent, then I don’t know what is |gemstone| |raisedhand| bought GME at $493when its 52wk high was $483 |rocket| |rocket| If I can hodl, so can u |smilingfacewithsunglasses|ALL YOU $300+ BAGHOLDING RETARDS PUT YOUR AFFAIRS IN ORDER & BE SUREYOUR BAGS ARE PACKED CUZ THE ROCKET AIN’T STOPPING. BUCKLE THE F*** UP |rocket||rocket| |rocket| |gemstone| |rasinghands| |gorilla| |briefcase|Rocketing to the Moon tendies with mah ape brain |gorilla| |brain| |rocket| |rocket| |rocket| |crescentmoon|

Figure 1: Example post titles. Emojis are replaced with text tokens (e.g., |rocket|, |gorilla|)

many posts that are about GME but do not directly mentionGME or associated entities in the text. Figure 2 is an exampleof one such post. Thus, rather than attempting to isolate senti-ment toward GME or other particular entities, in this paper weexamine the general sentiment expressed on r/wallstreetbetsacross all posts, regardless of topic.

Some peculiar rhetorical habits of r/wallstreetbets users maycause difficulties for uninitiated human sentiment annotators,to say nothing of automatic analyzers. We find many examplesof esoteric slang and self-referential humor. In particular, thelanguage of r/wallstreetbets users is characterized by heavyuse of emojis, often in senses that are unique to the subreddit.These emojis often convey strong positive sentiment. Postersoften use profanity and what might normally be construed asaggressive tone to express positive emotions. Examples of theabove are illustrated in Figure 1.

We anticipate that these factors may present difficult obsta-cles to automatic sentiment analysis.

4 Methods4.1 Automatic sentiment analysisFor automatic sentiment analysis, we turn to VADER, a popu-lar general sentiment analyzer for social media text. VADERis a rule-based model that relies on a lexicon containing sen-timent information (both polarity and intensity) for a widerange of tokens, including slang and emoticons, in a socialmedia context. This approach lends itself well to lightweightimplementation and has been shown to work well on typicalsocial media text [Hutto and Gilbert, 2014]. Our choice touse VADER is inspired by related work [Kim et al., 2016][Valencia et al., 2019].

However, due to the observed peculiarities of the languageproduced by r/wallstreetbets users discussed previously (Sec-tion 3.3), we anticipate difficulties arising from using off-the-shelf VADER.

To gauge the reliability of VADER’s sentiment analysis re-sults on r/wallstreetbets text, we perform a preliminary annota-tion experiment wherein three human annotators, all of whompossess working familiarity with r/wallstreetbets content andinvestment lingo, each evaluate the same random sample of100 post titles for majority sentiment (positive, negative, orneutral).

We observe average agreement between the three human an-notators (Cohen’s Kappa) of κ = 0.7280, indicating middlingto high agreement — a higher than anticipated result, giventhe subjective nature of the sentiment annotation task. On

the other hand, average agreement between human annotatorsand VADER is low at κ = 0.078. This suggests rather poorperformance on the part of VADER, as we anticipated.

Caveats to this annotation experiment: First, this experi-ment solely examines post titles, as some posts contain longbodies, which would lead to a cumbersome annotation task. Inmany cases, this of course reduces the number of words andamount of context available. Second, due to time constraints,the data used in this evaluation is only partially preprocessed.Most notable here is the absence of emojis, which, as previ-ously discussed, can convey especially strong sentiment onr/wsb. These factors likely impact annotation performance,both human and automatic.

We observe that VADER tends to overpredict neutral sen-timent, especially for posts labeled as ’positive’ by humanannotators. Of the sample of 100 post titles, VADER scores93/100 examples as being highest in neutral sentiment. Ofthose 93, 30 are labeled by all three human annotators as’positive.’

We speculate that this is due to a high proportion of out-of-vocabulary words (recall that post titles are often terse), whichVADER generally scores as neutral, as well as in-vocabularywords with r/wallstreetbets-specific senses that convey differ-ent sentiment polarity or intensity than they would in a genericsocial media context. For example, in the post title in Figure2, "Diamond Hands" is a popular r/wallstreetbets idiom thatexpresses determination, encouraging the reader to avoid thetemptation to sell their GME shares, even in the face of de-clining share price or the allure of slight profit. Outside ofr/wallstreetbets and similar communities, where this usuallytranslates to positive sentiment, this phrase has little meaning.

The sentiment features we extract using VADER take theform of three normalized sentiment scores: positive, neutral,and negative. These are the features we use as inputs in ourclassification experiments.

4.2 Semantic embeddingsNoting poor classification performance from sentiment fea-tures alone, we opt to augment sentiment with semantic rep-resentations. In our experiments, we compare two differentsemantic representations as features for classification along-side sentiment: word2vec and pretrained BERT.

First, we generate word2vec embeddings4 of 100 dimen-sions from our training set, ignoring any words that appearfewer than two times in the data, since these are likely to be

4https://radimrehurek.com/gensim/models/word2vec.html

24

Figure 2: An example post. Pictured: Keith Gill a.k.a. RoaringKitty,widely regarded as a principal architect of the GME short squeeze.

typos. To produce a vector representation for the text of eachpost’s title + body, we simply average the word embeddingsfor all words in a post.

For comparison, we extract from a BERT model5, pretrainedon BookCorpus6 and Wikipedia text, the last dense layer. Fromthis, we generate the output of the last hidden layer before theoutput classification layers. This produces an output vector of768 dimensions to represent the text in each post.

4.3 Additional featuresBeyond sentiment and semantic representations, we exploreother features that could benefit the prediction of price move-ment. For additional textual features, we extract word count,stopword count, average word length, and emoji count. Next,we incorporate some metadata features not derived from text,namely, upvote score and number of comments. We suspectthat these properties may indirectly indicate the level of ex-citement of the r/wallstreetbets community. We experimentwith combining these additional features with the sentimentand semantic features.

5 ExperimentsFor our classification experiments, we employ a stratifiedtrain-test split of 80%/20% and a variety of classifier models:support vector machine (SVM or SVC), random forest (RF),decision tree (DT), and multilayer perceptron (MLP).

First, we perform a preliminary experiment using only senti-ment to predict the direction of price movement. In our secondexperiment, we compare the performance of two different se-mantic representations as input features, as detailed in Section4.2: word2vec and BERT. For the last experiment, we testcombinations of sentiment, semantic representations, and theadditional features from Section 4.3.

6 Results6.1 Sentiment onlyTable 2 shows the classification result using only VADERsentiment features. In the ’F1 up’ column, which is the F1

5https://huggingface.co/bert-base-uncased/6https://yknzhu.wixsite.com/mbweb/

score for the minority class, two classifiers have an F1 scoreof zero. These classifiers label all test examples as ’down,’showing that VADER sentiment alone is not suitable for theprediction of price movement.

6.2 Semantic representationsFollowing the classification with sentiment features only, werepeat the prior experiment with the addition of word2vec andBERT semantic representations as input features. The featuresare combined via simple vector concatenation. Table 3 dis-plays results across different classifier models. Word2vec withthe decision tree classifier has overall best accuracy, macroaverage F1 score, and minority class F1 score. However, theword2vec features also produce the worst minority class per-formance with the SVM classifier.

In addition to using BERT output as a feature, we alsoexperiment with pretrained BERT as a classifier model (Table3). On the BERT side, the pretrained BERT model ("Original")shows better performance than other classifiers in terms of F1score (0.6419) and minority class F1 score (0.4580).

When introducing BERT embeddings to supplementVADER sentiment features, we observe a general strengthen-ing of majority class performance and weakening of minorityclass performance. Overall, utilizing semantic representationsas features generally improves performance, although not asmuch as word2vec.

6.3 Sentiment with semantic representations andadditional features

Finally, we experiment with combining our two prior features(sentiment and semantic representations), again using simplevector concatenation, with additional features that might ben-efit performance (detailed previously in Section 4.3). Table4 shows the results of different feature combinations. Here,"All" refers to VADER sentiment, and the additional featuresdetailed in Section 4.3.

The first row of table 4 shows the result of different com-binations of word2vec, sentiment, and extra features. Weobserve a large improvement after the addition of VADER sen-timent. Here, we achieve the overall best result with 0.9905accuracy and 0.9857 macro average F1 score. The randomforest classifier also shows great improvement in this case.This improvement happens on both BERT and word2vec. Weprovide a visualization of the decision tree nodes (AppendixA) and find that the VADER vectors are included in everybranch, which indicates that the sentiment information is ac-tively involved in the decision process.

Adding all of the additional features did not yield significantimprovement over sentiment and semantic embeddings.

7 Discussion & Future WorkFrom our results, we are unable to definitively demonstrate astrong relationship between sentiment and price movement.However, our results do show that semantic embeddings,which may capture sentiment information to some degree, areuseful for this prediction task. Predicting based on a combina-tion of sentiment and semantic embeddings produces overall

25

Table 2: VADER sentiment: preliminary classification

Table 3: Semantic representations: word2vec vs. BERT. The "Original" row represents the result of pretrained BERT with the last dense layeroutput of one dimension as prediction.

Table 4: Word embeddings with sentiment and additional features.

best performance, so sentiment does appear to have somepositive impact.

One limitation of this work is dataset quality stemmingfrom the poor performance of automatic sentiment analy-

sis. As demonstrated previously, off-the-shelf VADER sen-timent analysis produces disappointing results on text fromr/wallstreetbets. From our brief subjective analysis, we specu-late that the major contributing factors to this are:

26

• frequent use of sarcasm or irony, especially pejorativeterms and profanity to convey positive sentiment

• a preponderance of out-of-vocabulary words, chieflyslang, investment jargon, and emojis

• non-standard usages of many in-vocabulary words spe-cific to the subreddit or the investment domain

We suspect that improved sentiment analysis may yieldfurther improvements in performance. We leave to futurework the task of customizing the VADER sentiment lexiconin order to better accommodate r/wallstreetbets language. Ourdecision to replace unicode emojis with descriptive text tagsis made with this future task in mind.

Alternatively, more sophisticated sentiment extraction ap-proaches may bear investigation, such as that explored by Liand Lu (2017), who introduce scoped entity awareness to sen-timent analysis. For reasons mentioned in Section 3.3, it isdifficult to isolate posts by topic in r/wallstreetbets. Becauseof this, in this paper we examine general sentiment from thesubreddit as a whole, with no regard to the specific object(s)of the sentiment expressed in a given post. Approaches thatincorporate entity awareness may be better suited for eval-uating sentiment across long texts, such as in the bodies ofsome r/wallstreetbets posts. For example, a user might expresspositive sentiment toward GME in part by using disparagingor defiant language directed toward Robinhood or the SEC,two entities that gained notoriety on the subreddit as the shortsqueeze progressed.

Our data is moderately imbalanced between the two classes,with only 21.2% of posts falling on days with upward net pricemovement. Resampling methods – undersampling, oversam-pling, and ensemble sampling – may improve performance onthe minority class while maintaining similar performance fornon-minority classes. [More, 2016].

In the intervening months since the peak of GME ma-nia, several "meme stock"-focused offshoot subreddits haveappeared, including r/gme and r/superstonk. Based on thepremise that these subreddits emerged largely as spin-offsof r/wallstreetbets and are likely to share significant portionsof r/wallstreetbets’s user base, we speculate that going for-ward, these subreddits may be another useful source of train-ing data for the task of predicting GME price movement asthe saga continues to unfold. At the same time, sentimentfrom r/wallstreetbets posts may become a worse predictor forGME price movement over time as GME-related discussionno longer dominates discourse on the subreddit as it did duringQ1 of 2021.

A more sophisticated heuristic for temporally associatingmarket data to social media posts (discussed in Section 3.2)may allow for additional powerful approaches to be employed,including feedback neural networks, such as the GRU andLSTM models used by Minh et al. (2018).

Finally, the direction of change in stock price is a simplisticindex for stock performance and likely fails to indicate muchnuance in stock market activity, especially in an unusual shortsqueeze situation. Rather than framing this prediction taskas a binary classification problem, future work might benefitfrom experimenting with more granular labels that correspondto different magnitudes of price movement. Additionally, the

findings of Prosky et al. (2018) suggest that sentiment fromsocial media may, under some conditions, be a better predictorfor other market indicators such as volume and liquidity thanfor price movement.

References[Baldwin et al., 2013] Timothy Baldwin, Paul Cook, Marco

Lui, Andrew MacKinlay, and Li Wang. How noisy socialmedia text, how diffrnt social media sources? In Proceed-ings of the Sixth International Joint Conference on NaturalLanguage Processing, pages 356–364, 2013.

[Bing et al., 2014] Li Bing, Keith C. C. Chan, and Carol Ou.Public sentiment analysis in twitter data for prediction of acompany’s stock price movements. In Proceedings of the2014 IEEE 11th International Conference on E-BusinessEngineering, ICEBE ’14, page 232–239, USA, 2014. IEEEComputer Society.

[bus, 2021] The 10 most heavily-shorted stocks of january2021, 2021.

[Hutto and Gilbert, 2014] Clayton Hutto and Eric Gilbert.Vader: A parsimonious rule-based model for sentimentanalysis of social media text. In Proceedings of the In-ternational AAAI Conference on Web and Social Media,volume 8, 2014.

[Kim et al., 2016] Young Bin Kim, Jun Gi Kim, WookKim, Jae Ho Im, Tae Hyeong Kim, Shin Jin Kang, andChang Hun Kim. Predicting fluctuations in cryptocurrencytransactions based on user comments and replies. PloS one,11(8):e0161197, 2016.

[Li and Lu, 2017] Hao Li and Wei Lu. Learning latent sen-timent scopes for entity-level sentiment analysis. Pro-ceedings of the AAAI Conference on Artificial Intelligence,31(1), Feb. 2017.

[Li et al., 2014] Xiaodong Li, Haoran Xie, Li Chen, JianpingWang, and Xiaotie Deng. News impact on stock pricereturn via sentiment analysis. Knowledge-Based Systems,69:14–23, 2014.

[Minh et al., 2018] Dang Lien Minh, Abolghasem Sadeghi-Niaraki, Huynh Duc Huy, Kyungbok Min, and HyeonjoonMoon. Deep learning approach for short-term stock trendsprediction based on two-stream gated recurrent unit net-work. Ieee Access, 6:55392–55404, 2018.

[More, 2016] Ajinkya More. Survey of resampling tech-niques for improving classification performance in unbal-anced datasets, 2016.

[Orland, 2021] Kyle Orland. The complete moron’s guide togamestop’s stock roller coaster, Jan 2021.

[Pagolu et al., 2016] Venkata Sasank Pagolu, Kamal NayanReddy, Ganapati Panda, and Babita Majhi. Sentiment anal-ysis of twitter data for predicting stock market movements.In 2016 International Conference on Signal Processing,Communication, Power and Embedded System (SCOPES),pages 1345–1350, 2016.

27

[Prosky et al., 2018] Jordan Prosky, Xingyou Song, AndrewTan, and Michael Zhao. Sentiment predictability for stocks,2018.

[Valencia et al., 2019] Franco Valencia, Alfonso Gómez-Espinosa, and Benjamín Valdés-Aguirre. Price movementprediction of cryptocurrencies using sentiment analysis andmachine learning. Entropy, 21(6):589, 2019.

A Appendix: Visualization of decision treeclassifier

The following pages contain a visualization of the decisiontree nodes from the experiment in Table 4.

28

Figure 3: visualization of decision tree part 1 of 2

29

Figure 4: visualization of decision tree part 2 of 2

30

predicting $gme stock price movement using sentiment from

Documents