making trading great again: trump-based stock …cs229.stanford.edu/proj2019spr/report/27.pdfmaking...

Making Trading Great Again: Trump-based Stock Predictions viadoc2vec Embeddings

Rifath RashidStanford University

[email protected]

Anton de LeonStanford University

[email protected]

1 Introduction

The question of how accurately Twitter posts canmodel the movement of financial securities is onethat has had no lack of exploration in the pastdecade. Some of the first work on this topic wasdone in 2010, giving rise to an entire market-place of algorithmic trading based on Twitter sen-timent [1]. Notably, the glory of Twitter-basedtrading behaviour met its doom on April 23rd 2013at 1:07 PM, when Syrian hackers posted a tweetvia the Associated Press’s account claiming thattwo explosions in the White House had left Presi-dent Obama injured. Immediately after this tweet,the Dow Jones Industrial Average dropped 143.5points and the S&P 500 lost $136 billion. Forsome, events like this have called to question thefallibility of basing financial decisions off of Twit-ter sentiment. For others these events have mo-tivated further research into improved models, toprevent re-occurrence of these issues. The workpresented in this paper is done in the spirit of thelatter school of thought. We seek to understandhow more recent advancements in distributed wordrepresentations can capture contextually-rich sen-tences and phrases that pure sentiment analysismight miss. The first work in this space focusedon categorical sentiment, yet we have advancedmuch in our ability to understand contexts sincethen. Secondly, what’s different about today andthree years ago is that in the past we did not haveone of the most powerful people in the world us-ing Twitter as their singular platform. This paperseeks to address this new reality with the question:can contextually-rich document representations ofTrump tweets successfully predict stock change?

This work will attempt to answer this question byusing document embeddings of tweets and feedingthose embeddings into a neural network to predictstock price change.

2 Literature Review

The first work in this space introduced the ideaof utilizing the Twitter platform as an estimatorof public mood [1]. Here, researchers used a toolcalled Google-Profile of Mood States(GPOMS)which takes as input the text content of varioustweets and generates a daily time-series of publicmood along six predetermined dimensions, andthey find that two of these dimensions (‘Calm’and ‘Happiness’) seem to have predictive powerover the Dow Jones Industrial Average (DJIA)when added to existing prediction models of theDJIA. While this study provides evidence for thepotential predictive power of public sentiment andmakes strides in using Twitter as a measurementtool, categorical sentiment may be too coarsea lens for viewing the sensitivities of financialmarkets.

Recent efforts to address the obstacle of sentence-level sentiment analysis of financial news buildson foundational work by Google with doc2vec [3,4]. On this vertical, researchers have demonstratedthe success of distributed representations in iden-tifying the sentence-level sentiment, outperform-ing both bag-of-words and dictionary-based meth-ods. Interestingly, work by the theory commu-nity has claimed that, in the context of transferlearning, methods like doc2vec are outperformedby more baseline approaches such as using the

distributed representation of the words of a sen-tence and taking a weighted average [5]. Thus, itis very possible that the use of distributed repre-sentations of phrases that are longer than a sin-gle word does not generalize to other bodies oftext. However, this generalization is beyond thegoal of my work as the focus of this project is en-tirely on Trump tweets, and the domain-successof sentence-level distributed representations is em-pirically supported [3].

3 Data3.1 Twitter DataFor non-commercial applications, Twitter placeslimits for how far back developers can accessusers’ tweets. We use the API for attaining 2019tweets, but use a publicly available dataset of alltweets of Donald Trump from 2010 - 2018 1.Specifically, we make use of the raw text anddates of the tweets. In total, we attained 38,471tweets. Our predictive model considers all tweetson one day as a single training instance, leavingus with 2,139 business days of tweets. Here, weignore days with no corresponding financial data(i.e. weekends and holidays). The average numberof tweets per day is 13.4, and the max number oftweets in one day is 160. Table 1 showcases someof the most common words in tweet corpus (ig-noring common stop words and considering onlylowercase versions). For our training data, we con-sider all tweets from 1/1/10 to 5/30/2015, giving us1,136 days of tweets with matching price changedata. For our validation set, we consider all tweetsfrom 5/31/2015 to 10/31/2016, giving us 360 daysof tweets.

3.2 Stock Price DataEnd-of-day stock price was attained from Quandl,which provides financial data for various securitiessince their IPOs.

4 Experimental Models4.1 doc2vec EmbeddingsAs Figure 1 shows, the original learning goalformulated by Mikolov and Le was to observe if

1https://github.com/bpb27/trump-tweet-archive/tree/master/data/realdonaldtrump

Word Countgreat 4,632trump 3,409thank 2,416people 1,642would 1,578get 1,577new 1,544president 1,532make 1348big 1,170obama 1,165

Table 1: Most frequent words in Trump Twittercorpus

Figure 1: Original diagram used in seminal paperby Mikolov & Le

one could learn document vectors such that thesevectors could contribute to a prediction task aboutthe next word in a sentence [3]. Here, we use theparagraph ID and a fixed-window of words asfeatures and the subsequent word in the windowas our target. This is known as the DistributedMemory approach. Mikolov & Le also proposeda distributed bag of words approach that sampleda random window from a paragraph and thenformulated the task of predicting a random wordfrom the window given the paragraph ID. Whilethe Distributed Memory approach has a longertraining time, we focus on the use of this approachas it preserves the ordering and context of wordsin ways that the bag-of-words methods mightignore. This ordering may be critical to thefinancial implication of sentence.

We use the gensim library for building ourdoc2vec model. We remove stop-words using thenltk library and convert all words to lower case.We begin the learning rate at 0.025 and allow itto decay in each epoch by 0.0002 until it reaches0.00025. We parameter tune on vector size andnumber of training epochs and present the resultsin the following section.

Figure 2: α = 0.025 and decayed at a rate of 0.002/epoch, number of training epochs = 100

Figure 3: α = 0.025 and decayed at a rate of 0.002/epoch, vector size = 10

4.2 Four layer Neural Network

We use a four-layer neural network that takes adoc2vec embedding as input and outputs a stockprice change. We train separately for each finan-cial security. The input is the average embed-ding vector for all posts on the day for which weare predicting stock change at the end of the day.Here, stock change is the difference in the open-ing and closing price of a stock in relation to itsopening price. The first hidden layer of the neuralnetwork contains 128 units and uses a ReLu acti-vation. The second hidden layer contains 16 unitsand uses a ReLu activation. Both layers are fully-connected. We focus our parameter tuning on thedoc2vec model, and perform minimal tuning onthe neural network. We use mean squared error

as our cost function and linear activation for ouroutput. For regularization, we apply L2 penalty of0.04 and a dropout probability of 0.2 between thetwo hidden layers. These values were decided byexperimentation on the validation set. Finally, wenormalize all train, validation, and testing date us-ing the means and standard deviations of the train-ing data.

4.3 Trading Simulation

We perform backtesting with a simply greedystrategy. Our goal here is to place all decisionmaking on the model itself. For a given day, ifthere are tweets for that day, then our model willpredict the price change of the security for that day.If the price change is positive, then we enter a longposition for 10 shares. If the price change is pre-

dicted to be negative, then we exit our position forthis security. Each simulation is done with respectto only one security, but we allude to more com-plex trading algorithms in our future directions.All portfolios begin with 30, 000 USD. The startdate of each simulation is 11/1/16 and the end dateis 6/9/19, giving us 653 days of tweets for whichwe have end-of-day price change data (test set).

5 Results

We first begin with a search for the optimalparameters of our doc2vec model. Here, we trainon the price change of the RUSSELL 3000, anindex representing the 3,000 largest companiesby market capitalization that are incorporated inAmerica. The motivation behind this decision wasto use a security that represents the wide rangeof American companies that may be sensitiveto Trump tweets. Figures 1 and 2 exhibit ourexperimental results when varying embeddingvector size and number of training epochs. Weobserve that smaller vector sizes tend to havelower validation loss, with the lowest validationloss belonging to a vector size of 10. We observethe opposite relation with number of trainingepochs, where training our doc2vec model for 400epochs resulted in the lowest validation error.

Using these optimal parameters, we then testedour model on various financial securities. To re-spect the brevity of this report, we hold our searchto two main sectors: technology companies andcompanies with government contracts. For eachcompany, we present its final training and testingloss after 100 epochs. We simulate trading for theaforementioned sectors using our algorithm men-tioned in section 4.3 and present the results in fig-ures 4 and 5.

6 Discussion

We observe several trends from our results. Tobegin, validation loss tends to increase with vectorsize and this makes sense. Since we’re onlyconsidering once person’s lexicon, increasing thedimension of our embedding space likely addsunnecessary noise and makes our model vulner-able to the curse of dimensionality. By varyingnumber of training epochs for our doc2vec model,

Figure 4: Simulated trading for technology com-panies

Figure 5: Simulated trading for government-contracted companies

Financial Security Train Loss Test LossFaceboook (FB) 3.6 4.01Apple (AAPL) 1.64 1.46Google (GOOGL) 1.19 1.69Tesla (TSLA) 7.51 7.03Lockheed Martin (LMT) 0.99 1.27General Dynamics (GD) 1.16 1.46Boeing (BA) 1.28 1.94

Table 2: Summary of results with technologycompanies and companies with government con-tracts

we notice that training for more iterations resultsin lower validation loss. This, too, is unsurprising,

yet future research should explore at what pointthe model begins to overfit. In this work, weare limited in our exploration due to time andcompute power.

When trained on specific companies, we noticethat the final test loss is highly variable, yet theaccuracy in direction of price change is relativelyconstant for all companies (around ∼ 50%). Thissuggests that our model is reasonable at predictingdirection, but suffers with the scale of the change.It is important to note here that the testing labelsare evenly distributed and the predictions are alsoevenly distributed. For instance, in the case ofGoogle, 51.8% of Google price changes in thetesting period are non-negative, and 54.8% ofour model’s predictions are non-negative. Thesesorts of distributions are generally the same acrosscompanies, and this shows that our model is notnaively predicting negative price changes for allinstances and thereby trivially earning an accuracyof 0.5 for direction of movement. Our originaltask is a regression task, and we only discussaccuracy of sign change here to illustrate thatmuch of the loss is due to scaling of price change.

We notice that our model generalizes well to thetesting data, even though for some companies thetrain loss is high to begin with. It should be notedthat we were only able to achieve this after regular-ization and normalization of the features (addingL2 regularization and including dropout). Priorto regularization, our validation error remainedconstant for each epoch. Interestingly, we noticethat some companies exhibit lower testing lossthan training loss. Research into this phenomenain the tensorflow threads shows that this is poten-tially due to the fact that regularization is appliedonly during train time, resulting in higher trainingloss in the first few epochs. The final training lossreported by tensorflow is averaged over all epochs.

Finally, we notice a distinction between companiesthat are contracted by the government and onesthat are not. Government-contracted companiesconsistently have lower loss, whereas general tech-nology companies exhibit varying properties ofloss. In our trading simulation, this is showcasedby the fact that government-contracted companies

consistently earned more profits than general tech-nology companies. This does not necessarily sug-gest that these companies had better stocks. It sug-gests that our model was a better decision makerwhen it came to government-contracted compa-nies. It should be noted that in this time period,the return of the DJIA was 38.4% . Still, therelative trading success of government-contractedcompanies makes intuitive sense as the success ofgovernment-contracted companies is intrinsicallytied to the actions of the President. A good exam-ple of this is on December 12th 2016, when Presi-dent Trump tweeted “The F-35 program and costis out of control. Billions of dollars can and willbe saved on military (and other) purchases afterJanuary 20th.” A statement like this had immedi-ate effect with Lockheed Martin shares dropping4%, Boeing shares dropping 0.72%, and GeneralDynamics shares dropping 2.87% 2.

7 Conclusion and Future Research

The main accomplishment of this work is asa proof-of-concept that trading based on Trumptwitter posts is grounded in reasoning. This is sup-ported by the consistent non-negative returns ob-tained via only a simple trading strategy for vari-ous public companies. We learn that our model hasvariable success with different companies, and fu-ture research might explore company-specific in-formation extraction from tweets. Pre-trained dis-tributed word representations, such as BERT andGloVe, have proven their success in various do-mains, and these claims are theoretically and em-pirically supported [5, 6]. We relied on documentembeddings for their empirical success in the fi-nancial domain [3], but future work may stronglyconsider trying a weighted average of word em-beddings. Finally, projects with access to micro-scale stock information should look deeper into thelag times in minutes and seconds on stock pricemovements. For this paper, we were limited toend-of-day prices. Minor explorations with lagtimes in days showed little change. Given the sen-sitivity of financial markets, future work should in-deed focus on the very immediate effects of Trumptweets.

2https://www.fxcm.com/uk/insights/president-trumps-twitter-impact-forex-markets-stocks/footnote-7

8 ContributionsI, Rifath Rashid, am responsible for the design,coding, and reporting of this project. I thankAtharva Parulekar for advice and guidance alongthe way. Anton helped survey the literature andprepare the review. He also looked into ways ofbuilding neural nets via TensorFlow”

9 References[1] Bollen, J., Mao, H. & Zeng, X. (2010) Twittermood predicts the stock market. Journal ofComputer Science 2(1), pp. 1-8.

[2] Nisar, T.M. & Yeung, M. (2018) Twitter as atool for forecasting stock market movements: Ashort-window event study. Journal of Finance andData Science 4(2), pp. 101-119.

[3] Lutz, Bernhard & Prllochs, Nicolas& Neu-mann, Dirk. (2019). Sentence-Level SentimentAnalysis of Financial News Using DistributedText Representations and Multi-Instance Learn-ing. 10.24251/HICSS.2019.137.

[4] Q. Le and T. Mikolov, Distributed representa-tions of sentences and documents, in Proceedingsof the 31st International Conference on MachineLearning (ICML), pp. 11881196, 2014.

[5] Sanjeev Arora, Yingyu Liang, and TengyuMa. 2017. A simple but tough to beat baseline forsentence embeddings. In International Conferenceon Learning Representations (ICLR 2017).

[6] Jey Han Lau and Timothy Baldwin, ”AnEmpirical Evaluation of doc2vec with PracticalInsights into Document Embedding Generation.”CoRR. 2016.

10 Codehttps://github.com/RifathRashid/deep-stock-study

making trading great again: trump-based stock …cs229.stanford.edu/proj2019spr/report/27.pdfmaking...

Documents