deep attentive learning for stock movement prediction from

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 8415–8426,November 16–20, 2020. c©2020 Association for Computational Linguistics

8415

Deep Attentive Learning for Stock Movement PredictionFrom Social Media Text and Company Correlations

Ramit Sawhney*Netaji Subhas Institute of Technology

[email protected]

Shivam Agarwal*Manipal Institute of [email protected]

Arnav WadhwaMIDAS, IIIT Delhi

[email protected]

Rajiv Ratn ShahIIIT Delhi

[email protected]

Abstract

In the financial domain, risk modeling andprofit generation heavily rely on the sophisti-cated and intricate stock movement predictiontask. Stock forecasting is complex, given thestochastic dynamics and non-stationary behav-ior of the market. Stock movements are in-fluenced by varied factors beyond the conven-tionally studied historical prices, such as socialmedia and correlations among stocks. The ris-ing ubiquity of online content and knowledgemandates an exploration of models that factorin such multimodal signals for accurate stockforecasting. We introduce an architecture thatachieves a potent blend of chaotic temporalsignals from financial data, social media, andinter-stock relationships via a graph neuralnetwork in a hierarchical temporal fashion.Through experiments on real-world S&P 500index data and English tweets, we show thepractical applicability of our model as a toolfor investment decision making and trading.

1 Introduction

Stock prices have an intrinsically volatile andnon-stationary nature, making their rise and fallhard to forecast (Adam et al., 2016). Investmentin stock markets involves a high risk regardingprofit-making. Prices are driven by diverse fac-tors that include but are not limited to companyperformance (Anthony and Ramesh, 1992), histori-cal trends (Kohara et al., 1997), investor sentiment(Neal and Wheatley, 1998). Uninformed tradingdecisions can leave traders and investors prone tofinancial risk and experience monetary losses. Onthe contrary, careful investment choices can maxi-mize profits (de Souza et al., 2018). Conventionalresearch focused on time series and technical anal-ysis of a stock, i.e., using patterns from historicalprice signals to forecast stock movements (B et al.,

* Equal contribution.

2013). However, price signals alone fail to capturemarket surprises and impacts of sudden unexpectedevents. Social media texts like tweets can havehuge impacts on the stock market. For instance,US President Donald Trump shared tweets express-ing negative sentiments against Lockheed Martin,which led to a loss of around $5.8 Billion to thecompany’s market capitalization.1

The Efficient Market Hypothesis (EMH)(Malkiel, 1989) states that financial markets areinformationally efficient, such that stock prices re-flect all known information. Existing works (Sec.2) mainly focus on subsets of stock relevant data.Although useful, they do not jointly optimize learn-ing over modalities like social media text and interstock relations limiting their potential to capture abroader scope of stock movement affecting data,as we show in Sec. 6. Multimodal stock predic-tion involves multiple challenges (Hu et al., 2018).Both price signals and tweets exhibit sequentialcontext dependencies, where singular samples maynot be informative enough but can be considered asequence for a unified context. Tweets often havediverse influence on stock prices, based on theirintrinsic content, such as breaking news as opposedto noise like vague comments. Fusing multiplemodalities of vast stock related data generated withvarying characteristics (frequency, noise, source) iscomplex and mandates the careful design of jointoptimization over modality-specific components.

Building on the EMH and prior work (Sec. 2),we propose MAN-SF: Multipronged Attention Net-work for Stock Forecasting that jointly learns fromhistorical prices, social media, and inter stock rela-tions. MAN-SF through hierarchical attention cap-tures relevant signals across diverse data to train aGraph Attention Network (GAT) for stock predic-tion (Sec. 3). MAN-SF (Sec. 4) jointly learns from

1https://medium.com/scoop-markets/7-tweets-which-wiped-40-billion-off-the-stock-market

https://medium.com/scoop-markets/7-tweets-which-wiped-40-billion-off-the-stock-market-ac8652dffa1a

https://medium.com/scoop-markets/7-tweets-which-wiped-40-billion-off-the-stock-market-ac8652dffa1a

8416

price and tweets over graph-based models for stockprediction. Through varied experiments (Sec. 5),we show the predictive power of MAN-SF alongwith profitability analysis (Sec. 6) and qualitativelyanalyze MAN-SF in high risk scenarios (Sec. 7).

2 Related Work

Predicting stock movements spans multiple do-mains (Jiang, 2020); 1) theoretical: quantitativemodels like Modern Portfolio Theory (Elton et al.,2009), Black-Scholes model (Black and Scholes,1973), etc. and, 2) practical: investment strategies(Blitz and Van Vliet, 2007), portfolio management(Hocquard et al., 2013), and beyond the world offinance (Erb et al., 1994; Rich and Tracy, 2004).Financial models conventionally focused on tech-nical analysis (TA) relying only on numerical fea-tures like past prices (Ding and Qin, 2019; Nguyenet al., 2019) and macroeconomic indicators likeGDP (Hoseinzade et al., 2019). Such TA methodsinclude discrete: GARCH (Bollerslev, 1986), con-tinuous (Andersen, 2007), and neural approaches(Nguyen and Yoon, 2019; Nikou et al., 2019).

Newer models based on the EMH that are cate-gorized under fundamental analysis (FA) (Dichevand Tang, 2006), account for stock affecting factorsbeyond numerical ones such as investor sentimentthrough news, etc. Work in natural language pro-cessing (NLP) from sources such as news (Hu et al.,2018), social media data (Xu and Cohen, 2018),earnings calls (Qin and Yang, 2019; Sawhney et al.,2020b) shows the merit of FA in capturing mar-ket sentiment, surprises, mergers, acquisitions thattraditional TA based methods fail to account. Alimitation of existing NLP methods for stock pre-diction is that they assume stock movements to beindependent of each other, contrary to true marketfunction (Diebold and Yilmaz, 2014). This assump-tion hinders NLP centric FA’s ability to learn latentpatterns for the study of interrelated stocks.

Another line of FA revolves around employinggraph-based methods to improve TA (e.g., price-based models) by augmenting them with interstock relations (Feng et al., 2019b; Sawhney et al.,2020a). Matsunaga et al. (2019) combine historicalprices with stock graphs through Graph Convolu-tion Networks (GCNs), outperforming price-onlymodels. Similarly, Kim et al. (2019) further im-prove graph neural network methods by weighingstock relations through attention mechanisms, asnot all stock movements are equally correlated.

Despite the popularity of NLP and graph-basedstock prediction, multimodal methods that captureinter stock relations and market sentiment throughlinguistic cues are seldom explored. Jue Liu (2019)combines feature extraction from news sentimentscores, financial information (price-earnings ra-tio, etc.) along with knowledge graph embed-dings through TransR. However, such existing ap-proaches (Deng et al., 2019) are unable to representtextual signals from social media and prices tem-porally, as they only utilize sentiment scores anddo not account for stock correlations. To cover thisgap in prior research, MAN-SF captures a broaderset of features as opposed to both conventional TAand FA that singularly focus on either text or graphmodalities, but not both together.

3 Problem Formulation

MAN-SF’s main objective is to learn temporallyrelevant information jointly from tweets and histor-ical price signals and make use of corporate rela-tions among stocks to predict movements. Follow-ing Xu and Cohen (2018), we formalize movementbased on the difference between the adjusted clos-ing prices of the stock s ∈ S on trading days d andd− 1. We formulate stock movement prediction asa binary classification problem.

Problem Statement: Given stock s ∈ S, andhistorical price data and tweets for stock s over alookback window of T days over the day range[t − T, t − 1], we define the price movement ofstock s from day t− 1 to t as:

Yt =

{0, pcd < pcd−11, pcd ≥ pcd−1

(1)

where pcd represents the widely used (Yang et al.,2020; Qin and Yang, 2019) adjusted closing price2

of a given stock on day t. Here, 0 represents a pricedownfall, and 1 represents a rise in the price.

4 MAN-SF: Components and Learning

In this section, we first give an overview of MAN-SF, followed by a detailed explanation of each com-ponent. As shown in Figure 1, MAN-SF first en-codes market data for each stock over a fixed pe-riod. Formally, we encode stock features xt ∈ Rw

for each trading day t as, xt = B(ct, qt); where,ct ∈ Ru represents a social media feature that we

2Source: https://www.investopedia.com/terms/a/adjusted_closing_price.asp

https://www.investopedia.com/terms/a/adjusted_closing_price.asp

https://www.investopedia.com/terms/a/adjusted_closing_price.asp

8417

Figure 1: An overview of MAN-SF: Encoding Mechanisms, GAT Mechanism, Joint Optimization.

Figure 2: An overview of the Price Encoder.

obtain by encoding tweets over the lag window foreach stock s ∈ S = {s1, s2, . . . sS}. Similarly,qt ∈ Rv are the features obtained from historicalprices for a stock in the lag window. We detailthese encoders first, and then explain the fusionB(·) over ct and qt to obtain xt ∈ Rw. We thendescribe the graph to represent the inter stock re-lations. Lastly, we explain the GAT to which thefused feature vector xt is passed to propagate fea-tures based on inter-stock relations along with thejoint optimization of MAN-SF.

4.1 Price Encoder

Technical Analysis shows that historical price infor-mation is a strong indicator of future trends (Jean-blanc et al., 2009). Therefore, price data from eachday is a crucial input to MAN-SF. The Price En-coder shown in Figure 2 encodes historical stockprice movements to produce price feature, qt. Ittakes in a per-day price feature from the lookbackof T days and encodes the temporal trend in prices.To capture such sequential dependencies acrosstrading days, we use a Gated Recurrent Unit (GRU)(Cho et al., 2014; Giles et al., 2001). The output ofthe GRU on day i is denoted by:

hi = GRUp(pi, hi−1) t− T ≤ i ≤ t (2)

where, pi ∈ Rdp is the price vector on day i foreach stock s in the lookback. The raw price vector,pi = [pci , p

hi , p

li] comprises of a stock’s adjusted

closing price, highest price and lowest price for atrading day i. Since it is the price change that deter-mines the stock movement rather than the absoluteprice value, we normalize it with its last adjustedclosing price, pi = pi/p

ci−1.

It has been shown that the stock trend of eachday has a different impact on stock trend prediction(Feng et al., 2019a). Towards this end, we employtemporal attention ζ(·) (Li et al., 2018) that learnsto weigh critical days and forms an aggregatedfeature representation across all hidden states ofthe GRU (Qin et al., 2017). The temporal attentionmechanism yields qt = ζ(hp); where, hp ∈ Rdp×T

is the concatenated hidden states ofGRUp for eachstock s. This temporal attention mechanism ζ(·)rewards days with more impactful information andaggregates it from all days in the lag window toproduce price features qt ∈ Rv.

Temporal Attention We use a temporal atten-tion mechanism that is a form of additive attention(Bahdanau et al., 2014). The mechanism ζ(·) ag-gregates all the hidden representations of the GRUacross different time-steps into an overall represen-tation with learned adaptive weights (Feng et al.,2019a). We formulate this mechanism ζ(·) as:

βi =exp (hTi Whz)∑Ti=1 exp (h

Ti Whz)

(3)

ζ(hz) =∑i

βihi (4)

where, hz ∈ RT×dm denotes the concatenatedhidden states of the GRU. βi represents the learnedattention weights for trading day i, and W is alearnable parameter matrix.

8418

Figure 3: Social Media Information Encoder.

4.2 Social Media Information Encoder (SMI)Xu and Cohen (2018) suggest that tweets not onlyconvey factual data, but also portray user sentimenttowards stocks that influence financial prediction(Bollen et al., 2011). A variety of market factorsbeyond historical prices drive stock trends (Abu-Mostafa and Atiya, 1996). With the rising ubiquityof the Internet, social media platforms, such asTwitter, influence investors to follow market trends(Tetlock, 2007; Hu et al., 2018). Tweets not onlyconvey factual information but also portray usersentiment towards stocks (Xu and Cohen, 2018;Fung et al., 2002). To this end, MAN-SF uses theSMI encoder to extract a feature vector ct usingtweets. The encoder shown in Figure 3 extractssocial media features, ct, by first encoding tweetsfor a day and then over multiple days using a hier-archical attention mechanism (Yang et al., 2016).

Tweet Embedding For any given tweet tw, wegenerate an embedding vector m ∈ Rd. We ex-plored word and sentence level embedding methodsto learn tweet representations: Global Vectors forWord Representation (GloVe) (Pennington et al.,2014), Fasttext (Joulin et al., 2017), and UniversalSentence Encoders (USE) (Cer et al., 2018). Empir-ically, sentence-level embeddings generated usinga deep averaging network encoder variant of theUSE3 gave us the most promising results. Thus,we encode each tweet tw using USE.

Learning Representations for one day On anyday i, a variable number tweets [tw1, tw2, . . . twK ]for each stock s are posted, and these cap-ture and influence the stock trends (Fung et al.,

3Implementation used: https://tfhub.dev/google/universal-sentence-encoder/2

2002). For each tweet, we obtain a representa-tion using the Tweet Embedding layer (USE) as[m1,m2, . . .mK ] where mj ∈ Rd and K is thenumber of tweets per stock on day i. To model thesequence of tweets within a day, we use a GRU.For stock s on each day i:

hj = GRUm(mj , hj−1); j ∈ [1,K] (5)

The influence of online tweets on the market canvary greatly (Hu et al., 2018). To identify tweetsthat are likely to have a more substantial influenceon the market, we use an intraday tweet level atten-tion. For each stock s on each day i the mechanismcan be summarized as:

γj =exp (hTj Whm)∑Kj=1 exp (h

Tj Whm)

(6)

ri =∑j

γjhj (7)

where, hm ∈ RK×dm denotes a concatenation ofall hidden states from GRUm and dm is the di-mension of each hidden state. γj represents theattention weights and ri represents the features ob-tained from several published tweets on day i foreach stock s. W is a learned linear transformation.

Learning Representations across days Analyz-ing a temporal sequence of tweets and combiningthem can provide a more reliable assessment ofmarket trends (Zhao et al., 2017). We learn a so-cial media representation from the sequence of daylevel tweet representations ri. This feature vectorencodes all the information in a lookback window.We then feed temporal day level tweet vectors to aGRU for sequential modeling given by:

hi = GRUs(ri, hi−1) t− T ≤ i ≤ t (8)

where, hi summarizes the tweets on day i for stocks as well as tweets from preceding days while fo-cusing on day i. Like historical prices, tweets fromeach day have a different impact on stock move-ments. Hence, the previously described temporalattention mechanism used for historical prices isalso used for social media. This mechanism learnsa procedure to aggregate impactful information toform SMI features ct over a lookback of T daysfor each stock s. The temporal attention mecha-nism yields ct = ζ(hs); hs ∈ RT×ds representsthe concatenated hidden states of GRUs and ds isthe size of output space of the GRU. This temporal

https://tfhub.dev/google/universal-sentence-encoder/2

https://tfhub.dev/google/universal-sentence-encoder/2

8419

attention ζ(·), along with the intraday tweet-levelattention, forms a hierarchical attention mechanism.This mechanism captures the fact that tweets aredifferently informative and have varied impacts dur-ing different market phases. The obtained SMI andprice features for each stock are then blended toobtain a joint representation.

4.3 Blending Multimodal InformationSignals from different modalities often carry com-plementary information about different events inthe market (Robert P. Schumaker, 2019). Directconcatenation treats information from Price andSMI encoders equally (Li et al., 2016). Further-more, the interdependencies between price andtweets are not appropriately captured, damping theframework’s capacity to learn their correlations tomarket trends (Li et al., 2014). We use a bilineartransformation that learns the pairwise feature in-teractions from historical price features and tweets.Formally, qt ∈ Rv and ct ∈ Ru are obtained fromthe Price Encoder and SMI Encoder, respectively.The output xt∈Rw is given by:

xt = B(ct, qt, ) = ReLU(qTt Wct + b) (9)

where, W ∈ Rw×v×u is the weight matrix, andb ∈ Rw is the bias. Methods like direct mean andattention-based aggregation (Bahdanau et al., 2014)do not account for pair-wise interactions as shownin the results (Sec. 6). Other methods like fac-torized bilinear pooling (Yu et al., 2017), reducecomputational complexity; however, we empiri-cally find that the generalized bilinear layer out-performs these techniques. This layer learns anoptimum blend of features from prices and tweetsin a translationally invariant manner.

4.4 Graph Attention Network (GAT)Stocks are often interlinked with one another, andthus, we model stocks and their relations as a graph.

Graph Creation Following Feng et al. (2019b),we make use of Wiki company-based relations. Us-ing Wikidata4, we extract first and second-order re-lations between the company stocks in the S&P 500index. A first-order relation is defined as X R1−→ Ywhere X and Y denote entities in Wikidata thatcorrespond to the two stocks. A second-order re-lation is defined by X R2−→ Z R3←− Y where Z de-notes another entity connecting the two entities X

4https://www.wikidata.org/wiki/Wikidata:List_of_properties/all

and Y. R1, R2, and R3, defined in Wikidata, aredifferent types of entity-relations. For instance,Wells Fargo and Bank of America are related toBerkshire Hathaway via a first-order company rela-tion ”owned by.” Another example is Microsoft andBerkshire Hathaway that are related through BillGates (second-order relation: ”owned by” - ”is aboard member of”) since Bill Gates possesses own-ership over Microsoft and is a Board member ofBerkshire Hathaway. We define the stock relationnetwork as a graph G(S,E) where S denotes theset of nodes, and E is the set of edges. Each nodes ∈ S represents a stock, and two stocks s1, s2 ∈ Sare joined by an edge e∈E if s1, s2 are linked bya first or second-order relation.

Graph Attention Graph-based representationlearning through graph neural networks can be con-sidered as information exchange between relatednodes (Gilmer et al., 2017). As each stock has adifferent degree of influence on another stock, it isessential that the graph encoding suitably weighsmore relevant relations between stocks. To this end,we use graph attention networks (GATs), whichare graph neural networks with node-level atten-tion (Velickovic et al., 2017).

We first describe a single GAT layer that isused throughout the GAT component. The in-put to the GAT is a set of stock (node) features,h = [x1, x2, . . . x|S|], where xi is the encodedmulti-modal market information (Sec. 4.3). TheGAT layer produces an updated set of of node fea-tures h′ = [z1, z2, . . . z|S|]; zi ∈ Rw′ based on theGAT mechanism (shown in Figure 1). We firstapply a shared linear transform parameterized byW ∈ Rw′×w to all the nodes. Then, we apply ashared self-attention mechanism to each node i inits immediate neighborhood Ni. For each nodej ∈ Ni, we compute normalized attention coeffi-cients αij representing the importance of relationsamong stocks i and j. Formally, αij is given as:

αij=exp (LeakyReLU(aTw[Wxi ⊕Wxj ]))∑

k∈Ni

exp (LeakyReLU(aTw[Wxi⊕Wxk]))

(10)where, .T and⊕ represent transpose and concatena-tion respectively. aw∈R2w′ is a learnable weightmatrix of a single layer feed forward neural net-work. The learned attention coefficients αij areused to weigh and aggregate feature vectors fromneighboring with a non-linearity σ. The updated

https://www.wikidata.org/wiki/Wikidata:List_of_properties/all

https://www.wikidata.org/wiki/Wikidata:List_of_properties/all

8420

node feature vector zi is given as:

zi = σ

∑j∈Ni

αijWxj

(11)

We use multi-head attention to stabilise training(Vaswani et al., 2017). Formally, U independentexecutors apply the above attention mechanism.Their output features are concatenated to yield:

zi =

U⊕k=1

σ

∑j∈Ni

αkijW

kxj

(12)

where, αkij and W k denote normalised attention

coefficients and linear transformation parametermatrix computed by the kth attention mechanism.

We use a two-layer GAT, the first layer is fol-lowed by Exponential Linear Unit (Clevert et al.,2015), and the second layer outputs a vector yifor each stock i, which is then used to classifythe stock’s future price movements. MAN-SF istrained using the Adam optimiser by optimizingthe cross-entropy loss, given as:

Lcse = −|S|∑i=1

Yi ln(yi)+(1−Yi) ln(1−yi) (13)

where, Yi is the true price movement of stock i.

5 Experiments

5.1 Dataset and Training SetupWe adopt the StockNet dataset (Xu and Cohen,2018) for the training and evaluation of MAN-SF.The dataset contains data of high-trade-volumestocks in the S&P 500 index in the NYSE andNASDAQ markets. Stock specific tweets are ex-tracted using regex queries made out of NASDAQticker symbols, for instance, $AMZN for Ama-zon. The price data has been obtained from Ya-hoo Finance5. We shift a 5-day lag window alongthe trading days to generate samples. We labelthe samples according to the movement percent-age of the closing price such that those ≥ 0.55%and ≤ −0.5% are labeled positive and negativesamples, respectively. This leaves us with 26, 614samples divided as 49.78% and 50.22% in the twoclasses. We temporally split the dataset in a ra-tio of Train:Validation:Test in 70:10:20, leaving uswith date ranges from 01/01/2014 to 31/07/2015 for

5https://finance.yahoo.com/industries

training, 01/08/2015 to 30/09/2015 for validation,and 01/10/2015 to 01/01/2016 for testing. Follow-ing Xu and Cohen (2018), we align trading days bydropping samples that lack either prices or tweets,and further align the data across trading windowsfor related stocks to ensure data is available for alltrading days in the window for all stocks. The hid-den size of all GRUs is 64, and the USE embeddingdimension is 512. We use U = 8 attention headsfor both GAT layers. We use the Adam optimizerwith a learning rate set to 5e−4 and train MAN-SFfor 10, 000 epochs. It takes 3hrs to train and testMAN-SF on Tesla K80 GPU. We use early stop-ping based on Matthew’s Correlation Coefficient(MCC) taken over the validation set.

5.2 EvaluationFollowing prior research for stock prediction (Dinget al., 2014; Xu and Cohen, 2018), we use accuracy,F1 score, MCC (implementations from sklearn6)for classification performance. We use MCC be-cause, unlike the F1 score, MCC avoids bias due todata skew as it does not depend on the choice of thepositive class and accounts for the True Negatives.

For a given confusion matrix(tp fnfp tn

):

MCC =tp× tn− fp× fn√

(tp+ fp)(tp+ fn)(tn+ fp)(tn+ fn)(14)

Like prior work (Kim et al., 2019; Feng et al.,2019b), to evaluate MAN-SF’s applicability to real-world trading, we assess its profitability on thetest data of the S&P 500 index using two metrics:Cumulative Profit and Sharpe Ratio (Sharpe, 1994).We follow a trading strategy where, if MAN-SFpredicts a rise in a stock’s value the next day, thenone share of that stock is bought (long position) atthe closing price of the current trading session andsold on the next day’s closing price. Otherwise,if the strategy speculates a fall in price, a shortsell7 is performed. We compute the cumulativeprofit (Krauss, 2018) earned as:

Profitt =∑i∈S

pti − pt−1i

pt−1i

(−1)Actiont−1i (15)

where, S denotes the set of stocks, pti denotes theprice of stock i at day t. Actiont−1i is a binaryvalue [0, 1]. TheActiont−1i is 0 if the long positionis taken at time t for stock i; otherwise it is 1.

6sklearn: https://scikit-learn.org7Short sell: https://en.wikipedia.org/wiki/

Short_(finance)

https://finance.yahoo.com/industries

https://scikit-learn.org

https://en.wikipedia.org/wiki/Short_(finance)

https://en.wikipedia.org/wiki/Short_(finance)

8421

Model F1 ↑ Accuracy ↑ MCC ↑

RAND 0.502± 8e−4 0.509± 8e−4 −0.002± 1e−3TA ARIMA (Brown, 2004) 0.513± 1e−3 0.514± 1e−3 −0.021± 2e−3

Selvin et al. (2017) 0.529± 5e−2 0.530± 5e−2 −0.004± 7e−2RandForest (Venkata Sasank Pagolu, 2016) 0.527± 2e−3 0.531± 2e−3 0.013± 4e−3TSLDA (Nguyen and Shirai, 2015) 0.539± 6e−3 0.541± 6e−3 0.065± 7e−3HAN (Hu et al., 2018) 0.572± 4e−3 0.576± 4e−3 0.052± 5e−3StockNet - TechnicalAnalyst (Xu and Cohen, 2018) 0.546±− 0.550±− 0.017±−StockNet - FundamentalAnalyst (Xu and Cohen, 2018) 0.572±− 0.582±− 0.072±−StockNet - IndependentAnalyst (Xu and Cohen, 2018) 0.573±− 0.575±− 0.037±−

FA StockNet - DiscriminativeAnalyst (Xu and Cohen, 2018) 0.559±− 0.562±− 0.056±−StockNet - HedgeFundAnalyst (Xu and Cohen, 2018) 0.575±− 0.582±− 0.081±−HATS (Kim et al., 2019) 0.560± 2e−3 0.562± 2e−3 0.117± 6e−3Chen et al. (2018) 0.530± 7e−3 0.532± 7e−3 0.093± 9e−3Adversarial LSTM (Feng et al., 2019a) 0.570±− 0.572±− 0.148±−MAN-SF (This work) 0.605± 2e−4 0.608± 2e−4 0.195± 6e−4

Table 1: Results compared with baselines. Bold shows the best results. Green is indicative of higher performance.TA and FA represent Technical Analysis and Fundamental Analysis models, respectively.

The Sharpe Ratio is a measure of the return ofa portfolio compared to its risk. We calculate theSharpe ratio by computing the ratio of the expectedreturn Ra of a portfolio to its standard deviation as:

Sharpe Ratioa =E[Ra]

std[Ra](16)

5.3 Baselines

We compare MAN-SF with the below baselinesspanning both technical and fundamental analysis.

Technical Analysis: These methods uses onlyhistorical price information.

• RAND: Random guess as price rise or fall.

• ARIMA: Autoregressive Integrated MovingAverage models historical prices as a non-stationary time series (Brown, 2004).

• Selvin et al. (2017): Three deep neural archi-tectures (RNN, CNN and LSTM) using prices.We compare with the best performing LSTM.

Fundamental Analysis: These methods useother modalities such as text information and com-pany relationships along with historical prices.

• RandForest: Random Forests classifiertrained over word2vec (Mikolov et al., 2013)embeddings for tweets.

• TSLDA: Topic Sentiment Latent Dirichlet Al-location model is a generative model that usessentiments and topic modeling on social me-dia (Nguyen and Shirai, 2015).

• HAN: A hierarchical attention mechanism toencode textual information during a day andacross multiple days (Hu et al., 2018).

• StockNet: A variational Autoencoder (VAE)that uses price and text information. Text isencoded using hierarchical attention duringand across days. Price features are modeledsequentially (Xu and Cohen, 2018). We com-pare with all five variants of StockNet.

• HATS: A hierarchical graph attention methodthat uses a multi-graph to weigh different rela-tionships between stocks. It uses only histori-cal price data (Kim et al., 2019).

• Chen et al. (2018): GCNs to model interstock relations with only historical price data.

6 Results and Analysis

We now discuss the experimental results and somefindings with their financial implications.

Performance Comparison Table 1 shows theperformance of the compared methods on Stock-Net’s test data split from 01/10/2015 to 31/12/2015on the S&P 500 index averaged over ten differ-ent runs. Using a learned blend of historical priceand tweets using corporate relationships, MAN-SF achieves the best performance, outperformingthe strongest baselines, StockNet, and AdversarialLSTM. We also note that Fundamental Analysis(FA) techniques outperform numerical only Tech-nical Analysis (TA) methods, reiterating the effec-tiveness of factoring in social media signals and

8422

Model Component F1 ↑ MCC ↑

LSTM + Historical Price 0.521 0.002GRU + Social Media Text (BERT) 0.539 0.077GCN + Historical Price 0.532 0.093GRU + Social Media Text (USE) 0.546 0.101GCN + Social Media Text (USE) 0.555 0.102GAT + Historical Price 0.562 0.117MAN-SF (Concatenation) 0.588 0.156MAN-SF (Attention Fusion) 0.594 0.173MAN-SF (Bilinear Transformation) 0.605 0.195

Table 2: Ablation study over MAN-SF’s components.

(a) Feature fusion maps (b) Graph attention map

Figure 4: Feature weight heatmaps for MAN-SF

inter stock relations. These results empirically vali-date the effectiveness of multimodal signals due toa broader capture of stock price influencing infor-mation, including tweets and other related stocks.

Ablation Study In Table 2, we observe the abil-ity of price and text models to predict the markettrend to an extent using unimodal features. Im-provements over individual modalities are notedwith the inclusion of a graph-based learning model,i.e., GCN and GAT validating the premise of us-ing inter stock relations for enhanced forecasting.When the text and price signals are fused, and morerelevant information is extracted using the atten-tion mechanisms, a performance gain is seen. Theablation study ties up with the EMH, as we add ad-ditional modalities, we note an increment in MAN-SF’s ability for stock prediction. Two critical obser-vations from Table 2 are the substantial MCC gainswhen using GAT over GCN and the contrast be-tween fusing text and prices via concatenation andbilinear transformations. We discuss these next.

Impact of Bilinear Transformations Bilinearblending outperforms concatenation, and attentionfusion variants, as seen in Table 2. We postulatethat the bilinear transformation can better learn theinterplay between the signals compared to othervariants. On examining Figure 4a, we observe thatthe bilinear layer blends highly non-linear relation-

Table 3: Annualized sharpeRatio comparison withbaselines. Bold and italicsdenotes best and secondbest results, respectively.

Model Sharpe Ratio↑Stocknet 0.83HATS 0.78MAN-SF 1.05

2015

-10-08

2015

-10-18

2015

-10-28

2015

-11-07

2015

-11-17

2015

-11-27

2015

-12-07

2015

-12-17

2015

-12-27

0

0.5

1

1.5

2

2.5

3

3.5

4 Adv-LSTM Stock-NetHATS MAN-SF (Concat)

MAN-SF

Figure 6: Cumulativeprofit trend

ships between the two signals leading to a jointrepresentation that captures more specific featuresnoticed by areas of concentrated attention as com-pared to simple concatenation based fusion.

Analyzing Graph Attention We notice thatequally weighing all correlations using GCN-basedmodels leads to smaller performance gains, asshown in Table 2, as compared to GAT (GAT, andMAN-SF variants). To analyze this difference, wefirst calculate each neighbor’s attention scores inthe stock relations graph, as shown in Figure 4b.By analyzing the different stock associations withthe highest and lowest attention scores, we observethat some relations between stocks, such as beinga part of the same industry or having the samefounder, are more critical than other relations likestocks having the same country of origin. For in-stance, C (CitiCorp) and JPM (JP Morgan) havea relatively high attention score and are a part ofthe same investment and banking industry, whereasthe attention score for JPM and CSCO (Cisco) isrelatively low. We also observe that some stocksshare hidden correlations captured by the GAT dueto the market’s temporal nature. We explain onesuch example in Section 7.

Profitability We examine MAN-SF’s practicalapplicability through a profitability analysis on real-world stock data. From Table 3 and Figure 6, wenote that MAN-SF achieves higher risk-adjustedreturns and an overall profit. MAN-SF outperformsdifferent baselines over the common testing periodof three months using the stocks data in the S&P500 index. These observations show the profitabil-ity of MAN-SF over models that do not capturestock correlations (StockNet) and models that donot use the impact of textual data (HATS). Wepotentially attribute these improvements to MAN-SF’s ability to learn a more concentrated blend oftext and price features as opposed to competitive

8423

Figure 5: Graph sample showing attention weights for stock correlations (top left); Stock price movement depictinginter-stock relationships (bottom left); Tweets with hierarchical temporal attention weights (right)

models. We extend this analysis in the next section.

7 Qualitative Analysis

We conduct an extended analysis across two high-risk scenarios, as shown in Figure 5, to study the ap-plicability of MAN-SF to investors in the stock mar-ket. The study is based on Apple’s (AAPL) trendduring 12th Nov - 18th Nov. Figure 5 shows someof the tweets posted and AAPL’s relations withrelevant stocks such as Alibaba (BABA), Google(GOOG), and among others during that period.

12th Nov to 16th Nov: Failure of StockNet andmodels that do not capture inter stock relations:From Figure 5, we see from the price movementthat 12th to 16th November 2015 shows a decline inApple’s stock price. Here, we observe that Stock-Net predicts a further drop in Apple’s price, andsimilar models that use only price and text are un-able to predict the price rise for Apple on 17thNovember correctly. However, we discover thatApple shares a strong relationship with Alibabaand Google during that time, as indicated by the at-tention weights. MAN-SF incorporates inter-stockrelations through graph attention to learn latent cor-relations between AAPL, BABA, and GOOG, asshown by the graph snippet in Figure 5. MAN-SFcorrectly predicts a rise in Apple’s price and makesa profit, unlike StockNet. We attribute this predic-tion to MAN-SF likely having a broader context byblending multimodal signals.

14th Nov to 18th Nov: Failure of HATS andmodels that do not leverage social media data:Despite Apple’s sharp fall on 18th November, wesee tweets with positive sentiment having higher

attention weights during the lookback window,indicating a possible increase in Apple’s price.MAN-SF uses hierarchical attention mechanismsover tweets and inter-stock correlations correctly.Thereby likely predicting a rise in Apple’s stockprice, similar to models such as StockNet. As op-posed to these, models such as HATS forecast acontinual decrease in Apple’s price, potentially dueto not factoring in social media data.

8 Conclusion and Future Work

We study stock movement prediction by using nat-ural language, graph-based and numeric features.We propose MAN-SF, a neural model that jointlylearns temporally relevant signals from chaoticmultimodal data spanning historical prices, tweets,and inter stock correlations in a hierarchical fashion.Extensive quantitative and qualitative experimentson real market data demonstrate MAN-SF’s appli-cability for neural stock forecasting. We plan tofurther use news articles, earnings calls, and otherdata sources to capture market dynamics better. An-other interesting direction of future research is toexplore the cold start problem, where MAN-SFcould be leveraged to predict stock movements fornew stocks. Lastly, we would also like to extendMAN-SF’s architecture to not be limited to modelall stocks together (because of its GAT component)to increase scalability to cross-market scenarios.

ReferencesYaser S. Abu-Mostafa and Amir F. Atiya. 1996. In-

troduction to financial forecasting. Applied Intelli-gence, 6(3):205–213.

https://doi.org/10.1007/bf00126626

https://doi.org/10.1007/bf00126626

8424

Klaus Adam, Albert Marcet, and Juan Pablo Nicoli.2016. Stock market volatility and learning. TheJournal of Finance, 71(1):33–82.

Leif B. G. Andersen. 2007. Efficient simulation of theheston stochastic volatility model. SSRN ElectronicJournal.

Joseph H Anthony and K Ramesh. 1992. Associa-tion between accounting performance measures andstock prices. Journal of Accounting and Economics,15(2-3):203–227.

Uma Devi B, Sundar D, and Alli P. 2013. An effectivetime series analysis for stock trend prediction usingARIMA model for nifty midcap-50. InternationalJournal of Data Mining & Knowledge ManagementProcess, 3(1):65–78.

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben-gio. 2014. Neural machine translation by jointlylearning to align and translate. arXiv preprintarXiv:1409.0473.

Fischer Black and Myron Scholes. 1973. The pricingof options and corporate liabilities. Journal of Polit-ical Economy, 81(3):637–654.

David C Blitz and Pim Van Vliet. 2007. The volatil-ity effect. The Journal of Portfolio Management,34(1):102–113.

J. Bollen, H. Mao, and X. Zeng. 2011. Twitter moodpredicts the stock market. Journal of ComputationalScience, 2(1):1–8. Cited By 2072.

Tim Bollerslev. 1986. Generalized autoregressive con-ditional heteroskedasticity. Journal of Economet-rics, 31(3):307–327.

Robert Goodell Brown. 2004. Smoothing, forecastingand prediction of discrete time series. Courier Cor-poration.

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua,Nicole Limtiaco, Rhomni St John, Noah Constant,Mario Guajardo-Cespedes, Steve Yuan, Chris Tar,et al. 2018. Universal sentence encoder for english.In Proceedings of the 2018 Conference on EmpiricalMethods in Natural Language Processing: SystemDemonstrations, pages 169–174.

Yingmei Chen, Zhongyu Wei, and Xuanjing Huang.2018. Incorporating corporation relationship viagraph convolutional neural networks for stock priceprediction. In Proceedings of the 27th ACM Inter-national Conference on Information and KnowledgeManagement, CIKM ’18, page 1655–1658, NewYork, NY, USA. Association for Computing Machin-ery.

Kyunghyun Cho, Bart Van Merrienboer, Dzmitry Bah-danau, and Yoshua Bengio. 2014. On the propertiesof neural machine translation: Encoder-decoder ap-proaches. arXiv preprint arXiv:1409.1259.

Djork-Arne Clevert, Thomas Unterthiner, and SeppHochreiter. 2015. Fast and accurate deep networklearning by exponential linear units (elus). arXivpreprint arXiv:1511.07289.

Shumin Deng, Ningyu Zhang, Wen Zhang, JiaoyanChen, Jeff Z. Pan, and Huajun Chen. 2019.Knowledge-driven stock trend prediction and expla-nation via temporal convolutional network. In Com-panion Proceedings of The 2019 World Wide WebConference, WWW ’19, page 678–685, New York,NY, USA. Association for Computing Machinery.

Ilia D. Dichev and Vicki Wei Tang. 2006. Earningsvolatility and earnings predictability. SSRN Elec-tronic Journal.

F.X. Diebold and K. Yilmaz. 2014. On the networktopology of variance decompositions: Measuringthe connectedness of financial firms. Journal ofEconometrics, 182(1):119–134. Cited By 416.

Guangyu Ding and Liangxi Qin. 2019. Study on theprediction of stock price based on the associated net-work model of lstm. International Journal of Ma-chine Learning and Cybernetics.

Xiao Ding, Yue Zhang, Ting Liu, and Junwen Duan.2014. Using structured events to predict stock pricemovement: An empirical investigation. In Proceed-ings of the 2014 Conference on Empirical Methodsin Natural Language Processing (EMNLP), pages1415–1425.

Edwin J Elton, Martin J Gruber, Stephen J Brown, andWilliam N Goetzmann. 2009. Modern portfolio the-ory and investment analysis. John Wiley & Sons.

Claude B Erb, Campbell R Harvey, and Tadas EViskanta. 1994. Forecasting international equity cor-relations. Financial analysts journal, 50(6):32–45.

Fuli Feng, Huimin Chen, Xiangnan He, Ji Ding,Maosong Sun, and Tat-Seng Chua. 2019a. En-hancing stock movement prediction with adversar-ial training. In Proceedings of the Twenty-EighthInternational Joint Conference on Artificial Intel-ligence, IJCAI-19, pages 5843–5849. InternationalJoint Conferences on Artificial Intelligence Organi-zation.

Fuli Feng, Xiangnan He, Xiang Wang, Cheng Luo,Yiqun Liu, and Tat-Seng Chua. 2019b. Temporalrelational ranking for stock prediction. ACM Trans.Inf. Syst., 37(2).

Gabriel Pui Cheong Fung, Jeffrey Xu Yu, and Wai Lam.2002. News sensitive stock trend prediction. In Ad-vances in Knowledge Discovery and Data Mining,pages 481–493. Springer Berlin Heidelberg.

C. Lee Giles, Steve Lawrence, and Ah Chung Tsoi.2001. Noisy time series prediction using a recurrentneural network and grammatical inference. MachineLearning, 44(1/2):161–183.

https://doi.org/10.1111/jofi.12364

https://doi.org/10.2139/ssrn.946405


https://doi.org/10.1016/0165-4101(92)90018-w

https://doi.org/10.1016/0165-4101(92)90018-w

https://doi.org/10.1016/0165-4101(92)90018-w

https://doi.org/10.5121/ijdkp.2013.3106



https://doi.org/10.1086/260062

https://doi.org/10.1086/260062

https://doi.org/10.1016/j.jocs.2010.12.007

https://doi.org/10.1016/j.jocs.2010.12.007

https://doi.org/10.1016/0304-4076(86)90063-1

https://doi.org/10.1016/0304-4076(86)90063-1

https://doi.org/10.1145/3269206.3269269

https://doi.org/10.1145/3269206.3269269

https://doi.org/10.1145/3269206.3269269

https://doi.org/10.1145/3308560.3317701

https://doi.org/10.1145/3308560.3317701



https://doi.org/10.1016/j.jeconom.2014.04.012



https://doi.org/10.1007/s13042-019-01041-1

https://doi.org/10.1007/s13042-019-01041-1

https://doi.org/10.1007/s13042-019-01041-1

https://doi.org/10.24963/ijcai.2019/810



https://doi.org/10.1145/3309547

https://doi.org/10.1145/3309547

https://doi.org/10.1007/3-540-47887-6_48

https://doi.org/10.1023/a:1010884214864

https://doi.org/10.1023/a:1010884214864

8425

Justin Gilmer, Samuel S. Schoenholz, Patrick F. Ri-ley, Oriol Vinyals, and George E. Dahl. 2017. Neu-ral message passing for quantum chemistry. InProceedings of the 34th International Conferenceon Machine Learning - Volume 70, ICML’17, page1263–1272. JMLR.org.

Alexandre Hocquard, Sunny Ng, and Nicolas Papa-georgiou. 2013. A constant-volatility framework formanaging tail risk. The Journal of Portfolio Man-agement, 39(2):28–40.

Ehsan Hoseinzade, Saman Haratizadeh, and ArashKhoeini. 2019. U-cnnpred: A universal cnn-basedpredictor for stock markets.

Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, andTie-Yan Liu. 2018. Listening to chaotic whispers:A deep learning framework for news-oriented stocktrend prediction. In Proceedings of the eleventhACM international conference on web search anddata mining, pages 261–269.

Monique Jeanblanc, Marc Yor, and Marc Chesney.2009. Mathematical methods for financial markets.Springer Science & Business Media.

Weiwei Jiang. 2020. Applications of deep learning instock market prediction: recent progress.

Armand Joulin, Edouard Grave, Piotr Bojanowski, andTomas Mikolov. 2017. Bag of tricks for efficient textclassification. In Proceedings of the 15th Confer-ence of the European Chapter of the Association forComputational Linguistics: Volume 2, Short Papers,pages 427–431.

Wei Du Jue Liu, Zhuocheng Lu. 2019. Combining en-terprise knowledge graph and news sentiment analy-sis for stock price volatility prediction. Proceedingsof the 52nd Hawaii International Conference on Sys-tem Sciences.

Raehyun Kim, Chan Ho So, Minbyul Jeong, SanghoonLee, Jinkyu Kim, and Jaewoo Kang. 2019. Hats: Ahierarchical graph attention network for stock move-ment prediction. arXiv preprint arXiv:1908.07999.

Kazuhiro Kohara, Tsutomu Ishikawa, YoshimiFukuhara, and Yukihiro Nakamura. 1997. Stockprice prediction using prior knowledge and neu-ral networks. Intelligent Systems in Accounting,Finance & Management, 6(1):11–22.

Thomas Fischer Christopher Krauss. 2018. Deep learn-ing with long short-term memory networks for finan-cial market predictions. European Journal of Oper-ational Research.

Hao Li, Yanyan Shen, and Yanmin Zhu. 2018. Stockprice prediction using attention-based multi-inputlstm. In ACML, volume 95 of Proceedings of Ma-chine Learning Research, pages 454–469. PMLR.

Xiaodong Li, Xiaodi Huang, Xiaotie Deng, and Shan-feng Zhu. 2014. Enhancing quantitative intra-daystock return prediction by integrating both marketnews and stock prices information. Neurocomput-ing, 142:228–238.

Xiaodong Li, Haoran Xie, Ran Wang, Yi Cai, JingjingCao, Feng Wang, Huaqing Min, and Xiaotie Deng.2016. Empirical analysis: stock market predictionvia extreme learning machine. Neural Computingand Applications, 27(1):67–78.

Burton G. Malkiel. 1989. Efficient market hypothe-sis. In Finance, pages 127–134. Palgrave MacmillanUK.

Daiki Matsunaga, Toyotaro Suzumura, and ToshihiroTakahashi. 2019. Exploring graph neural networksfor stock market predictions with rolling windowanalysis. ArXiv, abs/1909.10660.

Tomas Mikolov, Kai Chen, Greg Corrado, and Jef-frey Dean. 2013. Efficient estimation of wordrepresentations in vector space. arXiv preprintarXiv:1301.3781.

Robert Neal and Simon M Wheatley. 1998. Do mea-sures of investor sentiment predict returns? Journalof Financial and Quantitative Analysis, 33(4):523–547.

Duc Huu Dat Nguyen, Loc Phuoc Tran, andVu Nguyen. 2019. Predicting stock prices usingdynamic LSTM models. In Communications inComputer and Information Science, pages 199–212.Springer International Publishing.

Thi-Thu Nguyen and Seokhoon Yoon. 2019. A novelapproach to short-term stock price movement pre-diction using transfer learning. Applied Sciences,9(22):4745.

Thien Hai Nguyen and Kiyoaki Shirai. 2015. Topicmodeling based sentiment analysis on social mediafor stock market prediction. In Proceedings of the53rd Annual Meeting of the Association for Compu-tational Linguistics and the 7th International JointConference on Natural Language Processing (Vol-ume 1: Long Papers), pages 1354–1364.

Mahla Nikou, Gholamreza Mansourfar, and JamshidBagherzadeh. 2019. Stock price prediction usingdeep learning algorithm and its comparison with ma-chine learning algorithms. Intelligent Systems in Ac-counting, Finance and Management, 26.

Jeffrey Pennington, Richard Socher, and ChristopherManning. 2014. GloVe: Global vectors for wordrepresentation. In Proceedings of the 2014 Confer-ence on Empirical Methods in Natural LanguageProcessing (EMNLP), pages 1532–1543, Doha,Qatar. Association for Computational Linguistics.

Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng,Guofei Jiang, and Garrison Cottrell. 2017. A

http://arxiv.org/abs/1911.12540


http://proceedings.mlr.press/v95/li18c.html



https://doi.org/10.1007/978-1-349-20213-3_13

https://doi.org/10.1007/978-1-349-20213-3_13

https://doi.org/10.1007/978-3-030-32475-9_15

https://doi.org/10.1007/978-3-030-32475-9_15

https://doi.org/10.3390/app9224745



https://doi.org/10.1002/isaf.1459



https://doi.org/10.3115/v1/D14-1162

https://doi.org/10.3115/v1/D14-1162

8426

dual-stage attention-based recurrent neural net-work for time series prediction. arXiv preprintarXiv:1704.02971.

Yu Qin and Yi Yang. 2019. What you say and howyou say it matters: Predicting financial risk usingverbal and vocal cues. In 57th Annual Meeting ofthe Association for Computational Linguistics (ACL2019), page 390.

Robert Rich and Joseph Tracy. 2004. Uncertainty andlabor contract durations. Review of Economics andStatistics, 86(1):270–287.

Hsinchun Chen Robert P. Schumaker. 2019. Textualanalysis of stock market prediction using breakingfinancial news: The azfin text system. ACM Trans-actions on Information Systems.

Ramit Sawhney, Shivam Agarwal, Arnav Wadhwa, andRajiv Ratn Shah. 2020a. Spatiotemporal hyper-graph convolution network for stock forecasting. In2020 IEEE International Conference on Data Min-ing (ICDM).

Ramit Sawhney, Puneet Mathur, Ayush Mangal, PiyushKhanna, Rajiv Shah, and Roger Zimmermann.2020b. Multimodal multi-task financial risk fore-casting. In Proceedings of the 28th ACM Interna-tional Conference on Multimedia, MM ’20, NewYork, NY, USA. Association for Computing Machin-ery.

Sreelekshmy Selvin, R Vinayakumar, EA Gopalakrish-nan, Vijay Krishna Menon, and KP Soman. 2017.Stock price prediction using lstm, rnn and cnn-sliding window model. In 2017 international con-ference on advances in computing, communicationsand informatics (icacci), pages 1643–1647. IEEE.

William F Sharpe. 1994. The sharpe ratio. Journal ofportfolio management, 21(1):49–58.

Matheus Jose Silva de Souza, DaniloGuimaraes Franco Ramos, Marina Garcia Pena,Vinicius Amorim Sobreiro, and Herbert Kimura.2018. Examination of the profitability of technicalanalysis based on moving average strategies in brics.Financial Innovation, 4(1):3.

Paul C. Tetlock. 2007. Giving content to investor sen-timent: The role of media in the stock market. Jour-nal of Finance.

Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N. Gomez, LukaszKaiser, and Illia Polosukhin. 2017. Attention is allyou need.

Petar Velickovic, Guillem Cucurull, Arantxa Casanova,Adriana Romero, Pietro Lio, and Yoshua Bengio.2017. Graph attention networks. arXiv preprintarXiv:1710.10903.

Babita Majhi Venkata Sasank Pagolu, Kamal NayanReddy ; Ganapati Panda. 2016. Sentiment analy-sis of twitter data for predicting stock market move-ments. SCOPES.

Yumo Xu and Shay B. Cohen. 2018. Stock move-ment prediction from tweets and historical prices. InProceedings of the 56th Annual Meeting of the As-sociation for Computational Linguistics (Volume 1:Long Papers), pages 1970–1979, Melbourne, Aus-tralia. Association for Computational Linguistics.

Linyi Yang, Tin Lok James Ng, Barry Smyth, and Ri-uhai Dong. 2020. Html: Hierarchical transformer-based multi-task learning for volatility prediction.In Proceedings of The Web Conference 2020, WWW’20, page 441–451, New York, NY, USA. Associa-tion for Computing Machinery.

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He,Alex Smola, and Eduard Hovy. 2016. Hierarchi-cal attention networks for document classification.In Proceedings of the 2016 conference of the NorthAmerican chapter of the association for computa-tional linguistics: human language technologies,pages 1480–1489.

Zhou Yu, Jun Yu, Jianping Fan, and Dacheng Tao.2017. Multi-modal factorized bilinear pooling withco-attention learning for visual question answering.In Proceedings of the IEEE international conferenceon computer vision, pages 1821–1830.

Z. Zhao, R. Rao, S. Tu, and J. Shi. 2017. Time-weighted lstm model with redefined labeling forstock trend prediction. In 2017 IEEE 29th Inter-national Conference on Tools with Artificial Intelli-gence (ICTAI), pages 1210–1217.

https://doi.org/10.1145/3394171.3413752

https://doi.org/10.1145/3394171.3413752



https://doi.org/10.1145/3366423.3380128

https://doi.org/10.1145/3366423.3380128

deep attentive learning for stock movement prediction from

Documents