the performance of sentiment features of md&as for...
TRANSCRIPT
The performance of sentiment features of MD&As for financial misstatement prediction: A comparison of deep
learning and “bag-of-words” approach
By Ting(Sophia) Sun, Yue Liu, and Miklos A. Vasarhelyi
Deep learning
Deep learning mimics how a human brain thinks. It makes a machine think like human.
“The general idea of deep learning is to use neural networks to build multiple layers of abstraction to solve a complex semantic problem.”
-- Aaron Chavez, formerly chief scientist at Alchemy API
Biological Neurons
Axon
Terminal Branches
of AxonDendrites
soma
Electrical impulse
Deep neural network
pixels
edges
object parts
(combination
of edges)
object models
Research questions
(1) Do sentiment features add information for financial misreporting prediction?
(2) If they do, are they effective only for fraud prediction or for misstatement including both fraud and error?
(3) How effective the model using deep learning based sentiment features is as compared to the model using sentiment feature obtained by bag of words approach?
Sentiment analysis approachesDeep learning approach Bag of words approach
Description of the technique Emerging technique employing deep
hierarchical neural network and trained
with a large amount of text files
Prevalent technique using various word lists
(dictionary), with each one representing a
particular sentiment feature
Rationale “understand” the meaning of a text file count the frequency of the words originated
from a specific dictionary
Output sentiment feature Sentiment scores sentiment scores (positive score-negative
score)
Is there prior literature in
accounting and auditing
domain
No Yes
Tool Alchemy language API Loughran and McDonald (2011a)
Is it a finance-specific tool No Yes
Required text document HTML/text document and webpage HTML/text document
Does it need data
preprocessing
No Yes
Sentiment features and misstatements• We analyzed 31466 MD&As of 10-K filings for fiscal years from 2006 to
2015 using deep learning and “bag of words” approach separately.
• With deep learning approach, we obtained Sentiment_DL and Joy
• With bag of words approach, we obtained Sentiment_TM
•Misstatement samples:
• restatements caused by financial misreporting for the fiscal years in our MD&A sample.
• Misstatement=1 if there is a restatement as disclosed by audit analytics and 0 otherwise
• 321 out of 31466 observations are identified as misstatement (severe data imbalance issue)
Classification models: CHAID (CHI-square Adjusted Interaction Detection) algorithm
Results: Top 10 important predictors
Prediction results for testing data
•Answers to RQs:
(1) Do sentiment features add information for financial misreporting prediction?
•Yes
•(2) If they do, are they effective only for fraud prediction or for misstatement including both fraud and error?
•Fraud prediction
•(3) How effective the model using deep learning based sentiment features is as compared to the model using sentiment feature obtained by bag of words approach?
•Improvement of effectiveness in terms of Accuracy, AUC, false positive rates,
•Conclusions:
• Considering its effectiveness and efficiency, Deep Learning based textual analysis is a promising technique for audit analytics
• Its predictive performance is expected to be improved if a finance-specific deep learning model is developed.
•Future work:
• Increase the sample of financial misstatement (currently all sample comes from audit analytics database), use AAER (Accounting and Auditing Enforcement Releases)
• Decrease false positives
• Increase overall accuracy
• Use deep learning as the main classification model
Thank you