using matlab for sentiment analysis and text analytics€¦ · doc2 1 1 0 1 • word docs •...

11
1 © 2017 The MathWorks, Inc. Using MATLAB for Sentiment Analysis and Text Analytics Seth DeLand

Upload: others

Post on 23-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using MATLAB for Sentiment Analysis and Text Analytics€¦ · doc2 1 1 0 1 • Word Docs • PDF’s • Text Files • Stop Words • Stemming • Tokenization • Bag of Words

1© 2017 The MathWorks, Inc.

Using MATLAB for Sentiment Analysis and Text Analytics

Seth DeLand

Page 2: Using MATLAB for Sentiment Analysis and Text Analytics€¦ · doc2 1 1 0 1 • Word Docs • PDF’s • Text Files • Stop Words • Stemming • Tokenization • Bag of Words

2

Sentiment Analysis

Understand sentiment of documents, emails, social media

Applications§ Predict prices, inform trading strategies§ Risk analytics§ Economics research§ Litigation

Page 3: Using MATLAB for Sentiment Analysis and Text Analytics€¦ · doc2 1 1 0 1 • Word Docs • PDF’s • Text Files • Stop Words • Stemming • Tokenization • Bag of Words

3

Text Analytics Toolbox

Media reported two trees blown down along I-40 in the Old Fort area.

Develop Predictive ModelsAccess and Explore Data Preprocess Data

Clean-up Text Convert to Numeric

media report two tree blown down i40 old fort area

cat dog run two

doc1 1 0 1 0

doc2 1 1 0 1

• Word Docs• PDF’s• Text Files

• Stop Words• Stemming• Tokenization

• Bag of Words• TF-IDF• Word Embeddings

• Latent DirichletAllocation

• Latent Semantic Analysis

Page 4: Using MATLAB for Sentiment Analysis and Text Analytics€¦ · doc2 1 1 0 1 • Word Docs • PDF’s • Text Files • Stop Words • Stemming • Tokenization • Bag of Words

4

StringsThe better way to work with text

§ Manipulate, compare, and store text data efficiently>> "image" + (1:3) + ".png"

1×3 string array

"image1.png" "image2.png" "image3.png"

§ Simplified text manipulation functions– Example: Check if a string is contained within another string

§ Previously: if ~isempty(strfind(textdata,"Dog"))

§ Now: if contains(textdata,"Dog")

§ Performance improvement– Up to 50x faster using contains with string than strfind with cellstr– Up to 2x memory savings using string over cellstr

Page 5: Using MATLAB for Sentiment Analysis and Text Analytics€¦ · doc2 1 1 0 1 • Word Docs • PDF’s • Text Files • Stop Words • Stemming • Tokenization • Bag of Words

5

Demo: Sentiment Analysis of Tweets

Goal:§ Analyze the sentiment of tweets for several

tech companies and compare the sentiment scores to stock prices

Approach§ Access data from Twitter§ Preprocess to clean-up text and deal with

domain-specific terms§ Predict sentiment from word embedding § Compare sentiment scores to prices

Page 6: Using MATLAB for Sentiment Analysis and Text Analytics€¦ · doc2 1 1 0 1 • Word Docs • PDF’s • Text Files • Stop Words • Stemming • Tokenization • Bag of Words

6

Additional Text Analytics Capabilities

§ Extract text from pdf’s and word documents

Page 7: Using MATLAB for Sentiment Analysis and Text Analytics€¦ · doc2 1 1 0 1 • Word Docs • PDF’s • Text Files • Stop Words • Stemming • Tokenization • Bag of Words

7

Bag of Words Models and TF-IDF

Page 8: Using MATLAB for Sentiment Analysis and Text Analytics€¦ · doc2 1 1 0 1 • Word Docs • PDF’s • Text Files • Stop Words • Stemming • Tokenization • Bag of Words

8

LSA and LDA

§ Unsupervised machine learning techniques§ Well-suited to wide feature matrices§ Both can be used for dimensionality reduction§ LDA is also useful for topic modeling

Page 9: Using MATLAB for Sentiment Analysis and Text Analytics€¦ · doc2 1 1 0 1 • Word Docs • PDF’s • Text Files • Stop Words • Stemming • Tokenization • Bag of Words

9

Learn More

https://www.mathworks.com/help/textanalytics/examples.html

Page 10: Using MATLAB for Sentiment Analysis and Text Analytics€¦ · doc2 1 1 0 1 • Word Docs • PDF’s • Text Files • Stop Words • Stemming • Tokenization • Bag of Words

10

Learn More

https://blogs.mathworks.com/loren/2017/09/21/math-with-words-word-embeddings-with-matlab-and-text-analytics-toolbox/

Math with Words – Word Embeddings with MATLAB and Text Analytics Toolbox

Page 11: Using MATLAB for Sentiment Analysis and Text Analytics€¦ · doc2 1 1 0 1 • Word Docs • PDF’s • Text Files • Stop Words • Stemming • Tokenization • Bag of Words

11© 2017 The MathWorks, Inc.

Questions?