text analytics for unlocking the potential of big data
DESCRIPTION
Text Analytics for Unlocking the Potential of Big Data. 1. T ext analytics & big data. 2. New opportunities with text analytics. 3. Challenges when mining text. 4. Solutions to overcome challenges. 5. Wrap-up. Bhavani Raskutti @ Pacific Brands. - PowerPoint PPT PresentationTRANSCRIPT
1
Text Analytics for Unlocking the Potential of Big Data
Bhavani Raskutti @ Pacific Brands
5
1 Text analytics & big data
2 New opportunities with text analytics
3 Challenges when mining text
4 Solutions to overcome challenges
Wrap-up
2
Text Analytics for Unlocking the Potential of Big Data
Bhavani Raskutti @ Pacific Brands
5
1 Text analytics & big data
2 New opportunities with text analytics
3 Challenges when mining text
4 Solutions to overcome challenges
Wrap-up
3
Text Analytics & Big Data
Data used for Analytics Now Other Data Available
Customer Data• Demographics• Usage summary• Product Usage
Traditional customer feedback• Surveys• Customer complaints• Inbound emails
Transactional data• Usage records• Sales receipts• Outputs from sensors• Service assurance
Social media data• Facebook discussions• Twitter feeds• Blogs• Youtube videos
Product Data• Mix & usage
Access Device Data• GPS & locale data
… …
Linear growth Exponential growth
4
Text Analytics & Big Data
Data used for Analytics Now Other Data Available
Customer Data• Demographics• Usage summary• Product Usage
Traditional customer feedback• Surveys• Customer complaints• Inbound emails
Transactional data• Usage records• Sales receipts• Outputs from sensors• Service assurance
Social media data• Facebook discussions• Twitter feeds• Blogs• Youtube videos
Product Data• Mix & usage
Access Device Data• GPS & locale data
… …
Linear growth Exponential growth
5
Text Analytics & Big Data
Data used for Analytics Now Other Data Available
Customer Data• Demographics• Usage summary• Product Usage
Traditional customer feedback• Surveys• Customer complaints• Inbound emails
Transactional data• Usage records• Sales receipts• Outputs from sensors• Service assurance
Social media data• Facebook discussions• Twitter feeds• Blogs• Youtube videos
Product Data• Mix & usage
Access Device Data• GPS & locale data
… …
Linear growth Exponential growth
6
Text Analytics & Big Data
Data used for Analytics Now Other Data Available
Customer Data• Demographics• Usage summary• Product Usage
Traditional customer feedback• Surveys• Customer complaints• Inbound emails
Transactional data• Usage records• Sales receipts• Outputs from sensors• Service assurance
Social media data• Facebook discussions• Twitter feeds• Blogs• Youtube videos
Product Data• Mix & usage
Access Device Data• GPS & locale data
… …
Linear growth Exponential growth
7
Text Analytics & Big Data
Data used for Analytics Now Other Data Available
Customer Data• Demographics• Usage summary• Product Usage
Traditional customer feedback• Surveys• Customer complaints• Inbound emails
Transactional data• Usage records• Sales receipts• Outputs from sensors• Service assurance
Social media data• Facebook discussions• Twitter feeds• Blogs• Youtube videos
Product Data• Mix & usage
Access Device Data• GPS & locale data
… …
Linear growth Exponential growth
8
Text Analytics for Unlocking the Potential of Big Data
Bhavani Raskutti @ Pacific Brands
5
1 Text analytics & big data
2 New opportunities with text analytics
3 Challenges when mining text
4 Solutions to overcome challenges
Wrap-up
9
New Opportunities with Text Analytics
Mine freely available social media data for:• Understanding customer sentiment• Identifying major customer concerns• Tracking sentiment/issues over time
Business implications:• Ability to act on negative sentiments quickly• Respond to customer concerns in a timely manner• Target initiatives appropriately by continuous tracking
Superior market research & focus group outcomes
10
Sentiment AnalysisMethodology:• Score based on positive & negative sentiment words• OR Use supervised learning with labelled examples
New Opportunities
No sarcasm detection
11
Topic DetectionMethodology:1. Create term frequency matrix from text sequences
2. Use un-supervised learning to create clusters
3. Create cluster descriptions
New Opportunities
Concerns Examples of tweets in the cluster Change plan “@Telstra I don't normally do this but ridiculous service today. Can't change my plan instore.Also urgent service issue 1 full week no call.”
“@Telstra if i sign up on the $50 Every Day Connect BYO plan, can I upgrade to a similarly priced plan when the next iPhone is released?” “@Telstra Hey guys, need to change my plan. Due to constant drop outs in CBD I keep going over my cap. Can you help?”
Tech Support “On the phone to @telstra (bigpond) support and I really think I'm being punked. Tech keeps asking the same question. #frustrating” “@Telstra Thanks Greg. Tech support didn't really have an answer for me :\ V frustrating!”
Bigpond “@telstra what's going on with bigpond tonight? It's terribly slow” “@Telstra Hi. I'm trying to access my bigpond music account but keep getting directed to the mog trial page. Is this a glitch?” “@Telstra Hi! I have emails stuck in my outbox... getting an error code. Receiving ok. Are there any problems with Bigpond at the moment?”
Call centre “ Telstra sending its call centre offshore. Don't think i will renew my contract with them now. @telstra” “@Telstra Cannot connect to call centres now. Why close more?” “@Telstra what will Russ be able to do that your call centre can't, for a telecommunications company your customer service is pretty poor!”
Pay Bill “@Telstra - Paying a bill is a nightmare, you guys need to centralise things.. jeez, i have to login 3 times to pay one bill” “@Telstra Can't pay my bill if you didn't send me one. This is your mistake and not the first time it's happened.” “It is worth to pay the few extra dollars to stay away from @Vodafone_AU and go with @telstra. You get what you pay for.”
Job Cuts “Very sad day for the @Telstra staff losing their job, it's never good when people lose their job here at home” “Sending good vibes to the @Telstra folk impacted by the recent job cuts. DM me for details for #jobs @auspost” “So help me out can you please? You cut jobs move services off shore pay less in wages an somehow your plans are more expensive”
12
Text Analytics for Unlocking the Potential of Big Data
Bhavani Raskutti @ Pacific Brands
5
1 Text analytics & big data
2 New opportunities with text analytics
3 Challenges when mining text
4 Solutions to overcome challenges
Wrap-up
13
Challenges in Text Analytics
1. Creating term frequency matrix for machine learning– One row for each entry– One column for each term/feature describing the entries
3 a
bill
can
cap
cbd
centralise
change
constant
drop
due
going
guys
have
help
hey i
in
is
jeez
keep
my
need
nightmare
one
outs
over
pay
paying
plan
sign
things
times
to
you
1 1 3 2 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 1 3 12 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 2 1 0 0 1 1 0 0 1 0 0 0 1 1
Treat non-alpha as white spaceCase-insensitiveTerm = word
14
1. Term Frequency MatrixChallenges
3 a
bill
can
cap
cbd
centralise
change
constant
drop
due
going
guys
have
help
hey i
in
is
jeez
keep
my
need
nightmare
one
outs
over
pay
paying
plan
sign
things
times
to
you
1 1 3 2 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 1 3 12 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 2 1 0 0 1 1 0 0 1 0 0 0 1 1
• Presence of non-informative words
• Different forms of the same words
• Spelling error & typos
• Synonyms
• Homonyms
15
2. Very Large Feature Space Challenges
3 a
bill
can
cap
cbd
centralise
change
constant
drop
due
going
guys
have
help
hey i
in
is
jeez
keep
my
need
nightmare
one
outs
over
pay
paying
plan
sign
things
times
to
you
1 1 3 2 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 0 1 1 0 1 1 1 3 12 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 2 1 0 0 1 1 0 0 1 0 0 0 1 1
• Many different terms within a single entry – 104 features with just 50 to 100 entries– Sparse entries: Many zeros in the martrix
• Unsupervised learning– Hard to form cohesive clusters with sparse entries
• Supervised learning – Traditional statistical learning techniques need at least 10
labelled examples for each uncorrelated feature
16
Text Analytics for Unlocking the Potential of Big Data
Bhavani Raskutti @ Pacific Brands
5
1 Text analytics & big data
2 New opportunities with text analytics
3 Challenges when mining text
4 Solutions to overcome challenges
Wrap-up
17
1. Term Frequency MatrixSolutions
• Presence of non-informative words– Create a list of stopwords– Remove them from consideration
• Different forms of the same words– Use rule based stemming to remove suffix
• Spelling error & typos– Use some spell-checker OR– Use n-grams (character sequences) as features
• 5-grams for 'single bill': 'singl', 'ingle', 'ngle ', 'gle b', 'le bi', 'e bil‘, ' bill'
• Synonyms– Use a thesaurus (manual or statistical)
• Homonyms– Provide context by using word pair or triplets as features
18
2. Very Large Feature SpaceSolutions
• Use feature selection to identify significant features
• Features are of 3 types:– Very frequent low information content (e.g., stopwords)– Infrequent low information content (occurs once/twice in the set)– Significant middle frequency features
• Many statistical techniques– Inverse document frequency weight– signal-noise ratio– Average discrimination value– …
Unsupervised learningHard to form cohesive clusters with sparse entries
19
2. Very Large Feature Space (Cont’d)Solutions
• Use new techniques based on maximal margin separators that can handle large feature space
• Support Vector Machines
Supervised learning Traditional statistical learning techniques need at least 10 labelled examples for each uncorrelated feature
20
Support Vector MachinesSolutions
Customers whoChurned to otherproviders
Customers whoare loyal
Objective:To learn a separator to identify people likely to churn before they do
21
Support Vector MachinesSolutions
What is a good separator?
Maximises margin between two parallel supporting hyperplanes
Separator depends on support vectors
22
Support Vector MachinesSolutions
Why does maximising margins work? Small margin means
more choice & overfits data
Large margin meansless choice & no overfitting
23
2. Very Large Feature Space (Cont’d)Solutions
• Use new techniques based on maximal margin separators that can handle large feature space
• Support Vector Machines– Maximises margin between two classes– Separator depends only on support vectors– Separator obtained using quadratic programming
• Available in some statistical packages
Supervised learning Traditional statistical learning techniques need at least 10 labelled examples for each uncorrelated feature
24
Wrap-up
• Text analytics creates new opportunities for businesses to understand their customers– Understanding customer sentiment– Identifying major customer concerns– Tracking sentiment/issues over time
• A few challenges in implementing text analytics– Creating term frequency matrix from text sequence– Large number of features in matrix
• Many techniques to overcome these challenges
Now is the time to use text analytics to unlock the potential of big data in your business!!