sunz2013 annelies tjetjep
Post on 19-Oct-2014
477 views
DESCRIPTION
TRANSCRIPT
ANALYTICS TO COMBAT GROWTH IN
UNSTRUCTURED TEXT DATA
ANNELIES TJETJEP
BUSINESS SOLUTION MANAGER, ANALYTICS
21ST FEBRUARY 2013
TEXT ANALYTICS EXPLORATION, CATEGORISATION,
SENTIMENT ANALYSIS & INSIGHT
• Analytics in a World of Big Data
• What is Text Analytics?
• The SAS® Text Analytics Suite
• Text Mining in Action
• SAS® Social Media Analytics
• Questions?
Where is the cat?
ANALYTICS IN A
WORLD OF BIG DATA BREAKDOWN OF DATA USAGE
Source: Economist Intelligence Unit 2011 Report, Sponsored by SAS, 2011
We put nearly all of the data that is of
real value to good use
We probably leverage about half of our
valuable data
Vast quantities of useful data go
untapped
22%
53%
24%
ANALYTICS IN A
WORLD OF BIG DATA BREAKDOWN OF DATA COLLECTION & ANALYSIS
Based on 450 responses from 109 respondents who report practicing Big Data analytics; 4.1 responses per respondent on average .
Source: TDWI Big Data Analytics Report, 4 th Quarter 2011, Philip Russom
Structured data ( tables, records )
Semi-structured data ( XML and similar standards )
Complex data ( hierarchical or legacy sources )
Event data ( messages, usually in real time )
Unstructured data ( human language, audio, video )
Web logs and click streams
Social media data ( blogs, tweets, social networks )
Other
Spatial data ( long / lat coordinates, GPS output )
Machine-generated data ( sensors, RFID, devices )
Scientific data ( astronomy, genomes, physics )
WHAT IS TEXT
ANALYTICS? HONG KONG EFFICIENCY UNIT
The 1823 Call Centre of the Hong Kong government's Efficiency Unit acts as a
single point of contact for handling public inquiries and complaints on
behalf of many government departments.
1823 operates round-the-clock, including Sundays and public holidays. Each
year, it answers about 2.65 million calls and 98,000 e-mails, including inquiries,
suggestions and complaints.
"By decoding the 'messages' through statistical and
root-cause analyses of complaints data, the
government can better understand the voice of the
people, and help government departments improve
service delivery, make informed decisions and
develop smart strategies. This in turn helps boost
public satisfaction with the government, and build a
quality city.”
- Efficiency Unit’s Assistant Director, W. F. Yuk
1823
HONG KONG
EFFICIENCY UNIT
PU
BL
IC
Develop a Compliant
Intelligence System that
uncovers the trends,
patterns and relationships
inherent in the complaints
BUSINESS ISSUE RESULTS
Hong Kong ICT Awards 2009
Grand Award Best Public Service Application
(Transformations)
“The news hits so fast that you have to be changing
things very quickly. You have to be aware of what
you're writing about and the content that you're
tagging it to. If an indexing mistake happens, you
have to change it very quickly because reputations
are at stake.”
- Keith DeWeese, Director of Information Semantics
Management
• Better ad targeting and increased ad revenue
American news organization, reaching
more than 80% of US households TRIBUNE
COMPANY
ME
DIA
AN
D
PU
BL
ISH
ING
Needed to quickly and
accurately define and
categorize online content
relevant to readership
BUSINESS ISSUE RESULTS
THE SAS® TEXT
ANALYTICS SUITE BOTH BRAINS OF THE EQUATION
Natural Language Processing
Taxonomic classification
Entity and concept extraction
Sentiment identification
Contextual and pattern recognition
Linguistically-based classification models
Statistical Analysis
Singular value decomposition
Flat and hierarchical clustering
Word relationship strength profiling
Dominant word pairs identification
Algorithmically-based predictive models
Content
Categorization Text Mining
Sentiment
Analysis
THE SAS® TEXT
ANALYTICS SUITE
EXPLORING &
DISCOVERING
INSIGHTS
SAS® TEXT MINER
1. Input text messages – e.g. twitter data, reports,
email, news, forum messages
3. Discover Topics – cluster documents of similar content
and describe them with important key words
2. Parse & explore Text Data –break down text and explore relationships
of key concepts such as persons, places, organizations…
DISCOVERING
PATTERNS FOR
MODELLING
SAS® TEXT MINER
2. Parse Text Data and Discover Topics – Break down text into
structured data, group messages of similar content
3. Predictive Modeling with text data – text data input into models may provide
reliable info to predict outcome & behavior
Predict customers that are likely to accept the offer…
1. Input text messages – e.g. twitter data, reports,
email, news, forum messages
Customer
data
TAXONOMIES
Hotel Brand
Service Check-in, Check Out, Staff, Concierge, etc
Accommodations Bed, shower, TV, room art, lighting, technology,
etc.
Amenities Fitness, pools, spa, parking, etc.
Food and Bev Pool bar, restaurant, room service, etc.
Experience Nightlife, ambience, relaxation, romantic,
etc.
Gaming Slots, tables, tournaments, etc.
Website Navigation, ease of reservations, etc.
Categories and sub-categories
Related Terms, Phrases, linguistic logic
CONTENT
CATEGORISATION
SAS® ENTERPRISE CONTENT
CATEGORIZATION
Topic = Organized Crime
Categorization Taxonomy
1. Input text content – e.g. twitter data, reports, email,
news, forum messages
2. Parse content through categorization taxonomy – match and score messages/
documents to relevant categories
3. Output Results – e.g. each message/ document is now
associated with detailed category/ subcategories
Results are indexed or fed into existing systems
for search & analysis
CONCEPT
EXTRACTION
SAS® ENTERPRISE CONTENT
CATEGORIZATION
Concept Taxonomy
1. Input text content – e.g. twitter data, reports, email,
news, forum messages
2. Parse content through concepts taxonomy – match
messages/ documents to extract concepts
3. Output Results – e.g. each message/ document is now
associated with a list of extracted concepts
Results are indexed or fed into existing systems
for search & analysis
Concepts • Locations – kitchen…
• Persons – John…
• Dates – Monday…
• Weapons – knife…
SENTIMENT
EXTRACTION SAS® SENTIMENT ANALYSIS
Sentiment Taxonomy
2. Parse messages through Sentiment taxonomy –
match and score messages, and their details, for
sentiment polarity (e.g.
message is 80% positive)
3. Output Results – e.g. each message/ document and characteristics within the
document are now associated with a sentiment polarity score
This is negative
This is positive
This is negative
This is positive
This is positive
This is negative
Results are indexed or fed into existing systems
for search & analysis
4. Sentiments Reports – Results are easily analyzed against time period and/or
product features, drillable to see exact message
1. Input text content – e.g. twitter data, reports, email,
news, forum messages
Taxonomies
WHOLE BRAIN
PROCESS
INTERACTIONS
A CALL CENTRE EXAMPLE
Initial taxonomy Exploration of linkages
Topic categorisation
Predictive Modelling
Caller1234:
i called them with a little issue that i
had on my car repair, and the original
representative blind transferred me
over to the second representative that
i spoke to, so when i got to the second
rep (John?), he had no idea who i am,
what my account was, what were the
reasons that i was calling.
i had to re-explain myself completely.
Concepts:
Call reason: car repair
Unhappy reasons: blind
transfer; re-explain
Other related staff: John
Classification
Sentiment
TEXT MINING IN ACTION
TEXT MINING IN SAS® ENTERPRISE MINER
TEXT MINING PARSING
TEXT MINING SYNONYMS & CONCEPT LINKING
TEXT MINING SYNONYMS & CONCEPT LINKING
TEXT MINING SYNONYMS & CONCEPT LINKING
TEXT MINING CLUSTERING
TEXT MINING CLUSTERING
TEXT MINING PREDICTIVE MODELLING
SOCIAL MEDIA ANALYTICS
A QUICK LOOK
Social Media is
everywhere – it’s not
just Facebook and
Twitter.
• Your customers are there
talking about your brand.
• What are customers saying
about you and what impact
could that have on your
business?
Sources: The Conversation: Brian Solis and Jess3
POWER SHIFT THE EMPOWERED CONSUMER
COMPANIES CONSUMERS
SOLUTION
FRAMEWORK
Data Mining
Correlation & Forecasting
Text Mining Natural Language Processing
Taxonomies Influence
& Engagement
Customizable Sentiment
Analysis
Text Clusters & Segments
Collect
Clean
Integrate
Organize
Sample Online Sources
Classify & Segment
Mine & Forecast
Web Crawling Data Stores
Blog Data
Web Data
Online Reviews
Media Data
Call logs
Survey Data
Listen, Engage, & Leverage
SAS Media Portal
SAS Conversation
Center
SAS Media
Workbench
iPad & Android
apps
TEXT ANALYTICS • ANALYTICS TO COMBAT GROWTH IN
UNSTRUCTURED TEXT DATA
• Data is “BIG” and growing
• Most data is in unstructured or semi-structured format
• Need for smarter ways of mining data: automation & analytics
• Need for whole-brained analysis of textual information
• SAS provides an end-to-end text analytics suite
• Power is now in the hands of the consumer
QUESTIONS?
THANK YOU