fun with text - managing text analytics
TRANSCRIPT
![Page 1: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/1.jpg)
Cohan Sujay CarlosCEO, Aiaioo Labs
Fun with TextManaging Text
Analytics
![Page 2: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/2.jpg)
What I am going to talk about.
Text Analytics1. Examine 3 kinds of opportunities2. Discuss 3 text analytics problems3. Touch upon 3 things to watch out
for and 3 things to embrace.
![Page 3: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/3.jpg)
What if we can master “text”?What do we get from it?
There are opportunities in every vertical:
1. Aerospace / Defense / Automotive –-- Filing of various routine documents / Technical specification standardization / Competitive intelligence and customer feedback management
![Page 4: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/4.jpg)
What if we can master “text”?What do we get from it?
There are opportunities in every vertical:
1. Aerospace / Defense / Automotive –-- Filing of various routine documents / Technical specification standardization / Competitive intelligence and customer feedback management
2. Healthcare / Life sciences –-- Reporting / Storing relevant patents and publications / Analysis of research and competitive intelligence
![Page 5: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/5.jpg)
What if we can master “text”?What do we get from it?
There are opportunities in every vertical:
1. Aerospace / Defense / Automotive –-- Filing of various routine documents / Technical specification standardization / Competitive intelligence and customer feedback management
2. Healthcare / Life sciences –-- Reporting / Storing relevant patents and publications / Analysis of research and competitive intelligence
3. Legal and Government –-- Legal and administrative filings / Case document and administrative record management / Analysis of legal and administrative documents (land records, case files)
![Page 6: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/6.jpg)
What if we can master “text”?What do we get from it?
Do you observe a pattern?
In every vertical …
Output Text / Store and Transform Text / Ingest and Analyze Text
![Page 7: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/7.jpg)
How do we unlockthe value in “text”?
Output Text / Store and Transform Text / Ingest and Analyze Text
Natural Language Generation Natural Language Understanding
Natural Language Processing (aka Text Analytics)
![Page 8: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/8.jpg)
Use Case 1:Customer Service
Let’s say you have some text … … and a database or spreadsheet with columns
“John Chambers of Springfield, MAreported a problem with the clutchon his Ford Ranger purchased inBoston, MA in 2005.”
… and you have to fill in the database fieldsfrom the information in the text …
Reporter Location (of Reporter)
Product
![Page 9: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/9.jpg)
Use Case 1:Land Records
Let’s say you have some text … … and a database or spreadsheet with columns
“Property K45L234(lot 23-24) in Wake Countyof 3000 sq ftwas sold to James Fischeron 3-30-1997 …”
… and you have to fill in the database fieldsfrom the information in the text …
![Page 10: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/10.jpg)
Use Case 1:Land Records
Let’s say you have some text … … and a database or spreadsheet with columns
“Property K45L234(lot 23-24) in Wake Countyof 3000 sq ftwas sold to James Fischeron 3-30-1997 …”
… and you have to fill in the database fieldsfrom the information in the text …
Title Number Lot County
![Page 11: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/11.jpg)
Use Case 1:M&A Transactions
Let’s say you have some text … … and a database or spreadsheet with columns
“Acme Financials, a subsidiaryof Lehman Sisters, was acquiredby John Doe Corp on 5/26/2001.”
… and you have to fill in the database fieldsfrom the information in the text …
![Page 12: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/12.jpg)
Use Case 1:M&A Transactions
Let’s say you have some text … … and a database or spreadsheet with columns
“Acme Financials, a subsidiaryof Lehman Sisters, was acquiredby John Doe Corp on 5/26/2001.”
… and you have to fill in the database fieldsfrom the information in the text …
Acquirer Acquired Date
![Page 13: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/13.jpg)
Use Case 1: Customer Service[ Information Extraction ]
Let’s say you have some text … … and a database or spreadsheet with columns
“John Chambers of Springfield, MAreported a problem with the clutchon his Ford Ranger purchased inBoston, MA in 2005.”
Entities are pieces of text that could go into the fields in the database.
Identifying entities and the relations between them
Reporter Location (of Reporter)
Product
![Page 14: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/14.jpg)
Use Case 1: Customer Service[ Information Extraction ]
Let’s say you have some text … … and a database or spreadsheet with columns
“John Chambers of Springfield, MAreported a problem with the clutchon his Ford Ranger purchased inBoston, MA in 2005.”
Entities are pieces of text that could go into the fields in the database.
Identifying entities and the relations between them
Reporter Location Product
John Chambers
Springfield, MA
Ford Ranger
![Page 15: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/15.jpg)
Use Case 1: Customer Service[ Information Extraction ]
Relations tell you about the connections between entities.
“John Chambers of Springfield, MAreported a problem with the clutchon his Ford Ranger purchased inBoston, MA in 2005.”
Entities are pieces of text that could go into the fields in the database.
Relations connect the entities that belong in a row.
Identifying entities and the relations between them
Reporter Location Product
John Chambers
Springfield, MA
Ford Ranger
Location of Reporter
![Page 16: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/16.jpg)
Use Case 1: Customer Service[ Information Extraction ]
“John Chambers of Springfield, MAreported a problem with the clutchon his Ford Ranger purchased inBoston, MA in 2005.”
Information extraction converts:unstructured information into structured information.
Identifying entities and the relations between them
Reporter Location Product
John Chambers
Springfield, MA
Ford Ranger
![Page 17: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/17.jpg)
Use Case 1: Customer Service[ Information Extraction ]
“John Chambers of Springfield, MAreported a problem with the clutchon his Ford Ranger purchased inBoston, MA in 2005.”
Information extraction can improve efficienciesin processes where humans read text and copy fields into
databases.
Identifying entities and the relations between them
Reporter Location Product
John Chambers
Springfield, MA
Ford Ranger
![Page 18: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/18.jpg)
Use Case 1: Customer Service[ Information Extraction ]
How can text analytics methods be usedto automate entity and relation extraction?
Rule based methods Machine learning methods
Aiaioo Labs aiaioo.com
![Page 19: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/19.jpg)
Use Case 1: Customer Service[ Information Extraction ]
Rule-based frameworks for entity and relation extraction?
http://services.gate.ac.uk/annie/
![Page 20: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/20.jpg)
Use Case 1: Customer Service[ Information Extraction ]
![Page 21: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/21.jpg)
Use Case 1: Customer Service[ Information Extraction ]
It uses lists of first names and last names of persons, and names of places … and matches them in the text …
How does GATE/Annie identify entities and the relations?
“John Chambers of Springfield, MA reported a problem with the clutchon his Ford Ranger purchased in Boston, MA in 2005.”
“Jack”“Jill”“John”
“Chambers”“Miller”“Farnsworth”
“Springfield”“Boston”“Cambridge”
“MA”“CA”“MD”
![Page 22: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/22.jpg)
Use Case 1: Customer Service[ Information Extraction ]
Machine learning frameworks for entity and relation extraction?
https://opennlp.apache.org/
Apache OpenNLP
![Page 23: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/23.jpg)
Use Case 1: Customer Service[ Information Extraction ]
Machine learning frameworks need training data.
https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html
![Page 24: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/24.jpg)
Use Case 1: Customer Service[ Information Extraction ]
From examples such as:
It learns to recognize:
How does OpenNLP identify entities and the relations?
“John Chambers of Springfield, MA reported a problem with the clutchon his Ford Ranger purchased in Boston, MA in 2005.”
“<START:reporter>John Archer<END> of <START:location>Maryland<END> reported a problem with his <START:product>Figo<END>.”“<START:reporter>Vince Chambers<END> of <START:location>Denver, CO<END> had trouble with his <START:product>Focus<END>.”
![Page 25: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/25.jpg)
Use Case 1: Customer Service[ Information Extraction ]
How to choose between text analytics methods for entity and relation extraction?
Rule based methods Machine learning methods
3 months to reasonably performing modelTypically higher precisionTypically less flexibilityTypically less recall
1+ years to reasonably performing modelTypically lower precisionTypically more flexibilityTypically higher recall + overall performance
![Page 26: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/26.jpg)
5’11”5’ 8”
Can you classify these door heights as: Short / Tall ?
5’8”5’11” 6’2”
6’6”5’ 2”
6’8”
6’9”
6’10”
Aiaioo Labs aiaioo.com
![Page 27: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/27.jpg)
5’11”5’ 8”
In analytics, an analyst comes upwith a rule.
5’8”5’11” 6’2”
6’6”5’ 2”
6’8”
6’9”
6’10”If door_height < 6’ then Short else Tall
Aiaioo Labs aiaioo.com
![Page 28: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/28.jpg)
5’11”5’ 8”
In machine learning, the computer comes up with a rule from examples.
5’8”5’11” 6’2”
6’6”5’ 2”
6’8”
6’9”
6’10”
Aiaioo Labs aiaioo.com
![Page 29: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/29.jpg)
How do we unlockthe value in “text”?
The first use case …
Output Text / Store and Transform Text / Ingest and Analyze Text
Information ExtractionIdentifying entities and the relations between them
Aiaioo Labs aiaioo.com
![Page 30: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/30.jpg)
How do we unlockthe value in “text”?
The second use case …
Output Text / Store and Transform Text / Ingest and Analyze Text
Text CategorizationLabeling text with one or more category labels
Aiaioo Labs aiaioo.com
![Page 31: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/31.jpg)
Use Case 2:Organizing Text for Storage
Let’s say you have some text … … and you want to mark it as one of …
“John Chambers of Springfield, MAreported a problem with the clutchon his Ford Ranger purchased inBoston, MA in 2005.”
ReportInquiry
Aiaioo Labs aiaioo.com
![Page 32: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/32.jpg)
Use Case 2: Organizing Text[ Text Categorization ]
Start by collecting some samples of documents of each of your categories
Report InquiryI have a problem
This complaint is about
Where can I buy a
Do you sell furniture
Aiaioo Labs aiaioo.com
![Page 33: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/33.jpg)
Use Case 2: Organizing Text[ Text Categorization ]
Train a classifier with them.
Aiaioo Labs aiaioo.com
Report InquiryI have a problem
This complaint is about
Where can I buy a
Do you sell furniture
![Page 34: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/34.jpg)
Use Case 2: Organizing Text[ Text Categorization ]
Start by collecting some samples of documents of each of your categories
Politics SportsThe United Nations
The United States and
Manchester United
Manchester and Barca
Aiaioo Labs aiaioo.com
![Page 35: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/35.jpg)
Use Case 2: Organizing Text[ Text Categorization ]
Train a classifier with them.
Politics SportsThe United Nations
The United States and
Manchester United
Manchester and Barca
Aiaioo Labs aiaioo.com
![Page 36: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/36.jpg)
Use Case 2: Organizing Text[ Text Categorization ]
Run the classifier on a new piece of text.
The classifier will return a label.
Politics
Nations and States
Aiaioo Labs aiaioo.com
![Page 37: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/37.jpg)
Use Case 2: Organizing Text[ Text Categorization ]
How can text analytics methods be usedto automate organization/categorization?
Rule based methods Machine learning methods
Aiaioo Labs aiaioo.com
![Page 38: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/38.jpg)
Use Case 2: Organizing Text[ Text Categorization ]
But rule-based methods work for classification too.
Rule-based text categorization is often used in:Social media sentiment classification
Aiaioo Labs aiaioo.com
![Page 39: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/39.jpg)
Use Case 2: Organizing Text[ Text Categorization ]
We use lists of negative and positive words (usually adjectives)
(available in the AFINN gazetteer) … and match them in the text …
How do we use rules to identify sentiment?
“I am sad that Steve Jobs died.”
“sad”“bad”“evil”
“distraught”“dead”“died”
“thrilled”“excited”“amazed”
“happy”“love”“joy”
Aiaioo Labs aiaioo.com
![Page 40: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/40.jpg)
Use Case 2: Organizing Text[ Text Categorization ]
Can we use entity and relation extraction to do better?
“I am sad that [Steve Jobs died].”
Analysis: This person holds a positive opinionof Steve Jobs
The –ve entity ‘sad’ is related to the –ve event ‘Steve Jobs died’.
Aiaioo Labs aiaioo.com
![Page 41: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/41.jpg)
Use Case 2: Organizing Text[ Text Categorization ]
How to choose between text analytics methods for text categorization?
Rule based methods Machine learning methods
Typically higher precisionTypically less flexibilityTypically less recall
Typically lower precisionTypically more flexibilityTypically higher recall + overall performance
Aiaioo Labs aiaioo.com
![Page 42: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/42.jpg)
How do we unlockthe value in “text”?
The first use case …
Output Text / Store and Transform Text / Ingest and Analyze Text
Information ExtractionIdentifying entities and the relations between them
Aiaioo Labs aiaioo.com
![Page 43: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/43.jpg)
How do we unlockthe value in “text”?
The second use case …
Output Text / Store and Transform Text / Ingest and Analyze Text
Text CategorizationLabeling text with one or more category labels
Aiaioo Labs aiaioo.com
![Page 44: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/44.jpg)
How do we unlockthe value in “text”?
The third use case …
Output Text / Store and Transform Text / Ingest and Analyze Text
Question AnsweringGenerating a response to an inquiry
Aiaioo Labs aiaioo.com
![Page 45: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/45.jpg)
Use Case 3:Answering Questions
Let’s say you get a question … … and you want to answer to be one of …
“Do you ship your cars to Boston, MA?” YesNo
Aiaioo Labs aiaioo.com
![Page 46: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/46.jpg)
Use Case 3:Answering Questions
First you classify the question into one of 3 types… and these are…
“Do you ship your cars to Boston, MA?”
“Who is the CEO of Apple?”
“Why is the sky blue?”
Yes/No questionsFactoid questions
Non-factoid questions
Aiaioo Labs aiaioo.com
![Page 47: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/47.jpg)
Use Case 3:Answering Questions
Look for answers in databases that you created using entity / relationship extraction
“Do you ship your cars to Boston, MA?”
“Who is the CEO of Apple?”
“Why is the sky blue?”
Product Ships To
Cars USA
CEO Firm
Tim Cook Apple
Aiaioo Labs aiaioo.com
![Page 48: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/48.jpg)
To watch out for:
Text Analytics Traps1. Testing on Training Data2. Using US Training Data for India3. Treating all Data Sources as One
Aiaioo Labs aiaioo.com
![Page 49: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/49.jpg)
To embrace:
Text Analytics Tricks1. UI Compensation for AI Inaccuracy2. Raising Precision at the Cost of
Recall3. Domain Specific Rules
Aiaioo Labs aiaioo.com
![Page 50: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/50.jpg)
About Aiaioo Labs
AI Research Lab1. http://aiaioo.com2. http://aiaioo.com/publications3. http://aiaioo.wordpress.com
Aiaioo Labs aiaioo.com
![Page 51: Fun with Text - Managing Text Analytics](https://reader031.vdocuments.us/reader031/viewer/2022012914/58f0eb9f1a28abb0198b4613/html5/thumbnails/51.jpg)
THANK YOU
Aiaioo Labs aiaioo.com