sifting social data: word sense disambiguation using machine learning
DESCRIPTION
Slide prepared for "Social Media and the Defense Sector" http://www.smi-online.co.uk/defence/uk/social-media-within-the-military-and-defence-sectorTRANSCRIPT
![Page 1: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/1.jpg)
Sifting Social Data Word Sense Disambiguation Using Machine Learning
Dr. Stuart Shulman
Founder & CEO, Texifter
“…a wealth of information creates a poverty of attention.”- Herbert Simon, 1971
![Page 2: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/2.jpg)
Pronounced “tech-sifter” the metaphor is of a sifter
![Page 3: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/3.jpg)
Text Classification
A 2500 year-old problem
Plato argued it would be frustrating and it still is…
![Page 4: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/4.jpg)
Grimmer & Stewart “Text as Data” Political Analysis (2013)
Volume is a problem for scholarsCoders are expensive
Groups struggle to accurately label text at scaleValidation of both humans and machines is “essential”
Some models are easier to validate than othersAll models are wrong
Automated models enhance/amplify, but don’t replace humansThere is no one right way to do this
“Validate, validate, validate”“What should be avoided then, is the blind use of any method without a validation step.”
![Page 5: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/5.jpg)
Our free, open-source, web-based text analytics toolkit
![Page 6: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/6.jpg)
The original software kernel: tools for measurement
![Page 7: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/7.jpg)
A mission to avoid tennis elbow
Items load to the screen and the coder hits the keystroke
![Page 8: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/8.jpg)
Keystroke human coding: alone or in groups
Codes
Metadata Data
Human coding can be distributed to individuals, groups & crowds
![Page 9: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/9.jpg)
Computer science & NSF influences: measure everything
How fast?How reliable?
How accurate?
Stuart Shulman – Texifter
![Page 10: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/10.jpg)
Inter-rater reliability is one critical measurement
Stuart Shulman – Texifter
![Page 11: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/11.jpg)
![Page 12: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/12.jpg)
Plugged in to APIs & Government
![Page 13: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/13.jpg)
Import data directly via APIs or from your desktop
Stuart Shulman – Texifter
![Page 14: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/14.jpg)
Full historical Twitter access
Stuart Shulman – Texifter
![Page 15: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/15.jpg)
PowerTrack operators for more precise queries
Stuart Shulman – Texifter
![Page 16: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/16.jpg)
Store social data with survey responses and other data
Stuart Shulman – Texifter
![Page 17: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/17.jpg)
Private, 3rd party & free (rate limited) social data sources
Stuart Shulman – Texifter
![Page 18: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/18.jpg)
Unlimited “fire hose” premium data sources
Stuart Shulman – Texifter
![Page 19: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/19.jpg)
The Five Pillars of Text Analytics
SearchFiltering
De-duplication and ClusteringHuman Coding
Machine-LearningStuart Shulman – Texifter
![Page 20: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/20.jpg)
Pillar #1: Search
Stuart Shulman – Texifter
![Page 21: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/21.jpg)
Pillar #1: Defined multi-term search
Stuart Shulman – Texifter
![Page 22: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/22.jpg)
Pillar #2: Filters
Stuart Shulman – Texifter
![Page 23: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/23.jpg)
Pillar #2: Filters
Stuart Shulman – Texifter
![Page 24: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/24.jpg)
Pillar #3: Deduplication & clustering
Stuart Shulman – Texifter
![Page 25: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/25.jpg)
Pillar #3: Deduplication & clustering
Stuart Shulman – Texifter
![Page 26: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/26.jpg)
Pillar #4: Human coding (a.k.a. labeling or tagging)
Stuart Shulman – Texifter
![Page 27: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/27.jpg)
Pillar #4: Human coding
Stuart Shulman – Texifter
![Page 28: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/28.jpg)
Pillar #4: Human coding (adjudication)
Stuart Shulman – Texifter
![Page 29: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/29.jpg)
Pillar#5: Machine-learning
Stuart Shulman – Texifter
![Page 30: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/30.jpg)
Pillar#5: Machine-learning
Stuart Shulman – Texifter
![Page 31: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/31.jpg)
Our ActiveLearning engine and coding tools combine…
what humans do best… with what computers do best
Humans and machines learning togetherKeep humans “in-the-loop” for more accurate results and better insights
Stuart Shulman – Texifter
![Page 32: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/32.jpg)
Word sense disambiguation (relevance)
Stuart Shulman – Texifter
![Page 33: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/33.jpg)
Word sense disambiguation (relevance)
Stuart Shulman – Texifter
![Page 34: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/34.jpg)
Word sense disambiguation (relevance)
Stuart Shulman – Texifter
![Page 35: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/35.jpg)
Stuart Shulman – Texifter
![Page 36: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/36.jpg)
Human coding can be converted into machine classifiers
Accumulated human coding becomes training data via machine-learning
![Page 37: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/37.jpg)
Users can drill into interactive reporting displays
Use metadata to examine sub-sets of responses and create reports.
![Page 38: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/38.jpg)
Slicing big piles of text into smaller, more focused sets is key
Ultimately all text analytics are filtering techniques
![Page 39: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/39.jpg)
Crowdsourcing accelerates the insight generation process through machine-learning
Distributed for synchronous & asynchronous collaboration
![Page 40: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/40.jpg)
CoderRank (patent pending) for enhanced machine-learning is our key innovation
![Page 41: Sifting Social Data: Word Sense Disambiguation Using Machine Learning](https://reader033.vdocuments.us/reader033/viewer/2022052601/5594444a1a28ab06308b4797/html5/thumbnails/41.jpg)
For more information visit the Texifter table ordiscovertext.com
@discovertextThank-you for listening!
Stuart Shulman – Texifter