automated text coding: humans and machines learning...
TRANSCRIPT
![Page 1: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/1.jpg)
Automated Text Coding:Humans and Machines Learning Together
Dr. Stuart W. ShulmanFounder & CEO, Texifter
@stuartwshulman
“…a wealth of information creates a poverty of attention.”- Herbert Simon, 1971
![Page 2: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/2.jpg)
Text Classification
A 2500 year-old problem
Plato argued it would be frustrating. It still is.
![Page 3: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/3.jpg)
![Page 4: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/4.jpg)
![Page 5: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/5.jpg)
Fall 1999
![Page 6: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/6.jpg)
Grimmer & Stewart “Text as Data” Political Analysis (2013)
Volume is a problem for scholarsCoders are expensive
Groups struggle to accurately label text at scaleValidation of both humans and machines is “essential”
Some models are easier to validate than othersAll models are wrong
Automated models enhance/amplify, but don’t replace humansThere is no one right way to do this
“Validate, validate, validate”“What should be avoided then, is the blind use of any method without a validation step.”
![Page 7: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/7.jpg)
Free, Open-Source, Web-based Text Analytics Toolkit
![Page 8: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/8.jpg)
Original Software Kernel: Tools for Measurement
![Page 9: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/9.jpg)
Avoid Tennis Elbow
Items load to the screen and the coder hits the keystroke
![Page 10: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/10.jpg)
Keystroke Human Coding
Human coding can be distributed to individuals, groups & crowds
![Page 11: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/11.jpg)
Computer Science & NSF Influence: Measure Everything
How fast?How reliable?
How accurate?
![Page 12: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/12.jpg)
Annotator Speed
Redacted
![Page 13: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/13.jpg)
Interrater Reliability: A Critical Measurement
![Page 14: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/14.jpg)
Adjudication
![Page 15: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/15.jpg)
CoderRank for enhanced machine-learning is our key innovation
Patent issued March 1, 2016
![Page 16: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/16.jpg)
CoderRank for Enhanced Machine-learning
CoderRank is to text analytics what PageRank was to search. Just as Google said not all web pages are created equal, Texifter argues that not all humans are created equal. When training machines, it is best to rely most on the humans most likely to create a valid observation. We proposed a unique way to rank humans on trust and knowledge vectors.
![Page 17: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/17.jpg)
The Five Pillars of Text Analytics
SearchFiltering
De-duplication and ClusteringHuman Coding
Machine-Learning
![Page 18: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/18.jpg)
Pillar #1: Search
![Page 19: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/19.jpg)
Pillar #2: Filters
![Page 20: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/20.jpg)
Pillar #3: Deduplication & clustering
![Page 21: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/21.jpg)
Pillar #4: Human coding (a.k.a. labeling or tagging)
![Page 22: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/22.jpg)
Pillar#5: Machine-learning
![Page 23: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/23.jpg)
ActiveLearning engines and human coding tools combine…
what humans do best… with what computers do best
Humans and machines learning together
It is always good to keep humans “in-the-loop”
![Page 24: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/24.jpg)
Word sense disambiguation (relevance)
![Page 25: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/25.jpg)
Word sense disambiguation (relevance)
![Page 26: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/26.jpg)
Word sense disambiguation (relevance)
![Page 27: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/27.jpg)
Word sense disambiguation (relevance)
![Page 28: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/28.jpg)
![Page 29: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/29.jpg)
Human coding can be converted into machine classifiers
Accumulated human coding becomes training data via machine-learning
![Page 30: Automated Text Coding: Humans and Machines Learning Togetherinsightinnovation.org/wp-content/uploads/2016/07/PDF/shulman.pdf · CoderRank for Enhanced Machine-learning CoderRank is](https://reader033.vdocuments.us/reader033/viewer/2022050519/5fa32628438f3d556252bd7e/html5/thumbnails/30.jpg)
Crowdsourcing accelerates the insight generation process through machine-learning
Distributed for synchronous & asynchronous collaboration