national security & intelligence applications of text...
TRANSCRIPT
![Page 1: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/1.jpg)
National Security & IntelligenceApplications of Text Analytics
Patrick Lam
Lead Data Scientist, ThresherVisiting Fellow, Harvard IQSS
March 26, 2015
![Page 2: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/2.jpg)
What is text analytics?
![Page 3: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/3.jpg)
"a set of linguistic, statistical, and machine learningtechniques that model and structure the informationcontent of textual sources for business intelligence,exploratory data analysis, research, or investigation"
-Wikipedia
![Page 4: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/4.jpg)
−→
![Page 5: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/5.jpg)
What are some common textanalysis methods?
![Page 6: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/6.jpg)
Natural Language Processing (NLP)Enable computers to understand human text input.
![Page 7: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/7.jpg)
Machine LearningGrouping, classifying, predicting
![Page 8: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/8.jpg)
Applications
![Page 9: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/9.jpg)
Who Authored the Federalist Papers?(Mosteller and Wallace, 1963)
![Page 10: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/10.jpg)
43 papers
Hamilton
14 papers
Madison
5 papers
Jay
12 papers
H, M, or J?
![Page 11: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/11.jpg)
Model the usage of high-frequencyfunction words for each author.
also, and, by, of, on, there, …
![Page 12: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/12.jpg)
43 papers
Hamilton
14 papers
Madison
5 papers
Jay
12 papers
Madison
![Page 13: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/13.jpg)
Authorship attribution using textanalysis has a large literature and
can be useful for national security &intelligence purposes.
![Page 14: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/14.jpg)
Harvard IQSSInstitute for Quantitative Social Science
iq.harvard.edu
![Page 15: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/15.jpg)
Four recent projects using textanalysis by Harvard IQSS affiliateswith relevance to national security &
intelligence
![Page 16: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/16.jpg)
1. Reverse-Engineering CensorshipIn China
(King, Pan, and Roberts, 2013 and 2014)
![Page 17: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/17.jpg)
What Gets Censored In China?
Monitor ∼1,400 Chinese socialmedia sites over 6 months across85 content areas.
Download posts the instant theyappear.
Revisit each post later to check if itwas censored.
Analyze with new methods of textanalysis.
Experiment with writing socialmedia posts.
![Page 18: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/18.jpg)
Censorship program targetscollective action rather than criticism
of the government.
![Page 19: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/19.jpg)
2. Jihadi Radicalization of MuslimClerics(Nielsen, 2014)
![Page 20: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/20.jpg)
Why Do Some Clerics Preach Jihad WhileOthers Do Not?
Download writings of a sample ofMuslim clerics.
Use machine learning methods toscore clerics on level of jihad bycomparing writings to knownJihadi texts.
Analyze along with other data onclerics.
![Page 21: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/21.jpg)
Clerics with weak educationalnetworks and connections often useJihadi ideology to appeal to lay
audiences and further their careers.
![Page 22: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/22.jpg)
3. Predicting Crowd Behavior(Kallus, 2014)
![Page 23: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/23.jpg)
Can We Predict Major Protests?
Scan over 300,000 web sources(news, blogs, forums, Twitter) in 7languages for mentions of past,current, or future events.
Extract type of event, entitiesinvolved, and timeframe using NLPmethods.
Predict on each day whether asignificant protest will occur overthe next three days using machinelearning methods.
![Page 24: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/24.jpg)
Massive online public discourse datahas the potential to predict crowd
behavior using text analysis methods.
![Page 25: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/25.jpg)
4. Anti-Americanism in the MiddleEast
(Jamal, Keohane, Romney, and Tingley, 2015)
![Page 26: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/26.jpg)
Anti-Americanism on Twitter
Download Arabic Twitter posts bykeywords.
Consider discourse about US ingeneral and in reaction to specificevents.
Use text analysis methods toclassify proportion of posts inspecified categories.
![Page 27: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/27.jpg)
Anti-Americanism in the Middle Eastis directed toward the impingement ofthe US on other countries rather than
toward American society.
![Page 28: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/28.jpg)
All applications of text analysis havea common starting point.
![Page 29: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/29.jpg)
How do we define our set of textsthat we want to analyze?
![Page 30: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/30.jpg)
How do we retrieve the relevant textsfrom the vast set of texts available?
![Page 31: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/31.jpg)
How do we follow the relevantconversations as they evolve?
![Page 32: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/32.jpg)
Define.Retrieve.Follow.
![Page 33: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/33.jpg)
In some cases, the relevant set oftexts is well-defined, static, and
easily accessible.Federalist Papers, complete works of Shakespeare
![Page 34: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/34.jpg)
In most cases, we have to constantlysearch and retrieve the relevanttexts, often using keywords and
Boolean searches.Twitter, news articles, blogs and forums
![Page 35: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/35.jpg)
DHS Analyst's Desktop Binder
![Page 36: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/36.jpg)
Our analyses are only as good as ourkeywords!
![Page 37: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/37.jpg)
What is our current most commonlyused technology for defining therelevant keywords to retrieve our
texts?
![Page 38: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/38.jpg)
![Page 39: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/39.jpg)
Example: Think of keywords youwould use to follow the Twitterconversation around the Boston
Marathon Bombings.
![Page 40: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/40.jpg)
Keywords about the event#bostonbombings, explosion, terrorism, attack, …
![Page 41: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/41.jpg)
Keywords about the suspectssuspect, tsarnaev, dzhokhar, tamerlan, …
![Page 42: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/42.jpg)
Keywords about the victimsinnocent, victim, collier, …
![Page 43: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/43.jpg)
Keywords about the reactiontragedy, prayers, #prayforboston, …
![Page 44: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/44.jpg)
Keywords about the politicsobama, #tcot, #benghazi, …
![Page 45: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/45.jpg)
We ran a similar experiment with 43Harvard undergrads.
![Page 46: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/46.jpg)
![Page 47: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/47.jpg)
59% of the words were suggested byonly 1 out of 43 undergrads.
![Page 48: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/48.jpg)
Median number of words perrespondent was 7.
![Page 49: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/49.jpg)
Humans are good at recalling a smalllist of good keywords and
recognizing a good keyword whenthey see it.
![Page 50: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/50.jpg)
Humans are bad at recalling a longlist of keywords that capture different
ways of representing a concept.
![Page 51: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/51.jpg)
Some Existing Options forAutomated Keyword Discovery
![Page 52: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/52.jpg)
1. Mine search queriesGoogle Adwords
![Page 53: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/53.jpg)
2. Thesaurus methodsreference books, WordNet
![Page 54: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/54.jpg)
3. Co-occurrence methods
![Page 55: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/55.jpg)
Enter Thresher.(Based on King, Lam, and Roberts, 2014)
![Page 56: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/56.jpg)
Thresher keeps humans in the loopand helps them find more and better
words faster.
![Page 57: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/57.jpg)
The Thresher Algorithm
Reference set R: texts aboutconcept of interest
Search set S: broad set of texts
Target set T: texts in S about thesame concept as in R
Goal: Estimate T and find keywordsthat define T.
![Page 58: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/58.jpg)
Thresher Applications
![Page 59: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/59.jpg)
The Boston Marathon Bombings onTwitter
R: #bostonbombingsS: boston
![Page 60: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/60.jpg)
Thresher separates relevant wordsfrom irrelevant words.
![Page 61: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/61.jpg)
Target set words
suspect, bomb, police,people, fbi, tsarnaev, arrest,die, terror, attack, kill,obama, custody, prayer,dzhokhar, god, false,#prayforboston, #tcot,#bostonmarathon, picture,identify, russia, #watertown,tamelan, islam, jihad,…
Non-target set words
game, red sox, bruins,celtics, back, tonight, fan,#mlb, night, chicago, newyork, garnett, fenway, rondo,#job, playoff, yankees,blackhawks, stanley, pizza,#nhl, draft,…
![Page 62: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/62.jpg)
The Bo Xilai Scandal on Chineseblogs and forums
R:薄熙来 (Bo Xilai)S:重庆 (Chongqing)
![Page 63: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/63.jpg)
王立军 Wang Lijun政治 government事件 event (euphemism for the scandal)打黑 strike corruption犯罪 commit a crime民主 democracy权力 power文革 Cultural Revolution领导 leader改革 reform群众 the masses
中央中共 Central Communist Party社会主义 socialism
唱红 sing red songs黑社会 black society干部 cadre路线 party line
![Page 64: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/64.jpg)
Writings About Suicide Bombings inArabic
R: عمليات االستشهادية (martyrdom operations)from "Haqibatu'l-Mujahid" (A Mujahid's Bookbag)
S: "Pulpit of Tawhid and Jihad"A Jihadist web library
![Page 65: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/65.jpg)
العدو enemyقتل killing
والنكاية to vex or spite ("vex the infidels'')م ه لَم عي teach themيلِ الْخَ steed
وا د َأع و fightون لَم تُظْ wrongedترهبون terrifyالغالم boy
(the story of the boy and the king, relevant to jihadis)
![Page 66: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/66.jpg)
"And prepare against them whatever you are able ofpower and of steeds of war by which you may terrifythe enemy of Allah and your enemy and othersbesides them whom you do not know [but] whomAllah knows. And whatever you spend in the cause ofAllah will be fully repaid to you, and you will not bewronged."
-Quran 8:60
![Page 67: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/67.jpg)
Can we use Thresher to definedifferent Arabic dialects by keywords
and retrieve texts from them?
![Page 68: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/68.jpg)
Egyptian Gulf Levantine MSA Translation
ايه وش شو ماذا Whatمعرفش مادري بعرف ما يعرف ال Don't knowعايز يبي بدي To want/I wantعاوز يبغى بدك To want/You wantعايزين يبون بدكم To want/You want (pl.)
![Page 69: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/69.jpg)
Conversations can evolve quickly inresponse to certain actors and weneed to be able to follow them.
![Page 70: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/70.jpg)
Evading Censors in China
![Page 71: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/71.jpg)
自由 Freedom目田 Eye field
(homograph)
![Page 72: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/72.jpg)
和谐 Harmonious [Society]河蟹 River crab
(homophone)
![Page 73: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/73.jpg)
Evading Authorities and theDistribution of Child Pornography
![Page 74: National Security & Intelligence Applications of Text ...patricklam.org/talk/ucf_thresher.pdf · NationalSecurity&Intelligence ApplicationsofTextAnalytics PatrickLam LeadDataScientist,Thresher](https://reader034.vdocuments.us/reader034/viewer/2022043013/5fad6e005f761609512d358d/html5/thumbnails/74.jpg)
PatrickLam.orgThresherVentures.com
iq.harvard.edu