DARPA Human Language Technology Programs
Boyan Onyshkevych Program Manager, I2O
Presented by Doug Jones MIT Lincoln Laboratory
28 October 2016
Approved for Public Release, Distribution Unlimited
15 10
10
5 5
Q/A
3
DEFT, LORELEI, and a Potential New Program Space
Entities
Relations
Events
Topics
Multiple Hypothesis Generation
Sentiment
Relationship Network Analysis
Entity Analysis
Machine Translation
© ERSI
Hotspot Map
Analysis Tools
Analyst
Link
ing Common Semantics
DEFT LORELEI Potential New Program Space
Uncertainty Management
Goals Video Analysis
Image Analysis
Sensor Output Analysis
Language Analysis
Video
Images
Sensor Output
English Speech and
Text
Foreign Language
Speech and Text Projection Resource
Optimization
Language Models
Analysis of Universals
User Interaction Data
Interaction Data Analysis
Image sources (top to bottom) Santa Cruz Sentinel Reuters: Amit Dave Dylan Vester PBS Wikipedia Dinozzo Dreamstime.com ERSI Approved for Public Release, Distribution Unlimited
Deep Exploration and Filtering of Text (DEFT)
4
Goal: Identify and aggregate explicit and implicit information from multiple unstructured English, Spanish, and Chinese text sources to support automated analytics and human analysts
Approach: Convert text to structured representation of situation • Find and represent key information, including information on entities, relations, events,
sentiment, and beliefs • Aggregate information from multiple sources, detect inter-document relationships, and
represent the information in a structured knowledge base (KB) • Identify emergent situations and anomalies through KB analysis • Explore and filter relations, events, and anomalies from formal and informal inputs
Analysts
Analytics
Approved for Public Release, Distribution Unlimited
Who
What
Why
How
When Where
Opinion
Planner(A), Courier(B)
Meeting(1),Meeting(2) Superior(A,B)
TodayPM(2) Location(2,UncleHouse)
Cause(~Passage,Failed(1))
Negative(A,FailedMeeting)
DEFT Output
Via(Transfer(B,A,Stuff), Meeting(2))
DEFT Example English
DEFT Algorithms
A: Where were you? We waited all day for you and you never came. B: I couldn’t make it through, there was no way. They…they were everywhere. Not even a mouse could have gotten through. A: You should have found a way. You know we need the stuff for the…the party tomorrow. We need a new place to meet…tonight. How about the…uh…uh…the house? You know, the one where we met last time. B: You mean your uncle’s house? A: Yes, the same as last time. Don’t forget anything. We need all of the stuff. I already paid you, so you had better deliver. You had better not $%*! this up again.
Source Material
5
A: Where were you? We waited all day for you and you never came. B: I couldn’t make it through, there was no way. They…they were everywhere. Not even a mouse could have gotten through. A: You should have found a way. You know we need the stuff for the…the party tomorrow. We need a new place to meet…tonight. How about the…uh…uh…the house? You know, the one where we met last time. B: You mean your uncle’s house? A: Yes, the same as last time. Don’t forget anything. We need all of the stuff. I already paid you, so you had better deliver. You had better not $%*! this up again.
Approved for Public Release, Distribution Unlimited
Who
What
Why
How
When Where
Opinion
DEFT Algorithms
A: ¿Dónde estabas? Te esperamos todo el día y nunca llegaste. B: No pude venir, no había pasaje. Ellos ... ellos estaban por todas partes. Ni siquiera un ratón podría pasar. A: Debieras haber encontrado pasaje. Sabes que tenemos las cosas para la ... la fiesta mañana por la noche. Necesitamos otro lugar reunir... esta noche. ¿Y este ... este ... la casa? Ya sabes, donde nos reunimos la última vez. B: ¿Te refieres a la casa de tu tío? A: Sí, la misma que la última vez. No te olvides nada. Necesitamos todas las cosas. Ya te pagué, así que debieras cumplir. Que no $%*! esta vez.
Planner(A), Courier(B)
Meeting(1),Meeting(2) Superior(A,B)
TodayPM(2) Location(2,UncleHouse)
Cause(~Passage,Failed(1))
Negative(A,FailedMeeting)
Source Material DEFT Output
6
Via(Transfer(B,A,Stuff), Meeting(2))
A: ¿Dónde estabas? Te esperamos todo el día y nunca llegaste. B: No pude venir, no había pasaje. Ellos ... ellos estaban por todas partes. Ni siquiera un ratón podría pasar. A: Debieras haber encontrado pasaje. Sabes que tenemos las cosas para la ... la fiesta mañana por la noche. Necesitamos otro lugar reunir... esta noche. ¿Y este ... este ... la casa? Ya sabes, donde nos reunimos la última vez. B: ¿Te refieres a la casa de tu tío? A: Sí, la misma que la última vez. No te olvides nada. Necesitamos todas las cosas. Ya te pagué, así que debieras cumplir. Que no $%*! esta vez.
DEFT Example Spanish
Approved for Public Release, Distribution Unlimited
DEFT Example Chinese
Who
What
Why
How
When Where
Opinion
DEFT Algorithms
A: 你在哪?我 等了你一整天,你都没来。 B: 我没法 去,没 法。他 …他
无所不在。 一只老鼠都没法穿 去。 A: 你 想个 法。你知道我明天的…的派 需要 些 西。我
今晚需要一个新的地点 面。那… … …房子怎么 ?你知道的
,我 上次 面的那 。 B: 你是 你叔叔家? A: ,跟上次一 。不要忘了任何
西。我 需要全部的 西。我已付了你 ,所以你最好交差。你次最好 再 搞 了。
Planner(A), Courier(B)
Meeting(1),Meeting(2) Superior(A,B)
TodayPM(2) Location(2,UncleHouse)
Cause(~Passage,Failed(1))
Negative(A,FailedMeeting)
Source Material DEFT Output
7
Via(Transfer(B,A,Stuff), Meeting(2))
A: 你在哪?我 等了你一整天,你都没来。 B: 我没法 去,没 法。他 …他
无所不在。 一只老鼠都没法穿 去。 A: 你 想个 法。你知道我明天的…的派 需要 些 西。我
今晚需要一个新的地点 面。那… … …房子怎么 ?你知道的
,我 上次 面的那 。 B: 你是 你叔叔家? A: ,跟上次一 。不要忘了任何
西。我 需要全部的 西。我已付了你 ,所以你最好交差。你次最好 再 搞 了。
Approved for Public Release, Distribution Unlimited
• Coreference Resolution • Inducing entity and event matches without direct explicit matches • Cross-document tracking of entities, relations, and events
• Entities, Relations, and Events • Learning of explicit and implicit relations and of new expressions • Automatic representation of temporal, enabling, and attributive aspects • Recovery and representation of implicit arguments
• Knowledge Bases • Dynamic push and pull from active knowledge bases • Can various elements like belief/sentiment/etc. be represented as confidence in KB entries?
• Opinions and Modality • Infer opinion/modality, topic, attribution and change over time • Tackle ground-truth challenges (probabilistic, many annotators?)
• Foreign Languages • Natural language data from English, Spanish, and Chinese converted to machine-readable
structured language
• Algorithm Combination • How do DEFT algorithms add up to a knowledge base? • Linking and cross-document coreference
DEFT Research Areas
Approved for Public Release, Distribution Unlimited 8
9
DEFT – NIST Open Evaluation Task Alignment
Document-Level Annotations
• Part-of-Speech • Parsing • Named Entity • Within-Document
Coreference • Relation
extraction • Event extraction • Sentiment/ Belief • Many others
Relations
Link
ing
Machine- Readable Output
Inference
Uncertainty Management
Input: Multilingual
Unstructured Text Core KB
Assertions
Events
Sentiment
Entities
Downstream Analytics
Visualization
Entity Discovery & Linking
Belief & Sentiment
ColdStart Entity Discovery
ColdStart Slotfilling
ColdStart Full KB
Event Nugget and Argument
Slotfiller Validation
Approved for Public Release, Distribution Unlimited
DEFT Program Participants
TA3 Data
U. Penn/LDC
TA2 Integration
BBN
SDL / LW
ISI
Colorado SAIC
NCC
TA1 Algorithm
Developers
Columbia GWU
U. Washington
CMU-CS USC/ISI
U. Mass
Evaluation
Lincoln Lab
QC/CUNY
Stanford
Johns Hopkins
Cornell
U. Washington
U. Illinois Urbana-
Champaign
U. Texas Austin
CMU-ML
NIST
IHMC SUNY/Albany
U. Florida
U. Texas Dallas
10
RPI U. North Texas U. Pittsburgh
Approved for Public Release, Distribution Unlimited
Low Resource Languages for Emergent Incidents (LORELEI)
Rapid development of language technology to provide situational awareness based on information from ANY language, in support of emergent missions
LORELEI success will mean non-linguist mission planners can achieve results
comparable to expert linguists’
Rapid Response Language Technology - Initial capability beginning within 24 hours of
emergent need
Language Variation (Colors represent language groups)
© Muturzikin.com
© Muturzikin.com
© Muturzikin.com
4 Approved for Public Release, Distribution Unlimited
• 7100+ active languages in the world - hard to predict which languages will be needed next • 44 in Boko Haram area (Hausa, Kanuri) – 522 languages in all of Nigeria • 19 in Ebola outbreak areas in Liberia, Sierra Leone, and Guinea (Kissi, Kuranko) • 20+ Mayan languages spoken by Central American refugee children (Q’anjob’al, K’itche, Ixil)
• Developing language technologies by current methods requires approximately 3 years and 10s of millions of $ per language (mostly to construct translated or transcribed corpora) • By current methods, need $70B and 230,000 person-years of effort for all languages
LORELEI Motivation
Language Requirements Example: Foreign Disaster Relief Language Needs, 1990-2014
Data sources: ReliefWeb, CRS, whitehouse.gov
: 1 intervention : 2-4 interventions : 5-10 interventions : 11-14 interventions : +15 interventions
266 INTERVENTIONS 879 ASSOCIATED LANGUAGE NEEDS
Approved for Public Release, Distribution Unlimited 12
LORELEI Potential HLT Use Case
In 2010, a company called Ushahidi carried out a post-earthquake support effort in Haiti in which affected people sent text messages that were translated by humans and the translations were used by aid personnel to organize the data manually for visualization
Example Output: Hotspot Map
www.ushahidi.com
www.ushahidi.com
Approved for Public Release, Distribution Unlimited 13
LORELEI Program Goal
1 Day 1 Week 1 Month
Place Names
Topics in Text
Sentiment/Emotional State
Events
Relationships
Person Information
Topics in Speech
Scenario Types • Humanitarian Assistance and Disaster Relief (HADR) • Peacekeeping, Counterterrorism, Law Enforcement Assistance • Medical Aid for Infectious Disease Outbreaks, Radiological Incidents
Approved for Public Release, Distribution Unlimited 14
LORELEI Example Use Case: Mission Planning
15 Approved for Public Release, Distribution Unlimited
16
LORELEI Example Use Case: Situation Awareness
Approved for Public Release, Distribution Unlimited
LORELEI Hypothetical Mission Requirements and Answers
• Wherearethereconfirmedandsuspectedcases?• Wherearereac3onstotheoutbreakoccurring?
Discoverwhatloca/onsareinvolved
• Howmanycasesarethereateachloca3on?• Whatlevelofviolenceisinvolvedinreac3ons?Ratetheurgencyateachloca/on
• Whattypesofshortagesareoccurring?• Whattypesofpeopleareinfected?• Whendidtheinfec3onsoccur?
Ascertainthetypeofresponseneededforeachloca/on.
• Whataretheavailablelocalresourcesandfacili3es?• Whichorganiza3onsarealreadyontheground?• Whatresourcesareneeded?Medicine,food,water?
Determinewhatpersonnelandsuppliesarerequired
• Whoisinchargeofrelevantorganiza3ons?• Whoarethepoli3calofficialsordefactoleaders?
Planforcoordina/onwithlocalpeopleandorganiza/ons
• Whattransporta3onisavailable?• Whatroutesareimpededbyquaran3nes,roadblocks,orviolence?
Mapoutroutesforforcestogettoeachloca/on
• Wherearesickpeopleandrefugeesmoving?• Whereistheoutbreakundercontrol/notundercontrol?
Determinewhatcontainmentmeasuresarerequired
• Whatisthereac3onoftheinfectedpeople?Theuninfected?• Isthequaran3necausingunrest?Violence?• Isthereanincreasedlevelofcrime?
Planforadjustmentsinresponsetochangesincircumstances
Ques/onsAnsweredbyLORELEI(Ebolaoutbreakscenario)MissionRequirement
Limite
dCh
aracteriza/
onof
theEn
vironm
ent
Extend
ed
Characteriza/
onof
theEn
vironm
ent
Full
Characteriza/
onof
theEn
vironm
ent
17 Approved for Public Release, Distribution Unlimited
LORELEI
LORELEI Concept
Language Technology Development Environment
Linguistics Expert
Scenario Model
Knowledge Fusion Engines
Language Tools
Run-Time Models
Run-time Framework
Mission Plan
Analysis Tools
Analyst
Hotspots
Relations Entities
Translation Web Services
Real-time Incident
Data
Language-Universal Resources Known
Language Data
Linguistic Universals
Language Analysis Algorithms
Language-Specific Resources
Dictionaries, Corpora, etc.
Resource Optimization Algorithms
Linguistic Units
Related-Language Resources Language
Packs
Projection Hypotheses
Projection Algorithms
© ERSI
DefenseImagery.mil
Approved for Public Release, Distribution Unlimited 18
LORELEI Performer Teams
TA1.4
TA1.1 TA1.2 TA1.3
Data
TA2
19
Raytheon BBN Technologies
Carnegie Mellon
University
Information Sciences Institute
Johns Hopkins University
University of Pennsylvania
University of Texas El Paso
University of Illinois
University of Washington
Center for Research in Computational
Linguistics
Columbia University
University of Massachusetts
Linguistic Data Consortium Appen
Next Century Corporation
Approved for Public Release, Distribution Unlimited
20
Performer Main Focus Areas
Prime TA
LanguageFam
ilies/
Universals
Lexicon/
dic3on
aries
Morph
ology&
Syntax
En33
es:N
ER,
coreference,linking
Even
tsand
Re
la3o
ns
Text:Top
ics
Speech:p
hone
mes,
topics,sen
3men
t
Text:Sen
3men
t,Be
liefs,Priv
ate
States
Machine
Transla
3on
BBN 1.4 BBN BBN BBN BBN BBN
CMU 1.4 CMU CMU
USC 1.4 USC USC USC USC USC
JHUYarowsky 1 JHU-DY JHU-DY
UIUCRoth 1 UIUC-DR UIUC-DR
JHUKhudanpur 1 JHU-SK
Columbia 1 Columbia Columbia
UPenn 1 UPenn
UW 1 UW UW UW
JHUVanDurme 1 JHU-BV
UMassAmherst 1 UMass UMass UMass
UTEP 1 UTEP
JHUKhudanpurWrksp 1 JHU-SK
UIUCSchwartz 1 UIUC-LS
CRCL 1 CRCL CRCL Approved for Public Release, Distribution Unlimited
21
Broad Operational Language Translation (BOLT)
TheaimofBOLThasbeentoenableiden/fica/onofimportantinforma/oninforeign-languagesourcesandcommunica/onwithnon-English-speakingpopula/onsby:
• AllowingEnglishspeakerstounderstandforeign-languagesourcesofallgenres,includingchat,messaging,andinformalconversa3on;
• ProvidingEnglishspeakerstheabilitytoquicklyiden3fytargetedinforma3oninforeign-languagesourcesusingnatural-languagequeries;and
• Enablingmul3-turncommunica3onintextandspeechwithnon-Englishspeakers.
Thesystemsproducedbytheprogramaretheresultsofthreeyearsofintensivefocusonreducingthedomainandgenrelimita3onsofhumanlanguagetechnologies.Theyfallintothreemaincategories:
• Machinetransla/onofinformalArabicandChinesetextandspeechintoEnglish• Informa/onretrievalfrominformalArabicandChinesetext• User-interac3vespeech-to-speechtransla/onbetweenIraqiArabicandEnglish
SometechnologiesproducedbytheBOLTprogramareavailabletogovernmentusersunderano-costgovernmentlicense,whileothershaveassociatedlicensingfees.Allhavethepoten3altobeadaptedtootherlanguagesandtypesofdata.
Approved for Public Release, Distribution Unlimited
22
Available BOLT Software
Software Languages Status
BBNMachineTransla/on MandarinChinese->EnglishEgyp3anArabic->English
Fullydeveloped,GovernmentPurposeRights
BBNInforma3onRetrieval MandarinChinese,English,Egyp3anArabic
PrototypelooselyintegratedwithinBBN’scommercialMul3media
MonitoringSystem
BBNError-TolerantS2STransla3on IraqiArabic<->EnglishCommerciallicenserequiredforBBN
ByblosTMSpeechRecogni3onandSVOXText-to-Speech
BBNCustomiza/onToolsforS2STransla/on n/a Fullydeveloped,
GovernmentPurposeRights
IBMS2STransla/on IraqiArabic<->English
Proof-of-concept,GovernmentPurposeRights
IBMMachineTransla/on MandarinChinese->EnglishEgyp3anArabic->English
Fullydeveloped,GovernmentPurposeRights
IBMMul/-lingualQues/onAnswering
MandarinChinese,English,Egyp3anArabic
Researchprototype,GovernmentPurposeRights
IBMEn3tyResolu3on Fullydeveloped,Commerciallicenserequired
SRIS2STransla3on IraqiArabic<->English
Fullydeveloped,Commerciallicenserequired
Approved for Public Release, Distribution Unlimited
23
• DoD demonstration of BOLT speech-to-speech translation on a handheld device
• Integration of BOLT machine translation into CYBERTRANS
• Adaptation of BOLT speech-to-speech translation for African French<->English for US Army Africa
New BOLT Transitions
Wherewereyoulastmonth?
WereyouinKabul?
AfghanistanandIran.
Pleasetellmeanotherwordorphrasefor[playbackKabul].
InthecapitalofAfghanistan.
Approved for Public Release, Distribution Unlimited