overview of current and planned odni-led sensemaking...
TRANSCRIPT
Overview of Current and Planned ODNI-led Sensemaking Research Activities
USGIF Machine Learning and Artificial Intelligence Working Group
26 June 2018
Dr. David M. Isaacson Program Manager, Machine Analytics Research
Acquisition, Technology, and Facilities
SCIENCE & TECHNOLOGY
Motivation In-VEST Program
– Xpress Challenge – Xtend Challenge – Xamine Challenge
Next Steps – Xpect Challenge + Xplore Challenge – Transitions: ADDS and AIDE programs – Technical workshops
2
Agenda
SCIENCE & TECHNOLOGY
Motivation — Mitigating Bottlenecks in Decision Support
3
Sensemaking techniques offer a promising alternative for decision advantage by sidestepping the latency and production bandwidth issues associated with traditional analysis’ production, review, and editing processes.
SCIENCE & TECHNOLOGY
In-VEST Xploratory Challenge Series
Launched in 2017 and continuing into 2018, ODNI’s In-VEST program is pursuing a series of prize challenges aimed at ascertaining and advancing the state-of-the-art in natural language processing, cognitive computing, and other artificial intelligence approaches with the potential to revolutionize IC capabilities. Collectively, these efforts explore
technical opportunities for accelerating and automating the production of intelligence.
4
ODNI’s Intelligence Ventures in Exploratory Science and Technology (In-VEST) program seeks to catalyze disruptive research approaches for addressing IC needs.
SCIENCE & TECHNOLOGY
The ODNI-OUSD(I) Xpress Challenge
The Xpress Challenge explored opportunities for machine analytics to generate finished intelligence products to inform policymakers and warfighters.
Timeline: 6 April 2017: Xpress Challenge opened 5 July 2017: Xpress Challenge closed 8 September 2017: Evaluation complete 28 November 2017: Code validation complete
The Xpress Challenge strived for an “apples-to-apples” comparison to IC analytic production: i.e., used a policymaker-relevant “intelligence” question, used established IC evaluation criteria, etc.
Key details: • Addressed a single family of questions • Used a corpus of nearly 15,000 documents • Evaluated based on ODNI/Analytic Integrity and Standards criteria (ICD 203)
5
SCIENCE & TECHNOLOGY
Xpress Source Material Xpress Challenge solvers had access to a .zip file containing .xml files of roughly 15,000 SIGNAL articles, columns and blog entries going back several years. The use of SIGNAL solved a number of problems, to include: copyright issues lack of control over source material, code validation, and avoiding irrelevant websites.
For the competition, solvers were asked to craft machine-generated responses to the below question: What developments related to artificial intelligence are most impactful to the national security of the United States?
6
SCIENCE & TECHNOLOGY
Xpress Evaluation Methodology
7
AIS uses a 0-3 scale, with “2” representing the CIA/DA standard. Products are judged on whether they exceed that standard (3), meet it (2), partially meet it (1), or fall short (0). • 3: Exceeds the IC standards in all aspects of a criterion.
• 2: Meets the IC standards in all aspects of a criterion. If a piece falls short in any aspect of a criterion, it should not receive a “2.”
• 1: Meets some aspects of the criterion’s standard but fails to meet other aspects of that criterion.
• 0: Falls below standards on all aspects of the criterion’s standard.
• NA: Numerical evaluation is not warranted.
Evaluation of all Xpress Challenge submissions was performed by ODNI’s Analytic Integrity and Standards (AIS) against the ODNI Rating Scale for Evaluating Analytic Tradecraft Standards. Importantly, AIS reviewers were NOT told they were evaluating machine-generated products!
About the ICD-203 Standards.
INFE
REN
TIAL
N
OT
USE
D
LITE
RAL
EV
ALU
ATIV
E
SCIENCE & TECHNOLOGY
Sample Product Provided to Solvers
All content in this sample document was extracted directly from SIGNAL articles and arranged by the Xpress team in the preferred format.
8
SCIENCE & TECHNOLOGY
AIS Scoring of Sample Product against the Six Criteria
Additional information on how the provided Sample Product was scored for all six criteria is included in the Backup material.
9
SCIENCE & TECHNOLOGY
Xpress Challenge Award Schedule
10
To be eligible for Overall Best Submission awards, submitted Analytic Products must receive a score of Fair (1) or above for each evaluation criteria. Winners in the category award areas of Literal, Inferential, and Evaluative are determined by the highest score for the criteria in the respective category regardless of performance in the other categories.
SCIENCE & TECHNOLOGY 11
Xpress Solver Base
387 registrants across 42 countries!
15 submissions!
SCIENCE & TECHNOLOGY
AIS Evaluation of Xpress Submissions
12
Paper # Literal Criterion 1
Literal Criterion 2
Inferential Criterion 1
Inferential Criterion 2
Evaluative Criterion 1
Evaluative Criterion 2
Total Score by Paper
1 0 1 2 1 0 NA 4
2 1 1 1 0 0 0 3
4 1 1 1 0 1 NA 4
5 1 1 1 1 0 NA 4
6 1 2 1 0 1 1 6
7 0 1 2 1 0 NA 5
8 1 2 2 1 1 NA 7
10 0 1 2 NA 0 NA 3
11 0 1 0 0 1 NA 2
13 1 1 1 0 1 0 4
14 0 1 1 0 0 0 2
15 0 1 0 0 0 NA 1
16 1 1 NA NA 0 NA 2 Total Score by Criterion 7 15 14 4 5 1
Code validation on additional, related topics impacted these standings...
SCIENCE & TECHNOLOGY
Code Validation of Xpress Submissions
13
The Xpress Challenge documentation explicitly required solutions to able to address other questions posed in the format provided.
Based on AIS’ results, the performance of nine of the Xpress challenge algorithms was explored: 1, 2, 4, 5, 6, 8, 11, 13, 14, and 16
The algorithms were tasked with producing reports against 7 topics using Xpress’ corpus of SIGNAL articles:
artificial intelligence algorithms machine learning North Korea software defined social media UAV
For code validation, solvers’ solutions were tasked with creating machine-generated responses to the below question: What developments related to <______________> are most impactful to the national security of the United States?
SCIENCE & TECHNOLOGY
Precision, Recall, and Overfitting
14
Algorithms for report generation should generalize well across topics… not be overfitted to any one topic.
Applying cosine similarity allowed us to visualize these performance indicators.
Overfitted Generalizable
SCIENCE & TECHNOLOGY 15
Lighter colors reflect that algorithms were less focused on the topic, noisier, and more vague.
Code Validation Metric 1: Topics Focus
This metric explored the similarity of content across topics used by a single algorithm (determines focus or vagueness of content to topic).
SCIENCE & TECHNOLOGY
Code Validation Metric 2: # of Topics Generated
16
2 reports 16 ai, nk 11 ai, algorithms 8 ai, ml
3 reports 2 ai, algorithms, social media
5 reports 1 ai, ml, nk, software
defined, social media 5 ai, algorithms, nk,
software defined, uav
7 reports 4, 6, 14: artificial intelligence (AI) algorithms machine learning (ML) north korea (NK) software defined social media uav
This metric simply identifies if the algorithm generated a report on the 7 test topics. It does not measure the quality of the content.
SCIENCE & TECHNOLOGY 17
And the Xpress Challenge Winners Are…
“ #4 ”
“ #2 ”
SCIENCE & TECHNOLOGY
“[The Xpress Challenge] is an excellent opportunity for the IC to break new ground in ways we’ve never seen before, which could ultimately shape how we inform policymakers or enable the warfighter in the field—it just doesn’t get any better than that.”
— Sue Gordon, Principal Deputy Director for National Intelligence
Final Thought on Xpress…
18
SCIENCE & TECHNOLOGY
The ODNI-OUSD(I) Xtend Challenge
The Xtend Challenge should also help improve the quality and homogeneity of traditional, human-generated analytic products before they are delivered to customers.
The Xtend Challenge complements the Xpress Challenge by asking researchers to develop approaches for the machine evaluation of analytic products.
Initial submissions were evaluated based on: Overall scientific and technical merit Contribution and relevance to the Xtend
Challenge objective
The Xtend Challenge closed on 15 January 2018.
A machine evaluation capability will be critical should the machine generation of high-quality analytic products prove possible.
19
SCIENCE & TECHNOLOGY 20
Xtend Solver Base
186 registrants across 32 countries!
14 submissions!
SCIENCE & TECHNOLOGY 21
And the Initial Xtend Challenge Winners Are…
Xtend Challenge winners continue to compete for an additional $50k in prizes.
Working with AIS to develop a follow-on research effort—the Artificial Intelligence-Derived Evaluation (AIDE) program—using thousands of evaluated and scored IC products
SCIENCE & TECHNOLOGY
The ODNI-OUSD(I) Xamine Challenge
The Xamine Challenge should also help improve the ensuring the accuracy and veracity of input information incorporated in traditional, human-generated analytic products.
The Xamine Challenge complements the Xpress Challenge and the Xtend Challenge by asking researchers to develop approaches for the machine inspection of information reports.
The Xamine Challenge closes on 2 July 2018. Register today
for a chance to win prizes totaling $75,000!
22
Factors for assessing the trustworthiness of information before it is incorporated into an IC analytic product widely can include: Ensuring accuracy and completeness, Detecting possible denial and deception, Identifying unique—and possibly unverifiable—
information, Determining the age and continued currency of
information, Weighting the technical elements of collection, and Ascertaining source access, validation, motivation,
possible bias, or level of expertise.
SCIENCE & TECHNOLOGY
Xamine Challenge Solution Requirement
23
Proposed solutions should assess—using a proposed quantitative framework, where possible—the credibility of information reports’ underlying sources and methodologies upon which reports’ facts, opinions, or judgments are based as well as describe factors affecting source quality and credibility.
Xamine Challenge submissions require a written proposed solution which describes novel technologies or improvements to existing technologies. Each submission should include: An executive summary (no longer than 1-page) of the proposed solution. By making a
submission to this Challenge, Solvers agree to allow the executive summaries of their solutions to be posted on ODNI’s webpage and used in other publications reporting the results of this Challenge.
Detailed description of the proposed solution relative to existing technologies that address the outlined Challenge. Proposed solution descriptions should not exceed 10 pages in length, and should include discussion of how the solution meets the Challenge stated above.
Drawings/sketches/visual aids of the proposed solution, if applicable. Optional (will not impact judging): Description of resources, materials, budget, and proposed
timeframe needed to develop a prototype capable of evaluating and numerically measuring the trustworthiness of ingested information.
SCIENCE & TECHNOLOGY 24
Xamine Solver Base (as of 6-21-2018)
SCIENCE & TECHNOLOGY
Next Step 1: The Xpect and Xplore Challenges
25
ODNI, again in partnership with the Office of the Under Secretary of Defense for Intelligence (OUSD[I]), in Fiscal Year 2019 will pursue two public prize competitions—“Xplore” and “Xpect”—to explore opportunities, using artificial intelligence (AI) techniques, to further catalyze enhancements to the IC’s finished intelligence production processes.
The Xpect Challenge will ask solvers to describe artificial intelligence-based
approaches for automating model-based indications of change.
Through the Xplore Challenge, solvers will be asked to describe artificial intelligence-based approaches for enabling the automated and predictive discovery of information.
The Xpect and Xplore Challenges will likely follow the path of Xtend and Xamine (guaranteed $25k initial prize with $50k follow-up prize pool).
SCIENCE & TECHNOLOGY
Next Step 2: Algorithm-Derived Decision Support (ADDS)
26
The Xpress Challenge showed that respectable results could be generated—often in seconds, but Solvers… had 90 days to prepare a single
response, used 15000 documents and blogs
from only one source, only answered 1 family of
questions, and only generated one product type.
ADDS will build on early Xpress Challenge successes by:
Using simulated crises* to exhibit the speed advantage of machine analytics,
Addressing a greater range of policymaker questions (political, economic, etc.),
Using classified, all-source reporting (SIGINT, HUMINT, imagery, press reporting, etc.), and
Through AIDE, leveraging Xtend Challenge results to employ automated scoring for real-time quality control.
Program Goal: The ADDS program will use a tournament to demonstrate the potential for machine-generated analytic products to provide timely and relevant decision advantage in a simulated crisis scenario.
* ADDS will use existing CIA scenarios to use proven reporting and taskings to explore the potential for machine analytics
SCIENCE & TECHNOLOGY
Additional Approaches for Exploring Machine Analytics Overview: A two-day, National Academies of Science (NAS)-led workshop leveraging expertise from industry, academia, and government to investigate issues around the application of artificial intelligence techniques to IC analytic tradecraft.
Questions addressed included: What are the technical objectives and metrics needed
for success? What are the primary issues? What are the current and “next level” key performance
metrics? What is the “after next level” of expected research and development performance?
What is the research knowledge base? How can the U.S. Government best prepare the
scientific workforce to enhance discovery in this area?
Held 9-10 August 2017
Workshop report available now!
https://www.nap.edu/catalog/24900/challenges-in-machine-generation-of-analytic-products-from-multi-source-data
An additional workshop focused on machine verification of uncertain data scheduled for
later this year in Silicon Valley
27
SCIENCE & TECHNOLOGY
Conclusion — Mitigating Bottlenecks in Decision Support
28
ODNI efforts to advance analysis with sensemaking techniques will continue to explore a promising parallel intelligence production pathway for decision advantage.
Rapid, AI-assisted provision of accurate and current intelligence products (within seconds or minutes of request), especially in the midst of a crisis, could enable crucial decision advantage for U.S. warfighters and policymakers.
SCIENCE & TECHNOLOGY
Acknowledgments
29
ATIA set the stage for Xpress(+) by working with cleared industry to develop the initial technology roadmaps…
AFRL teamed with us to handle the contracting to access InnoCentive…
OUSD(I) co-sponsored Xpress, providing over half of the total project funding…
CEO of AFCEA agreed to let us use SIGNAL Magazine content …
ODNI/AIS evaluated the submissions based on their standards…
SIGNAL Magazine promoted the challenge in 3 articles…
ISSO helped with the source code validation…
SCIENCE & TECHNOLOGY 30
Questions?
For more information:
[email protected] www.dni.gov/in-step www.dni.gov/in-vest