overview of current and planned odni-led sensemaking...

Overview of Current and Planned ODNI-led Sensemaking Research Activities

USGIF Machine Learning and Artificial Intelligence Working Group

26 June 2018

Dr. David M. Isaacson Program Manager, Machine Analytics Research

Acquisition, Technology, and Facilities

SCIENCE & TECHNOLOGY

Motivation In-VEST Program

– Xpress Challenge – Xtend Challenge – Xamine Challenge

Next Steps – Xpect Challenge + Xplore Challenge – Transitions: ADDS and AIDE programs – Technical workshops

2

Agenda


Motivation — Mitigating Bottlenecks in Decision Support

3

Sensemaking techniques offer a promising alternative for decision advantage by sidestepping the latency and production bandwidth issues associated with traditional analysis’ production, review, and editing processes.


In-VEST Xploratory Challenge Series

Launched in 2017 and continuing into 2018, ODNI’s In-VEST program is pursuing a series of prize challenges aimed at ascertaining and advancing the state-of-the-art in natural language processing, cognitive computing, and other artificial intelligence approaches with the potential to revolutionize IC capabilities. Collectively, these efforts explore

technical opportunities for accelerating and automating the production of intelligence.

4

ODNI’s Intelligence Ventures in Exploratory Science and Technology (In-VEST) program seeks to catalyze disruptive research approaches for addressing IC needs.


The ODNI-OUSD(I) Xpress Challenge

The Xpress Challenge explored opportunities for machine analytics to generate finished intelligence products to inform policymakers and warfighters.

Timeline: 6 April 2017: Xpress Challenge opened 5 July 2017: Xpress Challenge closed 8 September 2017: Evaluation complete 28 November 2017: Code validation complete

The Xpress Challenge strived for an “apples-to-apples” comparison to IC analytic production: i.e., used a policymaker-relevant “intelligence” question, used established IC evaluation criteria, etc.

Key details: • Addressed a single family of questions • Used a corpus of nearly 15,000 documents • Evaluated based on ODNI/Analytic Integrity and Standards criteria (ICD 203)

5


Xpress Source Material Xpress Challenge solvers had access to a .zip file containing .xml files of roughly 15,000 SIGNAL articles, columns and blog entries going back several years. The use of SIGNAL solved a number of problems, to include: copyright issues lack of control over source material, code validation, and avoiding irrelevant websites.

For the competition, solvers were asked to craft machine-generated responses to the below question: What developments related to artificial intelligence are most impactful to the national security of the United States?

6


Xpress Evaluation Methodology

7

AIS uses a 0-3 scale, with “2” representing the CIA/DA standard. Products are judged on whether they exceed that standard (3), meet it (2), partially meet it (1), or fall short (0). • 3: Exceeds the IC standards in all aspects of a criterion.

• 2: Meets the IC standards in all aspects of a criterion. If a piece falls short in any aspect of a criterion, it should not receive a “2.”

• 1: Meets some aspects of the criterion’s standard but fails to meet other aspects of that criterion.

• 0: Falls below standards on all aspects of the criterion’s standard.

• NA: Numerical evaluation is not warranted.

Evaluation of all Xpress Challenge submissions was performed by ODNI’s Analytic Integrity and Standards (AIS) against the ODNI Rating Scale for Evaluating Analytic Tradecraft Standards. Importantly, AIS reviewers were NOT told they were evaluating machine-generated products!

About the ICD-203 Standards.

INFE

REN

TIAL

N

OT

USE

D

LITE

RAL

EV

ALU

ATIV

E


Sample Product Provided to Solvers

All content in this sample document was extracted directly from SIGNAL articles and arranged by the Xpress team in the preferred format.

8


AIS Scoring of Sample Product against the Six Criteria

Additional information on how the provided Sample Product was scored for all six criteria is included in the Backup material.

9


Xpress Challenge Award Schedule

10

To be eligible for Overall Best Submission awards, submitted Analytic Products must receive a score of Fair (1) or above for each evaluation criteria. Winners in the category award areas of Literal, Inferential, and Evaluative are determined by the highest score for the criteria in the respective category regardless of performance in the other categories.

SCIENCE & TECHNOLOGY 11

Xpress Solver Base

387 registrants across 42 countries!

15 submissions!


AIS Evaluation of Xpress Submissions

12

Paper # Literal Criterion 1

Literal Criterion 2

Inferential Criterion 1

Inferential Criterion 2

Evaluative Criterion 1

Evaluative Criterion 2

Total Score by Paper

1 0 1 2 1 0 NA 4

2 1 1 1 0 0 0 3

4 1 1 1 0 1 NA 4

5 1 1 1 1 0 NA 4

6 1 2 1 0 1 1 6

7 0 1 2 1 0 NA 5

8 1 2 2 1 1 NA 7

10 0 1 2 NA 0 NA 3

11 0 1 0 0 1 NA 2

13 1 1 1 0 1 0 4

14 0 1 1 0 0 0 2

15 0 1 0 0 0 NA 1

16 1 1 NA NA 0 NA 2 Total Score by Criterion 7 15 14 4 5 1

Code validation on additional, related topics impacted these standings...


Code Validation of Xpress Submissions

13

The Xpress Challenge documentation explicitly required solutions to able to address other questions posed in the format provided.

Based on AIS’ results, the performance of nine of the Xpress challenge algorithms was explored: 1, 2, 4, 5, 6, 8, 11, 13, 14, and 16

The algorithms were tasked with producing reports against 7 topics using Xpress’ corpus of SIGNAL articles:

artificial intelligence algorithms machine learning North Korea software defined social media UAV

For code validation, solvers’ solutions were tasked with creating machine-generated responses to the below question: What developments related to <______________> are most impactful to the national security of the United States?


Precision, Recall, and Overfitting

14

Algorithms for report generation should generalize well across topics… not be overfitted to any one topic.

Applying cosine similarity allowed us to visualize these performance indicators.

Overfitted Generalizable


Lighter colors reflect that algorithms were less focused on the topic, noisier, and more vague.

Code Validation Metric 1: Topics Focus

This metric explored the similarity of content across topics used by a single algorithm (determines focus or vagueness of content to topic).


Code Validation Metric 2: # of Topics Generated

16

2 reports 16 ai, nk 11 ai, algorithms 8 ai, ml

3 reports 2 ai, algorithms, social media

5 reports 1 ai, ml, nk, software

defined, social media 5 ai, algorithms, nk,

software defined, uav

7 reports 4, 6, 14: artificial intelligence (AI) algorithms machine learning (ML) north korea (NK) software defined social media uav

This metric simply identifies if the algorithm generated a report on the 7 test topics. It does not measure the quality of the content.


And the Xpress Challenge Winners Are…

“ #4 ”

“ #2 ”


“[The Xpress Challenge] is an excellent opportunity for the IC to break new ground in ways we’ve never seen before, which could ultimately shape how we inform policymakers or enable the warfighter in the field—it just doesn’t get any better than that.”

— Sue Gordon, Principal Deputy Director for National Intelligence

Final Thought on Xpress…

18


The ODNI-OUSD(I) Xtend Challenge

The Xtend Challenge should also help improve the quality and homogeneity of traditional, human-generated analytic products before they are delivered to customers.

The Xtend Challenge complements the Xpress Challenge by asking researchers to develop approaches for the machine evaluation of analytic products.

Initial submissions were evaluated based on: Overall scientific and technical merit Contribution and relevance to the Xtend

Challenge objective

The Xtend Challenge closed on 15 January 2018.

A machine evaluation capability will be critical should the machine generation of high-quality analytic products prove possible.

19


Xtend Solver Base

186 registrants across 32 countries!

14 submissions!


And the Initial Xtend Challenge Winners Are…

Xtend Challenge winners continue to compete for an additional $50k in prizes.

Working with AIS to develop a follow-on research effort—the Artificial Intelligence-Derived Evaluation (AIDE) program—using thousands of evaluated and scored IC products


The ODNI-OUSD(I) Xamine Challenge

The Xamine Challenge should also help improve the ensuring the accuracy and veracity of input information incorporated in traditional, human-generated analytic products.

The Xamine Challenge complements the Xpress Challenge and the Xtend Challenge by asking researchers to develop approaches for the machine inspection of information reports.

The Xamine Challenge closes on 2 July 2018. Register today

for a chance to win prizes totaling $75,000!

22

Factors for assessing the trustworthiness of information before it is incorporated into an IC analytic product widely can include: Ensuring accuracy and completeness, Detecting possible denial and deception, Identifying unique—and possibly unverifiable—

information, Determining the age and continued currency of

information, Weighting the technical elements of collection, and Ascertaining source access, validation, motivation,

possible bias, or level of expertise.


Xamine Challenge Solution Requirement

23

Proposed solutions should assess—using a proposed quantitative framework, where possible—the credibility of information reports’ underlying sources and methodologies upon which reports’ facts, opinions, or judgments are based as well as describe factors affecting source quality and credibility.

Xamine Challenge submissions require a written proposed solution which describes novel technologies or improvements to existing technologies. Each submission should include: An executive summary (no longer than 1-page) of the proposed solution. By making a

submission to this Challenge, Solvers agree to allow the executive summaries of their solutions to be posted on ODNI’s webpage and used in other publications reporting the results of this Challenge.

Detailed description of the proposed solution relative to existing technologies that address the outlined Challenge. Proposed solution descriptions should not exceed 10 pages in length, and should include discussion of how the solution meets the Challenge stated above.

Drawings/sketches/visual aids of the proposed solution, if applicable. Optional (will not impact judging): Description of resources, materials, budget, and proposed

timeframe needed to develop a prototype capable of evaluating and numerically measuring the trustworthiness of ingested information.


Xamine Solver Base (as of 6-21-2018)


Next Step 1: The Xpect and Xplore Challenges

25

ODNI, again in partnership with the Office of the Under Secretary of Defense for Intelligence (OUSD[I]), in Fiscal Year 2019 will pursue two public prize competitions—“Xplore” and “Xpect”—to explore opportunities, using artificial intelligence (AI) techniques, to further catalyze enhancements to the IC’s finished intelligence production processes.

The Xpect Challenge will ask solvers to describe artificial intelligence-based

approaches for automating model-based indications of change.

Through the Xplore Challenge, solvers will be asked to describe artificial intelligence-based approaches for enabling the automated and predictive discovery of information.

The Xpect and Xplore Challenges will likely follow the path of Xtend and Xamine (guaranteed $25k initial prize with $50k follow-up prize pool).


Next Step 2: Algorithm-Derived Decision Support (ADDS)

26

The Xpress Challenge showed that respectable results could be generated—often in seconds, but Solvers… had 90 days to prepare a single

response, used 15000 documents and blogs

from only one source, only answered 1 family of

questions, and only generated one product type.

ADDS will build on early Xpress Challenge successes by:

Using simulated crises* to exhibit the speed advantage of machine analytics,

Addressing a greater range of policymaker questions (political, economic, etc.),

Using classified, all-source reporting (SIGINT, HUMINT, imagery, press reporting, etc.), and

Through AIDE, leveraging Xtend Challenge results to employ automated scoring for real-time quality control.

Program Goal: The ADDS program will use a tournament to demonstrate the potential for machine-generated analytic products to provide timely and relevant decision advantage in a simulated crisis scenario.

* ADDS will use existing CIA scenarios to use proven reporting and taskings to explore the potential for machine analytics


Additional Approaches for Exploring Machine Analytics Overview: A two-day, National Academies of Science (NAS)-led workshop leveraging expertise from industry, academia, and government to investigate issues around the application of artificial intelligence techniques to IC analytic tradecraft.

Questions addressed included: What are the technical objectives and metrics needed

for success? What are the primary issues? What are the current and “next level” key performance

metrics? What is the “after next level” of expected research and development performance?

What is the research knowledge base? How can the U.S. Government best prepare the

scientific workforce to enhance discovery in this area?

Held 9-10 August 2017

Workshop report available now!

https://www.nap.edu/catalog/24900/challenges-in-machine-generation-of-analytic-products-from-multi-source-data

An additional workshop focused on machine verification of uncertain data scheduled for

later this year in Silicon Valley

27


Conclusion — Mitigating Bottlenecks in Decision Support

28

ODNI efforts to advance analysis with sensemaking techniques will continue to explore a promising parallel intelligence production pathway for decision advantage.

Rapid, AI-assisted provision of accurate and current intelligence products (within seconds or minutes of request), especially in the midst of a crisis, could enable crucial decision advantage for U.S. warfighters and policymakers.


Acknowledgments

29

ATIA set the stage for Xpress(+) by working with cleared industry to develop the initial technology roadmaps…

AFRL teamed with us to handle the contracting to access InnoCentive…

OUSD(I) co-sponsored Xpress, providing over half of the total project funding…

CEO of AFCEA agreed to let us use SIGNAL Magazine content …

ODNI/AIS evaluated the submissions based on their standards…

SIGNAL Magazine promoted the challenge in 3 articles…

ISSO helped with the source code validation…


Questions?

For more information:

[email protected] www.dni.gov/in-step www.dni.gov/in-vest

overview of current and planned odni-led sensemaking...

Documents