searching for the quantifiable, scalable, verifiable, and understandable

22
UNCLASSIFIED UNCLASSIFIED 1 Searching for the Quantifiable, Scalable, Verifiable, and Understandable Quantitative Methods in Defense of National Security, 25 May 2010 Dewey Murdick, Ph.D. Program Manager 25 May 2010

Upload: aira

Post on 25-Feb-2016

26 views

Category:

Documents


1 download

DESCRIPTION

Searching for the Quantifiable, Scalable, Verifiable, and Understandable. Quantitative Methods in Defense of National Security, 25 May 2010. Dewey Murdick, Ph.D. Program Manager. Intelligence Advanced Research Projects Activity (IARPA). Overview. This is about taking real risk. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

1

Searching for the Quantifiable, Scalable, Verifiable, and Understandable

Quantitative Methods in Defense of National Security, 25 May 2010

Dewey Murdick, Ph.D.Program Manager

25 May 2010

Page 2: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

Intelligence Advanced Research Projects Activity

(IARPA)Dr. Lisa Porter

Director, IARPA

Dr. Peter HighnamOffice Director, Incisive Analysis

Dr. Pete HaalandOffice Director, Safe & Secure

Operations

Dr. Ed BaranoskiOffice Director, Smart Collection

25 May 2010 2

Page 3: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

Overview

This is about taking real risk.– This is NOT about “quick wins”, “low-hanging fruit”, “sure things”, etc.

CAVEAT: HIGH-RISK/HIGH-PAYOFF IS NOT A FREE PASS FOR STUPIDITY.– Competent failure is acceptable; incompetence is not.

“Best and brightest”.– World-class PMs.

o IARPA will not start a program without a good idea and an exceptional person to lead its execution.

– Full and open competition to the greatest possible extent. Cross-community focus.

– Address cross-community challenges– Leverage agency expertise (both operational and R&D)– Work transition strategies and plans

IARPA’s mission is to invest in high-risk/high-payoff research programs that have the potential to provide the U.S. with an overwhelming intelligence

advantage over our future adversaries

25 May 2010 3

Page 4: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

The “P” in IARPA is very important Technical and programmatic excellence are required Each Program will have a clearly defined and measurable end-goal,

typically 3-5 years out.– Intermediate milestones to measure progress are also required– Every Program has a beginning and an end– A new program may be started that builds upon what has been

accomplished in a previous program, but that new program must compete against all other new programs

This approach, coupled with rotational PM positions, ensures that… – IARPA does not “institutionalize” programs– Fresh ideas and perspectives are always coming in– Status quo is always questioned– Only the best ideas are pursued, and only the best performers are

funded.

25 May 2010 4

Page 5: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

The “Heilmeier Questions”

1. What are you trying to do?2. How does this get done at present? Who does it? What are the

limitations of the present approaches?– Are you aware of the state-of-the-art and have you thoroughly thought

through all the options?3. What is new about your approach? Why do you think you can be

successful at this time?– Given that you’ve provided clear answers to 1 & 2, have you created a

compelling option?– What does first-order analysis of your approach reveal?

4. If you succeed, what difference will it make?– Why should we care?

5. How long will it take? How much will it cost? What are your mid-term and final exams?

– What is your program plan? How will you measure progress? What are your milestones/metrics? What is your transition strategy?

25 May 2010 5

Page 6: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

The Three Strategic Thrusts (Offices)

Smart Collection: dramatically improve the value of collected data– Innovative modeling and analysis approaches to identify where to look

and what to collect.– Novel approaches to access. – Innovative methods to ensure the veracity of data collected from a

variety of sources. Incisive Analysis: maximizing insight from the information we collect, in

a timely fashion– Advanced tools and techniques that will enable effective use of large

volumes of multiple and disparate sources of information.– Innovative approaches (e.g., using virtual worlds, shared workspaces)

that dramatically enhance insight and productivity.– Methods that incorporate socio-cultural and linguistic factors into the

analytic process.– Estimation and communication of uncertainty and risk.

Safe and Secure Operations: countering new capabilities of our adversaries that could threaten our ability to operate effectively in a networked world– Cybersecurity

o Focus on future vulnerabilitieso Approaches to advancing the "science" of cybersecurity, to include the

development of fundamental laws and metrics – Quantum information science & technology

25 May 2010 6

Page 7: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

Program Manager Interest Areas by Office

7

smart collection

safe and secure operations

incisive analysis

20 April 201025 May 2010

Page 8: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

Concluding Thoughts on IARPA

Technical Excellence & Technical Truth

– Scientific Method

– Peer/independent review

– Full and open competition We are looking for outstanding PMs. How to find out more about IARPA:

www.iarpa.gov

25 May 2010 8

Page 9: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

9

Conference on Technical Information Discovery, Extraction & Organization– Mark Heiligman, IARPA PM, Mile-wide, Mile-deep (M2) Exploration– Held October 28-29, 2008, consisted of talks, breakout sessions, and open discussion– Attended by 30+ researchers, business intelligence, and government participants

Facilitated an open and active discussion on current methods, challenges, and opportunities in:– Information Retrieval– Text Processing– Knowledge Discovery– Information Extraction– Social Network Analysis– Scientometrics– Information Visualization and – Closely related research domains

Goal: Drive technical innovation and explore novel applications in the area of systematically mining the global technical literature for useful and non-obvious information and insights

25 May 2010

This talk is a personal summary of the materials presented and discussed at the conference.

Page 10: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

M2 Information Content Formal Presentations

– Mile-wide, Mile-deep, Mark Heiligman, IARPA– Information Retrieval, Scientometrics/Text Mining,and Literature-related Discovery and

Innovation, Ron Kostoff, MITRE– From Knowledge Mapping to Innovation Evolution, Hsinchun Chen, University of

Arizona– Machine Learning for Extraction, Integration and Mining of Research Literature,

Andrew McCallum, University of Massachusetts Amherst– Information Retrieval:The Path Ahead, Jamie Callan, Carnegie Mellon University– Sentiment Analysis from User Forums, Ronen Feldman, Hebrew University– The Accuracy of a Map of Science: Measurement & Implications, Richard Klavans,

SciTech Strategies, Inc– Document Classification Using Nonnegative Matrix Factorization, Michael W. Berry,

University of Tennessee, Knoxville Breakout Sessions & Open Discussion – richest idea content, and biggest contribution to

what follows MITRE Summary:

– A Two-step Analytic-workshop Process For Identifying Promising Research Opportunities, by Ronald Kostoff et al.

25 May 2010 10

Page 11: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

Problems Too Much Data / Diversity

– Scale– Textual / Multimedia– Multilingual– Multiple Sources

Too Complex– Motivation (Create / Disseminate)– Topics / Domains (# / Connectedness)– Shared Intentionally or Not

Too Fast – Streaming

Example for Technical Topics:Scientific Literature, Patents, Conference Proceedings, Talks, Technical Blogs, S&T News, Social Media, Experimental Data, Computational Models / Code, Forecasts, Corporate Filings, Government Funding, Policy, Public Opinion, etc.

1125 May 2010

Page 12: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

Weak Signals in Context

Find weak signals

Use weak signals within context for– Finding connections– Anomaly detection/rare events– Cultural meaning / implications

Manage uncertainty

Development new standards for “ground truth”

1225 May 2010

Page 13: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

Automated Connection Making / Knowledge Discovery Iterative information retrieval (IR), extraction (IE), and linkages

identification Leveraging previous relevancy judgments and feedback Probabilistic linking of subjective qualities within text

Goal: find high-value, low-signature information in context

Connecting Weak Signals

13

Material processing method X may be interesting for property Y

Intriguing Rumors, Uncertain Source

Analyst Analyst Analyst w/

Quantitative System

!

25 May 2010

Page 14: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

Enhancing Contextual Awareness Automatically

– Leverage element characteristics in connection building process– Focused information augmentation from secondary sources– Characterize and apply to analogous situations

o Network Behaviors and Featureso Assessments of subjectivity (e.g., theme, sentiment)

Goal: rapidly inform non-experts with context about a given area/issue

14

www

Context

S&T Literature

Where does this

nugget of

information fit?

Analyst25 May 2010

Page 15: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

Identifying Outliers, Rare Events Automatically

– Measuring and analyzing low-frequency indicators in group trends– Systematically identifying anomalies from records of interest and early-stage

emerging technologies – Identifying rare events based on non-technical phrase association patterns– Extracting technical phrases of interest by targeting non-technical phrases such as

sentiment, analysis, stylistics, etc.– Intelligent clustering techniques

Goal: Identify significant rare events

15

Bank statements

Is Jim doing something illegal?

Analyst

25 May 2010

Page 16: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

Collaboration (Two Different Kinds)

Common playground facilitating:– Large-scale data sharing– Data discovery annotation– Error corrections– Multi-source integration– Recall of what has been done in the past

Measure collaboration– Recognize cultural differences– Discover key players – Process changes over time

1625 May 2010

Page 17: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

Multilingual Methods

17

Need algorithms that can process, filter, and analyze multilingual data

Leverage domain-specific machine translation

Compare and contrast translated and multilingual data for improvements in queries, trends, etc.

Language translation is high cost

Translation is not enough to understand meaning in non-English text

Cultural information helps to understand social landscape, motivation, and production of scientists in S&T

25 May 2010

Page 18: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

No Black Boxes No Algorithm black boxes

– Shared environment for algorithm development– Success verifiable through indicator metrics– Output must be humanly comprehensible

Human comprehension metrics:o Number of potential associationso Number of dimensions simultaneously analyzedo Steps to finding informationo Amount of time to digest informationo Amount of information at timeo Efficiency of user-driven tuning of level-of-detail

Algorithmic output exportable to interactive tools

1825 May 2010

Page 19: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

User-Friendly Displays for Data Analysis

Interactive and multifaceted views of scientific landscape– Geo-location– Entity Networks– Topical Networks

Environments that provide both contextual awareness and visualizations– Contextual information (Wikipedia

style) provided when user encounters unfamiliar term or concept

Interactive interfaces to pull out information

1925 May 2010

Page 20: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

Metric Validation Processes User studies and human labeling to verify

data in information extraction(IE) and NLP is costly

Use hybrid methods (e.g., boosting)

Leverage automatically processed information from a external source to validate output

Automating identification of trusted sources to help validation process

Validate results with historical studies, knowledge of current state, and forecasts

20

Serious Need for Novel Thinking

25 May 2010

Page 21: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

Things to Remember Track Uncertainty

– Indicator metrics– Weak signals

No black boxes– Human comprehensible output

Provide clear view of evaluation metrics– Gold standards – Ground truth

2125 May 2010

Page 22: Searching for the Quantifiable, Scalable, Verifiable, and Understandable

UNCLASSIFIED

UNCLASSIFIED

Take Action Respond to an open BAA

Chat with a Program Manager (PM)

Come up with new ideas for programs, become a PM

Provide information to open RFIs

2225 May 2010