1 rapid: representation and analysis of probabilistic intelligence data carnegie mellon university...

Post on 21-Jan-2016

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

RAPID:Representation and Analysis ofProbabilistic Intelligence Data

Carnegie Mellon UniversityPI : Prof. Jaime G. Carbonell / jgc@cs.cmu.edu / (412) 268-7279

Dr. Eugene Fink / e.fink@cs.cmu.edu / (412) 268-6593Dr. Anatole Gershman / anatoleg@cs.cmu.edu / (412) 268-8259

DYNAMiX TechnologiesPOC: Dr. Ganesh Mani / gmani@dynamixtechnologies.com / (412) 401-0121

Mr. Dwight Dietrich / ddietrich@dynamixtechnologies.com / (724) 940-4304

PAINT PAINT

2

Carnegie Mellon

FacultyJaime G. CarbonellEugene FinkAnatole Gershman

StudentsBin FuDiwakar PunjaniAndrew Yeager

People

DYNAMiX

PrincipalsDwight DietrichGanesh Mani

EngineersAtul BhandariJeremy HermannVeera Manda

3

Outline of the presentation

• RAPID functionality

• Preliminary demo

• Architecture and main components

• Integration with REALISM

• Current results and work plan

4

Analysis of uncertain intelligenceRAPID is a probabilistic reasoning engine for the analysis of dynamically evolving intelligence data.

Intelligence results

RAPID will help:• Identify important holes• Locate most crucial

missing pieces• Insert these pieces

Initial knowledge

Ava

ilab

le k

now

ledg

eO

bse

rvab

le

fact

sH

idd

en

fact

s

Jigsaw analogy:

Knowledge sources:• Public domain• Intelligence• Inferences

5

Analysis of uncertain intelligenceRAPID will help intelligence analysts to accomplish the following tasks.• Draw probabilistic conclusions from available

intelligence, including uncertain and missing data• Identify potentially surprising developments• Formulate and assess hypotheses• Identify critical uncertainties• Develop strategies for proactive collection of

additional intelligence to resolve uncertainties, based on the analysis of cost / benefit trade-offs

Filtering and processing of

new intelligence

Propagation of inferences

Analysisof key

indicators

Development of intelligence-

collection plans

Massive newintelligence

Intelligencecollection

Analysts

6

Underlying functionality• Representation of uncertainty:

Novel representation of massive uncertain data,which supports fast matching and inferences

• Inferences from uncertain data:Scalable inference mechanism for reasoningabout uncertain intelligence

• Analysis of critical uncertainties:Assessment of uncertain situations, evaluation of datautility, and identification of important missing data

• Proactive intelligence planning:Evaluation of available probes and constructionof optimized intelligence-collection plans

7

Outline of the presentation

• RAPID functionality

• Preliminary demo

• Architecture and main components

• Integration with REALISM

• Current results and work plan

8

Preliminary demo

Uncertainty analysisand probe evaluation,integrated into Excel.

9

Outline of the presentation

• RAPID functionality

• Preliminary demo

• Architecture and main components

• Integration with REALISM

• Current results and work plan

10

Architecture

Advanced analysis of incomplete data,identification of critical uncertainties,evaluation and selection of probes,what-if analysis, and visualization. Excel extension for the analysis

of uncertainty, probes, and proactive data collection

Uncertainty calculus andproactive probe planning

A large-scale database of incomplete anduncertain facts, uncertain inference rules,and hypotheses, which allows scalableplanning of proactive data collection.

Scalable assessment ofuncertain intelligence

Relational database of uncertaindata and inference rules

Uncertain situation assessmentand data-collection planning

An advanced API for integration withother systems.

Optional user interface for the integratedaccess to all system components, whichextends the standard Excel interface.

Analystinterface

11

Architecture

Proactiveintelligencecollection

Generalintelligencecollection

Massive newintelligence

Massive newintelligence

Processing ofdata streams

Real-time matching of queriesand inference rules against amassive stream of new data

Approved plans forproactive data collection

Fast database operations on astream of newly incoming data,and integration of this streamwith the static database.

Scalable assessment ofuncertain intelligence

Relational database of uncertaindata and inference rules

Uncertainty calculus andproactive probe planning

Excel extension for the analysis of uncertainty, probes, and

proactive data collection

Uncertain situation assessmentand data-collection planning

Analystinterface

Hypotheses,conclusions, and

data-collection plans

Hypotheses,conclusions, and

data-collection plans

12

Architecture

Proactiveintelligencecollection

Generalintelligencecollection

Massive newintelligence

Massive newintelligence

Scalable assessment ofuncertain intelligence

Relational database of uncertaindata and inference rules

Uncertainty calculus andproactive probe planning

Excel extension for the analysis of uncertainty, probes, and

proactive data collection

Uncertain situation assessmentand data-collection planning

Analystinterface

Processing ofdata streams

Real-time matching of queriesand inference rules against amassive stream of new data

Value-addedreasoning tools

Hypotheses,conclusions, and

data-collection plans

Hypotheses,conclusions, and

data-collection plans

Approved plans forproactive data collection

13

Processing ofdata streams

Value-addedreasoning tools

Uncertainty database

Uncertainty calculus andproactive probe planning

Microsoft Excel

• Representation of probabilitydistributions and qualitativeuncertainty

• Uncertainty arithmetic

Uncertainty analysis

• Representation of data utility• Tracking utility changes

during data collection• Identification of critical

uncertainties

Situation assessment

• Representation of probes• Evaluation of probe utility• Automated selection and

launching of critical probes

Proactive probe planning

What-if analysis of alternativefuture developments and data-collection plans based on anextension of Excel “scenarios”

Contingency planning

Analystinterface

14

Scalable assessmentof uncertain intelligence

Uncertainfacts

Goals, queries, andhypotheses

Prioritized plans for proactive

data collection

Uncertaininference

rules

Semanticnetwork

Criticaluncertainties

Querymatches

Evaluation ofhypotheses

Inferredfacts

Learnedinference

rules

Conflictdetection

Manual entry, selection, and editing of knowledge

Analystinterface

15

Value-added reasoning tools

Part of uncertainty database

Knownpatterns

Identification of patternsand their gradual changes

in massive data streams

ARGUS data explorer

Contingency analysisWhat-if analysis of alternative

hypotheses, data-collection plans,and possible future developments

Alternative scenariosand their implications

Markov reasoningSelection of most likelyhypotheses and possible

future developments

Markovmodels

Adversarial searchAnalysis of possible

concealment and disinformation,and plans to prevent them

Adversarial goalsand resources

Identification of syntacticallydifferent words that refer

to the same objects

Entity co-reference

These tools are not essentialfor the core functionality.

Uncertainty calculus andproactive probe planning

Excel extension for the analysis of uncertainty, probes, and

proactive data collection

The available intelligence data and inference rules are in Excel tables, and in the uncertainty database integrated with Excel.

16

Analyst interface• Optional extension of the Excel interface

• Visualization and explanation of intelligence data, inferences, and data-collection plans

17

Outline of the presentation

• RAPID functionality

• Preliminary demo

• Architecture and main components

• Integration with REALISM

• Current results and work plan

18

Integration goalsWe will integrate the text-extraction system developed by HNC / Fair Isaac with the uncertainty-analysis system developed by CMU / DYNAMiX. The integrated system will support the following capabilities.• Extraction of facts, relations, and causal

links from natural-language documents• Evaluation of given hypotheses• Proactive information gathering• Application to the analysis of Iranian

nano-technology plans and capabilities

19

Inputs and outputs

Output:• Large structured tables of

relevant facts and entities, which include uncertainty

• Inference-rule representation of relations and causal links, also including uncertainty

Input:• Requirements and filters for

the information extraction• Natural-language documents• World-wide web

Output:• Inferences from uncertain data• Exact and approximate

matches for given queries• Hypothesis assessment• Proactive plans for collecting

additional data

Input:• Tables of uncertain facts• Uncertain inference rules• Queries for specific data• Analyst hypotheses

REALISM RAPID

20

Architecture

Hypotheses,conclusions, and

data-collection plans

Hypotheses,conclusions, and

data-collection plans

Informationrequests

REALISMHNC / Fair Isaac

Structured relations andcausal links

Structuredfacts andentities

Topicfilters

RAPIDCMU / DYNAMiX

Analystinterface

Scalable assessment ofuncertain intelligence

Uncertainty calculus andproactive probe planning

Uncertain situation assessmentand data-collection planning

21

Outline of the presentation

• RAPID functionality

• Preliminary demo

• Architecture and main components

• Integration with REALISM

• Current results and work plan

22

Initial results• Detailed technical plan of uncertain situation

assessment and proactive probe planning:architecture, functionality, and algorithms

• Uncertain intelligence scenario based onpublic data about Iranian nano-technology

• Preliminary prototype of situation assessment tools integrated with a relational database

• Preliminary prototype of a tool for the resolution of entity co-references

• Application of DYNAMiX Data Explorer to the nano-tech conference data provided by PAINT

23

Current work

• Uncertainty calculus,integrated with Excel

• Proactive probe planning

• Scalable uncertainty assessment,integrated with a relational database

• Integration with REALISM

• Initial analyst interface

24

Prototype of uncertainty calculus March Prototype of probe-planning tools MarchInitial RAPID / REALISM integration MayInitial analyst interface (extended Excel) JunePrototype of uncertainty database July

Short-term plan

25

Uncertain situation assessmentand proactive probe planning

July 2008

Discrimination among competing hypothesesand identification of critical uncertainties

July 2009

Fully integrated deployable prototype July 2009Advanced proactive-intelligence planningand learning of inference rules

July 2010

Value-added tools, which may include data-stream processing, entity co-reference, adversarial search, and Markov reasoning

July 2011

Fully integrated deliverable system Jan 2012

All versions of RAPID will demonstrate all main capabilities, with increasing functionality over time.

Long-term plan

26

EvaluationWe expect that RAPID will provide significant advantage over available off-the-shelf tools, such as standard spreadsheets and database systems.

To support this claim, we plan to compare the productivity of analysts using RAPID with that of analysts who perform the same tasks using commercially available tools.

Experimental group:Use of RAPID

Control group:Use of standard tools

27

EvaluationWe expect that RAPID will provide significant advantage over available off-the-shelf tools, such as standard spreadsheets and database systems.

To support this claim, we plan to compare the productivity of analysts using RAPID with that of analysts who perform the same tasks using commercially available tools.We will view RAPID as success if it consistently outperforms the standard tools, and the analysts report the overall positive experience of using it.

28

Adjustment of the earlier planWe need to adjust the plan to the new budget.We will deliver the full core functionality, but we propose to reduce the work on value-added tools.

Reduced work• Processing of data streams• Advanced contingency analysis • Analyst interface

Suspended work• Predictive Markov models• Analysis of adversarial actions

29

30

Appendices

• Previous work

• Empirical evaluation

• PAINT contributions

31

ARGUSARGUS project sponsored by DTO/ARDA: Identification and tracking of novel patterns in massive databases and data streams.

CreateBackground

Model

DetectNovelEvents

GenerateProfiles

Re-cluster

UpdateProfiles

Match

HistoricalData

BackgroundModel

NovelEvents

NovelClusters

TrackedEvents

New ProfilesProfiles

Data

Alerts

Analysts

CreateBackground

Model

DetectNovelEvents

GenerateProfiles

Re-cluster

Match

HistoricalData

BackgroundModel

NovelEvents

NovelClusters

New ProfilesProfiles

New

Alerts

Analysts

32

ARGUS• Estimate the density function at t0• Grow the cluster for a period of Δt while

reducing the weight of old records

• Estimate the new density function at t0+Δt

• Compare the two estimates

33

ARGUS

t0 + Δt

Re-clustering

RespiratoryDiseases SARS

Densitychange

34

RADARRADAR project sponsored by DARPA:Analysis and management of volatile crisis situations based on uncertain data.

Data elicitorParser Optimizer

Processnew data

Update crisis-management

plans

Suggest data-collection strategies

Top-level controland learning

Analysts

35

RADAR

We have applied the system to repair a schedule of a conference after a crisis loss of rooms.

After

Crisis

0.50 Manual

Repair

0.61 Auto w

/oE

licitation

0.72 Auto w

ithE

licitation

0.93ScheduleQuality

Manual and auto repair

20

0.72

0.93

ScheduleQuality

6040 80 100Number of Questions

(Out of 1100)

Dependency of the qualityon the number of questions

0

36

RAPID

Unlike ARGUS…• Represents and analyzes uncertainty• Supports complex inferences

Unlike RADAR…• Scales to massive intelligence datasets• Analyzes complex “external” situations• Develops intelligence-collection plans

37

Appendices

• Previous work

• Empirical evaluation

• PAINT contributions

38

Evaluation goalsWe expect that RAPID will provide significant advantage over available off-the-shelf tools, such as standard spreadsheets and database systems.

To support this claim, we plan to compare the productivity of analysts using RAPID with that of analysts who perform the same tasks using commercially available tools.

Experimental group:Use of RAPID

Control group:Use of standard tools

39

Experimental setup

We expect to recruit retired intelligence analysts for the system evaluation, and ask them to perform several tasks based on given uncertain data.

• Identify the data most relevant to given tasks

• Evaluate the validity of given hypotheses

• Find relevant hidden patterns

• Identify critical missing data and propose acost-effective plan for collecting this data

40

Performance measurementsWe will measure the following main factors to evaluate the performance of analysts:

• Number of high-level tasks completedwithin the experiment time frame

• Accuracy of hypothesis evaluation

• Number and relevance of identified patterns

• Effectiveness and costs of data-collection plans

We will also ask analysts to complete a questionnaire on their overall experience.

41

Expected results

We will view the proposed work as success if• RAPID consistently outperforms the off-the-

shelf tools in all four performance factors,

• the performance difference for each factor is statistically significant, and

• analysts report the overall positive experience of using the system.

42

RAPID / REALISM evaluation

Component utility:We will also evaluate the utility of REALISM and RAPID by comparing the productivity of subjects under the following three conditions:• Use of the integrated system• Use of REALISM without RAPID• Use of RAPID without REALISM

Component evaluation:We will measure the following performance factors:• Accuracy and completeness of text extraction• Accuracy of hypothesis evaluation• Effectiveness of data-collection plans• Speed of each system component

43

Appendices

• Previous work

• Empirical evaluation

• PAINT contributions

44

Main contributions

Feedback

Strategy Generation and Exploration

Dynamic Simulation

Models

Response Options

2

3

4

Representation of massive uncertain knowledgeAutomated discovery of causal relationships

Fast probabilistic integration of all evidenceAnalysis of possible future developments

1

Identification of critical uncertaintiesPlanning of proactive intelligence gathering

1

4

3

Data

45

Inputs and outputs

Uncertain intelligenceand analyst opinions:

Massive stream ofstructured records

Specifichypotheses

New learnedrules

Data-searchqueries

Querymatches

Evaluation ofhypotheses

Plans for proactiveintelligence collection

Uncertainsituation

assessment

Inferencerules

Domainknowledge

RAPID

Generalintelligencecollection

Proactiveintelligencecollection

46

InputsFrom other PAINT components:• Available intelligence data and its certainty• Hypotheses about unknown factors• Available domain knowledge

From analysts:• Intelligence-analysis tasks and priorities• Hypotheses and related opinions• Responses to RAPID-generated probes• Additional domain knowledge

From other sources:• Databases with available intelligence• Public databases with relevant data

47

Outputs• Inferences from available uncertain data• Evaluation of given hypotheses• New hypotheses and their certainties• Plans for proactive intelligence collection• Learned inference rules

top related