1 rapid: representation and analysis of probabilistic intelligence data carnegie mellon university...
TRANSCRIPT
1
RAPID:Representation and Analysis ofProbabilistic Intelligence Data
Carnegie Mellon UniversityPI : Prof. Jaime G. Carbonell / [email protected] / (412) 268-7279
Dr. Eugene Fink / [email protected] / (412) 268-6593Dr. Anatole Gershman / [email protected] / (412) 268-8259
DYNAMiX TechnologiesPOC: Dr. Ganesh Mani / [email protected] / (412) 401-0121
Mr. Dwight Dietrich / [email protected] / (724) 940-4304
PAINT PAINT
2
Carnegie Mellon
FacultyJaime G. CarbonellEugene FinkAnatole Gershman
StudentsBin FuDiwakar PunjaniAndrew Yeager
People
DYNAMiX
PrincipalsDwight DietrichGanesh Mani
EngineersAtul BhandariJeremy HermannVeera Manda
3
Outline of the presentation
• RAPID functionality
• Preliminary demo
• Architecture and main components
• Integration with REALISM
• Current results and work plan
4
Analysis of uncertain intelligenceRAPID is a probabilistic reasoning engine for the analysis of dynamically evolving intelligence data.
Intelligence results
RAPID will help:• Identify important holes• Locate most crucial
missing pieces• Insert these pieces
Initial knowledge
Ava
ilab
le k
now
ledg
eO
bse
rvab
le
fact
sH
idd
en
fact
s
Jigsaw analogy:
Knowledge sources:• Public domain• Intelligence• Inferences
5
Analysis of uncertain intelligenceRAPID will help intelligence analysts to accomplish the following tasks.• Draw probabilistic conclusions from available
intelligence, including uncertain and missing data• Identify potentially surprising developments• Formulate and assess hypotheses• Identify critical uncertainties• Develop strategies for proactive collection of
additional intelligence to resolve uncertainties, based on the analysis of cost / benefit trade-offs
Filtering and processing of
new intelligence
Propagation of inferences
Analysisof key
indicators
Development of intelligence-
collection plans
Massive newintelligence
Intelligencecollection
Analysts
6
Underlying functionality• Representation of uncertainty:
Novel representation of massive uncertain data,which supports fast matching and inferences
• Inferences from uncertain data:Scalable inference mechanism for reasoningabout uncertain intelligence
• Analysis of critical uncertainties:Assessment of uncertain situations, evaluation of datautility, and identification of important missing data
• Proactive intelligence planning:Evaluation of available probes and constructionof optimized intelligence-collection plans
7
Outline of the presentation
• RAPID functionality
• Preliminary demo
• Architecture and main components
• Integration with REALISM
• Current results and work plan
8
Preliminary demo
Uncertainty analysisand probe evaluation,integrated into Excel.
9
Outline of the presentation
• RAPID functionality
• Preliminary demo
• Architecture and main components
• Integration with REALISM
• Current results and work plan
10
Architecture
Advanced analysis of incomplete data,identification of critical uncertainties,evaluation and selection of probes,what-if analysis, and visualization. Excel extension for the analysis
of uncertainty, probes, and proactive data collection
Uncertainty calculus andproactive probe planning
A large-scale database of incomplete anduncertain facts, uncertain inference rules,and hypotheses, which allows scalableplanning of proactive data collection.
Scalable assessment ofuncertain intelligence
Relational database of uncertaindata and inference rules
Uncertain situation assessmentand data-collection planning
An advanced API for integration withother systems.
Optional user interface for the integratedaccess to all system components, whichextends the standard Excel interface.
Analystinterface
11
Architecture
Proactiveintelligencecollection
Generalintelligencecollection
Massive newintelligence
Massive newintelligence
Processing ofdata streams
Real-time matching of queriesand inference rules against amassive stream of new data
Approved plans forproactive data collection
Fast database operations on astream of newly incoming data,and integration of this streamwith the static database.
Scalable assessment ofuncertain intelligence
Relational database of uncertaindata and inference rules
Uncertainty calculus andproactive probe planning
Excel extension for the analysis of uncertainty, probes, and
proactive data collection
Uncertain situation assessmentand data-collection planning
Analystinterface
Hypotheses,conclusions, and
data-collection plans
Hypotheses,conclusions, and
data-collection plans
12
Architecture
Proactiveintelligencecollection
Generalintelligencecollection
Massive newintelligence
Massive newintelligence
Scalable assessment ofuncertain intelligence
Relational database of uncertaindata and inference rules
Uncertainty calculus andproactive probe planning
Excel extension for the analysis of uncertainty, probes, and
proactive data collection
Uncertain situation assessmentand data-collection planning
Analystinterface
Processing ofdata streams
Real-time matching of queriesand inference rules against amassive stream of new data
Value-addedreasoning tools
Hypotheses,conclusions, and
data-collection plans
Hypotheses,conclusions, and
data-collection plans
Approved plans forproactive data collection
13
Processing ofdata streams
Value-addedreasoning tools
Uncertainty database
Uncertainty calculus andproactive probe planning
Microsoft Excel
• Representation of probabilitydistributions and qualitativeuncertainty
• Uncertainty arithmetic
Uncertainty analysis
• Representation of data utility• Tracking utility changes
during data collection• Identification of critical
uncertainties
Situation assessment
• Representation of probes• Evaluation of probe utility• Automated selection and
launching of critical probes
Proactive probe planning
What-if analysis of alternativefuture developments and data-collection plans based on anextension of Excel “scenarios”
Contingency planning
Analystinterface
14
Scalable assessmentof uncertain intelligence
Uncertainfacts
Goals, queries, andhypotheses
Prioritized plans for proactive
data collection
Uncertaininference
rules
Semanticnetwork
Criticaluncertainties
Querymatches
Evaluation ofhypotheses
Inferredfacts
Learnedinference
rules
Conflictdetection
Manual entry, selection, and editing of knowledge
Analystinterface
15
Value-added reasoning tools
Part of uncertainty database
Knownpatterns
Identification of patternsand their gradual changes
in massive data streams
ARGUS data explorer
Contingency analysisWhat-if analysis of alternative
hypotheses, data-collection plans,and possible future developments
Alternative scenariosand their implications
Markov reasoningSelection of most likelyhypotheses and possible
future developments
Markovmodels
Adversarial searchAnalysis of possible
concealment and disinformation,and plans to prevent them
Adversarial goalsand resources
Identification of syntacticallydifferent words that refer
to the same objects
Entity co-reference
These tools are not essentialfor the core functionality.
Uncertainty calculus andproactive probe planning
Excel extension for the analysis of uncertainty, probes, and
proactive data collection
The available intelligence data and inference rules are in Excel tables, and in the uncertainty database integrated with Excel.
16
Analyst interface• Optional extension of the Excel interface
• Visualization and explanation of intelligence data, inferences, and data-collection plans
17
Outline of the presentation
• RAPID functionality
• Preliminary demo
• Architecture and main components
• Integration with REALISM
• Current results and work plan
18
Integration goalsWe will integrate the text-extraction system developed by HNC / Fair Isaac with the uncertainty-analysis system developed by CMU / DYNAMiX. The integrated system will support the following capabilities.• Extraction of facts, relations, and causal
links from natural-language documents• Evaluation of given hypotheses• Proactive information gathering• Application to the analysis of Iranian
nano-technology plans and capabilities
19
Inputs and outputs
Output:• Large structured tables of
relevant facts and entities, which include uncertainty
• Inference-rule representation of relations and causal links, also including uncertainty
Input:• Requirements and filters for
the information extraction• Natural-language documents• World-wide web
Output:• Inferences from uncertain data• Exact and approximate
matches for given queries• Hypothesis assessment• Proactive plans for collecting
additional data
Input:• Tables of uncertain facts• Uncertain inference rules• Queries for specific data• Analyst hypotheses
REALISM RAPID
20
Architecture
Hypotheses,conclusions, and
data-collection plans
Hypotheses,conclusions, and
data-collection plans
Informationrequests
REALISMHNC / Fair Isaac
Structured relations andcausal links
Structuredfacts andentities
Topicfilters
RAPIDCMU / DYNAMiX
Analystinterface
Scalable assessment ofuncertain intelligence
Uncertainty calculus andproactive probe planning
Uncertain situation assessmentand data-collection planning
21
Outline of the presentation
• RAPID functionality
• Preliminary demo
• Architecture and main components
• Integration with REALISM
• Current results and work plan
22
Initial results• Detailed technical plan of uncertain situation
assessment and proactive probe planning:architecture, functionality, and algorithms
• Uncertain intelligence scenario based onpublic data about Iranian nano-technology
• Preliminary prototype of situation assessment tools integrated with a relational database
• Preliminary prototype of a tool for the resolution of entity co-references
• Application of DYNAMiX Data Explorer to the nano-tech conference data provided by PAINT
23
Current work
• Uncertainty calculus,integrated with Excel
• Proactive probe planning
• Scalable uncertainty assessment,integrated with a relational database
• Integration with REALISM
• Initial analyst interface
24
Prototype of uncertainty calculus March Prototype of probe-planning tools MarchInitial RAPID / REALISM integration MayInitial analyst interface (extended Excel) JunePrototype of uncertainty database July
Short-term plan
25
Uncertain situation assessmentand proactive probe planning
July 2008
Discrimination among competing hypothesesand identification of critical uncertainties
July 2009
Fully integrated deployable prototype July 2009Advanced proactive-intelligence planningand learning of inference rules
July 2010
Value-added tools, which may include data-stream processing, entity co-reference, adversarial search, and Markov reasoning
July 2011
Fully integrated deliverable system Jan 2012
All versions of RAPID will demonstrate all main capabilities, with increasing functionality over time.
Long-term plan
26
EvaluationWe expect that RAPID will provide significant advantage over available off-the-shelf tools, such as standard spreadsheets and database systems.
To support this claim, we plan to compare the productivity of analysts using RAPID with that of analysts who perform the same tasks using commercially available tools.
Experimental group:Use of RAPID
Control group:Use of standard tools
27
EvaluationWe expect that RAPID will provide significant advantage over available off-the-shelf tools, such as standard spreadsheets and database systems.
To support this claim, we plan to compare the productivity of analysts using RAPID with that of analysts who perform the same tasks using commercially available tools.We will view RAPID as success if it consistently outperforms the standard tools, and the analysts report the overall positive experience of using it.
28
Adjustment of the earlier planWe need to adjust the plan to the new budget.We will deliver the full core functionality, but we propose to reduce the work on value-added tools.
Reduced work• Processing of data streams• Advanced contingency analysis • Analyst interface
Suspended work• Predictive Markov models• Analysis of adversarial actions
29
30
Appendices
• Previous work
• Empirical evaluation
• PAINT contributions
31
ARGUSARGUS project sponsored by DTO/ARDA: Identification and tracking of novel patterns in massive databases and data streams.
CreateBackground
Model
DetectNovelEvents
GenerateProfiles
Re-cluster
UpdateProfiles
Match
HistoricalData
BackgroundModel
NovelEvents
NovelClusters
TrackedEvents
New ProfilesProfiles
Data
Alerts
Analysts
CreateBackground
Model
DetectNovelEvents
GenerateProfiles
Re-cluster
Match
HistoricalData
BackgroundModel
NovelEvents
NovelClusters
New ProfilesProfiles
New
Alerts
Analysts
32
ARGUS• Estimate the density function at t0• Grow the cluster for a period of Δt while
reducing the weight of old records
• Estimate the new density function at t0+Δt
• Compare the two estimates
33
ARGUS
t0 + Δt
Re-clustering
RespiratoryDiseases SARS
Densitychange
34
RADARRADAR project sponsored by DARPA:Analysis and management of volatile crisis situations based on uncertain data.
Data elicitorParser Optimizer
Processnew data
Update crisis-management
plans
Suggest data-collection strategies
Top-level controland learning
Analysts
35
RADAR
We have applied the system to repair a schedule of a conference after a crisis loss of rooms.
After
Crisis
0.50 Manual
Repair
0.61 Auto w
/oE
licitation
0.72 Auto w
ithE
licitation
0.93ScheduleQuality
Manual and auto repair
20
0.72
0.93
ScheduleQuality
6040 80 100Number of Questions
(Out of 1100)
Dependency of the qualityon the number of questions
0
36
RAPID
Unlike ARGUS…• Represents and analyzes uncertainty• Supports complex inferences
Unlike RADAR…• Scales to massive intelligence datasets• Analyzes complex “external” situations• Develops intelligence-collection plans
37
Appendices
• Previous work
• Empirical evaluation
• PAINT contributions
38
Evaluation goalsWe expect that RAPID will provide significant advantage over available off-the-shelf tools, such as standard spreadsheets and database systems.
To support this claim, we plan to compare the productivity of analysts using RAPID with that of analysts who perform the same tasks using commercially available tools.
Experimental group:Use of RAPID
Control group:Use of standard tools
39
Experimental setup
We expect to recruit retired intelligence analysts for the system evaluation, and ask them to perform several tasks based on given uncertain data.
• Identify the data most relevant to given tasks
• Evaluate the validity of given hypotheses
• Find relevant hidden patterns
• Identify critical missing data and propose acost-effective plan for collecting this data
40
Performance measurementsWe will measure the following main factors to evaluate the performance of analysts:
• Number of high-level tasks completedwithin the experiment time frame
• Accuracy of hypothesis evaluation
• Number and relevance of identified patterns
• Effectiveness and costs of data-collection plans
We will also ask analysts to complete a questionnaire on their overall experience.
41
Expected results
We will view the proposed work as success if• RAPID consistently outperforms the off-the-
shelf tools in all four performance factors,
• the performance difference for each factor is statistically significant, and
• analysts report the overall positive experience of using the system.
42
RAPID / REALISM evaluation
Component utility:We will also evaluate the utility of REALISM and RAPID by comparing the productivity of subjects under the following three conditions:• Use of the integrated system• Use of REALISM without RAPID• Use of RAPID without REALISM
Component evaluation:We will measure the following performance factors:• Accuracy and completeness of text extraction• Accuracy of hypothesis evaluation• Effectiveness of data-collection plans• Speed of each system component
43
Appendices
• Previous work
• Empirical evaluation
• PAINT contributions
44
Main contributions
Feedback
Strategy Generation and Exploration
Dynamic Simulation
Models
Response Options
2
3
4
Representation of massive uncertain knowledgeAutomated discovery of causal relationships
Fast probabilistic integration of all evidenceAnalysis of possible future developments
1
Identification of critical uncertaintiesPlanning of proactive intelligence gathering
1
4
3
Data
45
Inputs and outputs
Uncertain intelligenceand analyst opinions:
Massive stream ofstructured records
Specifichypotheses
New learnedrules
Data-searchqueries
Querymatches
Evaluation ofhypotheses
Plans for proactiveintelligence collection
Uncertainsituation
assessment
Inferencerules
Domainknowledge
RAPID
Generalintelligencecollection
Proactiveintelligencecollection
46
InputsFrom other PAINT components:• Available intelligence data and its certainty• Hypotheses about unknown factors• Available domain knowledge
From analysts:• Intelligence-analysis tasks and priorities• Hypotheses and related opinions• Responses to RAPID-generated probes• Additional domain knowledge
From other sources:• Databases with available intelligence• Public databases with relevant data
47
Outputs• Inferences from available uncertain data• Evaluation of given hypotheses• New hypotheses and their certainties• Plans for proactive intelligence collection• Learned inference rules