towards situational awareness systems for disaster response naveen ashish calit2@uc-irvine bell labs...

Towards situational awareness systems for disaster response

Naveen Ashish

Calit2@UC-Irvine

Bell Labs India, Bangalore,

04/23/07

Organization Introduction to

Selected research areas

Technology transition

Discussion

RESCUE NSF funded “large-ITR” project

Advance information technologies for disaster response 5 year project

Oct 2003 to Oct 2008 Institutions

6 universities (UCI, UCSD, UIUC, BYU, U-Colorado, U-Maryland) and 1 company (ImageCat)

Active and formal community partners City of LA, OCFA, Irvine Police, ….

People Director: Sharad Mehrotra ~ 25 researchers and staff, ~40 students

Web: http://www.itr-rescue.org

The SAMI TEAM

StudentsStella Chen, Chaitanya Desai, Vibhav Gogate, Jon Hutchinson, Ram Hariharan, Shengyue Ji, Yiming Ma, Rabia Nuray-Turan, Dawit Seid, Shankar Shivappa

StaffJay Lickfett, Chris Davison

CollaboratorsCharles Huyck, Ron Eguchi, Shubharoop Ghosh

Faculty, Scientists and Post-docsDmitri Kalashnikov, Rajesh Hedge, Sharad Mehrotra, Sangho Park

Slide Aggregator (aka Project Leader)Naveen Ashish

RESCUE Mission

The mission of RESCUE is to enhance the ability of

emergency response organizations and the public to mitigate

crises, save lives, and prevent secondary and indirect human

and economic loss by radically transforming ways in which

these organizations gather, process, manage, use and

disseminate information during man-made and natural

catastrophes.

Motivation: Transform the Ability of First Responders to Mitigate Crisis

Observation: Right Information to the Right Person at the Right Time can result in dramatically better response

Response Effectiveness• lives & property saved • damage prevented• cascades avoided

Quality & Timeliness of

Information

Situational Awareness• incidences• resources• victims• needs

Quality of Decisions• first responders• consequence planners• public

RESCUE Objectives Develop technologies to dramatically improve

situational awareness of first-responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.

RESCUE Objectives Develop technologies to dramatically improve situational awareness of first-

responders, response organizations, and the public by providing them with timely access to accurate, reliable and actionable information about the disaster.

Develop technologies that enable seamless information sharing and collective decision making across highly dynamic virtual organizations consisting of diverse entities (government, private sector, NGOs, individuals).

Develop robust communication systems that continue to operate in crisis situations despite partial/total failure of infrastructure and increased communication demands.

Develop technologies that can be used for timely and customized dissemination of crisis information that inform the public at large thus enhancing the abilities of the affected populations to take appropriate self-protective actions.

Explore the privacy challenges that emerge as a result of infusing technology to improve information flow in crisis response networks and the public.

Promote interdisciplinary education at all levels (graduate, undergraduate, K-12) and across diverse student groups to expose the future community of citizens to issues in emergency management and homeland security – an area of global and national importance.

RESCUE Research Projects SAMI: Situational Awareness from Multi-Modal

Input (Project Lead: N. Ashish, UCI)

PISA: Policy-driven Information Sharing Architecture (Project Lead: M. Winslett, UIUC)

Customized Dissemination in the Large (Project Leads: K. Tierney, UC-B & N. Venkatasubramanian, UCI)

Privacy Implications of Technology Adoption (Project Lead: S. Mehrotra, UCI)

Robust Networking and Information Collection (Project Lead: BS Manoj, UCSD)

A Situational Awareness Application

Reports Responders News Weather Traffic

Damage Assessment

Evacuation Planning

Situational Dashboard

Simulations Reconnaissance System

Information

Applications

Architecture

Situational data management

Analysis

Extraction and synthesis

Events as fundamental abstraction units

Situational awareness systems

Extraction and synthesisData management

Analysis

semantic extraction from text

audio-visualextraction

E event model

SAT-ware

graph analysis

geospatial

predictive modeling

damage assessmentspatial indexing

Extraction and Synthesis

Semantic extractionfrom text

Audio eventextraction

Visual eventextraction

Why do we need “Data Cleaning”?

An actual excerpt from a person’s CV sanitized for privacy quite common in CVs, etc this particular person

argues he is good because his work is well-cited

but, there is a problem with using CiteSeer ranking

in general, it is not valid (in CVs) let’s see why...

“... In June 2004, I was listed as the 1000th most cited author in computer science (of 100,000 authors) by CiteSeer, available at

http://citeseer.nj.nec.com/allcited.html. ...”

Suspicious entries Let us go to the DBLP

website which stores

bibliographic entries of many CS authors

Let us check who are “A. Gupta” “L. Zhang”

What is the problem in the example?

CiteSeer: the top-k most cited authors DBLP DBLP

Comparing raw and cleaned CiteSeerRank Author Location # citations

1 (100.00%) douglas schmidt cs@wustl 5608

2 (100.00%) rakesh agrawalalmaden@ib

3 (100.00%)hector

garciamolina@ 4167

4 (100.00%) sally floyd @aciri 3902

5 (100.00%) jennifer widom @stanford 3835

6 (100.00%) david culler cs@berkeley 3619

6 (100.00%) thomas henzingereecs@berkele

7 (100.00%) rajeev motwani @stanford 3570

8 (100.00%) willy zwaenepoel cs@rice 3624

9 (100.00%) van jacobson lbl@gov 3468

10 (100.00%) rajeev alur cis@upenn 3577

11 (100.00%) john ousterhout @pacbell 3290

12 (100.00%) joseph halpern cs@cornell 3364

13 (100.00%) andrew kahng @ucsd 3288

14 (100.00%) peter stadler tbi@univie 3187

15 (100.00%) serge abiteboul @inria 3060

CiteSeer top-k

Cleaned CiteSeer top-k

What is the lesson?

data should be cleaned first e.g., determine the (unique) real authors of publications solving such challenges is not always “easy” that explains a large body of work on data cleaning note

CiteSeer is aware of the problem with its ranking there are more issues with CiteSeer many not related to data cleaning

“Garbage in, garbage out” principle: Making decisions based on bad data, can lead to wrong results.

High-level view of the problem

"J. Smith"

Raw Dataset

...J. Smith ...

.. John Smith ...

.. Jane Smith ...

Intel Inc.

Normalized Dataset(now can apply data analysis techniques)

Extraction(uncertainty,

duplicates, ...)

John Smith Intel

Jane Smith MIT

... ...

John SmithJane Smith

Attributed Relational Graph (ARG)

The problem:

(nodes, edges can have labels)(for any objects, not only people)

Traditional Domain-Independent DC Methods

objectX

feature1

feature2

feature3

objectY

feature1

feature2

feature3

Feature-based similarity (FBS)

objectX

feature1

feature2

feature3

Context

feature4A new feature is derived from context

XRelDC =

Traditional FBS

Relationship Analysis(enhance the core)

Traditional techniques (FBS-based)

What is “Reference Disambiguation”?

A1, ‘Dave White’, ‘Intel’A2, ‘Don White’, ‘CMU’A3, ‘Susan Grey’, ‘MIT’A4, ‘John Black’, ‘MIT’A5, ‘Joe Brown’, unknownA6, ‘Liz Pink’, unknown

P1, ‘Databases . . . ’, ‘John Black’, ‘Don White’P2, ‘Multimedia . . . ’, ‘Sue Grey’, ‘D. White’P3, ‘Title3 . . .’, ‘Dave White’P4, ‘Title5 . . .’, ‘Don White’, ‘Joe Brown’P5, ‘Title6 . . .’, ‘Joe Brown’, ‘Liz Pink’P6, ‘Title7 . . . ’, ‘Liz Pink’, ‘D. White’

Author table (clean) Publication table (to be cleaned)?

Analysis (‘D. White’ in P2, our approach):

1. ‘Don White’has a paper with ‘John Black’@MIT

2. ‘Dave White’is not connected to MIT in any way

3. ‘Sue Grey’

is coauthor of P2 too, and @ MIT

Thus: ‘D. White’ in P2 is probably Don

(since we know he collaborates with MIT ppl.)

Analysis (‘D. White’ in P6, our approach):

1. ‘Don White’has a paper (P4) with Joe Brown;Joe has a paper (P5) with Liz Pink;Liz Pink is a coauthor of P6.

2. ‘Dave White’

does not have papers with Joe or Liz

Thus: ‘D. White’ in P6 is probably Don

(since co-author networks often form clusters)

Attributed Relational Graph (ARG)

View dataset as a graph nodes for entities

papers, authors, organizations e.g., P2, Susan, MIT

edges for relationships “writes”, “affiliated with” e.g. Susan → P2 (“writes”)

“Choice” nodes for uncertain relationships mutual exclusion “1” and “2” in the figure

Analysis can be viewed as application of the “Context AP” to this graph defined next...

w1 = ?

Dave White

Don White

Susan Grey

John Black

Joe BrownP4

Liz Pink

w3 = ?

Q: How come domain-independent?

In designing the RelDC approach- our goal was to use CAP as an axiom - then solve problem formally, without heuristics

if reference r, made in the context of entity x,

refers to an entity yj but, the description, provided by r, matches

multiple entities: y1,…,yj,…,yN,

thenx and yj are likely to be more strongly connected

to each other via chains of relationships

than x and yk (k = 1, 2, … , N; k j).

Context Attraction Principle (CAP)“J. Smith”

publication P1

John E. SmithSSN = 123

Joe A. SmithP1

John E. Smith Jane Smith

Analyzing paths: linking entities and contexts

D. White is a reference in the context of P2, P6 can link P2, P6 to Don cannot link P2, P6 to

Dave more complex paths in

general

w1 = ?

Dave White

Don White

Susan Grey

John Black

Joe BrownP4

Liz Pink

w3 = ?

Analysis (‘D. White’ in P2): path P2→Don

1. ‘Don White’has a paper with ‘John Black’@MIT

2. ‘Dave White’is not connected to MIT in any way

3. ‘Sue Grey’

is coauthor of P1 too, and @ MIT

Thus: ‘D. White’ is probably Don White

Analysis (‘D. White’ in P6): path P6→Don

1. ‘Don White’has a paper (P4) with Joe Brown;Joe has a paper (P5) with Liz Pink;Liz Pink is a coauthor of P6.

2. ‘Dave White’

does not have papers with Joe or Liz

Thus: ‘D. White’ is probably Don White

Questions to answer

1. Does the CAP principle hold over real datasets? That is, if we disambiguate references based on it, will the

references be correctly disambiguated?

2. Can we design a generic solution to exploiting relationships for disambiguation?

Problem formalization

Notation Meaning

X={x1, x2, ... , xN} the set of all entities in in the database

xi .rk the k-th reference of entity xi

a reference a description of an object, multiple attributes

d[xi .rk] the “answer” for xi .rk -- the real entity xi .rk refers to (unknown, the goal is to find it)

CS[xi .rk] the “choice set” for xi .rk -- the set of all entities matching the description provided by xi .rk

y1, y2, ... , yN the “options” for xi .rk -- elements in CS[xi .rk]

v[xi] the node in the graph for entity xi

the name of k-th author of paper xi, e.g. ‘J. Smith’

the true k-th author of paper xi

‘John A. Smith’, ‘Jane B. Smith’, ...

Handling References: Linking(references correspond to

relationships)if |CS[xi .rk]| = 1 then

we know the answer d[xi .rk] link xi and d[xi .rk] directly, w = 1

else the answer is uncertain for xi .rk create a “choice” node, link it “option-weights”, w1 + ... + wN = 1 option-weights are variables

Entity-Relationship Graph RelDC views dataset as a graph

undirected nodes for entities

don’t have weights edges for relationships

have weights real number in [0,1] the confidence the relationship

exists

w1 = ?

Dave White

Don White

Susan Grey

John Black

Joe BrownP4

Liz Pink

w3 = ?

cho[xi.rk]

v[y2]w0=1

N nodesfor entities in CS[xi.rk]

“J. Smith”P1

“Jane Smith”

“John Smith”

Definition: To resolve a reference xi .rk means

to pick one yj from CS[xi .rk] as d[xi .rk]. Graph interpretation

among w1, w2, ... , wN, assign wj = 1 to one wj

means yj is chosen as the answer d[xi .rk]

Definition: Reference xi .rk is resolved correctly, if the chosen yj = d[xi .rk].

Definition: Reference xi .rk is unresolved or uncertain, if not yet resolved...

Goal: Resolve all uncertain references as correctly as possible.

Objective of Reference Disambiguation

cho[xi.rk]

Formalizing the CAP

CAP is based on “connection strength” c(u,v) for entities u and v

measures how strongly u and v are connected to each other via relationships

e.g. c(u,v) > c(u,z) in the figure will formalize c(u,v) later

if c(xi, yj) ≥ c(xi, yk)

then wj ≥ wk (most of the time)

Context Attraction Principle (CAP)

cho[xi.rk]

We use proportionality:

c(xi, yj) ∙ wk = c(xi, yk) ∙ wj

RelDC approachInput: the ARG for the dataset

1. Computing connection strengths− for each unresolved reference xi .rk

− determine equations for all (i.e., N) c(xi , yj)’s− c(xi , yj) = gij(w)

− a function of other option-weights

2. Determining equations for option-weights− use CAP to relate all wj’s and connection strengths− since c(xi , yj) = gij(w), hence wij = fij(w)

3. Computing option-weights− solve the system of equations from Step 2.

4. Resolving references− use the interpretation procedure to resolve weights

cho[xi.rk]

Computing connection strength (Step 1)

Computation of c(u,v) consists of two phases Phase 1: Discover connections

all L-short simple paths between u and v bottleneck optimizations, not in SDM05

Phase 2: Measure the strength in the discovered connections many c(u,v) models exist we use random walks in graphs model

v[yN]u va

N-2... ... ... ... ...

Measuring connection strength

v1 vkw1,0

... ...

wk-1,0...

... ...

edge E1,0

v2w2,0

... ...

– c(u,v) returns an equations

– because paths can go via various option-edges

– cuv = c(u,v) = guv(w)

Equations for option-weights (Step 2)

CAP (proportionality):

System (over-constrained):

Add slack:

Solving the system (Steps 3 and 4)

Step 3: Solve the system of equations1. use a math solver, or2. iterative method (approx. solution ), or3. bounding-interval-based method (tech. report).

Step 4: Interpret option-weights to determine the answer for each reference pick yj with the largest weight as the answer

Experimental Setup

Parameters When looking for L-short simple paths, L = 7 L is the path-length limit

RealPub dataset: CiteSeer + HPSearch

publications (255K) authors (176K) organizations (13K) departments (25K)

ground truth is not known accuracy...

SynPub datasets: many ds of two types emulation of RealPub

publications (5K) authors (1K) organizations (25K) departments (125K)

ground truth is known

RealMov: movies (12K) people (22K)

actors directors producers

studious (1K) producing distributin

Sample Publication Data

CiteSeer: publication records

HPSearch: author records

Efficiency and Long paths

Non-exponential cost Longer paths do help

Web Disambiguation

Music Composer

Football Player

UCSD Professor

Comedian

Botany Professor @ Idaho

Web Disambiguation

Web Disambiguation Extract key information such as mentions of

entities (persons, names, locations) and other information such as hyperlinks and email addresses from Web pages

Cast as a relationship analysis problem Prototype at:

http://opteron.calit2.uci.edu:1977/Diamond/people_search.jsp

Information extraction from text Many systems and techniques May benefit from semantics Limitations

All or nothing extraction Towards probabilistic extraction systems

Leads Disambiguation and data cleaning

Dmitri Kalashnikov, Stella Chen, Rabia Nuray-Turan Information extraction

Naveen Ashish, Sharad Mehrotra

Multi-microphone speech processing Speaker identification Noise reduction

Audio-visual speech recognition Combine visual features (venemes) with audio

Speech recognition on light-weight devices Team

Rajesh Hegde, Bhaskar Rao, Shankar Shivappa (UCSD)

Combine views from multiple cameras Homomorphic transformations

Multi-perspective “view-binding” Team

Sangho Park, Mohan Trivedi (UCSD)

Situational Data Management

Spatial Indexing Event data model SAT-Ware

Outline Overall Goal Use examples to illustrate:

Different approaches in modeling and querying Advantage of our approach

Extracting spatial expression Building model for spatial expression Experiments Conclusion

Overall Goal

Goal: Situation Awareness from Textual SourcesDatabas

...reports

Textual data after crisis

first responders reports Internet sources for post factum analysis

Info about events, that constitute a crisis, is often available as text.

Textual data during crisis

transcribed 911 calls first responder

communications

Motivating Examples Two reports filed by first responders after 9/11 attack:

“…the PAPD Mobile Command Post was located on West St. north of WTC …”

“…a PAPD Command Truck parked on the west side of Broadway St. and north of Vesey St….”

Query: Retrieve Events around WTC

Goal: Both events should be retrieved with high scores attached.

Approach 1: Using IR approach Direct Keyword retrieval

Only one report mentioned keyword “WTC”

Query expansion based on nearby spatial

objects E.g. Nearby streets and

buildings… Ad-hoc and Objects might

not be bounded

Approach 2: Mapping Using Uncertain Region Query : Near WTC

Report 1: West St. north of WTC

Report 2: west side of Broadway St. and

north of Vesey St

Rank based on the ratio of intersection Problem: rank score is not accurate based on the uniform

assumptions

Our Approach Step 1: Converting Text to Spatial Expression

S-expression: has well-defined function form

Near WTC Near(WTC)

West St. north of WTC

On(West St.) North(WTC)

• west side of Broadway St. and north of Vesey St

West(Broadway St.) North(Vesey St.)

Our ApproachStep 2: Mapping S-expression to probabilistic density function

Near(A)

Answering Range Query Given a query region

Retrieve objects based on the degree of belonging

West(Broadway St.) North(Vesey St.)

Consider location as a random variable

Advantages of Our Approach More explicit spatial mapping remove the needs for

keyword expansion (IR approach)

Probabilistic representation is more formal and accurate than uncertain region (UR) approach

Decouple the extraction and modeling modules Better extraction and modeling modules can be easily

plug-in

Extracting Spatial Expression

Step1: Discovering landmarks buildings, roads, intersections

Step2: Generating s-descriptors Use spatial relations to connect the landmarks Spatial relations: near, behind, between in the format D(L1, L2, ... ,Ln)

Step3: Generating s-expressions compositions of s-descriptors near(A) near(B)

Step1: Discovering landmarks

Markup the text by the landmarks Using Gazetteers (Incorporate into information extractor,

GATE) Note: not only markup the “name”, features also attached

Examples of Landmark

Step2: Generating s-descriptors

Discover spatial relations around the landmarks Dictionary approach (convert spatial relations to potential

words) Machine learning techniques can also be used

Examples of s-descriptors

Modeling S-expression Goal: generating a reasonable probabilistic

representation for s-expression

Step1: Modeling S-descriptors

Step2: Combining s-descriptors

Modeling S-descriptors

Modeling templates e.g Uniform, Normal distribution

Using parameter learning techniques

Generating s-expression In a s-expression, we assume the s-descriptors are

conditional independent. If a s-expression has 2 descriptors, S1, S2

It can be generalized to n descriptors, S1…Sn

Generating s-expression

Near(A)

Outdoor()

Outdoor() Near(WTC)

Experimental Setup rdsf Domain real geographic dataset Manhattan, NY, near WTC buildings, streets, roads 4 4 km2

Data Based on 164 reports

by Police Officers participants of 9/11

s-expressions near(A), on(A), outdoor intersections, buildings,

street Construct 2359 pdfs

Queries 50 Range Queries

Simulate the Errors Extraction Errors:

With human supervision, error is small. Modeling Errors:

Even with supervision, model parameters can still be away from the ideal settings.

E.g., the mean and variance settings for the Gaussian model.

We simulate two types of modeling errors for the analysts: Overly confident: estimated model is too “tight”

By reducing variance of the “ideal” Gaussian model Not confident: estimated model is too “loose”

By increasing variance in the “ideal” Gaussian model

Results Event with large errors, probabilistic models are still

better than bounding region methods

Conclusions

Ongoing work database aspects of the problem

more types of queries

Future work spatio-temporal aspects better modeling (text to PDF)

Novel in this work approach for mapping text to PDF

query requirements for SA apps

query design issues

representation of PDFs

Spatial Awareness from Textual SourcesDatabas

...reports

Lead Spatial awareness

Yiming Ma

Analysis

Analysis and Visualization

Graph analysis GIS Predictive modeling Damage assessment

Graph Analysis

SEMANTIC METADATA

DESCRIBED DATA

Semantic Graphs(Attributed graphs)Entity-Relationship

Schemas

Relations Document Repositories

Taxonomies(“ReferenceData”)

Ontologies(“Semantic Models”)

Graph Pattern-Based Querying

Ranked Graph Pattern Matching

Multi-dimensionalAnalysis[For Documents]

Relationship Summarization/Exploration[Relations]

Graph Data Model (Entity-Attribute-Value Model) Graph (edge sets aka triple sets):

E.g. (&dawit ns:studentAt &UCI)(&UCI ns:type &university)

(ns:university ns:subClassOf ns:oraganization) Two kinds of nodes: object-ids, literals (e.g. integer, string, etc.)

Blank nodes (e.g. (&dawit :studentAt _) Directed edges (aka predicates or properties)

there exists only one edge with a given label between a pair of nodes

Symmetric representation of Metadata + data Nodes: object classes or link classes Links: predicates on classes:

(:studentAt :domain :person)(:studentAt :range :organization)(:universty :subclassOf :organization)

Object identity + relationship identity Objects and relationships have unique ids (called URIs)

&dawitns:studentAt

Graphs for actual data storage - beyond data modeling

Graphs normally used for conceptual data modeling the entity-relationship (ER) model

What is different ? Using graphs for actual (minimally structured) data

representation. Why ?

Store/represent and query data without schema Symmetrically Store/query both schema (ontology) and data Graph traversal based query + reasoning (inference) Multi-schema queries on the same graph Query unstructured data annotated with

taxonomies/ontologies using traditional (structured) query operators

topic ontology

editor

publication

bookproceeding

article

researcher

author

String

editsProc

writesArticle

writesBook

produces

chapter inProceeding

Date String

refersTo

titleyearname

Literal

list_pricerating

Literal

(a) (b) (c)

INSTANCE

organization

affiliates

String

org_name

affiliates

writesBook

org_name

Johnname

&o2Sara name

“”title

year 2003

year1998

Comp.Sc

Info. Sys.Data

InterfacesDB

IR Encrypt.DataStruct.

Onlineservices

D. Lib.Systems

Languages

DistributedDB

MultimediaDB

affiliates

writesArticle

org_name

editsBook

rdf:type

subClassOf/subPropertyOf

LEGEND

produces&o organization&r researcher&b book&p proceeding&a article

Literal

LiteralLiteral

Info. Sys.

InterfacesDB

SystemsLanguages

DistributedDB

MultimediaDB

writesBook

affiliates

year1998

&p1inPRoceeding

Graph Pattern based Querying

SELECT *WHERE { ?org :affiliates ?aut .

?aut :produces ?b .?b :type :book .?b :price ?p .?b ?pred ?x . }

variable

triple pattern

queries schema (a)

super-class of writesBook

uses schema (b)

Variable on predicates - matches all applicable predicates

&o1 &r1 &b1 90

Graph set GraphRelation

&o1 &r2 &b1 90

2003...

org aut book price year&o1 &r1 &b1 90 2003&o1 &r2 &b1 90 2003&o1 &r2 &b2 110 1998&o1 &r3 &b3 100 1998&o2 &r2 &b1 90 2003&o2 &r2 &b2 110 1998

CONSTRUCT *WHERE { ?org :affiliates ?aut .

EnumerativeSemantics

ExtractiveSemantics

&o1 :affiliates &r1&r1 :produces &b1&b1 :price 90&b1 :year 2003&o1 :affiliates &r2

&o1 &r1 &b1 90

&o1 &r2 &b1 90

&o1 &r1 &b1 90

&r2 &b2

&b3&r3

1998&o2

100...

org aut book price year&o1 &r1 &b1 90 2003&o1 &r2 &b1 90 2003&o1 &r2 &b2 110 1998&o1 &r3 &b3 100 1998&o2 &r2 &b1 90 2003&o2 &r2 &b2 110 1998

EnumerativeSemantics

ExtractiveSemantics

Relation

Enumerative Algebra

Enumerative algebra - algebra over sets of variable bindings

?aut :produces ?b?org :affiliates ?autTriple patterns …

Variablesaut b

Bindings (per triple pattern)

Joinable Bindings – same variable, same value.

autorg

Enumerative Algebra (ctd.)

Given two set of bindings T1 and T2, and r denoting a binding:

T1 = {r | r T1 or r T2 }T2

T1 ⋈ = {r1T2 r2 | r1 T1 and r T2 and r1 and r2 are joinable}

?aut ?b?org

Enumerative Algebra (ctd.)

match[P] (G) – matches the graph pattern P to graph G Given P = {p1, p2, …, pm}

match [P](G) = match [p1] ⋈

match [p2] ⋈ ⋈ match [pm]…

Sets of sets (tuples) of bindings

Enumerative Algebra (ctd.) Other operators:

Difference:T1 \ T2 = {r T1 | for all r’ T2,

r and r’ are not joinable}

Filter, (T), evaluate the Boolean condition on T. E.g. of is: ?p > 100.

Outer Join:T1 T2 = (T1 ⋈ T2) ∪ (T1 \ T2)

Extractive Algebra

Given two graphs G1 and G2, and t denoting a triple :

G1 = {t | t G1 or t G2 }G2

?aut :produces ?b?org :affiliates ?aut

&r1 :prod

&r2 :prod

&o1 :aff

&o1 “aff

&o1 :aff

&o1 “aff

&r1 :prod

&r2 :prod

• Matching retains Structure

• More compact Representation during implementation

Extractive Algebra (ctd.)

1. For all t1 G1, either there exists t2 G2 such that t1 and t2 are joinable by p or t1 does not match p1 p.

2. For all t2 G2, either there exists t1 G1 such that t2 and t1 are joinable by p or t2 does not match p2 p

G1 ⋈p G2 = {G1

G2 |˄

?aut :produces ?b?org :affiliates ?aut

&r1 :prod

&r2 :prod

&o1 :aff

&o1 “aff

where p = (p1,p2), i.e. a pair of triple patterns.

⋈((?org :affiliates ?aut),(?aut :produces ?b))

&o1 :aff

&o1 “aff

&r3&r1 :prod

&r2 :prod

?b :price ?p .?b ?pred ?x

?org :affiliates ?aut .?aut :produces ?b

⋈((?aut :produces ?b),(?b :price ?p))

&o1 :aff

&r1 :prod

&r2 :prod

&b1 :price

&b3 :price

&b1 :year

&b3 :year

&o1 :aff

&r1 :prod

&r2 :prod

&b1 :price 90

&b1 :year

&b3 :year

?b :price ?p .?b ?pred ?x

?org :affiliates ?aut .?aut :produces ?b

⋈((?aut :produces ?b),(?b ?pred ?x))

&o1 :aff

&r1 :prod

&r2 :prod

&b1 :price

&b3 :price

&b1 :year

&b3 :year

&o1 :aff

&r1 :prod

&r2 :prod

&b1 :price 90

&b1 :year 2003

Extractive Algebra (ctd.) extract[P] (G) – matches the graph pattern P

Given P = {p1, p2, …, pm}

extract [P](G) = match [p1]

match [p2]

⋈⋈ match [pm]

˄ ˄ ˄

Extractive Algebra (ctd.) Other operations:

Difference:G1 \ G2 = {t G1 and t G2}

Filter: (G) = G \ {t | (t) true}

Implementing Extract – Naïve/Join-split As a post-process of enumerative matching

Do enumerative matching Produces a joined relation

Vertically split join result into triples IO cost: for a pair of triple-sets:

2 reads of triple sets + 1 write of joined result + 2 reads of join result (one for each split/projection) + 2 writes of projected result + 2 reads of the projected triple sets 1 write of unioned result Total: 6 reads and 4 writes (4 reads and 3 write if no

union).

Implementing Extract – 2-way semi-joins Use 2-way semi-joins

Given two joinable triple sets A and B,

IO Cost 2 reads of triplesets (first semi-

join) 1 write of result to union (writes

smaller table) 2 reads to perform next

semijoin (1 read is on smaller table)

1 write of result to union Total: 4 reads and 2 writes.

Implementing Extract – 2-stream operator

Scan each input and produce triples that have at least one match in the other

Is a high-level operator that can be implemented via: Hashing or Sort-merge A B

A’ B’

Grouping and Aggregation : Flatten-and-Aggregate Approach

SELECT ?org, sum (?p) as totalPriceWHERE { ?org :affiliates ?aut .

?aut :writesBook ?b .?b :price ?p }

GROUP BY ?org

affiliates

writesBook

100org aut book price year&o1 &r1 &b1 90 2003&o1 &r2 &b1 90 2003&o1 &r2 &b2 110 1998&o1 &r3 &b3 100 1998

This is how Oracle supports aggregation over graph data ! Also, [Hung, Deng, and Subrahmanian, ICDE 2005]

Group and Aggregate EnumerativeMatch Results

Result: 390. WRONG !

Group By

Should be based on extractive matching (graphs).

What should group by mean on graphs ? Collapse a set of

triples into a single triple.

Use Bag nodes.

affiliates

writesBook

type:1

affiliates

GROUP BY ?aut ON :writesBook

Grouping Target

Grouping Basis

Aggregation

Two types (modes) of aggregations on graphs Branch-wise : aggregate a set of values adjacent to a node type Path-wise : aggregate over a path in the graph

Not discussed here. Branch-wise Example :

SELECT ?b, branch sum (:price) as totalPriceWHERE { ?org :affiliates ?aut .

Anchor ModeAggregationbasis

Aggregation – revisit example

SELECT ?org, branch sum (:price) as totalPriceWHERE { ?org :affiliates ?aut .

GROUP BY ?org

Optional

Anchor and aggregation basisnot adjacent !

Anchor ModeAggregationbasis

affiliates

writesBook

Aggregation - solution

&o1&r1&r2

affiliates&b1

writesBook

affiliates

&b2writesBook

Bagtype

&b3writesBook

RULE: All nodes between anchor and aggregation basis should be bags ! If anchor and

aggregation basis are adjacent, push aggregation into group by.

Otherwise, iteratively perform graph grouping with edge-propagation making each intermediary node an aggregation target. Result: &o1, 300.

&o2, 200

Lead Dawit Yimam Seid

Ram Hariharan (with Sharad Mehrotra and Chen Li) Searching (open source) GIS data and datasets

Metadata Compression

Vibhav Gogate and Jon Hutchinson (with Padhraic Smyth)

Activity monitoring and prediction Anomalous event detection

ImageCat Inc (Ron Eguchi, Charles Huyck) INLET, MetaSIM

Artifacts

Many Communities – Many Disaster Portals Contents of sites are administered by respective city emergency mgmt. Easily customized to meet needs of different communities. Regional summarization capabilities built in (eg. county/state level

summary view).

Objectives of the Disaster Portal project are to provide: An integrated platform for RESCUE team members to develop, test, and

demonstrate their research projects in real-life scenarios. Next-generation capabilities to first responders and the public.

Key development partner: City of Ontario

The Disaster Portal is a suite of web applications for disseminating information and providing situational awareness to the general public during a disaster.

Disaster Portal

Community Deployment of Disaster Portal

Applications selected from Disaster Portal suite.

Portal framework providing situation summary page, custom look-and-feel

http://www.disasterportal.org:8380/Ontario/

Applications Available in Disaster Portal Suite Research Topics

Crisis AlertsKey contacts at companies / organizations can sign up for customized information updates via web or phone.

Scalable rapid dissemination

Donation ManagementIndividuals and organizations post needs and donations; helps coordinate the matching process.

Complex publish-subscribe systems

Family ReunificationSearch for contact info of a displaced family member.

Information extraction &Data cleaning

Shelter InformationAnnouncements and status information for open emergency shelters.

Travel PlanningCurrent and predicted traffic conditions.

Activity modeling algorithms

Disaster-Oriented Web SearchFind information not already included in the site.

Multidimensional analysis algorithms

Included in Ontario Pilot Disaster Portal

Disaster Portal

Situational awareness systems

Extraction and synthesisData management

Analysis

semantic extraction from text

audio-visualextraction

E event model

SAT-ware

graph analysis

geospatial

predictive modeling

damage assessmentspatial indexing

Conclusions Situational data

management Semantics Synergies Integrated demonstration

Thank you !

ashish@ics.uci.edu

towards situational awareness systems for disaster response naveen ashish calit2@uc-irvine bell labs...

right information

actionable information

seamless information

rescue objectives

rescue mission

mission of rescue

situational awareness

dynamic virtual organizations

Documents

calit2 intro12

calit2 projects in cyberinfrastructure

1 relational query languages naveen ashish calit2 &ics...

network coding - an introduction - calit2

calit2: an experiment in social networks

report of the committee on national optical fibre network...

relational model naveen ashish calit2 & information and...

calit2 and international r&d: india

naveen doc

calit2:facilitating the digital humanities

naveen elevators

naveen ppt

nasa and the semantic web naveen ashish research institute...

introduction to the ucsd division of calit2" calit2 tour...

full page fax print - pt.bhagwat dayal sharma post...

calit2 - living in the future

enterprise information integration successes, challenges,...

defence research and development organisation - drdo|goi ·...

picking up good vibrations - calit2university of california,...

the creation of calit2