an ontological approach to the document access problem of insider threat

27
An Ontological Approach to the Document Access Problem of Insider Threat ISI 2005, (May 20) Boanerges Aleman-Meza 1 Phillip Burns 2 Matthew Eavenson 1 Devanand Palaniswami 1 Amit P. Sheth 1 (1) LSDIS Lab, Computer Science Dept., University of Georgia, USA (2) CTA – Computer Technology Associates USA

Upload: knox

Post on 17-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

An Ontological Approach to the Document Access Problem of Insider Threat. ISI 2005 , (May 20). Boanerges Aleman-Meza 1 Phillip Burns 2 Matthew Eavenson 1 Devanand Palaniswami 1 Amit P. Sheth 1. (1) LSDIS Lab, Computer Science Dept., University of Georgia, USA - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Ontological Approach to the Document Access Problem of Insider Threat

An Ontological Approach to the Document Access Problem ofInsider Threat

ISI 2005, (May 20)

Boanerges Aleman-Meza1

Phillip Burns2

Matthew Eavenson1

Devanand Palaniswami1

Amit P. Sheth1

(1) LSDIS Lab, Computer Science Dept., University of Georgia, USA

(2) CTA – Computer Technology AssociatesUSA

Page 2: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/20042

Objective & Approach

Determine if (classified) documents reviewed an IC analyst satisfy his/her “need to know”

Characterization of “need to know” w.r.t. ontology

Characterizing document content in terms of ontology

Discovering weighted semantic relationships between document content and “need to know”

Page 3: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/20043

Characterizing “Need to Know” using an a Semantic Approach (using Ontology) Requires domain ontology

models important concepts & relationships of domain (schema), captures factual knowledge (instances)

Relate analyst’s need to know to concepts & relationships in ontology e.g. terrorist organization, funding sources,

facilitators, members, methods

Page 4: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/20044

“Need to know” = context of investigation 26,489 entities

34,513 (explicit) relationships

Add relationship to context

Page 5: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/20045

Characterizing document content in terms of ontology “Semantic Annotation” Correlate words/phrases from document with

entities/relationships in ontology Entity identification

Meta-data added to document (from associated ontological knowledge)

Active area of research but practically useful technology now available

Constrained to content of ontology

Page 6: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/20046

Page 7: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/20047

Semantic Relationships between Document & “Need to Know” Semantic associations: relationships between

document concepts & “need to know” concepts are discovered and ranked

Ranking based on multiple factors no. of links, types of links, location in ontology, …

Ranking indicates degree of semantic “closeness” and therefore, how related document is to “need

to know”

Page 8: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/20048

Highly relevant

Closely related

Ambiguous

Not relevant

Undeterminable

DocumentsRanking

Page 9: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/20049

Research Content

Discovery & Ranking of semantic semantic associations

Characterizing “need to know” in terms of ontological concepts & relationships

Meta-data annotation of data and (semi-structured & unstructured) documents correlation of document content & concepts in

ontology

Page 10: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200410

In this project we are addressing: Discovery of Semantic Associations per entity

per document Input/Visualization/Management of Context of

Investigation Scalability on number of documents & ontology

size Performs well with thousand documents

Ranking of documents

Research Challenges

Page 11: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200411

“Closely related entities are more relevant than distant entities”

E = {e | e Document }

Ek = {f | distance(f, eE) = k }

nk

k

k

k

k

Eelevanceentities_R

ERelevancerelations_

Elevanceclasses_Re

0

)(

)(

)(

RelevanceDocument

Ranking of Documents Relevance

Page 12: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200412

Components of Document Relevance

(specific entities)• Abu Abdallah • Turkmenistan • Konduz Province • …

Context of Investigation

e7:Terror Organization

e4:Watch-List

worksat

friends withcit

izen

of

citizen of

liste

d in

e3:Person

claims

responsibility fo

r

e8:Event

friends

withe1:Person

livesin

e5:Person

e6:State

e9:Person

e2:Country

e6:Company

e7:Terror Organization

e4:Watch-List

worksat

friends withcit

izen

of

citizen of

liste

d in

e3:Person

claims

responsibility fo

r

e8:Event

friends

withe1:Person

livesin

e5:Person

e6:State

e9:Person

e2:Country

e6:Company

e7:Terror Organization

e4:Watch-List

worksat

friends withcit

izen

of

citizen of

liste

d in

e3:Person

claims

responsibility fo

r

e8:Event

friends

withe1:Person

livesin

e5:Person

e6:State

e9:Person

e2:Country

e6:Company

Entities belong to classes in the

Context type(entity) Context

1.

Relationships constrains

Relationship [Class]

2.

Entities match a list of entities of interest (in the Context)

entity Entities-List

3.

Page 13: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200413

Schematic of Ontological Approach to the Legitimate

Access Problem Semagix Freedom

Semagix Freedom

Page 14: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200414

Conclusions

New Semantic Approach to the challenging problem

Viability demonstrated on a small scale Significant new research that builds upon the

latest Semantic Platform Many applications of this approach: vendor

vetting, knowledge discovery, ….

Page 15: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200415

Acknowledgements

Semagix provided technology to populate ontology using knowledge extraction, and (semi-)automatic metadata extraction from documents (Freedom toolkit).

NSF-funded projects provided core research: "Semantic Association Identification and Knowledge Discovery for National Security Applications" (Grant No.

IIS-0219649) and "Semantic Discovery: Discovering Complex Relationships in Semantic Web" (Grant No. IIS-0325464)

Page 16: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200416

References 1. B. Aleman-Meza, C. Halaschek, I.B. Arpinar, A. Sheth, Context-Aware Semantic Association Ranking. Proceedings of Semantic Web and Databases Workshop, Berlin, September 7- 8 2003, pp. 33-50 2. B. Aleman-Meza, C. Halaschek, A. Sheth, I.B. Arpinar, and G. Sannapareddy. SWETO: Large-Scale Semantic Web Test-bed. Proceedings of the 16th International Conference on Software Engineering and Knowledge Engineering (SEKE2004): Workshop on Ontology in Action, Banff, Canada, June 21-24, 2004, pp. 490-493 3. R. Anderson and R. Brackney. Understanding the Insider Threat. Proceedings of a March 2004 Workshop. Prepared for the Advanced Research and Development Activity (ARDA). http://www.rand.org/publications/CF/CF196/ 4. K. Anyanwu and A. Sheth ρ-Queries: Enabling Querying for Semantic Associations on the Semantic Web The Twelfth International World Wide Web Conference, Budapest, Hungary, 2003, pp. 690-699 5. K. Anyanwu, A. Maduko, A. Sheth, SemRank: Ranking Complex Relationship Search Results on the Semantic Web, In Proceedings of the 14th International World Wide Web Conference, Japan 2005 (accepted, to appear) 6. K. Anyanwu, A. Maduko, A. Sheth, J. Miller. Top-k Path Query Evaluation in Semantic Web Databases. (submitted for publication), 2005 7. C. Halaschek, B. Aleman-Meza, I.B. Arpinar, A. Sheth Discovering and Ranking Semantic Associations over a Large RDF Metabase Demonstration Paper, VLDB 2004, 30th International Conference on Very Large Data Bases, Toronto, Canada, 30 August - 3 September, 2004 8. B. Hammond, A. Sheth, and K. Kochut, Semantic Enhancement Engine: A Modular Document Enhancement Platform for Semantic Applications over Heterogeneous Content, in Real World Semantic Web Applications, V. Kashyap and L. Shklar, Eds., IOS Press, December 2002, pp. 29-49

Page 17: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200417

References (cont)

9. M. Rectenwald, K. Lee, Y. Seo, J.A. Giampapa, and K. Sycara. Proof of Concept System for Automatically Determining Need-to-Know Access Privileges: Installation Notes and User Guide. Technical Report CMU-RI-TR-04-56, Robotics Institute, Carnegie Mellon University, October, 2004. http://www.ri.cmu.edu/pub_files/pub4/rectenwald_michael_2004_3/rectenwald_michael_20 04_3.pdf 10. C. Rocha, D. Schwabe, M.P. Aragao. A Hybrid Approach for Searching in the Semantic Web, In Proceedings of the 13th International World Wide Web, Conference, New York, May 2004, pp. 374-383. 11. M.A. Rodriguez, M.J. Egenhofer, Determining Semantic Similarity Among Entity Classes from Different Ontologies, IEEE Transactions on Knowledge and Data Engineering 2003 15(2):442-456 12. A. Sheth, C. Bertram, D. Avant, B. Hammond, K. Kochut, and Y. Warke. Managing Semantic Content for the Web. IEEE Internet Computing, 2002. 6(4):80-87 13. A. Sheth, B. Aleman-Meza, I.B. Arpinar, C. Halaschek, C. Ramakrishnan, C. Bertram, Y. Warke, D. Avant, F.S. Arpinar, K. Anyanwu, and K. Kochut. Semantic Association Identification and Knowledge Discovery for National Security Applications. Journal of Database Management, Jan-Mar 2005, 16 (1):33-53 14. Boanerges Aleman-Meza, Phillip Burns, Matthew Eavenson,Devanand Palaniswami, Amit Sheth. An

Ontological Approach to the Document Access Problem of Insider Threat

Page 18: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200418

Security and Terrorism Part of SWETO Ontology

Page 19: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200419

Semantic Annotation

Document searched for entity names (or synonyms) contained in ontology

Then document entities are annotated with additional information from corresponding entities in ontology including named relationships to other entities

Following chart is example Highlighted text are entities found corresponding to

concepts in ontology XML is corresponding meta-data annotation

Page 20: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200420

Relevance Measures for Documents(relating document content to IA “need to know” Relevance engine input

the set of semantically annotated documents the context of investigation for the assignment the ontology schema represented in RDFS, and

the ontology instances represented in RDF Relevance measure function used to verify

whether the entity annotations in the annotated document can be fit into the entity classes, entity instances, and/or keywords specified in the context of investigation.

Page 21: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200421

The Big Picture

Ontology /knowledge base

MassiveMetadata

Store

API

Ope

n/pr

oprie

tary

Het

erog

eneo

us D

ata

Sou

rces

documents

databases

Html pages

emails

XMLfeeds

TrustedSources

populates

Semi-structured

data

populates

Knowledge Discovery Algorithms

SWETO WebService Browsing

Page 22: An Ontological Approach to the Document Access Problem of Insider Threat

SWETO – Ontology Schema Visualization

See SemDis project of LSDIS Lab, University of Georgia

Page 23: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200423

Relevance Measures for Documents(relating document content to IA “need to know” (cont) Documents classified as:

Highly relevant Document entities directly related

Closely related Document entities related through strong semantic

associations Ambiguous

Document entities related through weak semantic associations

Not relevant Document entities not related to “need to know”

Undeterminable Document entities not found in ontology

Page 24: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200424

IA Context of Investigation(characterization of “Need to Know”)

We define the context of investigation as a combination of the following:

A set of entity classes and relationships, and/or a negation of a set of entity classes and relationships

A set of entity instance names, and/or a negation of a set of entity instance names

A set of keyword values that might appear at any attribute of the populated instance data, and/or a negation of a set of keyword values

Page 25: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200425

Context of Investigation (cont) Goal is to capture, at a high level, the types of

entities, (or relationships), that are considered important.

Relationships can be constrained to be associated with specified class types E.G. It can be specified that a relation ‘affiliated with’ is part

of the context only when it is connected with an entity that belongs to a specific class, say, ‘Terror Organization’

Page 26: An Ontological Approach to the Document Access Problem of Insider Threat

6/21/200426

Ranking of Documents RelevanceFour groups of document-ranking:- Not Related Documents

- unable to determine relation to context- Ambiguously Related Documents

- some relationship exists to the context- Somehow Related Documents

- Entities are closely related to the context- Highly Related Documents

- Entities are a direct match to the context

Cut-off values determine grouping of documents w.r.t. relevance- These are customizable cut-off values (more control and more

meaningful parameters compared to say automatic classification or statistical approaches)

“Inspection” of a document is possible via (a) original document or (b) original document with highlighted entities

Page 27: An Ontological Approach to the Document Access Problem of Insider Threat