abstract - carnegie mellon school of computer sciencemaksim/research/proposal2.pdf ·...

78
Abstract The study of complex social and technological systems, such as organizations, requires a sophisticated approach that accounts for the underlying psychological and sociological principles, communication patterns and the technologies within these systems. Social Network Analysis and link analysis have since inception operated on the cutting edge bringing together mathematical analysis of social structures and qualitative reasoning and interpretation. As available computing power grew, social network-based models have be- come not only an analysis tool, but also a methodology for building new theories of social behaviour and organizational evolution, frequently through the creation of simulation models. This work examines the past approaches of creating Social Network-based semantically consistent and interpretable models of social structure and social networks, as well as social simulation tools. I propose the creation of a multi-theory, multi-level simulation model of social structure that relies on social network theory and Artificial Intelligence algorithms. I elaborate by proposing a number of enabling technologies that facilitate storage, manipulation and interchange of social structure data. I further propose the creation of a robust and scalable social structure seman- tic that facilitates interpretable reasoning about evolution of social structure.

Upload: others

Post on 31-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Abstract

The study of complex social and technological systems, such as organizations,requires a sophisticated approach that accounts for the underlying psychologicaland sociological principles, communication patterns and the technologies withinthese systems.Social Network Analysis and link analysis have since inception operated on

the cutting edge bringing together mathematical analysis of social structuresand qualitative reasoning and interpretation.As available computing power grew, social network-based models have be-

come not only an analysis tool, but also a methodology for building new theoriesof social behaviour and organizational evolution, frequently through the creationof simulation models.This work examines the past approaches of creating Social Network-based

semantically consistent and interpretable models of social structure and socialnetworks, as well as social simulation tools.I propose the creation of a multi-theory, multi-level simulation model of

social structure that relies on social network theory and Artificial Intelligencealgorithms. I elaborate by proposing a number of enabling technologies thatfacilitate storage, manipulation and interchange of social structure data.I further propose the creation of a robust and scalable social structure seman-

tic that facilitates interpretable reasoning about evolution of social structure.

Page 2: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Robust, Scalable Object-Oriented Semantics for

Reasoning and Simulating Social StructureProposal for a Ph.D. Thesis

Maksim Tsvetovat

July 15, 2004

Page 3: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Table of Contents

1 Introduction 1

1.1 Social Network Analysis . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Applications of Social Network Analysis . . . . . . . . . . 31.1.2 Shortcomings of Pure Social Network Analysis . . . . . . 4

1.2 Meta-Matrix - Expansion of Network Semantics . . . . . . . . . . 51.2.1 MetaMatrix Measures . . . . . . . . . . . . . . . . . . . . 51.2.2 Applications of MetaMatrix Analysis . . . . . . . . . . . . 71.2.3 Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Data Sources 10

3 Simulation Approaches to Analyzing Dynamic Networks 12

3.1 Research Statement . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.1 Performance Evaluation . . . . . . . . . . . . . . . . . . . 143.1.2 Replication and Third-party Usage . . . . . . . . . . . . . 14

3.2 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Enabling Technologies: Storage, Manipulation and Interchange

of Rich Social Structure Data 16

4.1 Storage Requirements for Social Structure Data . . . . . . . . . . 164.2 Requirements for Data Manipulation Tools . . . . . . . . . . . . 174.3 Requirements for Data Interchange . . . . . . . . . . . . . . . . . 174.4 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . 184.5 Usability and Documentation . . . . . . . . . . . . . . . . . . . . 19

5 Semantic Reasoning and Social Structure 20

5.1 Research Statement . . . . . . . . . . . . . . . . . . . . . . . . . 215.1.1 Performance Evaluation . . . . . . . . . . . . . . . . . . . 22

6 Conclusion 23

6.1 Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

i

Page 4: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

A NetWatch: Simulating and Reasoning about Dynamic Covert

Networks 26

A.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26A.2 Covert Terrorist Networks - the Al Qaeda . . . . . . . . . . . . . 28A.3 Modeling Dynamic Networks . . . . . . . . . . . . . . . . . . . . 32

B Technical Description of NetWatch 34

B.1 Social Networks in NetWatch . . . . . . . . . . . . . . . . . . . . 34B.2 Agents in NetWatch . . . . . . . . . . . . . . . . . . . . . . . . . 34B.3 Processes Governing Communication . . . . . . . . . . . . . . . . 35B.4 Inter-agent Knowledge Exchange . . . . . . . . . . . . . . . . . . 36B.5 Planning and Execution of Complex Tasks . . . . . . . . . . . . . 37B.6 Classification Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . 38

C Preliminary NetWatch Results 39

C.1 Red Team . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39C.1.1 Blue Team . . . . . . . . . . . . . . . . . . . . . . . . . . 40C.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

C.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

D DyNetML: Data Interchange for Object Oriented Networks 45

D.1 Requirements for Data Interchange . . . . . . . . . . . . . . . . . 45D.2 DyNetML: an XML-Derived Social Network Language . . . . . . 46

D.2.1 DyNetML Format Overview . . . . . . . . . . . . . . . . . 46D.3 Representing Multiple Node and Relation Types . . . . . . . . . 47

D.3.1 Specifying Individuals and Nodes . . . . . . . . . . . . . . 47D.4 Representing Relations in DyNetML . . . . . . . . . . . . . . . . 49

D.4.1 Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51D.5 Representing Graph, Node and Edge Attributes . . . . . . . . . . 52D.6 Complex Social Networks in DyNetML . . . . . . . . . . . . . . . 53

D.6.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

E Storage and Manipulation of Social Structure Data 55

E.1 Storage Requirements for Social Structure Data . . . . . . . . . . 55E.2 Database Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

E.2.1 Database Schema . . . . . . . . . . . . . . . . . . . . . . . 55E.2.2 Thesaurus . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

E.3 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . 57E.3.1 Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

E.4 WWW Interface to NetIntel dataset . . . . . . . . . . . . . . . . 58

F Beyond MetaMatrix: Robust Reasoning about Networks 60

F.1 Toward a Social Network Semantics . . . . . . . . . . . . . . . . 60F.2 Edge Semantics - More then just edge labels . . . . . . . . . . . . 61F.3 Defining an Object-Oriented Network Semantics . . . . . . . . . 61

F.3.1 Nodes and Edges . . . . . . . . . . . . . . . . . . . . . . . 62

ii

Page 5: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

F.3.2 Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . 62F.3.3 Derivation of MetaMatrix Semantics . . . . . . . . . . . . 64

F.4 Rules and Social Structure Inference . . . . . . . . . . . . . . . . 64F.4.1 How the rule system works . . . . . . . . . . . . . . . . . 66F.4.2 Making Complex Statements in SSS . . . . . . . . . . . . 67

iii

Page 6: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

List of Figures

6.1 Thesis Completion Schedule . . . . . . . . . . . . . . . . . . . . . 25

B.1 NetWatch Simulation Design . . . . . . . . . . . . . . . . . . . . 35B.2 Planning in NetWatch Agents . . . . . . . . . . . . . . . . . . . . 38

C.1 Red Team: A Cellular Covert Network . . . . . . . . . . . . . . . 40C.2 Average Performance of Wiretapping Strategies . . . . . . . . . . 42

D.1 Dynamic Networks in DyNetML . . . . . . . . . . . . . . . . . . 46D.2 Dynamic Networks in DyNetML . . . . . . . . . . . . . . . . . . 47D.3 Specification of Vertices in DyNetML . . . . . . . . . . . . . . . . 48D.4 Sample specification of a Vertex . . . . . . . . . . . . . . . . . . . 48D.5 Specification of Graphs in DyNetML . . . . . . . . . . . . . . . . 50D.6 Specification of Graphs in DyNetML . . . . . . . . . . . . . . . . 51D.7 Specification of Properties and Measures . . . . . . . . . . . . . . 52

E.1 NetIntel Database Schema . . . . . . . . . . . . . . . . . . . . . . 56E.2 Screenshot of the WWW Interface to NetIntel Database . . . . . 59

F.1 Node and Edge Semantics . . . . . . . . . . . . . . . . . . . . . . 62F.2 Node Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . 63F.3 Edge Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . 63F.4 MetaMatrix Semantics . . . . . . . . . . . . . . . . . . . . . . . . 65F.5 Network Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 66F.6 Making Network Inferences: a simple example . . . . . . . . . . . 67F.7 Complex Inferential Statement . . . . . . . . . . . . . . . . . . . 68—————————————————————-

iv

Page 7: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Chapter 1

Introduction

The study of complex social and technological systems, such as organizations,requires a sophisticated approach that accounts for the underlying psychologicaland sociological principles, communication patterns and the technologies withinthese systems.Social Network Analysis and link analysis have since inception operated on

the cutting edge bringing together mathematical analysis of social structuresand qualitative reasoning and interpretation.As available computing power grew, social network-based models have be-

come not only an analysis tool, but also a methodology for building new theo-ries of social behaviour and organizational evolution. This was frequently donethrough the creation of simulation models that allowed researchers to test theo-retical constructs in a safe and ethical manner. Simulations also facilitate large-scale experiments and Monte-Carlo simulations - which were all but impossiblein the qualitative analysis world due to cost, time and ethical constraints.One facet of research methodology has largely remain unaddressed. Most

mathematical analysis and social simulation tools operate on abstract numericalrepresentations of social structures, such as graphs, matrices and time series.However, the concrete semantics behind these numbers frequently was only apart of the researcher’s mental model. Its communication to the outside worldwas largely a function of the researcher’s writing skills. This, and the level ofabstraction required by early computer models, has resulted in datasets andmodels that are very difficult to interpret, especially by non-specialists.This work examines the past approaches of creating interpretable and se-

mantically consistent models of social structure and social networks, as well associal simulation tools. I further propose the creation of a robust and scalablesocial structure semantic that facilitates interpretable reasoning about evolutionof social structure.

1.1 Social Network Analysis

Social network analysis (SNA) has been developed as a tool for mapping andmeasuring of relationships and flows between people, groups, organizations,computers or other information/knowledge processing entities.The concept of social network analysis is based upon representation of social

structure as a graph, with the nodes representing the people and groups whilethe links show relationships or flows between the nodes. Social Network Analysis

1

Page 8: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

builds upon graph theory to provide mathematical tools for analysis of socialstructure independently of agent attributes.SNA provides both a visual and a mathematical analysis of human relation-

ships.

Degree

Social network researchers measure network activity for a node by using theconcept of degree[26] - the number of connections a node has. While degreeprovides a useful approximation of who has power in a network, this is notalways the case. It is only a matter of who has the most connections, butalso the people that they connect together, and whether or not they would beconnected otherwise.

Betweenness

Betweenness centrality[26] answers the question above - who does a node con-nect, and how important is his network position to the preservation of thegroup’s communication pathways. Mathematically, betweenness of a node is aratio number of paths between every pair of other nodes that lead through thenode over total number of paths. Nodes with high betweenness are thought ofas gatekeepers, or individuals that control information flows between subsets ofthe organization.

Closeness

Closeness centrality[26] calculates who has the most visibility of activities andinformation flows in the network. It is calculated as a normalized average dis-tance from a given node to every other node.

Boundary Spanners

Nodes that connect their group to others usually are more central on a numberof measures than their immediate neighbors whose connections are only local.Boundary spanners are well-positioned to be innovators, since they have ac-cess to ideas and information flowing in other clusters. They are in a positionto combine different ideas and knowledge, found in various places, into newproducts and services. Mathematically, the quality of being a boundary span-ner is calculated as a linear combination of degree, betweenness and closenesscentralities.

Network Centralization

Individual network centralities provide insight into the individual’s location inthe network. The relationship between the centralities of all nodes can revealmuch about the overall network structure. A very centralized network is dom-inated by one or a few central nodes. If these nodes are removed or damaged,

2

Page 9: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

the network quickly fragments into unconnected sub-networks. A highly cen-tral node can become a single point of failure. A network centralized around awell connected hub can fail abruptly if that hub is disabled or removed. A lesscentralized network has no single points of failure and is resilient in the face ofmany intentional attacks or random failures.

1.1.1 Applications of Social Network Analysis

Management and Corporate Applications

As social network analysis emerged into a field of its own, the idea of mappinginformal relationships between company employees - or its customers - wasrecognized as having a profit-bearing potential.Internal corporate applications of SNA are generally centered in the human

resource departments. Applying SNA metrics to the informal networks formedby employees allows managers to recognize communication bottlenecks, findcliques and reorganize the corporate structure for greater efficiency.Companies have also recognized that customer social networks are an impor-

tant marketing tool. While some brands such as Apple Computer and HarleyDavidson [42] for many years have been enjoying a cult status due to the socialnetwork nature of their customer base and word-of-mouth marketing, the officialrecognition of network marketing came in the mid-1990s with the explosion ofelectronic commerce.The term “Viral Marketing” came into existence as goods enabling commu-

nications (from e-mail accounts to fax machines) became common. In a viralmarketing scenario, the value of a good for each customer increases with thesquare (or more) of number of other customers that use it.Six Degrees (www.sixdegrees.com) was an early attempt to capitalize on

marketing potential of social networks by prompting people to bring their net-works of friends and acquaintances in online contact, shortly followed by a num-ber of other sites. However, most explicit attempts to elicit and capitalize on thesocial network data of customers so far have failed as a business, partially dueto lack of analytical sophistication of the websites, as well as a certain stigmathat came from floods of spam unleashed by some of the sites.The more successful approach to use of social networks in a business context

comes in the Customer Relation Management (CRM) front. Sophisticated CRMtools use SNA analysis to transparently map relationships between customersand company staff and to recognize and bolster viral patterns.

Military and Law Enforcement Applications

A primitive form of social network analysis has been in use by the law enforce-ment since the early days of Scotland Yard. However, the original SNA tool oflaw enforcement was merely a blackboard with names of actors involved in orsuspected of criminal activity, connected with lines representing their relation-ships.

3

Page 10: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Modern application such as Analyst’s Notebook[21] combine visualizationand analysis of criminal networks for investigation purposes. While lackingmathematical sophistication of dedicated SNA tools, software targeted for in-vestigative purposes is designed to link to and import data from specializedapplications designed for gathering telephone log and other investigative data.

1.1.2 Shortcomings of Pure Social Network Analysis

We live in a complex world, and our social relationships reflect that complexity.A majority of real-world social networks are multi-mode and multi-plex, featur-ing a plurality of node and edge types. When this data is gathered for an SNAdataset, complex human relationships are flattened into a simple structure. Atbest, multiple graphs are collected, separated by semantics of edges (such asadvice networks, friendship networks, money lending networks). However, col-lecting this kind of data through traditional means of questionnaires is difficultdue to the amount of work that respondents have to do to fill out the surveys.Since most social network metrics only operate on binary graphs, represent-

ing the strength of relationships or frequency of communication via a valuedgraph does not always result in meaningful data.While social network analysis provides a simple and semantically consistent

way to analyze and reason about social structure, it is limited to analyzingsimple networks.The measurement approaches in traditional social network analysis gener-

ally focus on social network measures such as individual degree and betweennesscentrality [9] or conceptual extensions such as Burts[11] structural holes, Gra-novetter’s weak ties[28] and Krackhardts Simmelian ties[36]. While individu-ally and collectively insightful, such measures are dependent primarily on social,friendship, communication or advice networks obtained through survey instru-ments and personal interviews, inhibiting a more comprehensive view of othercritical components in the meta-network of people such as education, skill andexperience. Due to the reductionist approach, results of graph-theoretic analy-sis of human networks are often up for interpretation, and conflicting papers onmeaning of specific kinds of ties are published every year.The other caveat in use of social network measures lies in the scalability of

SNA metrics. Early social network analysis was done only on small datasets (inpart due to the difficulty of obtaining large quantities of relational data), thusbehaviour of SNA metrics on large quantities of data has not been extensivelystudied.Studies on robustness of SNA metrics conducted by Borgatti, Carley and

Krackhardt[48] suggest that many SNA measures become highly correlated withthe size of the network, limiting usefulness of the metrics on networks over 2000nodes.While hand-collected datasets seldom have more then a few hundred nodes,

with the advent of machine-driven data collection tools such as AutoMap[34],the quantity of available data is increasing rapidly.

4

Page 11: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Lack of studies of larger network datasets has spurned another problem:most social network measures scale badly in terms of computational complex-ity. It is not uncommon for an SNA metric to be of a high-order polynomialcomplexity; metrics that involve graph traversal (such as Hamiltonian paths) oroptimization are NP-complete.Finally, social network analysis is sensitive to incomplete data. While this

was less of a problem with the smaller datasets (where quality of data could bemanaged through methods of data collection), machine-collected datasets havea lower fidelity then their hand-collected counterparts.

1.2 Meta-Matrix - Expansion of Network Se-

mantics

Placing sophisticated agents in social networks is insufficient to produce socialbehavior. Instead, the network conceptualization needs to encompass core en-tities such as people, resources, knowledge and tasks, and the relations amongthese entities. This concept is called the meta-matrix (table 1.1). Originallyproposed by Krackhardt and Carley [37] as a device for simultaneously reason-ing about people, resources and tasks, the meta-matrix approach was extendedto include knowledge and groups/organizations. By linking the social network(people to people) to other networks such as task flow (task to task), the agentsin the social network are provided with a framework for accomplishing tasks.The user can thus track the impact of such behavior on the underlying net-works. Linking social and knowledge networks allows observation of changes innetworks linked to organizational performance. From a simulation perspective,social agents could be used in a multi-agent simulation to act out the impact ofchanges in these networks on organizational performance.

1.2.1 MetaMatrix Measures

A number of metrics have been defined on the MetaMatrix models. These met-rics can be used to estimate the likelihood of a new link being formed betweentwo agents; find critical or redundant nodes in the network and locate emergentleaders based on their cognitive demand.

Homophyly and Relative Expertise

The following two measures are used to estimate the probability of creation of anew communication link in the social network or the motivation to communicate.Empirical studies of human communication behavior suggest that, without anyexternal motivation, individuals will spend about 60% of the time interactingon the basis of homophyly and 40% on the basis of need.Carley has defined homophyly[15] to be based on a measure of relative sim-

ilarity RS between agent i and agent j: the amount of knowledge that i and j

5

Page 12: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

People Knowledge and

Skills

Resources Tasks

People Structural knowl-

edge: command

structures and

relationships

between agents

Knowledge Net-

work: who has

access to what

knowledge

Resource Net-

work: who

can use what

resources

Task Assignment:

who does which

tasks

Knowledge Knowledge

Precedence:

types of skills

that go together

Resource Skills:

skills needed to

use a resource

Skill Require-

ments: skills

needed to accom-

plish a task

Resource Resource Prece-

dence: which

types of resources

go together

Resource Re-

quirements:

Which resources

are needed to

accomplish a task

Task Task Precedence:

the sequencing

and precedence

of tasks.

Table 1.1: Meta-Matrix of Organizational Knowledge

have in common divided by the amount i shares with all other agents (includingself), or

RSi,j =

∑K

k=0(Sik ∗ Sjk)

∑I

l=0

∑K

k=0(Sik ∗ Slk)

(1.1)

where Sik is 1 if agent i knows fact k and 0 otherwise.In contrast, relative expertise is defined from a purely knowledge perspective:

how much agent i thinks j knows that i does not know divided by how much i

thinks all others know that i does not know, or

REij =

∑K

k=0((1− Sik) ∗ Sjk)

∑I

l=0

∑K

k=0((1− Sik) ∗ Slk)

(1.2)

Cognitive Demand: Finding emergent leaders

Cognitive demand, described by Carley [19], measures the amount of effort eachperson expends in performing actual tasks, using the knowledge, resource, taskand communication networks of the MetaMatrix.The cognitive demand measure combines static measures of centrality with

dynamic measures of information flow, task performance and resource distri-bution. These measures are based on the meta-matrix knowledge about theorganization and have been shown to accurately detect emergent leaders in anorganization.

6

Page 13: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Key players: Task and Knowledge Exclusivity

Understanding the relative criticality of employees is important in managingturnover and security risks associated with human capital in organizations.Traditional social network analysis measures are based on static, survey-basedassessments of centrality and other sociometric aspects of organizations. Thislimits their effectiveness in fully evaluating human capital criticality, particularlycriticality that may be ”hidden” in the non-social dimensions of an organization.M. Ashworth and K. Carley [7][6] introduced new task- and knowledge-based

measures based on the MetaMatrix[37] designed to overcome such limitations.Their results suggest that while each class of measures provides useful insight oncriticality of organization actors, knowledge-based measures provide the mostrobust predictions of each actor’s contribution to organizational performance.Ashworth and Carley[6] proposed a number of new task- and knowledge-

based measures, the first of which is a Task Exclusivity Index(TEI) that es-sentially measures the extent to which each actor is the only one who can docertain tasks.The second proposed measure, The Knowledge Exclusivity Index (KEI),

measures the extent to which each actor is the only one who possesses certainskills, knowledge or expertise.The study found that no single measure or class of measures perfectly iden-

tifies all critical employees, but that a heuristic application of the proposedknowledge-based measures results in the highest overall accuracy.

1.2.2 Applications of MetaMatrix Analysis

The measures detailed above have found a number of uses in both analysis ofcorporate structure and adversarial networks.Schreiber and Carley[22] link the concepts of the MetaMatrix and measures

based on MetaMatrix data with personnel data from NASA Jet PropulsionLaboratory. The paper shows that measures of cognitive demand and knowledgeexclusivity are accurate predictors of turnover risk posed by key employees. Thestudy used the collected data to inform simulation and project the impact ofturnover within the subject teams.Further, the Organizational Risk Analyzer (ORA) tool developed by J.Reminga

and K.Carley[18] frames the multitude of social network and MetaMatrix mea-sures into a framework of a consistent risk report. The risk reports issued byORA point out critical personnel, measure turnover risk as well as incongruitiesin information and resource distribution.While the main use of ORA is to improve performance of organizations and

corporate networks, the tool can be also used to point out vulnerabilities inadversarial and covert networks - thus finding effective and convenient avenuesof attack against them.Another organizational analysis tool, DyNet[4] enables reasoning about dy-

namic networked organizations by adding a simulation component to MetaMa-trix analysis. The core tool is DyNet - a reasoning support tool for reasoning

7

Page 14: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

under varying levels of uncertainty about dynamic networked and cellular or-ganizations, their vulnerabilities, and their ability to reconstitute themselves.Using DyNet the analyst would be able to see how the networked organiza-tion was likely to evolve if undisturbed, how its performance could be affectedby various information warfare and isolation strategies, and how robust thosestrategies were in the face of varying levels of information assurance.Finding key players in a dynamic organizational network is of top impor-

tance for analysis of covert networks. Covert networks often exhibit informalor highly spread out command structures (see appendix A for more detail) thatmake emergent leaders (such local operational chiefs) pivotal for day-to-day op-eration of the organization. Elimination of these emergent leaders has provenan effective strategy of decreasing the operational capacity of terrorist organi-zations.NetWatch (see chapter 3) has been implemented as a simulation testbed for

detecting key players in dynamically evolving covert networks. One of the majorfindings in NetWatch was that as key players were eliminated using strategiesdescribed in this chapter, the dynamic network exhibited an emergent recoverybehaviour[41] (see appendix C).None of static key player detection algorithms predicted this recovery, and

while using Knowledge Exclusivity Index to detect and eliminate emergent ex-perts was effective, it suffered greatly from lack of data (knowledge data is moredifficult to find through signal intelligence then simpler connection data).Using domain knowledge and symbolic reasoning, as I propose, will enable

researchers to define and search for patterns in the network structures, such aspatterns of succession in event of contingencies. Moreover, results obtained froma symbolic reasoner are more interpretable by human users then these from astatistical measure - and thus more effective in mixed initiative scenarios.

1.2.3 Caveats

Meta-matrix data for the entire organization affords a birds-eye view of the or-ganization at a point in time, similar to a static snapshot. Moreover, if collectedin the real world, it provides a picture of the entire organization which is quitedistinct from what a single person is likely to know. A set of such matrices overtime represent the change trajectory for an organization. Boundedly rationalindividuals make decisions and operate in a climate of uncertainty, with incom-plete and inaccurate knowledge of the world. In essence, they make decisionsusing their perception of the meta-matrix - their personal knowledge and theirknowledge of other’s relations (transactive memory [50]).This distinction between the actual meta-matrix and the agent’s perception

is at the heart of the difference between social network analysis, agent modelling,and multi-agent network simulation.NetWatch is implemented on the basis of agent’s perception of its world,

not measures calculated on the actual network in a global fashion. Thus, it isonly able to reason about what it knows - which is a more realistic simulationtechnique.

8

Page 15: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

While the Social Structure Semantics is merely a tool for integration andreasoning about social structure data, using a specialized ruleset it can detectand reason about differences in network perception between different actors.Social network and MetaMatrix analysis affords researchers considerable

power at representation of social structure. Social network data can be usedto analyze organizations in terms of information flow and structural inconsis-tencies. Furthermore, findings in social network theory have some predictivepower, allowing one to make estimates regarding further evolution of a networkgiven its current state.However, static social network analysis does not capture the dynamic trends

in organizations, and does not provide rigorous semantic framework for suchanalysis. To fully capture dynamism inherent in human networks, one mustturn to computer-enabled approaches of simulation models. In the next chapter,I discuss creation of such models and their descriptive power.

9

Page 16: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Chapter 2

Data Sources

At the dawn of social network analysis, and until recently, social network datasetswere extremely difficult to obtain, and limited in size and scope.The prevailing methodology for collecting social network data was by sur-

vey, either administered to an entire group of people or collected in a snowballfashion. Collection of social network data was done in a way reminiscent ofanthropological data collection - by a human observer embedded into an orga-nization to be studied.This presented a number of problems. First of all, it was very costly to

collect all but the smallest of datasets. While a number of sampling strategieswere investigated, it was generally not feasible to canvass a larger organizationor population.Furthermore, presence of an observer or a survey instrument in an organi-

zation inevitably altered the behaviour of individuals in the organization. Forexample, due to the perception (and sometimes fact) that hiring and firing de-cisions would be made based on the results of the analysis, people would try toappear to be more gregarious and bolster their degree centrality figures.Finally, for some networks, especially covert networks, it is physically impos-

sible to collect a dataset via direct survey administration. The modus operandiof such networks is covertness, and this necessarily limits the data that can becollected on them.Thus, for study of terrorist organizations, one must obtain information via

indirect means. One approach to gathering indirect social network data is viaanalysis of texts. Originally a manual coding technique, text analysis extractsnetwork structure from corpora of text based on co-appearance of people, orga-nizations and other entities.Between September 14, 2001 and November, 2001 Valdis Krebs[38] has as-

sembled a corpus of texts regarding events preceding September 11th attacks.Manual analysis of these texts yielded a dataset (see figure 2.1) which becameone of the definitive sources of data on terrorist organizations and structure ofa terrorist plot.Since 2001, much larger datasets on covert networks are available due to

both increased interest in the research, as well as improvements in tools formachine collection of network data.Some of the newer more complete datasets include these collected by IntelCenter[33],

R. Renfro[43] and M. Sageman[46]IntelCenter is a private think-tank that specializes in gathering intelligence

10

Page 17: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

(a) (b)

Table 2.1: Data on (a) September 11th hijackers collected by Valdis Krebs and(b) Hamas collected by AutoMap

data from news wires and declassified reports. They make available an extensivedatabase of individuals and events related to terrorism in the Middle East, andhave recently published a dataset mapping structure of Al-Qaeda[33].AutoMapt[34] analyzes corpora of text, identifying name entities (actors,

organizations, resources, etc) and linking them based on co-occurrence in sen-tences. After application of a thesaurus, the result is a MetaMatrix structurethat is ready to be analyzed or simulated.For my thesis work, I shall use the NetIntel database (described in chapter 4

and appendix E) to integrate a number of datasets. The datasets mentionedabove will be joined with data on Hamas (see figure 2.1 and Middle-Easternpoliticians collected by AutoMap from news articles. I shall separate the over-all dataset into time slices to conduct time-series analysis and test temporalreasoning capabilities of my software.I posit that integration of multiple data sources, especially combination of

hand-collected and machine-collected data, is the most efficient way to obtainunbiased information on the covert networks.To further test the software systems, I shall create a number of artificial

datasets designed to stress potential weaknesses in the data representation andreasoning algorithms. The software tools will be tested with random networks ofvarying size and density, as well as cellular networks generated from statisticalprofiles to resemble terrorist organizations as we know them.Testing the software on machine-generated data will allow me to conduct

repeatable tests that stress certain aspects of the software and optimize itsperformance. After testing is complete, I shall run a set of virtual experiments(see chapter 3) with the software systems and integrated real data.

11

Page 18: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Chapter 3

Simulation Approaches to Analyzing

Dynamic Networks

While shifts from analyzing single-mode networks to working with complexmulti-mode and multi-plex networks such as the MetaMatrix resulted in a signif-icant increase in the fidelity of representation of human relations, one importantaspect remained un-addressed: the dimension of time.Human networks evolve continuously and rapidly, thus understanding the

dynamics of network evolution under different circumstances is paramount tobeing able to predict effects of policies and management strategies.A number of approaches at introducing dynamic qualities to network analysis

have been proposed, including analytical models, artificial-life based models andmulti-agent network models.Analytical models such as System Dynamics and Blanche[20], combines mod-

elling of social structure and individual attributes. However, application of an-alytical models to real world problems have been shown to be computationallyintractable, thus resulting in over-simplified models.Artificial life and cellular automata models have been shown to reflect some

of the complexity and dynamics exhibited by social systems. However, artificiallife simulations generally do not operate on empirical data, are constrained togrid based structure, and represent cognition at a high level of abstraction. Thisresults in an oversimplified view of the phenomena.I posit that the creation of high-fidelity models of socio-technical systems

requires the combination of analytical models with empirically grounded simu-lation.Multi-agent systems can serve as effective tools for reasoning about human

and group behavior. Their effectiveness is enhanced when the algorithms leadthe simulated agents to behave as humans behave, rather than doing what isoptimal for the task. Such systems are even more effective when the model’sinput is real data and the generated outputs are comparable to the actual datafiles in the real world. Such systems can be created by combining sophisticatedplanning and learning algorithms with extensive knowledge of human behaviorand underlying networks.In traditional social network analysis, behavioral interpretations are drawn

from the actual network which is viewed as constraining and enabling behavior.In agent modeling systems, agents are designed to act optimally for the task athand.

12

Page 19: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

In multi-agent network simulations, agents act in a boundedly rational fash-ion on the basis of their personal perception, emulating what people might do.In order to produce a high-fidelity model of a dynamic organization, it is neces-sary to simulate the world of the agents, including all imperfections. We enablethis by affording every agent a belief structure, which is essentially the agent’sprivate meta-matrix structure populated as the agent interacts with others,learns and performs tasks. In the simplest case, the meta-matrix contains bi-nary values whose meaning is simply existence or lack of connection between twonodes. However, this can be extended with weights assigned to every edge. Forexample, edge weights in the interpersonal network can be interpreted as trustor frequency of communication, as opposed to the existence of a connection.While studying real-world networks, and specifically covert networks such as

terrorist organizations, data can be collected for each cell of the meta-matrix,but it is very difficult to obtain a complete and accurate snapshot of the or-ganization. A key difficulty is in discerning whether a change in the networksis due to better intelligence or actual changes in the network. Thus, analysisalgorithms and simulation systems must be able to operate under uncertainty,provide confidence estimates for results, and approach the domain from a satis-ficing rather then optimization perspective.

3.1 Research Statement

In my dissertation, I will build an extensive model of organizational behaviourbased on the following principles:

• The system will consist of independent, autonomous agents designed withan ability to perceive and affect their surroundings via interaction withother agents and their environment.

• Agents will be endowed with a reasoning capability allowing them to makedecisions and conduct actions in a consistent manner in relation to theirperceived environment.

• Agents will be perceptually and cognitively limited, and thus forced todeal with incomplete and imperfect information in a satisficing manner.

• The design of the agents’ reasoning capabilities will be firmly groundedin literature on human cognition, social psychology and organizationaltheory

• The system will enable simulations to be initialized from real social struc-ture (MetaMatrix) data and thus will allow me to calibrate the simulationby replication of observed events, and, after calibration, the projection oforganizational development into predictions of future performance.

• The system will be instrumented to allow fine-grained control over inputand output parameters, such as injection of exogenous events and extrac-tion of multi-level data about the subject organization.

13

Page 20: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

• The system will be specifically geared towards simulation of adversarialnetworks, and tested in the context of modelling terrorist organizationsand effectiveness of anti-terrorism strategies.

I will demonstrate the power of this methodology by creating a multi-agentnetwork model of covert networks (such as terrorist organizations). The modelwill incorporate symbolic and statistical reasoning about social structure, re-sulting in a complex simulation-oriented multi-agent system that incorporatesplanning and learning algorithms, while built on an extensive model of socialnetwork phenomena and social-psychological findings.In doing so, I will show that AI algorithms and multi-agent systems, com-

bined with an analytic approach, create a generalizable and valuable simulationtoolbox for studying complex social systems.

3.1.1 Performance Evaluation

NetWatch will be tested in a sequence of virtual experiments that compareperformance of different strategies for detection of key personnel in a covertnetwork, and network destabilization tactics.Virtual experiments will be based on a number of datasets (see chapter 2)

including Al Qaeda, an integrated Middle East dataset and one of Hamas. Keydifferences between strategies’ performance on these datasets will be correlatedwith inherent properties of the datasets, including structure, information distri-bution, and quality of data.To analyze the performance of the software system, trial runs will be con-

ducted with randomly generated (but structurally similar) large networks of upto 10,000 nodes. Losses to the correlation of network measures to network sizeand density will be assessed in the dynamic simulation environment.

3.1.2 Replication and Third-party Usage

I shall solicit outside users to use NetWatch and will make it available for down-load, along with a simple user’s manual. 3rd-party user experiences will becollected to further test the simulation engine and detect vulnerabilities in itsalgorithms.

3.2 Preliminary Results

Based on the conceptual framework of multi-agent simulations illustrated above,I have developed NetWatch, a multi-agent network model for reasoning aboutthe destabilization of covert networks such as organized crime or terrorist orga-nizations under conditions of uncertainty.NetWatch is built to simulate the communication patterns, information and

resource flows in a dynamic organizational network based on cognitive, tech-nological and task based principles. In addition, the model is grounded usinginformation about surveillance technologies and intelligence operations (e.g. [1])

14

Page 21: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

and the covert networks (e.g. [8]). The process of gathering intelligence onan organization is simulated, enabling the evaluation of diverse heuristics andtechnologies for data gathering. Using NetWatch the user can conduct a vul-nerability analysis, examine potential emergent reactions of covert networks inresponse to attacks, and evaluate diverse destabilization strategies.NetWatch agents are intelligent, adaptive information processing systems,

constrained and enabled by the networks in which they are embedded. Thesenetworks evolve as individuals interact, learn and perform tasks. In greaterdetail, the multi-agent network paradigm is based on the following postulates:

• Agents are independent, autonomous entities endowed with some intelli-gence, though cognitively limited and boundedly rational.

• Agents and the networks in which they are embedded co-evolve.

• Agents do not have accurate information about the world or other agents,limited by their perception.

• Agents can learn the state of the world through interaction.

• Agents can be strategic about their communication.

• Agents do not use predefined geometrical locations or neighborhoods.

Agents obtain information via interaction with other agents. The accuracyof an agent’s perception of another decreases with the distance between them inthe social network. This corresponds with the empirical reality where people’sknowledge of each other decreases exponentially as the social distance betweenthem [35] increases.A more complete technical description of NetWatch and preliminary results

based on studies of the Al-Qaeda terrorist network and a set of anti-terrorismstrategies can be found in appendices A,B and C

15

Page 22: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Chapter 4

Enabling Technologies: Storage,

Manipulation and Interchange of Rich

Social Structure Data

In the past, social network (and more complex social structure) data has beentreated as distinct “datasets”. Each of these datasets represented a largely self-contained conceptual chunk: a snapshot of a group or organization, frozen intime. As one proceeded with analysis, new aspects of the dataset were born,again largely self-contained but, in a researcher’s mind, still connected with theoriginal data.However, as quality - level of detail and granularity - of social structure data

increased, its quantity rapidly increased as well. Thus, it is inevitable that ad-hoc approaches to storage and manipulation of such data will become obsolete.Existing ways of data representation did not fare well with multi-mode matricesor with rich data (where every node and edge carried multiple attributes).As quantities of data increased (largely facilitated by automated text analysis

tools[34]), it became apparent that new ways to query, manipulate and extractsubsets of the data were required.Furthermore, as research groups in the field joined in large-scale projects,

there arose a need for a well-defined data interchange format.In light of the problems outlined above, I propose to create:

1. a database-backed system for easy storage of large quantities of socialstructure data;

2. a set of data manipulation tools that enable powerful query capabilities

3. a rigorously defined data interchange language that will tie together aheterogenous set of storage, manipulation and analysis tools

4.1 Storage Requirements for Social Structure

Data

Social structure data will be stored in a relational or object-oriented databasecapable of manipulating large quantities of data. The database shall have ad-vanced query capabilities (both SQL and procedural), as well as stored proce-dure and trigger capabilities.

16

Page 23: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

The structure of the database shall be defined in a way that preserves thecharacter and integrity of the data (i.e. is aware of underlying graph propertiesof the data). The structure shall be designed in an extensible manner, allowingeasy addition of new attributes, node and edge types.The database system shall not only keep track of the units of social structure

data (such as nodes and edges) but also sources of such, enabling the creationof large-scale multi-source datasets while preserving the original data sources.The database shall have an easy-to-use web-based interface, allowing users to

enter, search and edit data as well as access manipulation and query capabilitiesdescribed below.

4.2 Requirements for Data Manipulation Tools

The data manipulation tools shall be closely coupled to the database systemdescribed above. The foremost requirement for the subsystem is the ability toextract subsets of the data based on:

• the source (or sources) of data (e.g. “Find all social structure data thatcame from New York Times” or “Find all data that came from New YorkTimes article from 10/10/2003”)

• attributes of nodes and edges (e.g. “What is the network of people whowere born in Syria?”)

The manipulation tools shall be able to extract subsets of the network basedon graph-theoretic properties of the network such as graph distance (e.g. “Findall nodes at a graph distance of 2 or less from a given node”) and graph density(e.g. “Find all nodes embedded in subgraphs with given density”).The manipulation tools shall allow easy completion of incomplete datasets

(e.g. “Given a set of people, find all organizations and resources connected tothem”)The query tools shall enable the creation of time-slices from the complete

dataset of any subset thereof, if time-dependent data is present.Finally, the query tools shall be easily combined into scripts, resulting in

extremely powerful structure-aware data manipulation capability.

4.3 Requirements for Data Interchange

1. The data interchange format shall be contained in human-readable textfiles that are at the same time easily parsable by computers.

2. The data interchange format shall allow an entire dataset, complete withall computed measurements, to be stored in one file.

3. The data interchange format shall provide maximum expressive power toits users, allowing:

17

Page 24: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

• Typed nodes (types may include “person”, “resource”, “organiza-tion”, “knowledge”, etc)

• Multiple sets of nodes of the same type (to express multiple unitswithin the company, etc)

• Multiple typed attributes per node

• Typed edges

• Multiple typed attributes per edge

• Multiple graphs (sets of edges) expressed within the same file

• Dynamic network data expressed in a single file

4. The data interchange format shall allow developers to extend it in a fashionthat will not break existing software.

5. The data interchange format shall be flexible enough to be used as bothinput and output of analysis tools.

4.4 Preliminary Results

I have developed a set of tools to facilitate the transmission, manipulation andstorage of social structure data as outlined above.NetIntel is a database system for the storage and manipulation of rich so-

cial structure data. The database is implemented using PostgreSQL databaseengine, and includes a yet incomplete, but rapidly growing set of manipulationtools, implemented as complex SQL and PL-SQL procedures.I have also developed an easy-to-use web interface allowing entry, browsing

and querying of network data from any Internet-connected computer, and a setof command-line tools for interfacing with data manipulation and extractiontools within the database.Complete technical description of NetIntel can be found in appendix E.To enable interchange and transmittal of rich social structure data, I have

designed DyNetML, an XML-based language that fits the requirements outlinedabove. More information on DyNetML can be found in appendix DWhile DyNetML is still a work in progress, it is slowly gaining industry

acceptance as a data interchange format. It is supported by all tools developedat the CASOS laboratory at Carnegie Mellon University, and is a part of ORA,a MetaMatrix-based organizational network analysis tool.As of August 2004, DyNetML will be natively supported by UCINET, the

premier software package for social network analysis. Native support is alsounder development for a number of software tools at the Department of De-fence. Through the use of translation tools, DyNetML is also used by Aptima,University of Connecticut and a number of other companies and institutions.NetIntel is currently receiving its first tests with very large datasets, includ-

ing up to 3.6 million nodes and potentially billions of edges.

18

Page 25: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

4.5 Usability and Documentation

I have published a technical report detailing use of DyNetML as a data in-terchange language, and conduct one-on-one support with outside users of thelanguage. Ongoing user experiences are collected and will be used as materialfor review of the structure of DyNetML, and in preparation for release of anew version of the standard. However, every effort will be made to preservebackward compatibility with the original version.When the NetIntel database technology matures, I shall publish a technical

report that will serve as a users and programmers manual for 3rd party users. Iwill also make the database structure, WWW interface and import/export toolsavailable for download by interested parties, pending approval of the IntellectualProperty Office.A section in the final dissertation shall be devoted to user experiences and

development of data interchange standards based on DyNetML.

19

Page 26: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Chapter 5

Semantic Reasoning and Social

Structure

Traditional social network analysis, as described above, operates on a fairlysimple semantic notion of nodes representing individuals and edges representingconnections between them, thus forming a graph. However, the introductionof more detailed modelling paradigms such the MetaMatrix[37][12] models in-creasingly overloads the semantics of social structure.As a result, mere graph-theoretic analysis of the resultant networks is no

longer a sufficient means of reasoning about social structure and must be sup-plemented with exogenous domain knowledge in order to derive meaning behindnumeric measures. However, the graph semantics does not readily allow formachine-interpretable encoding of such domain knowledge.A number of approaches to resolve this problem have been posed.

Relationship[32] is an RDF[23] schema that defines a vocabulary for describ-ing social interactions and relationships between people. However, definitionof a vocabulary falls short of rigorously specified social relationship semantics.While a vocabulary set can be negotiated and agreed to by a community ofresearchers, it will remain incomplete as the richness of human relationshipspresents more nuances that is possible to express in a finite vocabulary.However, a more serious complication of a purely vocabulary-based specifi-

cation of relationships is that a social network defined using this vocabulary ismerely a labelled graph. While such graphs are widely used to communicate re-lationship information to human users, it is not possible for computers to reasonabout such labelled graphs without an understanding of natural language.Of even more importance is a semantic ambiguity of edge direction and value,

i.e. - does it mean the same thing if an Agent is connected to an Organization

or Organization is connected to Agent? Since the MetaMatrix contains manysubmatrices with heterogeneous node types, and traditionally only included anUpper Triangular portion of the matrix, the directionality of heterogeneousedges was either lost completely (as the lower triangular went ignored) or po-tentially misinterpreted (does reversal of an edge’s direction change the meaningof an edge?).We cannot overlook the fact that some edges have different properties then

others. For example, subordination edges (“isSuperiorTo”) for people or inclu-sion edges (“isPartOf”) for organizations (as well as a number of others) aretransitive - i.e. the boss of my boss is also my boss. A number of other special

20

Page 27: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

properties can be defined as well. Graph models to date do not offer sufficientreasoning capabilities to resolve transitivity of edges, especially in a contextwhere nodes and edges of multiple types coexist.A further problem with the ever-expanding MetaMatrix is the fact that

the expansion process does not scale well. Let us suppose that a MetaMatrixincludes N types of nodes; expansion of the model to deal with N + 1 types ofnodes will require definition of meaning and measures upon (N + 1)2 − N2 =2N+1 submatrices. I consider this trend to be counter-productive and prone tocreation of semantic holes - areas in datasets where data exists but no adequateexplanation or analysis can be performed.

5.1 Research Statement

I propose the creation of robust, scalable and machine-interpretable object-oriented semantics for reasoning about social structure. Based on robust knowl-edge representations techniques defined by Artificial Intelligence researchers,the new semantics allows for symbolic as well as numeric reasoning about socialstructure.The root of the proposed methodology is in definition of complex meaning

of entities in terms of a small number of axioms and a rule-based language thatallows machine interpretation and reasoning on social structure data.This will enable a multi-layered approach to reasoning about social structure

through the use of object-oriented concepts of inheritance and polymorphism.To ground the results in the existing body of social network analysis work,

rigorous mappings will be defined between graph-based social network analysisand the proposed social structure semantics.More semantically complex models such as the MetaMatrix can be defined as

inheriting from simpler models — e.g. graph-level reasoning can still be appliedto single-mode subgraphs of the MetaMatrix.A more complex domain semantics model will reside on the next level of

complexity. Note that I do not propose a specific domain-level semantics orvocabulary - these will be a function of the subject datasets. However, I dodefine a rigorous language that facilitates specification of domain semantics,and provides basic building blocks for doing so.A more detailed description of Social Structure semantics and accompanying

language can be found in appendix F.I proceed to define the reasoning algorithms that make full use of the se-

mantic information encoded in the data, and design the software tools thatimplement these algorithms and apply them to real data. While this is still awork in progress, the current candidate algorithm is based on lazy-evaluationforward chaining. The worst-case computational complexity of the chaining al-gorithm is O(e) where e is the number of edges, or n2 where n is the numberof nodes. However, most social structure graphs are sparse, thereby making theaverage-case complexity much lower.

21

Page 28: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

5.1.1 Performance Evaluation

A number of performance evaluation studies will be conducted with Social Struc-ture Semantics.A first set of tests will involve backward compatibility of the Social Structure

Semantics and existing social network and MetaMatrix datasets. This shall bedone by porting major algorithms and measures from each of the two domains tothe new language, and testing performance of these algorithms using standarddatasets.As reasoning algorithms for social structure semantics evolve, I shall test per-

formance of each using a number of datasets designed to stress the algorithm’sweak points. I shall simulate networks of different sizes and densities, as wellas dynamic datasets that cause oscillations in the reasoning system. A rigorousstudy of the alternative algorithms will be published as a separate paper or asa chapter of my dissertation.

22

Page 29: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Chapter 6

Conclusion

In my dissertation, I will advance the state of the art of reasoning about complexsocial systems by employing artificial intelligence algorithms and knowledgerepresentation structures.I will implement the proposed Social Structure Semantics language, inter-

preter and reasoning system. Extensive replication studies will be conductedto ensure that the social structure semantics fulfills is both powerful and back-wards compatible with existing methodologies of social network analysis andMetaMatrix analysis. Further testing will also include performance testing ofreasoning algorithms.Enabling technologies of data storage, manipulation and interchange shall

be enhanced to bridge the new symbolic reasoning system with data sources,manipulation tools and existing analysis and visualization tools.I shall enhance the NetWatch simulation engine to take advantage of so-

cial structure semantics, and conduct extensive virtual experiments comparingstrengths and weakness of symbolic and statistical approaches to analysis ofdynamic covert networks.

23

Page 30: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

6.1 Timeline

A large amount of work on simulation of covert networks, as well as data inter-change and manipulation technologies, has been already completed. Due to thisfact, I propose completion of my dissertation within Spring semester of 2005.The approximate schedule of thesis completion can be found on the Ganntt

chart on figure 6.1.The highlights of the schedule are following:

• August—October, 2004

– Major implementation work on social structure semantics and com-pletion of the interpreter.

– Implementation, testing and performance analysis of reasoning algo-rithms.

– Ongoing testing and improvements in other tools and enabling tech-nologies

– Integration of Social Structure Semantics, DyNetML and databasetechnologies

– Writing: First publications on Social Structure Semantics submit-ted to journals. Technical report documenting design of the languageand interpreter.

• November, 2004—January, 2005

– Symbolic reasoning about covert networks

– Design of domain semantics; testing with existing datasets

– Integration of symbolic reasoning with NetWatch;

– Virtual experiments demonstrating the power of symbolic reasoning

– Implementation of Bayesian reasoning within Social Structure Se-mantics

– Writing: Use of social structure reasoning against covert networks(paper submitted to journal). Paper on Bayesian reasoning submit-ted to a journal. User manuals for DyNetML and database system;published as technical reports.

• January—February, 2005

– Integration of previous publications in to a coherent structure

– Thesis writing 50% complete

– Revision process begins for completed thesis chapters

• March, 2005

– Thesis writing complete

– Full-time revision/editing process

• Projected Thesis Defence: April 2005

24

Page 31: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Socia

lStructure

Sem

antic

s

Aug-04 Sep-04 Oct-04 Nov-04 Dec-04 Jan-05 Feb-05 Mar-05 Apr-05Social Structure Semantics

Implementation of language and interpreter XXXXX XXXXX XXXXXBridging of Semantic Intepreter with existing analysis tools, DyNetML, Database XXXXX XXXXXPerformance Testing of reasoning algorithms XXXXX XXXXX XXXXXBayesian Reasoning within Social Structure Semantics XXXXX XXXXXImplementation of Social Network Semantics and replication of SNA results XXXXX XXXXXImplementation of MetaMatrix Semantics and replication of MetaMatrix results XXXXX XXXXXImplementation of Domain Semantics for reasoning on covert networks XXXXX XXXXX XXXXX

NetWatchBridging NetWatch with existing data via NetIntel database XXXXXNew virtual experiments using existing ifrastructure XXXXX XXXXX XXXXXUsing Social Structure semantics in for Reasoning about covert networks in NetWatch XXXXX XXXXX XXXXX

Enabling TechnologiesRevision of database structure for compatibility with Social Structure Semantics XXXXXRevision of DyNetML structure for compatibility with Social Structure Semantics XXXXXStandard-setting, user support activities and 3rd-party adoption of DyNetML: Ongoing XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXX XXXXXBridging NetIntel with 3rd-party databases; user support: ongoing XXXXX XXXXX XXXXX XXXXX XXXXXSocial Structure Query Language: Using Social Structure Semantics for targeted database retrieval XXXXX XXXXX

WritingUsing Social Structure Semantics to reason about missing network data XXXXXSocial Structure Semantics language documentation/reference XXXXXReplication studies: Social Structure Semantics vs SNA XXXXXReplication studies: Social Structure Semantics vs MetaMatrix XXXXXSymbollic Reasoning Algorithms for Social Structure (results of testing) XXXXXBayesian Reasoning for Social Structure Semantics XXXXXUse of Social Structure Semantics in reasoning about covert networks XXXXX XXXXX (results from NetWatch experiments)

DyNetML and Database Manual (Users Guide, Implementor's Guide) XXXXX XXXXXSocial Structure Semantics Software Manual XXXXX XXXXXSocial Structure Query Language: Using Social Structure Semantics for targeted database retrieval XXXXX XXXXX

Thesis Writing XXXXX XXXXX XXXXX XXXXXRevision and Editing XXXXX XXXXX XXXXX

Projected Thesis Defence Date - April-May 2005 X

Figure6.1:

Thesis

Completion

Schedule

25

Page 32: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Appendix A

NetWatch: Simulating and Reasoning

about Dynamic Covert Networks

A.1 Background

For reasons of national security it is important to understand the propertiesof terrorist organizations that make such organizations efficient and flexible.Based on this understanding, successful strategies can be devised to destabilizesuch organizations or curtail their efficiency, adaptability, and ability to moveknowledge and resources. The assessment of destabilization strategies poses anumber of key challenges. What does the underlying organization look like?Does it evolve? What strategies inhibit or affect the evolution so that theorganization is destabilized? In this paper, we provide an approach to assessingdestabilization strategies that draws on work in organization science, knowledgemanagement and computer science.Terrorist organizations are often characterized as cellular organizations com-

posed of quasi-independent cells and distributed command. In a sense, this isa non-traditional organizational configuration; hence, much of the knowledgein traditional organizational theory, particularly that focused on hierarchies ormarkets, does not apply. Lessons can be learned from the work on distributedand decentralized organizations that provides some guidance. This work demon-strates that such structures are often adaptive, useful in a volatile environment,and capable of rapid response [39][3]. In other words, we should expect terror-ist organizations to adapt, and adapt rapidly. This suggests that in general,they should be difficult to destabilize; however, the traditional organizationalliterature provides little guidance on how to destabilize the organization.In general, the organization’s form or design profoundly influences its per-

formance, adaptability, and ability to move information [5]. It follows thatorganizations can be destabilized by altering their design. One caveat here isthat organizations, particularly more distributed and decentralized ones, evolvecontinuously [2]. Terrorist organizations are often characterized as such dynamicnetworks and in which the connections among personnel define the nature ofthat evolution. This suggests that social network analysis will be useful in char-acterizing the underlying structure and in locating vulnerabilities in terms ofkey actors. Unfortunately, the dynamic nature of these networks means that itis not clear whether the actors identified as key using standard network analysiswill remain key long enough for the destabilization tactics based on standard

26

Page 33: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

network analysis to be effective.In general, organizations evolve as they face unanticipated changes in their

environment, rapidly evolving technologies, and intelligent and adaptive oppo-nents. Over the past decade, progress has been made in understanding the setof factors that enable adaptation and partially validated models of adaptivenetworks now exist [14]. A key result is that in the short run, there appears tobe a tradeoff between adaptivity and extremely high performance in organiza-tions [19]. This suggests that forcing an organization to adapt should reduceits performance. Thus, even if an actor is no longer key, the mere isolation ofthat actor may be sufficient to be disruptive. However, to assess this a modelof organizational change and network healing is needed.Since the destabilization of terrorist networks could inhibit their ability to ef-

fect harm, there is a profound need for an approach that would allow researchersto reason about dynamic cellular networks and evaluate the potential effect ofdestabilization strategies. To be useful, such an approach must account for thenatural evolution of cellular networks. This situation is further complicated bythe fact that the information available on the terrorist network is liable to beincomplete and possibly erroneous. Hence, destabilization strategies need to becompared and contrasted in terms of their robustness under varying levels andtypes of information error. In other words, it would be misleading to judgedestabilization strategies in terms of their impact on a static an unchangingnetwork [17].These problems suggest the need for a new methodological approach. In this

paper, we provide an approach based on the use of a multi-agent network modelof the co-evolution of the network of “observers” (the blue network) and the“terrorists” (the red network) in which the observers can capture only partialdata on the underlying covert network and the covert network evolves bothnaturally and in response to attacks by the observers. This approach builds offof organization theory and social network theory, as well as machine learningand dynamic network analysis. Specifically, we have developed a computationalmodel of dynamic cellular organizations and used it to evaluate a number ofalternative strategies for destabilization of cellular networks.It is important at the outset to note that this examination of destabilization

strategies is highly exploratory. We make no claims that the examination ofdestabilization strategies is comprehensive, nor that the types of “error” in thedata that intelligence agencies can collect is completely described. Further, ourestimate of the structure of the covert network is based on publicly availabledata much of which is qualitative and requires interpretation. This work shouldtherefore be read as a study in the power of an empirically grounded simulationapproach and a call for future research. We restrict our analysis to a structuralor network analysis and focus on what the covert network looks like, how itsstructure influences its performance and its ability to pass information, howit evolves, how its evolution can be altered (its behavior destabilized) throughinterventions focused on the nodes, and what interventions should be taken giventhe level of fidelity in the information that we have. Admittedly, in this complexarena there are many other factors that are critical, but they are beyond the

27

Page 34: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

scope of this study. Thus, from a straight social network perspective, this studysuggests the types of methodological issues that will emerge when working withdynamic large scale networks under uncertainty.To ground this paper a short case description is provided of Al Qaeda with

the focus on the network structure. In these two descriptions we do draw onboth military and organizational theory. This is followed by a discussion of theintelligence agencies engaged in anti-terrorist activity and the possible data anderrors in said data. Our intent here is to demonstrate, at a fairly high level,the context and the resultant information and modelling problems and not toprovide a full analysis for intelligence or military operations. As good scienceoften emerges from attacking hard real world problems we are trying to providesufficient detail to understand the bases for the problems that research mustaddress rather than simply provide a high theoretical description of generaldata problems. This is followed by a brief discussion of the applicability oftraditional social network analysis and the need to take a dynamic networkperspective. We then describe a computational model of terrorist organizationsas dynamic evolving networks, and anti-terrorist bodies with emphasis on theirinformation collection and destabilization strategies. A virtual experiment usedto examine destabilization strategies and the results are then discussed.

A.2 Covert Terrorist Networks - the Al Qaeda

In order to be successful in combating covert networks, military theory suggeststhat extensive knowledge of the adversary is required. In ”The Art of War”[49],Sun Tzu wrote:

Know your enemy and know yourself; in a hundred battles, you willnever be defeated. When you are ignorant of the enemy but knowyourself, your chances of winning or losing are equal. If ignorantboth of your enemy and of yourself, you are sure to be defeated inevery battle.

From an organizational theory perspective what this means is that successin a competitive environment requires knowing your strengths and weaknessesand those of your adversary.Terrorism is a modus operandi through which targeted violence is used

against non-combatants in order to achieve political objectives or strategicgoals [45]. Terrorist organizations can be classified as state-sponsored or extra-national. State-sponsored terrorist organizations receive direct support fromtheir host countries. This support can manifest in various ways, beginning withfinancial aid to terrorists or terrorist organizations, to training of terrorist oper-atives and escalating to initiation or marshaling of terrorist attacks and directinvolvement of governmental units of the state in terrorist attacks. Often suchorganizations serve as extensions of the intelligence or secret service agencies ofthe host countries.

28

Page 35: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Activity of such organizations can often be effectively curtailed by political ormilitary pressure upon the sponsoring countries. State-sponsored organizationsthat receive direct assistance from the sponsor states also tend to be organizedin a hierarchical fashion similar to the rank structure of the supporting army andessentially comprises an extension of one — and can be fought with traditionalmilitary techniques.Extra-national terrorist groups generally serve to advance the interests of

their leaders or direct backers (whether political, religious or commercial) andspan multiple nations in their search for operatives and resources. Extra-national terrorist networks may enjoy support of one or several states whosepolitical agendas coincide with the goals of the organization - but ultimatelyare not dependent on state support due to their ability to find independentfinancial backing from wealthy sympathizers. Generally such groups are struc-tured in a way similar to organized crime syndicates and employ networks ofquasi-independent cells scattered through the region of operation of the organi-zation as well as other countries that could be used as resource bases, recruitingand training centers.

Al Qaeda, arabic for “The Base”, is the largest known extra-national terroristorganization. It is a large dynamic network, estimated to have the support of6-7 million radical Muslims worldwide, of which 120,000 are willing to take uparms [30]. Its reach is global, with outposts reported in Europe, Middle East,East Asia and both Americas. In the Islamic world, its task is to purify societiesand governments according to a strict interpretation of the Koran and to usereligion as a unification force for the creation of an Islamic superpower state.In the non-Islamic world, its task is to compel governments to withdraw their

cultural influences and military ties from the Islamic world. While Al Qaedaenjoys support of wealthy individuals in a number of countries, it does not havedirect support of any government. The Taliban government of Afghanistandirectly supported Al Qaeda by allowing them to create training centers andbases on their territory. The involvement of the Afghan government was notcrucial for the strength of the Al Qaeda organization. In fact, the relationshipbetween al Qaeda and the Taliban was more of an exchange with the Talibanhosting the training bases and recruiting centers and al Qaeda providing theTaliban with trained soldiers and officers as well as serving as a domestic securityservice within the country [8].As Goolsby[27] stated, Al Qaeda extends its reach and recruits new member

cells via the adoption of local Islamic insurgency groups. Beginning with pro-vision of operational support and resources to facilitate growth, Al Qaeda rep-resentatives work to transform an insurgency group such as Jemaah Islamiyya(Indonesia) from a group seeking political change to a full-fledged terrorist orga-nization executing multi-casualty attacks such as the Bali bombing in 2002[29].Al Qaeda’s global network, as we know it today, was created while it was

based in Khartoum, from December 1991 to May 1996. To coordinate its overtand covert operations as Al Qaeda’s ambitions and resources increased, it de-veloped a decentralized, regional structure. Al Qaeda pursues its objectivesthrough a network of cells, associate terrorist and guerilla groups and other af-

29

Page 36: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

filiated organizations. For instance, the Sudanese, Turkish and Spanish nodesran clandestine military activities in Europe and North America.The worldwide nodes appear to have no formal structure and no hierarchy.

Assignments are often carried out by individuals and small groups designatedfor the purpose as “the person responsible”. The regional nodes appear not tohave a fixed location and move quickly when dictated by the political situationin the region. Al Qaeda shares expertise, transfers resources, discusses strategyand sometimes conducts joint operations with regional terrorist groups.Although the modus operandi of Al Qaeda is cellular, familial relationships

play a key role. As an Islamic cultural and social network, Al Qaeda membersrecruit from among their own nationalities, families and friends. What givesAl Qaeda its global reach is its ability to appeal to Muslims irrespective oftheir nationality, enabling it to function in East Asia, Russia, Western Europe,Sub-Saharan Africa and North America with equal facility.Unlike conventional military forces which are often hierarchical and central-

ized, terrorist militant units are often small, dispersed and seemingly disorga-nized. Nevertheless, they have been able to effectively counter much larger con-ventional armies. Large terrorist organizations operate in small, dispersed cellsthat can deploy anytime and anywhere [44]. Dispersed forms of organizationallow these networks to operate elusively and secretly.The apparent structure of the Al Qaeda is not exclusive to such militant or

terrorist groups. Indeed, they bear a familiar resemblance to the structure ofother resistance groups. For example, a study published in 1970 by L. Gerlachand V. Hine [40] concluded that U.S. social movements, such as the environ-mental and anti-war movements in the 1960s, were structured as “segmented,polycentric, and ideologically integrated networks” (SPINs).

“By segmentary I mean that it is cellular, composed of many dif-ferent groups... . By polycentric I mean that it has many differentleaders or centers of direction... . By networked I mean that thesegments and the leaders are integrated into reticulated systems ornetworks through various structural, personal, and ideological ties.Networks are usually unbounded and expanding... . This acronym[SPIN] helps us picture this organization as a fluid, dynamic, ex-panding one, spinning out into mainstream society.”

The dynamics exhibit by SPINs appears to exist in both these social move-ment groups as well as in various terrorist, criminal and fundamentalist networksaround the world [44].However, unlike many protest movements, terrorist and criminal networks

often wish to remain covert. The need for security dictates that terrorist organi-zations must be structured in a way that minimizes damage to the organizationfrom arrest or removal of one or more members [24]. This damage may be direct- making key expertise, knowledge or resources inaccessible for the organization,or indirect - exposing other members of the organization during interrogations.There are several factors that allow a terrorist organization to remain covert,including:

30

Page 37: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

• Strong religious (in case of Islamic groups) or ideological (in case of SenderoLuminoso and other South American guerilla groups) views that allowmembers to form extremely strong bonds within a cell.

• Physical proximity among cell members, often to the extent of sharingliving quarters, working and training together.

• Lack of rosters on who is in which cell.

• Cell members being given little knowledge of the organizational structureand the size of the organization.

• Little inter-cell message traffic.

• Information about tasks issued on a need-to-know basis, so very few peoplewithin the organization know about the operational plans in their entirety.

A need-to-know information policy can be counterproductive when an orga-nization needs to complete a task that is larger than anyone cell. Further, suchpolicies tend to lead to duplication of effort and reduce the ability of one cell tolearn from another. To fix these inefficiencies, terrorist organizations have beenknown to employ “sleeper links” - where a small number of members of eachcell have non-operational ties (such as family ties, ties emerging from commontraining, etc) to members of other cells [38]. These links are rarely activatedand are used mainly for coordinating actions of multiple cells in preparation fora larger operation.To remain covert, the Al Qaeda has structured itself as a leaderless design

characterized by its organic structure, horizontal coordination, and distributeddecision making. However, the need to maintain a strong ideological foundationand resolve coordination issues has led to the need for strong leadership. Oneapparent solution has been to have multiple leaders diffused throughout thenetwork and engaged in coordinating activities, without central control or ahierarchy among the cells. Whether the leaders are themselves hierarchicallyorganized, even though the cells are not, is less clear.Under constant pressure from various world governments, terrorist organiza-

tions have evolved a structure that appears to be resilient to attacks. However,information on these terrorist organizations, their membership, the connectionsamong the members, and so on is, at best, incomplete. Available information isoften obtained during post-factum investigations of terrorist acts, and may offerlittle insight into the “main body” of the organization or the way in which it isevolving.Substantial intelligence effort is needed to piece together the massive amount

of often misleading information, both post-factum and “logs” of activity, togenerate a picture of the entire organization. Nevertheless, the picture that isemerging suggests that terrorist organizations are organized at the operationallevel as cellular networks rather than as hierarchies [16].

Cellular networks are different from traditional organizational forms as theyreplace a hierarchical structure and chain of command with sets of quasi-independent

31

Page 38: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

cells, distributed command, and rapid ability to build larger cells from sub-cellsas the task or situation demands. In these networks, the cells are often small,only marginally connected to each other, distributed geographically, and maytake on entirely different tasks.Each cell is functionally self-sufficient and is capable of executing a task

independently. Cells are loosely interconnected with each other, for purposesof exchanging information and resources. However, the information is usuallydistributed on a need-to-know basis and new cell members rarely have the sameexact skills as current members. This essentially makes each individual cellexpendable. The removal of a cell generally does not inflict permanent damageon the overall organization or convey significant information about other cells.Essentially, the cellular network appears to morph and evolve fluidly in responseto anti-terrorist activity.The fact that covert networks are often built-up from self-similar and some-

what self-sufficient cells leads to a hypothesis that cells throughout the networkcontain structurally equivalent[25] and essential roles, such as ideological orcharismatic leaders, strategic leaders, resource concentrators, and specializedexperts as needed given the modus operandi of the cell or its environment.Given this hypothesis, we can further reason that operations of a particular

cell will be affected in a negative way by the removal of an individual fillingone of these roles. Using this as a base for further exploration, we venture toshow in this paper that cellular networks indeed contain vitally important andstructurally equivalent roles, which can be detected through the use of dynamicsocial network analysis on the organizational MetaMatrix.

A.3 Modeling Dynamic Networks

Based on the conceptual framework of multi-agent simulations, we have devel-oped NetWatch, a multi-agent network model for reasoning about the desta-bilization of covert networks such as organized crime or terrorist organizationsunder conditions of uncertainty.NetWatch is built to simulate the communication patterns, information and

resource flows in a dynamic organizational network based on cognitive, tech-nological and task based principles. In addition, the model is grounded usinginformation about surveillance technologies and intelligence operations (e.g. [1])and the covert networks (e.g. [8]). The process of gathering intelligence onan organization is simulated, enabling the evaluation of diverse heuristics andtechnologies for data gathering. Using NetWatch, the user can conduct a vul-nerability analysis and examine potential emergent reactions of covert networksin response to attacks, as well as evaluate diverse destabilization strategies.NetWatch agents are intelligent, adaptive information processing systems,

constrained and enabled by the networks in which they are embedded. Thesenetworks evolve as individuals interact, learn and perform tasks. In greaterdetail, the multi-agent network paradigm is based on the following postulates:

32

Page 39: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

• Agents are independent, autonomous entities endowed with some intelli-gence, though cognitively limited and boundedly rational.

• Agents and the networks in which they are embedded co-evolve.

• Agents do not have accurate information about the world or other agents,and are limited by their perception.

• Agents can learn the state of the world through interaction.

• Agents can be strategic about their communication.

• Agents do not use predefined geometrical locations or neighborhoods.

Agents obtain information via interaction with other agents. The accuracy ofan agent’s perception of another decreases with the increase in distance betweenthem in the social network. This corresponds with the empirical reality wherepeople’s knowledge of each other decreases exponentially with and increase insocial distance between them [35].

33

Page 40: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Appendix B

Technical Description of NetWatch

B.1 Social Networks in NetWatch

In NetWatch, agents communicate on the basis of network membership. Agentslearn of other agents outside their ego network via interaction with agents intheir ego network and a process of introduction. Therefore, networks are repre-sented as a directed graph structure representing the probability of communi-cation (social proximity) pij between two agents ai and aj :

Net = AG,P

AG = ai : Set of agents (B.1)

P = pij : ∀ai, aj ∈ AG

The directed nature of the graph Net allows the user to specify one-way rela-tionships and chain-of-command relationships. While the formal network is gen-erally pre-specified at the start of the simulation, the informal network evolvesthrough interaction.The agents do not have access to full information about the network, but

every agent ak can only access a probability vector Pk = pki where pki is aprobability of agent ak communicating with all agents ai ∈ A. Hence, eachagent may only know who it may interact with or is close to, but does notknow the complete interaction patterns of other agents. Each agent possesses abelief matrix that it uses to store any information it learns about interrelation-ships of other agents within the network. However, this information is typicallyincomplete and inaccurate.

B.2 Agents in NetWatch

In keeping with cognitive science research, NetWatch agents representing hu-mans are both cognitively and socially constrained [47]. Thus, their decision-making ability, actions, and performance depend on their knowledge, structuralposition, procedures and abilities to manage and traverse these networks. Eachagent’s perception of the meta-matrix consists of the agent’s ego network (theset of agents it is directly connected to), its own knowledge, resources andtask assignments, and is augmented by the agent’s perception of other agents’ego networks, knowledge, resources and task assignments (Fig. B.1, 1), or, the

34

Page 41: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Communication Layer

Homophily RelativeExpertise

PrecedenceGraph

NeedsNetwork

MentalModel

AssignmentNetwork

KnowledgeNetwork

SocialNetwork Task Planning Behaviours

Hierarchical Decomposition Planner

Tas

kE

xecu

tio

n

Tas

kD

eleg

atio

n

Execution Monitor

Kn

ow

led

ge

See

kin

g

Res

ou

rce

See

kin

g

Strategic Layer(Bayesian Rules)

PrecedenceGraphTask

NeedsNetwork

MentalModel

Knowledge

AssignmentNetwork

KnowledgeNetwork

SocialNetwork

People

TaskKnowledgePeople

Self-Perception

Cla

ssif

icat

ion

Tas

ks

Attribute ContagionModel

Reporting/Experiment Instr��������� ��� � ��

Preceden

Needs Network

Mental Model

led

Assignment Network

Knowledge Network

Social Network

e

TaskKnowledge

People

Preceden

Needs Network

Mental Model

led

Assignment Network

Knowledge Network

Social Network

e

TaskKnowledge

People

1

2

3

4

5

6

7

8Perception of Others

People

Knowledge

Task

TaskKnowledgePeople

Precedence

NeedsNetwork

MentalModel

AssignmentNetwork

KnowledgeNetwork

SocialNetwork

Figure B.1: NetWatch Simulation Design

agent’s beliefs about them. An agent also tries to form beliefs about the net-works of other agents (Fig. B.1, 2).The agents implement a layered behavior model, inspired by Rodney Brooks’

subsumption architecture for robot control [10]. On the lower levels of controllie primitive communication behaviors (Fig. B.1, 3) , based on cognitive modelsof human communication.Intelligent task-directed behaviors are facilitated by a hierarchical decom-

position planner (Fig. B.1, 4), adapted for goal-directed interaction and dis-tributed task execution via delegation of subtasks. The planner is described insection B.6.Execution of distributed tasks is monitored by an independent execution

monitoring process (Fig. B.1, 6), which watches results of delegated tasks han-dles exceptions and triggers replanning in case of failure.Finally, production rules (Fig. B.1, 7) governs the agent’s strategic reasoning,

triggering tasks or high-level behaviors.Such a layered architecture allows a social modeler to isolate strategic and

tactical performance of the agents from their lower-level interaction, and buildexperiments where any one of the levels is manipulated.

B.3 Processes Governing Communication

The choice of a communication partner at every time period is based on twofactors: social proximity of the agents and their motivation to communicate.Social proximity is defined as closeness of a relationship between two agents,

35

Page 42: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

scaled between 0 and 1 where 0 means ”no relationship” and 1 is ”very closerelationship”.Motivation to communicate is computed on the basis of homophyly (relative

similarity) and need (relative expertise). Empirical studies of human communi-cation behavior suggest that, without any external motivation, individuals willspend about 60% of the time interacting on the basis of homophyly and 40% onthe basis of need.We defined homophyly to be based on a measure of relative similarity RS

between agent i and agent j: the amount of knowledge that i and j have incommon divided by the amount i shares with all other agents (including self),or

RSi,j =

∑K

k=0(Sik ∗ Sjk)

∑I

l=0

∑K

k=0(Sik ∗ Slk)

(B.2)

where Sik is 1 if agent i knows fact k and 0 otherwise.In contrast, we defined need from a purely knowledge perspective. Relative

expertise REij defined as how much agent i thinks j knows that i does not knowdivided by how much i thinks all others know that i does not know, or

REij =

∑K

k=0((1− Sik) ∗ Sjk)

∑I

l=0

∑K

k=0((1− Sik) ∗ Slk)

(B.3)

Agents operate on their beliefs about what the other agents know. Thus,their calculations can be inaccurate. However, as interaction progresses andagents learn more and more about each other, the accuracy of the agents’ per-ception of the world increases.

B.4 Inter-agent Knowledge Exchange

Tracing back its roots with the Construct [13] model, the NetWatch model is acognitive model focusing on knowledge manipulation and learning. Each agent’sknowledge is represented by a bit string. A value of 1 in the position n meansthat the agent knows fact n and the value of 0 means that it does not.Both homophyly and need for information, as used in the knowledge ex-

change protocol, are abstract measures and do not weigh facts in regards totheir importance to a task. More complex behaviors, such as task-directedinformation seeking, is accomplished using the planner (see section B.5).At the start of the simulation, the agents are endowed with some initial

knowledge (typically within 2%-10% range), distributed randomly between agentsor based on empirical profile of the organization.To learn new facts, the agents execute the Construct Knowledge Ex-

change Protocol. For ease of description, we shall refer to the parties inknowledge exchange as Alice (agent aa ∈ AG) and Bob (agent ab ∈ AG).

36

Page 43: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

1. Determine who to communicate with: Alice does this by evaluatingrelative similarity (Eqn. B.2) or relative expertise (Eqn. B.3) of every agentaccessible through its social network (i.e. pa,i > 0 ∀ai ∈ AG (Eqn. B.1)).Then, aa throws a dice that reflects the computed probability vector andpicks an agent to communicate with (e.g. ab).

2. Determine what to communicate: This is done by weighing informa-tion seeking vs. similarity-driven communication. If aa is in informationseeking mode, it chooses at random a part of the knowledge string that isnot known and queries the agent chosen in step 1. In similarity-based com-munication, aa chooses a part of the known knowledge string and sends itto that agent.

3. Determine proper response: On receipt of a query, ab determines if itshould answer by checking whether the sender of the query is part of itsnetwork. If yes, ab sends a reply - otherwise, he discards the message. Ifab does not know the facts requested, he may respond to aa with a nameof another agent (“Clare”) that may be better suited to answer the ques-tion, known as referential data. On receipt of knowledge, ab determinesif the knowledge is useful and whether it came from one of the agents inits network (and thus can be trusted), chooses some knowledge from itsknowledge base and send it in return.

4. Update internal knowledge base: On receipt of the reply, aa deter-mines the usefulness of the answer and uses that to update its internalknowledge of ab (“Bob knows fact n”) as well as its knowledge base (“Inow also know fact n”). If aa receives referential data, it uses that to up-date its knowledge of ab (“Bob does not know n” and “Bob knows Clare”)and ac (“Clare may know n”). This may be followed by a query to Clare(ac), which may or may not be answered, depending on the strategic po-sition of ac.

Clare may not have been originally a part of Alice’s network - but now throughBob, Alice has learned about her existence. Thus, agents within the organizationuse referential data about each other to form an informal network.

B.5 Planning and Execution of Complex Tasks

The agents use a hierarchical decomposition planner to execute complex tasksthat require coordination of knowledge and resources, as well as delegation ofsubtasks to other agents. In the meta-matrix representation, the agents pos-sess a definition of their hierarchical task structures, an acyclic directed graphspecifying the precedence of tasks. The skill requirements and the resourcerequirements are also specified in the meta-matrix.The planner starts with a top-level task and finds its subtasks, resource

and knowledge requirements. Then, the agent plans for each of the subtasks.

37

Page 44: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

T

TT

T T

K

R

K R

Agent3

T

T T

K

R

K

Agent2

Figure B.2: Planning in NetWatch Agents

If one of the subtasks requires knowledge or resource that the agent doesn’tpossess, the agent sends out request messages. The requests are handled by theConstruct Knowledge Exchange protocol (see section B.4).Similarly, if the agent does not have the ability to execute a task, a task

delegation message would be sent to an agent determined likely to execute thetask. The delegated knowledge or tasks are then passed to the execution monitorprocess. The execution monitor keeps track of the currently delegated tasks, andhandles failures and other exceptions.Performance in planning tasks is measured as (a) time that it takes, (b)

amount of re-work or replanning needed to complete the task, and (c) percentageof tasks that have failed.

B.6 Classification Tasks

One simple measure of organizational performance in NetWatch is based onthe binary classification task. The task is represented by a vector of binaryvalues. An agent can only access bits in the task vector that correspond tonon-zero values in the agent’s knowledge vector. The task is then decided by a“majority rule””. An agent’s decision accuracy is computed by taking a seriesof classification tasks and comparing the agent’s decisions to “true answers” -computed by applying a majority rule to each task given complete informationand access. Task performance is measured as a percentage of correctly decidedtasks.While appearing simplistic, performance in classification tasks have been

shown [39] to correspond to organizational performance in real cases, thus mak-ing classification tasks a suitable substitute for more complex tasks for purposesof simulation modelling.

38

Page 45: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Appendix C

Preliminary NetWatch Results

We have conducted a set of preliminary experiments using NetWatch simulationdescribed earlier. The experimental design builds upon a principle of Red Team-Blue Team military game simulation.The Red Team simulates a terrorist organization such as the one described

in appendix A.The Blue Team simulates a number of agencies engaged in anti-terrorism

activities. These agencies gather data using a number of signal intelligence andattempts to destabilize the Red Team using a variety of tactics.The experiments study the efficacy of signal intelligence on creating an ac-

curate representation of a terrorist network and overall effectiveness of destabi-lization measures upon the covert organization.

C.1 Red Team

Based on publicly available data, the following profile of the structure of covertnetworks has been derived [17]:

• The network consists of small cells (mean cell size of 6 members) withvery low interconnection between cells.

• Internally, the cells exhibit all-to-all communication patterns.

• There is a very low probability of two individuals communicating bychance (0.007).

• The probability of triad closure (link from x to y being more likely if bothx and y are linked to third party z) is 0.181.

• Senior members of each of the cells are often also parts of other cells andinteract with other senior members on the network.

• Cell leaders are more knowledgeable than other members.

• Cell members share an ideological doctrine but also specialized knowledge(i.e. bombmakers, drivers, operatives).

• Cells use information technologies and electronic communication.

39

Page 46: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Figure C.1: Red Team: A Cellular Covert Network

The aforementioned parameters form a statistical profile from which we cangenerate simulated organizational networks. The plot on figure C.1 shows acovert network generated using parameters specified above, based on the struc-ture of terrorist networks as shown by Valdis Krebs [38]. The agents in theRed Team network execute a set of tasks modelled on tactical descriptions (i.e.sequences of tasks, and their skill and resource requirements) of the terroristbombing of the U.S. embassy in Tanzania.

C.1.1 Blue Team

The Blue Team is an Anti-Terrorism organization consisting of a small numberof fully interconnected law enforcement agents.The goals of the Blue Team are:

• Learn the structure, task assignments and knowledge distribution of theRed Team.

• Remove or isolate Red Team members, aiming to maximally impair RedTeam’s performance.

The Blue Team has no access to the actual information about the Red Team.Its only source of information is a set of wiretaps on the communication networkof the Red Team. Based on the information it is able to collect, the Blue Teamtakes action to destabilize the Red Team by isolating agents on the Red Team.Further, like the Red Team, the initial meta-matrix for the Blue Team can beread from actual data or set up experimentally.

Wiretaps

A wiretap in NetWatch is an agent that selectively intercepts messages from themessage stream and routes them to one or more Blue Team members. The BlueTeam agent receives and records the origin and destination of the wiretappedmessage.

40

Page 47: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

If a wiretap could capture all communication, the Blue Team would be ableto create a full and accurate picture of the Red Team network. However, a wire-tap is not able to capture all relevant messages. Based on empirical informationon the use of wiretaps, the wiretap efficiency is set between 5% and 25%. Inthis paper, we study three different wiretap strategies:

• Random: any message in the message stream has an equal probabilityof being intercepted. Signal-to-noise ratio is an independent variable, andranges from 5% useful signal to 25% useful signal - thus allowing us tostudy the effect of signal-to-noise ratio on the quality of collected data.

• Snowball: Captures traffic originating from one agent, and sequentiallytargets every agent with which it communicated. This essentially is abreadth-first search of the network

• Socially Intelligent Traffic Analysis

As the Blue Team agents receive messages from the wiretap agents, theyuse their address information to build a representation of the network ofthe Red Team, or the Learned Network. Then, the agents analyze theperceived network and move the wiretap to an agent that is the highest inone or more network measures, such as degree centrality [26] and cognitiveload.

Cognitive load is a notion similar to the task load measure developed atNASA [31]. It measures the extent to which the person has to engage inmental activity to do the assigned tasks, defined as:

1. number of people person i interacts with / total number of people inthe group

2. number of tasks person i is assigned to / total number of tasks

3. sum of number of people who do the same tasks person i does / (totalnumber of tasks * total number of people)

The agents periodically reevaluate the target of wiretapping and switchif a better target is found or if the amount of new information about thecurrent target becomes too low.

Network Destabilization Tactics

In some of the experiments, the Blue Team collects information about the RedTeam and attempts to influence Red Team performance by attacking its vulner-ability points (e.g., by isolating or terminating key agents). We test a numberof strategies for finding key agents, including:

• Random: a base-line strategy; isolate one random individual from thenetwork.

41

Page 48: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

• Highest degree centrality: isolate one agent from the covert networkthat, in the data collected by the Blue Team, has the highest degreecentrality.

• Highest cognitive load: isolate one agent from the covert network that,in the data of the Blue Team, has the highest cognitive load.

C.1.2 Results

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0 10 20 30 40 50 60 70 80 90 100

random_0.1snowball

socint: cognitive loadsocint: degree

time periods

performance ofnetwork predictions

Figure C.2: Average Performance of Wiretapping Strategies

Following are some results from NetWatch. We include these results to il-lustrate the strength of this approach for addressing real world problems. Notethat these results are preliminary and will become more complete as additionalaspects of the situation are included such as technological capabilities and al-ternative task structures.For these results, we simulated 150 Red Team networks ranging in size from

100 to 400 agents. In each case we simulated the system for 100 time periods,which, in simulation time, is equivalent to approximately 25 days. The BlueTeam began with no knowledge of the Red Team. The Blue Team is acting likea single homogeneous unit where all actors or sub-agencies completely cooperate.Figure C.2 presents results indicating the average accuracy of the Blue

Team’s perception of the Red Team as it changes over time using differentwiretapping strategies. The performance of a strategy is measured as the prob-ability of correctly identifying the top actors within the group based on variousnetwork measures averaged across all networks. We see that a socially intelli-gent wiretap strategy enables the collection of a more accurate picture of theopponent. However, optimality of one socially intelligent strategy over anotherchanges over time.In further experiments we simulated a set of 150 networks allowing members

of the Blue Team to isolate one of the agents within the Red Team network.Each network was run 4 times using different destabilization strategies. Thefour strategies examined are:

42

Page 49: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

• no attacks on the Red Team.

• isolate a member of the Red Team at random.

• isolate the Red Team agent with the highest degree centrality.

• isolate the Red Team agent with the highest cognitive load.

wiretapping strategiesRandom Snowball Degree Cog.Load

attacks

Random -5.4% -13.4% -3.9% -18.3%Degree -21.2% -24.0% -21.5% -21.1%Cog.Load -5.7% -11.0% -13.5% -3.0%

Table C.1: Reduction in Organizational Performance of the Red Team due toAnti-Terrorist Activity

Table C.1 presents results indicating the change in performance for each ofthe analyzed strategies. The performance of the Red Team is measured as a ra-tio of numbers of successfully completed tasks to the number of assigned tasks.Each cell in this table shows the percentage difference in performance from the50 time periods prior to when the first agent is isolated and 50 time periodsafter the isolation. This shows the immediate impact of the various destabi-lization strategies. Note, in general, that any of these strategies does lead to aperformance reduction thus indicating that there has been some destabilization.Second, there is an interaction between the type of wire-tapping strategy andthe type of destabilization strategy.

C.2 Conclusions

These admittedly preliminary results show the potential power of multi-agentnetwork simulations for addressing real world issues. Agent-based simulationframeworks enable a more realistic and extendable architecture for addressingpolicy issues in a manner comparable to human behavior. Such work moves usfrom the realm of building agents to act more or less independently on behalfof people to the realm of using collections of agents to reason about how peopleas groups behave.The advantage of an agent approach is that it enables the simulated ac-

tors to behave like humans — they are cognitively and socially bounded withknowledge of themselves and others dependent on their personal history. Whensuch agents are embedded in dynamically evolving networks, the entire simu-lated system takes on the social and technological constraints consistent withempirical findings. The advantage of using AI planning, reasoning and decisionmaking techniques is that complex intelligent agents are extensible to multipletasks and scenarios.

43

Page 50: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Such models enable the researcher to examine the nature, not just of cog-nition but of social cognition, and to explore policy and managerial issues. Indoing so, the goal is not to ”predict” specific events, but to decrease uncertaintyin detecting trends. As such, these tools are valuable assets to decision makers.

44

Page 51: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Appendix D

DyNetML: Data Interchange for Object

Oriented Networks

This chapter specifies the DyNetML language for expressing and exchangingdata in the object-oriented social network semantics

D.1 Requirements for Data Interchange

In light of the problems outlined above, we proceed to define requirements for auniversal data interchange format that would ease the task of exchanging richsocial network data and improving compatibility of analysis and visualizationtools.

1. The data interchange format shall be contained in human-readable textfiles that are at the same time easily parsable by computers.

2. The data interchange format shall allow an entire dataset, complete withall computed measurements, to be stored in one file.

3. The data interchange format shall provide maximum expressive power toits users, allowing:

• Typed nodes (types may include “person”, “resource”, “organiza-tion”, “knowledge”, etc)

• Multiple sets of nodes of the same type (to express multiple unitswithin the company, etc)

• Multiple typed attributes per node

• Typed edges

• Multiple typed attributes per edge

• Multiple graphs (sets of edges) expressed within the same file

• Dynamic network data expressed in a single file

4. The data interchange format shall allow developers to extend it in a fashionthat will not break existing software.

5. The data interchange format shall be flexible enough to be used as bothinput and output of analysis tools.

45

Page 52: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

D.2 DyNetML: an XML-Derived Social Network

Language

To address the needs of data interchange, we have designed DyNetML: an XML-derived language for expressing rich social network data.

3

1 2

4

5

Figure D.1: Dynamic Networks in DyNetML

The following simple example illustrates use of DyNetML for representingsimple social network datasets (also illustrated on figure D.1):

<?xml version = "1.0" encoding = "UTF-8"?>

<DynamicNetwork xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation = "DyNetML.xsd">

<MetaMatrix>

<nodes>

<nodeset id = "actors" type = "agent">

<node id = "actor1"/>

<node id = "actor2"/>

<node id = "actor3"/>

<node id = "actor4"/>

<node id = "actor4"/>

<node id = "actor5"/>

</nodeset>

</nodes>

<networks>

<graph id = "social_network" sourceType = "agent"

targetType = "agent" isDirected = "true">

<edge source = "actor1" target = "actor2" type = "binary"/>

<edge source = "actor2" target = "actor3" type = "binary"/>

<edge source = "actor3" target = "actor4" type = "binary"/>

<edge source = "actor4" target = "actor1" type = "binary"/>

<edge source = "actor1" target = "actor5" type = "binary"/>

<edge source = "actor4" target = "actor5" type = "binary"/>

</graph>

</networks>

</MetaMatrix>

</DynamicNetwork>

D.2.1 DyNetML Format Overview

DyNetML represents dynamic network data as sets of time-slices. Each of thetime-slices is a descriptive snapshot of the organization at a given time.

46

Page 53: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Figure D.2: Dynamic Networks in DyNetML

Figure D.2 shows the top-level hierarchy of DyNetML files. A Dynam-icNetwork element is defined as a sequence of MetaMatrix elements, eachrepresenting a snapshot of the organization for one time period.Each of the MetaMatrix elements consists of:

• an optional TimePeriod attribute that allows clear identification of eachtimeslice.

• a set of properties and measures, representing data about the whole ofthe timeslice (see section D.5 for a complete definition).

• a nodes element, containing one or more nodesets (section D.3.1).

• a networks element, containing all networks in this timeslice (section D.4).

• an anthropac element that facilitates linking of network data to anthro-pological data.

D.3 Representing Multiple Node and Relation

Types

While designed predominantly for dealing with social network data, DyNetMLformat is shaped as a generalized graph data interchange framework.DyNetML represents graphs as sets of nodes nodes (vertices) and relation-

ships (edges) between them. The node specification allows for detailed specifi-cation of each vertex, as well as addition of rich data related to it.

D.3.1 Specifying Individuals and Nodes

Nodes are organized into nodesets, which should be thought about as logicalgroupings of nodes (by type, by affiliation, etc).

47

Page 54: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Each nodeset has to be identified with a unique id attribute (see figures D.3and D.4), and a type attribute. For more detailed description of node types,see section D.3.1.DyNetML allows for an arbitrary number of nodesets (and arbitrary number

of nodesets of each type) and an arbitrary number of nodes in each nodeset.

Figure D.3: Specification of Vertices in DyNetML

<node id="test"

title="test node"

prototype="test nodes">

<port name="in1" port_type="input"/>

<port name="out1" port_type="output"/>

<properties>

<property name="test_property" type="double" value="3.14"/>

</properties>

<measures>

<measure name="test_measure" type="string"

value="string"/>

</measures>

</node>

Figure D.4: Sample specification of a Vertex

A node specification consists of the following (see figure D.3):

• id: a unique ID note: it is advisable for ease of searching to use node IDsthat do not contain spaces or special characters and are limited in lengthto 32 characters.

• title: a human-readable title of the vertex (free of restrictions posed onnode ID field).

• prototype: an optional attribute specifying a subclass of a node. Nodeprototype can be used to specify additional details about the node.

• Element port allows the user to specify inflows and outflows of each nodeby allowing multiple connection points within each node.

48

Page 55: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

• Properties and Measures elements allow specification of arbitrary richdata for each node.

• Anthropac element provides a vehicle for connecting anthropologicaldata with social network data.

Ports and Multiple Connection Points

In order to implement multiple types of connections within the same graph, andto enable use of graphs as nodes of other graphs, we have implemented a systemof ports.A port can be viewed as a point where an edge attaches to a vertex. Thus,

a directed edge connecting a port specified as input to another node’s portspecified as output represents a resource or information flow across the edge.Since one can specify multiple input and output ports for every node, it is

possible to represent a number of distinct flows along every edge while main-taining clear separation between different types of links.A port is defined as follows (see figure D.3):

• attribute name specifies a port ID that is unique for this node

• attribute port type is a multiple choice, with possible values “input”,“output” and “general”

Node Types in DyNetML

DyNetML has been designed to assist the flow of information between softwaretools by not only enforcing a consistent structured format upon the data, butalso by specifying a constant vocabulary. Since the language has been designedin service of the Social Network Analysis community, we specify a set of standardnode types that could be used to express a majority of rich social network data.The standard node types are: agent, organization, knowledge, re-

source, task, location, and graph.While the plugin architecture of DyNetML allows developers to easily add

node types, we suggest that to ensure inter-operability of tools using DyNetMLone should refrain from expanding the vocabulary unless absolutely required.To provide a more fine-grained node type mechanism, we suggest using theprototype attribute of nodes to specify arbitrary subtypes.

D.4 Representing Relations in DyNetML

DyNetML format allows the user to specify multiple graphs within a singleframework, including graphs that share vertices with other graphs.An example for use of such system is the case where a number of individuals

are engaged in multiple relationship types - such as the formal network, informaladvice network, or familial ties network.Each graph is specified as follows: (see figure D.5)

49

Page 56: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

<graph id="TestGraph"

source="people"

sourceType="agent"

target="resources1"

targetType="resource"

isDirected="true">

<properties> ... </properties>

<measures> ... </measures>

<edge ...... />

.......

</graph>

Figure D.5: Specification of Graphs in DyNetML

• id attribute is the graph’s unique ID.

• source attribute specifies the nodeset from which the source nodes aretaken.

• sourceType attribute specifies the type of nodes contained in the sourcenodeset.

• target attribute specifies the nodeset from which the target nodes aretaken.

• targetType attribute specifies the type of nodes contained in the targetnodeset.

• isDirected attribute specifies whether the edges of this graph are di-rected; the attribute can only take values of “true” or “false”.

The graph then includes properties and edges elements (see D.5), followedby a set of edge elements that comprise the actual graph.

50

Page 57: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

<edge source="test1"

sourcePort="out1"

target="test2"

targetPort="in1"

type="double"

value="3.14"

name="testEdge">

<properties>...</properties>

<measures>...</measures>

</edge>

Figure D.6: Specification of Graphs in DyNetML

D.4.1 Edges

Edges (see figure D.6) of the graph include the following attributes:

• source and sourcePort attributes specify the source node and port thatan edge originates from. The source node should be a part of the node-set specified in the source attribute of the graph. source attribute isrequired; sourcePort is optional if no ports have been defined for thesource node.

• target and targetPort attributes specify the source node and port thatan edge connects to. The target node should be a part of the nodesetspecified in the target attribute of the graph. target attribute is required;targetPort is optional if no ports have been defined for the target node.

• type attribute is required for every edge. If the edge is unweighted, thetype attribute should be set to “binary”; other acceptable edge value typesare “double” and “string”

• value attribute specifies the edge weight or value; the type of the valueshould match the type specified in type attribute.

• name attribute is an optional string that allows the user to a add human-readable title to an edge.

51

Page 58: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

<properties>

<property name="test" type="double" value="3.14"/>

<property name="test2" type="string" value="test"/>

</properties>

<measures>

<measure name="test" type="double" value="3.14"/>

<measure name="test2" type="string" value="test"/>

</measure>

Figure D.7: Specification of Properties and Measures

D.5 Representing Graph, Node and Edge At-

tributes

One of the important facilities of DyNetML is its ability to attach rich data orattributes to every element of the structure.The rich data, specified as properties and measures, can be added to the

MetaMatrix, node, graph and edge objects.Properties andMeasures objects are syntactically similar (see figure D.7)

and consist of a set of name-value pairs. The main distinction between themis that Properties should be thought of as attributes inherent to the subject,such as information obtained from a questionnaire or otherwise known aboutthe subjects.Measures, on the other hand, are computed by analysis tools and inserted

into the dataset during processing.The guidelines for naming properties and measures are following:

• Names should be descriptive of the nature of data contained within.

• Measure names should include the name of the tool that generated them.

For example, the measure of Freeman centrality computed by NetStat toolshould look as:

<measure name="netstat_freeman_centrality" type="double"

value="3.14"/>

52

Page 59: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

D.6 Complex Social Networks in DyNetML

The basic use of DyNetML is specification of rich social network data, includingproperties and measures attached to objects within the network. DyNetML alsoallows for an arbitrary number of network superimposed upon each other, andspecification of network data over time.The example below is a heavily commented small dataset containing two

types of nodes (people and facts), and three networks (friendship, advice andknowledge).

D.6.1 Example

<?xml version = "1.0" encoding = "UTF-8"?>

<!DOCTYPE DynamicNetwork SYSTEM "DyNetML.dtd">

<DynamicNetwork>

<!-- Define the metamatrix... watch comments in

the XML for explanation of particular features of the format -->

<MetaMatrix timePeriod = "1">

<!-- A global measure on the entire metamatrix -->

<measures>

<measure name = "global" type = "double" value = "3.14"/>

</measures>

<!-- First, we specify the nodes -->

<nodes>

<!-- Nodes are broken up into nodesets by type (e.g. agent,

knowledge, resource, task, etc) -->

<nodeset id = "people" type = "agent">

<!-- This is the simple node with no extended data -->

<node id = "b"/>

<!-- This is a more complex node with properties and

attached measures -->

<node id = "a">

<!-- This is how to specify internal node properties -->

<properties>

<property name = "foo" type = "double" value = "3.14"/>

<property name = "bar" type = "double" value = "3.14"/>

</properties>

<!-- This is how to specify node-level measures -->

<measures>

<!-- Each measure is named and accompanied by

type (double|string|binary) -->

<measure name = "centrality" type = "double"

value = "3.14"/>

<measure name = "betweenness" type = "double"

value = "3.14"/>

</measures>

53

Page 60: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

</node>

</nodeset>

<!-- Another nodeset -->

<nodeset id = "facts" type = "knowledge">

<node id = "a1"/>

<node id = "a2" title = "boss"/>

</nodeset>

</nodes>

<!-- Now we specify the graphs that comprise the metamatrix -->

<networks>

<!--

NOTE: source and target of each edge should be a valid node;

however, it’s up to the software developer to ensure that, or

to check consistency in any code that imports this data

-->

<!-- A very simple graph -->

<graph id = "friendship" source="people" sourceType = "agent"

target="people" targetType = "agent">

<edge source = "a" target = "b" type = "binary"/>

<edge source = "b" target = "a" type = "binary"/>

</graph>

<!-- A graph with some graph-level measures -->

<graph id = "advice" source="people" sourceType = "agent"

target="people" targetType = "agent">

<measures>

<!-- Just like node-level measures; nothing new here -->

<measure name = "degree" type = "double" value = "3.14159"/>

<measure name = "foo" type = "double" value = "3.14159"/>

<measure name = "bar" type = "double" value = "3.14159"/>

</measures>

<edge source = "a" target = "b" type = "binary"/>

</graph>

<graph id = "knowledgeNetwork" isDirected = "true" source="people"

sourceType = "agent" source="facts" targetType = "knowledge">

<edge source = "a" target = "1" type = "string" value = "foobar"/>

<edge source = "b" target = "2" type = "double" value = "3.14159"/>

</graph>

</networks>

</MetaMatrix>

</DynamicNetwork>

54

Page 61: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Appendix E

Storage and Manipulation of Social

Structure Data

E.1 Storage Requirements for Social Structure

Data

Social structure data will be stored in a relational or object-oriented databasecapable of manipulating large quantities of data. The database shall have ad-vanced query capabilities (both SQL and procedural), as well as stored proce-dure and trigger capabilities.The structure of the database shall be defined in a way that preserves the

character and integrity of the data - i.e. aware of its underlying graph properties.The structure shall be designed in an extensible manner, allowing easy additionof new attributes, node and edge types.The database system shall not only keep track of the units of social structure

data (such as nodes and edges) but also sources of such, thus enabling creationof large-scale multi-source datasets while preserving the original data sources.The database shall have an easy-to-use web-based interface, allowing users to

enter, search and edit data as well as access manipulation and query capabilitiesdescribed below.

E.2 Database Design

The NetIntel database has been created using PostgreSQL database engine andlanguages of SQL (which provides pure query capabilities), PL-SQL (which isused to create on the fly and execute complex SQL queries), and C++ (whichserves as a means to import raw data from data gathering tools such as AutoMapand export data into analysis tools such as ORA).The database also sports a Web-based interface that allows easy navigation

and editing of large bodies of data, as well as some access to data manipulationand query tools.

E.2.1 Database Schema

The database schema is designed to preserve flexibility inherent in the sourcedata while enforcing some regularity upon the datasets.

55

Page 62: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Socia

lStructure

Sem

antic

s

Node

PK,FK1 nodeID

nodeType

Edge

PK edgeID

sourceID FK1 targetID

edgeType edgeStrength edgeConfidence

EdgeTypes

PK edgeType

sourceNodeType targetNodeType description

Agent

PK,FK1 nodeID

name ethnicity religion education gender maritalStatus passportDate dateOfBirth placeOfDeath dateOfDeath causeOfDeath criminalRecord description

Document

PK documentID

url filename text object type date

Node2Document

FK2 nodeID FK1 documentID

Organization

PK nodeID

name type description HQlocationID dateFounded religious_affiliation

Event

PK nodeID

name type description date purpose effect

Resource

PK nodeID

name type description

Location

PK nodeID

name type description coordinates

Edge2Document

FK1 edgeID FK2 documentID

Node table provides a

common interface to simplify entity to entity relation

and entity- document relation

A document is a place to store information about where the data came from; one can specify where information about any node

or edge came from

EdgeTypes table specifies appropriate types of relations for each pair of nodes (i.e. 2

people can be brothers, events can happen IN places, etc)

Task

PK nodeID

name type description purpose effect

Position

PK nodeID

name type description

Skill

PK,FK1 nodeID

name type description

FieldOption

PK fieldOptionID

tableID fieldID option

Users

PK login

password admin

FieldOption table contains lists of values to present to users (to

ease data entry)

Contains user authentication

information for website logins

FigureE.1:NetIn

telDatab

aseSchema

56

Page 63: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

2 tables, Node and Edge, compose the basic graph structure. Two separatetables contains a set of Node Types and Edge Types, thus making the graphstructure in the database semantically extendable.ADocument table stores data sources that contribute to the creation of the

database and links them through many-to-many relations to the graph entities(Nodes and Edges).A set of tables store domain-dependent data on each of the node types. These

tables are not static in the database schema, but rather created automatically atthe same time as a new Node Type, thus ensuring both flexibility and referentialintegrity of data.

E.2.2 Thesaurus

Due to the fact that data for the database comes from many disparate sourcesand includes many foreign names, alternative spellings of such names are in-evitable.The database uses a separate Thesaurus table to store alternative spellings

of names of entities. When an entity (Node or Edge) is inserted, queried orupdated, a Trigger Function checks spelling of the entity’s name or ID andmakes sure that the ID is spelled in a canonical way within the dataset.Unfortunately, the data populating the Thesaurus table had to be compiled

by hand. However, with a simple conversion tool, NetIntel can make use of the-sauri written for use with AutoMap and can therefore capitalize on the manualwork that was invested in their creation.

E.3 Data Manipulation

The data manipulation tools are closely coupled to the database system de-scribed above. The foremost requirement for the subsystem is the ability toextract subsets of the data based on:

• the source (or sources) of data (e.g. “Find all social structure data thatcame from New York Times” or “Find all data that came from New YorkTimes article from 10/10/2003”).

• attributes of nodes and edges (e.g. “What is the network of people whowere born in Syria?”).

The query tools enable the creation of time-slices from the complete datasetof any subset thereof if time-dependent data is present.The above queries are easily accomplished using SQL and the Document

tracking tables. This query has been implemented as a part of db2dynetmlexport program and is useable with a single command-line option.

57

Page 64: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

E.3.1 Subsets

The manipulation tools are designed to extract subsets of the network based ongraph-theoretic properties of the network such as graph distance (e.g. “Find allnodes at a graph distance of 2 or less from a given node”) and graph density(e.g. “Find all nodes embedded in subgraphs with given density”).SQL does not allow graph-theoretic computations as most of them require

recursion, which is expressly forbidden in SQL semantics. To implement graphtraversals within the database, I had to resort to writing recursive functions inPL-SQL (a procedural language shared between Oracle and a number of otherdatabase engines) which call SQL queries and build up recursive views of thedatabase.Current subsetting tools are written as stored procedures, and can be ac-

cessed through either the Web interface or through a command-line programmedb2dynetml.The manipulation tools allow easy completion of incomplete datasets (e.g.

“Given a set of people, find all organizations and resources connected to them”).To initiate dataset completion, the db2dynetml tool is launched with a DyNetMLfile containing the subject dataset. It then runs a set of graph-traversal expan-sions on the network and stores their results as a new network.

E.4 WWW Interface to NetIntel dataset

Figure E.2 shows the WWW interface for entry, editing and manipulation ofdata stored in the NetIntel database. The interface allows a user to enter newnodes and edges, search the database for occurrence of keywords and buildsubsets of data based on attribute values as well as graph-theoretic measures.In the future, the WWW interface will allow easy-to-use graphical controls

for building complex queries against the database, as well as easy import andexport of data.The WWW interface is written as a set of PHP scrips and runs on an Apache

server.

58

Page 65: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Figure E.2: Screenshot of the WWW Interface to NetIntel Database

59

Page 66: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Appendix F

Beyond MetaMatrix: Robust Reasoning

about Networks

F.1 Toward a Social Network Semantics

Traditional social network analysis operates on a simple set of concepts: nodes ofa social network are people or groups of people — generally a homogenous set —and links between nodes represent the existence of a connection in case of binarynetworks, frequency of communications, or closeness of relationships in real-valued networks. Thus, computing simple graph-theoretic measures upon theresultant graph produces interpretable results that allow detection of powerfulor important nodes, communication gatekeepers, etc.With the PCANS (and later MetaMatrix) models, there is a trend towards

expansion of network analysis to operate on a heterogeneous set of concepts.PCANS was originally designed to operate on concepts of People, Knowledgeand Tasks. Later, this model was expanded to include Resources and Organi-zations, and its expansion is continuing.However, with the expansion of the number of node types that take part in

the analysis, the concept of a node or an edge increasingly becomes overloadedwith a plethora of meanings. To wit: An edge between nodes A1 ∈ Agent

and A2 ∈ Agent preserves the original SNA meaning of “a connection exists”.Even at this point, a real-valued edge weight carries an ambiguous meaning(is it “strength of connection”, “distance”, or “frequency of communication”?).However, an edge between nodes A3 ∈ Agent and O1 ∈ Organization has anambiguous meaning of “agent is connected to organization” or “agent is a partof the organization”.Of even more importance is a semantic ambiguity of edge direction, i.e. -

does it mean the same thing if an Agent is connected to an Organization or anOrganization is connected to an Agent? Since the MetaMatrix contains manysubmatrices with heterogeneous node types, and traditionally only included anUpper Triangular portion of the matrix, the directionality of heterogeneousedges was either lost completely (as the lower triangular went ignored) or po-tentially misinterpreted (does reversal of an edge’s direction change the meaningof an edge?).We also cannot overlook the fact that some edges have different properties

than others. For example, subordination edges (“isSuperiorTo”) for people orinclusion edges (“isPartOf”) for organizations (as well as a number of others)

60

Page 67: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

are transitive - (i.e. the boss of my boss is also my boss). A number of otherspecial properties can be defined as well. Graph models to date do not offersufficient reasoning capabilities to resolve transitivity of edges, especially in acontext where nodes and edges of multiple types coexist.A further problem with the ever-expanding MetaMatrix is the fact that

the expansion process does not scale well. Let us suppose that a MetaMatrixincludes N types of nodes; expansion of the model to deal with N + 1 types ofnodes will require definition of meaning and measures upon (N + 1)2 − N2 =2N + 1 submatrices. I consider this trend to be counter-productive and proneto the creation of semantic holes - areas in datasets where data exists but noadequate explanation or analysis can be performed.The purpose of this chapter is to offer an alternative to graph-based models

of social networks that would offer formidable expressive power over heteroge-neous network models, enable automated reasoning and inference of networkproperties, while being at least partially backward-compatible with existingSNA models (thus allowing cross-validation with well-researched datasets).The proposed network semantics is rooted in object-oriented knowledge rep-

resentation, as well as frame-based reasoning and symbolic inference (chaining)algorithms.

F.2 Edge Semantics - More then just edge labels

Several attempts have been made to regularize the meaning of edges in socialnetwork. RELATIONSHIP[32] is an RDF[23] schema that defines a vocabularyfor describing social interactions and relationships between people. However,definition of a vocabulary falls short of rigorously specified social relationshipsemantics. While a vocabulary set can be negotiated and agreed to by a commu-nity of researchers, it will remain incomplete — the richness of human relation-ships presents more nuances than is possible to express in a finite vocabulary.However, a more serious complication of a purely vocabulary-based specifi-

cation of relationships is that a social network defined using this vocabulary ismerely a labelled graph. While such graphs are widely used to communicate re-lationship information to human users, it is not possible for computers to reasonabout such labelled graphs without an understanding of natural language.

F.3 Defining an Object-Oriented Network Se-

mantics

The Social Structure Semantics is defined as a LISP-based language for spec-ification of domain dependent, yet rigorously grounded, social structure data.The language is designed by borrowing design principles from Frame-Based rea-soning, Object-Oriented design and design of expert systems.

61

Page 68: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Node: <nodeID>

Node: <sourceID>

Node: <targetID>

edgeVerb: <edgeID>

(defnode <nodeID> [as <parent_nodeID>] [(with-edges

[(<edgeVerb> <connectedTo_nodeID)]* )] [(with-rules <rule definition>*)]

(defedge <edgeID> [as <parent_edgeID>] (from <sourceID>) (to <targetID>) [(with-rules <rule definition>*)]

Or

(defedge <edgeID> [as <parent_edgeID>] (from (defnode <sourceID> ….)) (to (defnode <targetID> ….) [(with-rules <rule definition>*)]

Figure F.1: Node and Edge Semantics

F.3.1 Nodes and Edges

The basic units of the social structure semantics are the graph-theoretic nodesand edges.A node is an entity representing an actor, information or a resource - some-

thing that in natural language would be described as a noun.An edge represents verbs, or actions that connect nodes into statements.Note that the LISP heritage of the Social Semantics Language allows for

nested definitions, thus allowing the code defining actors and relationships tobe put into a compact statement.Both nodes and edges can have attached attributes, using with-attributes

list and defined as lists of name-value pairs.Of special importance is the with-rules statement inside node and edge def-

initions. This statement allows specification of parameter ranges for attributes,as well as specification of inferential rules that ultimately compose the reasoningcomponent of the social structure semantics. Rule language is essentially LISPwith several additional constructs and is described in detail in section F.4Note that the basic semantics described in this section is sufficient for repre-

sentation of and reasoning on any datasets from the domain of traditional socialnetwork analysis. As the rule language is Turing-complete, any mathematicalcalculations and thus social network metrics can be implemented on this level.

F.3.2 Inheritance

The social structure semantics (SSS) would be overly simplistic if all it allowedfor was basic nodes and edges. However, allowing ad-hoc definitions of entitieswould compromise the rigor of underlying conceptual framework. Therefore, weresort to object-oriented paradigm to implement flexible yet rigorous constructderivation within the semantic structure.

62

Page 69: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Node: Agent

(defnode Person as Node (with-attributes

( type “animate”) ( name “”) ( gender “”) ( age 0))

(with-rules (defenum Agent.gender (“M”|”F”)) (defrange Agent.age (0 120))

) )

Agent: Joe

(defnode Joe as Person (with-attributes

( name “Joe”) ( gender “M”))

)

Agent: Bill

(defnode Bill as Person (with-attributes

( name “Bill”) ( gender “M”)) (age -5) ) < INVALID RANGE

)

Figure F.2: Node Inheritance

Node: Agent

(defedge speaksTo (from Agent ) (to Agent ) (with-attributes (frequency number 0)) (with-rules (defrange speaksTo:frequency (0 5))) )

Agent: Joe

Agent: Bill

speaksTo

(defedge ( genID ) as speaksTo //genID function generates a unique ID (from Joe ) (to Bill ) (with-attributes (frequency number 3)) )

Edge : speaksTo

Figure F.3: Edge Inheritance

63

Page 70: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Both node (figure F.2) and edge (figure F.3) inheritance operate in the samefashion as it does in the object-oriented languages of Java, C++ or CommonLisp Object System (CLOS). An object inherits all of its parent’s attributes,edges and rules - unless specifically overridden in the definition. In case ofmultiple inheritance, the newly defined child node inherits all properties of itsparents. Conflicts (i.e. if two or more parents specify an attribute) are resolvedby default by preference of a later parent (default inheritance can be overriddenwith special inheritance rules).However, there is one important difference between SSS inheritance and

standard object-oriented inheritance.SSS does not separate the notion of class from notion of an instance. Any

instance can thus inherit properties from one or more other instances. Thisallows the domain specification to operate with concrete (and therefore analyz-able) terms, which later are used as abstractions for creating datasets. Freemixing of abstract and concrete instances is also allowed.For example, let us define a terrorist as someone who trained in a terrorist

camp and is a member of a terrorist organization:(defnode terrorist as Agent

(with-edges

(defedge trainedIn (from this) (to (defnode terroristCamp asLocation)))

(defedge memberOf (from this) (to (defnode terroristOrganization as Organization)))

)

)

While ”terrorist” is still defined as an abstract concept, it can be operatedwith on a concrete level, essentially used as a stand-in for unknown individualsthat share its traits but may not yet have known names.This free mixing allows one to also specify a formal organizational chart of

a corporation using abstract nodes, and then fill the organization chart withconcrete names once they are known.The final advantage of blurring the line between abstract and concrete con-

cepts is that it enables the creation of evolutionary and heuristically drivenoptimization algorithms that operate directly on the social structure being ex-amined.

F.3.3 Derivation of MetaMatrix Semantics

Given the inheritance constructs described in the previous section, it becomesrelatively trivial to derive a domain specification corresponding to the MetaMa-trix semantics (and thus capable of replicating MetaMatrix analysis algorithmsand measures). A simple illustration of MetaMatrix semantics can be found onfigure F.4

F.4 Rules and Social Structure Inference

The key to the operation of Social Structure Semantics is the rule system. Eachconcept and entity in the environment is designed to carry a set of rules thatgovern its behaviour and the values that different components can take.A number of different rule types are possible:

64

Page 71: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Node

Node: Agent

Node: Knowledge

Node: Resource

Node: Task

knows

isConnectedTo

hasAccessTo

does

Figure F.4: MetaMatrix Semantics

Ranges and Enumerations

Ranges and enumerations specify limits of values that node attributes can take.Range statements only apply to numeric attributes, and enumeration statementsonly apply to strings:

(defrange [parameter name] ([min_value] [max_value]))

(defenum [parameter name] (value|value|value|value))

Entity Creation

When an entity creation rule fires, it defines a new entity in the system basedon the specification of the rule. Syntactically, entity creation rules are identicalto defnode and defedge statements.Figure F.5 illustrates an entity creation rule that builds up the concept of

transitive edge. A transitive edge is defined as follows:

if∃E(A,B)and∃E(B,C)then∃E(A,C)

When the defedge statement inside the rule fires, it examines whether theobject that the transitive edge refers to has other transitive edges emanatingfrom it and creates an extra edge from its source to the destination of thetransitive edge it found. To illustrate: if we have an office located in the TrumpTower, and the Trump Tower is located in New York, we can infer that the officeis located in New York.

65

Page 72: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

(defnode Location (with-attributes

( latitude number 0) ( longitude number 0))

)

(defedge transitiveEdge (from Node ) (to Node ) (with-rule (defedge (genid) (from this.from ) (to this.to.transitiveEdge.to ))) )

(defedge isIn as transitiveEdge (from Location ) (to Location ) )

Node: Location

Location: <id>

Location: <id>

isIn

Location: office

Location: Trump Tower

isIn

Location: New York

isIn

Inferred: isIn

Figure F.5: Network Inference

Conditional Rules

Conditional rules use the underlying LISP syntax to create complex statements.A LISP conditional expression is defined as follows:

(if (conditional_operator operand1 operand2)

(then_expression)

(else_expression)

)

or

(cond

((conditional_operator1 operand1 operand2) (then_expression1))

((conditional_operator2 operand1 operand2) (then_expression2))

(else_expression)

)

F.4.1 How the rule system works

Rule evaluation is based on the principle of “lazy evaluation”: a rule is only firedwhen an object is created or there is a need for the object to be re-evaluated.The need for re-evaluation is determined by the “dirty bit”. At the creation

of an object, all rules related to it are executed, and the dirty bit is set to “false”.If elsewhere in the system a rule is fired at some later time that changes someattribute of the object or attaches a new edge to the node, the dirty bit is setto “true”, and rules attached to the object fire.This results in forward propagation of inference, essentially filling in the

missing edges and parameter values.There is a number of unresolved issues in use of lazy evaluation for inference

of network properties. Most importantly, it is possible (and, indeed, simple) tocreate two rules that force the system into an infinite loop. At this point, the

66

Page 73: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Agent: Joe

Agent: Bill

Node: Event

Location: <id>

eventLocation

Node: Agent

Node: Event

participatedIn

Event: meeting

Location: New York

Bill met with Joe in New York

(defnode Event (with-edges

((defedge eventLocation (from Event ) (to Location )) Location ))

(with-attributes ( eventTime number 0)))

(defnode meeting as Event (with-edges

( eventLocation (defnode NewYork as Location ))) (with-attributes ( eventTime number 1)))

(defedge agentLocatedIn (from Agent ) (to Location )

(with-attributes ( time number 0)))

(defedge participatedIn (from Agent ) (to Event )

(with-rule (defedge (genID) as agentLocatedIn

(from this.from ) (to this.to.eventLocation.to ) (with-attributes ( time number this.to.eventTime )))

))

--------------------------------------------------------------------------------------------- (defnode Joe as Agent ) (defnode Bill as Agent ) (defedge (genID) as participatedIn (from Joe ) (to meeting )) (defedge (genID) as participatedIn (from Bill ) (to meeting ))

inferred

inferred

Figure F.6: Making Network Inferences: a simple example

language provides no protection against such loops, and it is unclear if it should.Some social systems oscillate, and terminating evaluation loops would preventmodelling of such systems.A plausible, but yet untested solution is to split the evaluation loops into

multiple time periods, and observe their progression over time. This, in con-junction with a number of rules governing assignment of timestamps to edges,would also allow modelling of dynamic systems without need for an externaldiscrete simulation mechanism.

F.4.2 Making Complex Statements in SSS

Figure F.6 illustrates how one may build a complete and complex statement inSSS and reason about its implication.Let us suppose that we have two actors, Bill and John, who are known to

have met in New York on one sunny day.What else does the knowledge of our domain tell us?A Meeting is a kind of Event which takes place at a particular Location

and at a particular time.We can also define an edge participatedIn that links an actor to an event.

Rules attached to the edge dictate that in order for an actor to participate inan event, he must be at the place of the event at the time of the event.Thus, saying that “Bill and Joe participated in a meeting that took place in

New York” implies also that both Bill and Joe were in New York at the time ofthe meeting. The rule system will infer that and create appropriate edges.

67

Page 74: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

Bill met Joe in New York , then they went to Afganistan and trained in a terrorist camp under direction of Bin Laden

Agent: Joe

Agent: Bill

Event: meeting

Location: New York

//Basic units that have not been yet defined (defnode Skill )

(defedge knows (from Agent ) (to Agent )) (defedge hasSkill (from Agent ) (to Skill ))

//Training inherits edge “eventLocation” and attribute “eventTime” from event (defnode Training as Event (with-edges

((defedge taughtBy (from Training ) (to Agent )) Agent ) ((defedge skillTaught (from Training ) (to Skill ))) Skill ))

)

(defedge trainedIn as participatedIn (from Agent ) (to Training)

(with-rules (defedge (genID) as knows (from this.from ) (to this.to.taughtBy.to )) (defedge (genID) as hasSkill (from this.from ) (to this.to.skillTaught.to ))

) )

(defnode terroristCamp as Location (with-edges

(isIn (defnode Afganistan as Location ))) )

(defnode trainingInAfganiTerroristCamp as Training (with-edges

( eventLocation terroristCamp ) ( skillTaught (defnode terroristOperations as Skill )) ( taughtBy (defnode BinLaden as Agent ))

) )

(defedge (genID) as trainedIn (from Joe) (to trainingInAfganiTerroristCamp))

(defedge (genID) as trainedIn (from Bill ) (to trainingInAfganiTerroristCamp))

Location: terroristCamp

Training : trainingInAfgani TerroristCamp

Location: Afganistan

isIn

eventLocation

Agent: BinLaden

taughtBy

Skill: terroristOper

ations

Figure F.7: Complex Inferential Statement

Now let us make the life of Bill and Joe a bit more interesting (see figure F.7).Let us suppose that right after the Meeting in New York, they Trained ina Terrorist Camp in Afghanistan under the direction of Bin Laden.Again, calling in the domain knowledge, let us defineTraining as aMeeting

where a Teacher transfers some knowledge to the Students. Furthermore, aTraining Camp is a location where such meetings occur.A set of inference rules can be also defined:

• If a person participated in training, afterwards he knows at least some ofthe skills that were taught there.

• If a person participated in training, he has met the teacher.

• Students and teachers share a special bond of respect and subordination.

Also, Training inherits all rules we have defined for a Meeting.Thus, from knowing that Bill and Joe participated in the training events,

we can infer that they have met Bin Laden, acquired some of his knowledge ofterrorist operations, have some respect for him as a teacher, and that they werein Afghanistan at a certain time.While these examples (both the facts and the rules) are pure conjecture, they

illustrate the power that Social Structure Semantics will have in the capablehands of a domain expert. Essentially, it will allow a domain expert to quantifyand regularize his knowledge of how subject organizations evolve and project itfurther in creating inferential simulations of specific data.

68

Page 75: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Bibliography

[1] D. Alberts, J. Garstka, and F. Stein, Network centric warfare: Developingand leveraging information superiority, CCRP Publication Series, 1999.

[2] H. Aldrich, Organizations evolving, Sage Publications, London, 1999.

[3] P. Lawrence J. Lorsch, Differentiation and integration in complex organi-zations, Administrative Science Quarterly (1967), no. 12, 1–47.

[4] Kathleen M. Carley Matthew Dombroski Maksim Tsvetovat JeffreyReminga Natasha Kamneva, Destabilizing dynamic covert networks, Pro-ceedings of the 8th International Command and Control Research andTechnology Symposium (2003).

[5] H.H.Baligh R.M. Burton B. Obel, Devising expert systems in organizationtheory: The organizational consultant., Organization, management, andexpert systems, vol. Michael Masuch (Ed.), pp. 35–57, Walter De Gruyter,Berlin, Germany, 1990.

[6] Michael Ashworth, Identifying key contributors to performance in organi-zations: The case for knowledge-based measures, CASOS Working Paper,2003.

[7] Michael Ashworth and Kathleen M. Carley, Identifying critical human cap-ital in organizations, CASOS Working Paper, 2002.

[8] N. Berry, The international islamic terrorist network, CDI TerrorismProject (2001).

[9] D. Brass, Being in the right place: A structural analysis of individual in-fluence in an organization, Administrative Science Quarterly 26 (1984),331–348.

[10] Rodney Brooks, A layered intelligent control system for a mobile robot,Proceedings of the Third International Symposium of Robotics Research,MIT Press, october 1985, p. 8.

[11] R. Burt, Structural holes: The structure of competition, Harvard UniversityPress, Cambridge, MA, 1992.

69

Page 76: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

[12] Kathleen M. Carley, On the evolution of social and organizational networks,Special Issue of Research in the Sociology of Organizations on Networks Inand Around Organizations 16 (1999), 3–30.

[13] , On the evolution of social and organizational networks., Researchin the Sociology of Organizations 16 (1999), no. special issue on NetworksIn and Around Organizations, 3–30.

[14] , Inhibiting adaptation, Proceedings of Command and Control Re-search and Technology Symposium (2002).

[15] , Smart agents and organizations of the future, The Handbook ofNew Media (Leah Lievrouw and Sonia Livingstone, eds.), Sage, ThousandOaks, CA, 2002, pp. 206–220.

[16] , Dynamic network analysis, ronald breiger kathleen m. carleyphilippa pattison, (eds.) ed., Dynamic Social Network Modeling and Anal-ysis: Workshop Summary and Papers, pp. 133–145, Comittee on HumanFactors National Research Council, 2003.

[17] Kathleen M. Carley, Ju-Sung Lee, and David Krackhardt, Destabilizingnetworks, Connections 24 (2001), no. 3, 31–34.

[18] Kathleen M. Carley and Jeff Reminga, Ora: Organization risk analyzer,Tech. Report Technical Report CMU-ISRI-04-101, Carnegie Mellon Uni-versity, School of Computer Science, Institute for Software Research Inter-national, 2004.

[19] Kathleen M. Carley and Yuquing Ren, Tradeoffs between performance andadaptability for c3i architectures, Proceedings of the 2000 InternationalSymposium on Command and Control Research and Technology (2001).

[20] Noshir S. Contractor and Peter R. Monge, Using multi-theoretical multi-level (mtml) models to study adversarial networks, Summary of the NRCworkshop on Social Network Modeling and Analysis (Ron Breiger andKathleen M. Carley, eds.), National Research Council, forthcoming.

[21] I2 Corporation, I2: Investigative analysis software,http://www.i2inc.com/.

[22] Kathleen M. Carley Craig Schreiber, Key personnel: Identification andassessment of turnover risk, 2004.

[23] Dan Brickley Eric Miller, Ralph Swick, Rdf: Resource description frame-work, http://www.w3.org/RDF/, 2004.

[24] Bonnie H. Erickson, Secret societies and social structure, Social Forces 60(1981), no. 1, 188–210.

[25] F.Lorrain and H.C. White, Structural equivalence of individuals in socialnetworks, Journal of Mathematical Sociology 1 (1971).

70

Page 77: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

[26] L. C. Freeman, Centrality in social networks: Conceptual clarication, SocialNetworks 1 (1979), 215–239.

[27] Rebecca Goolsby, Combating terrorist networks: An evolutionary approach,Proceedings of the 8th International Command and Control Research andTechnology Symposium, Conference held at National Defence War CollegeWashington DC, Evidence Based Research Vienna VA, 2003.

[28] Mark Granovetter, Strength of weak ties, a network theory revisited, Soci-ological Theory 1 (1983), 201–233.

[29] International Crisis Group, Indonesia backgrounder: How the jemaah is-lamiyaa terrorist network operates, Asia Paper (2002), no. 43.

[30] Rohan Gunaratna, Inside al quaeda: Global network of terror, New YorkUniversity Press, New York, NY, 2002.

[31] S.G. Hart and L. Staveland, Development of nasa-tlx (task load index):Results of empirical and theoretical research, Human mental workload, P.A.Hancock and N. Meshkati (Eds.), Amsterdam: Elsevier, 1988, pp. 139–183.

[32] Eric Vitiello Jr Ian Davis, Relationship: A vocabulary for describing rela-tionships between people, 2004, http://purl.org/vocab/relationship.

[33] IntelCenter.com, Mapping al-qaeda v1.0, www.intelcenter.com.

[34] E.T.Lewis K.M.Carley J. Diesner, Using automated text analysis to studyself-presentation strategies, 2001.

[35] David Krackhardt, Assessing the political landscape: Structure, cogni-tion, and power in organizations., Administrative Science Quarterly (1990),no. 35, 342–369.

[36] , The ties that torture: Simmelian tie analysis in organizations.,Research in the Sociology of Organizations 16 (1999), 183–210.

[37] David Krackhardt and Kathleen M. Carley, A pcans model of structure inorganizations, Proceedings of the 1998 International Symposium on Com-mand and Control Research and Technology (1998), 113–119.

[38] Valdis E. Krebs, Mapping networks of terrorist cells, Connections 24(2001), no. 3, 43–52.

[39] Zhiang Lin and Kathleen M. Carley, Designing stress resistant organiza-tions: Computational theorizing and crisis applications, Kluwer, Boston,MA, 2003.

[40] L.P.Gerlach and V.H.Hine, People, power, change: Movements of socialtransformation, Bobbs-Merrill, Indianapolis, IN, 1970.

71

Page 78: Abstract - Carnegie Mellon School of Computer Sciencemaksim/research/proposal2.pdf · 2004-07-15 · Robust, Scalable Object-Oriented Semantics for Reasoning and Simulating Social

Social Structure Semantics

[41] K.M. Carley M. Tsvetovat, Bouncing back: Recovery mechanisms of covertnetworks, 2003.

[42] Paul Magill, The crisis of branding and the theory needed to solve it, Pro-ceedings of Symposium on The Coevolution of Technology-Business Inno-vations (2003).

[43] Robert S. Renfro, Modelling and analysis of social networks, Ph.D. thesis,Department of Air Force, Air Force Institute of Technology, 2003.

[44] D. Ronfeldt and J. Arquilla, Networks, netwars and the fight for the future,First Monday 6, no. 10.

[45] C. Ruby, The definition of terrorism, Analyses of Social Issues and PublicPolicy (2002), 9–14.

[46] M. Sageman, Understanding terror networks, University of PennsylvaniaPress, 2004.

[47] H. Simon, A behavioral model of rational choice, Quarterly Journal of Eco-nomics 69 (1955), 99–118.

[48] Kathleen M. Carley David Krackhardt Stephen Borgatti, On the robustnessof centrality measures under conditions of imperfect data.

[49] Sun Tzu and S.Griffith(translator), The art of war, Oxford UniversityPress, Oxford, 6th cent. B.C. (translation 1963).

[50] Daniel M. Wegner, Transactive memory: A contemporary analysis of thegroup mind, Theories of group behavior, edited by B. Mullen and G. R.Goethals (1987), 185–208.

72