[ieee 2005 9th ifip/ieee international symposium on integrated network management, 2005. im 2005. -...

14
Real-time Views of Network Traffic using Decentralized Management K.-S. Lim and R. Stadler Dept. of Microelectronics and Information Technology (IMIT) KTH Royal Institute of Technology Stockholm Sweden {koonseng,stadler}@imit.kth.se Abstract The ability to create views of a network on a fast time scale becomes increasingly important as the complexity and diversity of networks increase. These views, which combine information from many distributed points in the network, can provide an administrator with a better understanding of the interdependencies and interactions between network elements and traffic conditions. Applications that could benefit from being able to compute such “near” real-time views of the network range from performance monitoring to fault management. In this paper, we present the architecture of a distributed management infrastructure that enables such views to be computed. Based on our earlier work on decentralized management, our architecture takes a novel database approach that combines the expressive power of SQL with distributed algorithms. We describe the implementation of the system on platform of embedded Linux devices attached to a network of routers. We provide specific examples of how the system can be used as a powerful distributed real- time monitoring platform. Finally, we derive a performance model of the system and validate it with a set of experiments. Keywords Distributed Management, Real-Time Monitoring, SQL 1. Introduction As networks grow larger and more dynamic, the ability to create network views in “near” real-time becomes increasingly important. Such views, which combine information from many distributed points in the network, allow interdependencies and interactions between network elements and network traffic to be better understood and analyzed. They also permit remedial actions, such as repairs and reconfigurations, to be initiated earlier, possibly before a problem deteriorates. Each computed view represents a “snapshot” of the distributed state of the network and a sequence of views generated on a fast time scale can be used capture and study transient phenomena, such as a route flap or service degradation. To compute such views, measurements must be taken from various points in a network, and statistical methods applied to infer state abstractions (we use here the terms state abstraction and view as synonyms). Since the quality of the inference depends on the 0-7803-9087-3/05/$20.00 ©2005 IEEE

Upload: r

Post on 07-Mar-2017

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [IEEE 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005. - Nice, France (15-19 May 2005)] 2005 9th IFIP/IEEE International Symposium on Integrated

Real-time Views of Network Traffic using Decentralized Management K.-S. Lim and R. Stadler Dept. of Microelectronics and Information Technology (IMIT) KTH Royal Institute of Technology Stockholm Sweden {koonseng,stadler}@imit.kth.se

Abstract

The ability to create views of a network on a fast time scale becomes increasingly important as the complexity and diversity of networks increase. These views, which combine information from many distributed points in the network, can provide an administrator with a better understanding of the interdependencies and interactions between network elements and traffic conditions. Applications that could benefit from being able to compute such “near” real-time views of the network range from performance monitoring to fault management. In this paper, we present the architecture of a distributed management infrastructure that enables such views to be computed. Based on our earlier work on decentralized management, our architecture takes a novel database approach that combines the expressive power of SQL with distributed algorithms. We describe the implementation of the system on platform of embedded Linux devices attached to a network of routers. We provide specific examples of how the system can be used as a powerful distributed real-time monitoring platform. Finally, we derive a performance model of the system and validate it with a set of experiments.

Keywords Distributed Management, Real-Time Monitoring, SQL

1. Introduction

As networks grow larger and more dynamic, the ability to create network views in “near” real-time becomes increasingly important. Such views, which combine information from many distributed points in the network, allow interdependencies and interactions between network elements and network traffic to be better understood and analyzed. They also permit remedial actions, such as repairs and reconfigurations, to be initiated earlier, possibly before a problem deteriorates. Each computed view represents a “snapshot” of the distributed state of the network and a sequence of views generated on a fast time scale can be used capture and study transient phenomena, such as a route flap or service degradation. To compute such views, measurements must be taken from various points in a network, and statistical methods applied to infer state abstractions (we use here the terms state abstraction and view as synonyms). Since the quality of the inference depends on the

0-7803-9087-3/05/$20.00 ©2005 IEEE

Page 2: [IEEE 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005. - Nice, France (15-19 May 2005)] 2005 9th IFIP/IEEE International Symposium on Integrated

number of measurements available, the ability to collect and process large amounts of data quickly and efficiently is the key to computing real-time views in the context of decentralized management. We envision a range of applications to take advantage of real-time management information. Not only network administrators, but also network mechanisms [1], such as routing protocols or schemes that reconfigure the system in response to changing load patterns or perceived threats, should be able to access this data. Such a requirement mandates a universal and flexible model of network information and an infrastructure that enables the rapid collection and processing of distributed information. In our research on designing decentralized management systems, we have developed a decentralized management paradigm called pattern-based management [9]. Based on software abstractions of distributed algorithms that can be programmed to perform specific management tasks, the approach elegantly allows the construction of distributed management applications while hiding much of the complexity. In [8] we describe a realization of the paradigm through the use of small Linux-based single board computers programmed to collect management and configuration information from routers in a distributed but coordinated manner. These devices, which we call Weaver Active Nodes (WANs), autonomously create a management overlay network at boot time which is subsequently used to control, coordinate and propagate distributed information among WANs to cooperatively perform a management task. In this paper, we describe a novel application of the platform called, the Weaver Query System (WQS) that allows a user to create global views of traffic flowing through the physical network in near real-time. Specifically, we make the following contributions. First, we show how complex traffic monitoring tasks can be formulated as simple Structured Query Language (SQL) [6] queries on a global database of network traffic data. Second, we describe an approach that allows this global database to be realized “virtually” over a collection of relational databases. Third, we present a general framework that allows SQL data aggregation and computation tasks to be implemented in a distributed manner. Lastly, we derive a performance model of the system and validate it empirically with experiments on our testbed. We begin in section 2 by describing the data model of the system. In section 3, we present an overview of the system architecture and detail the dynamics of the system. Section 4 provides a description of the syntax of the Weaver Query Language and its implementation on the platform. In Section 5, we derive a performance model of the system and describe experiments to validate it in section 6. Finally in section 7, we discuss pertinent issues with regards to the performance and behavior of the system and present a list of related work.

2. Using tables to define network views

In our system, each WAN is associated with a physical router from which it collects data about the network. This data is stored on a WAN in the form of tables. Each table contains records of identical structure, and contains data gathered by the WAN from its associated

Session Three Traffic Monitoring120

Page 3: [IEEE 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005. - Nice, France (15-19 May 2005)] 2005 9th IFIP/IEEE International Symposium on Integrated

router via an access protocol—SNMP or via a command line interface. Examples of the data collected by a WAN may be traffic statistics such as the utilization of a port on a router or configuration information about the router. All WANs share the same schema, meaning that the type and the structure of information they collect and store are identical. Each WAN also contains a Relational Database Management System (RDBMS) that is used to store and retrieve records from the tables. The WQS operates as a layer over this RDBMS, abstracting the distributed collection of tables in the network as a set of global virtual tables. The network administrator interacts with the system by formulating queries to extract or process information stored in this set of global virtual tables using a declarative query language called the Weaver Query Language (WQL).

Schema of Device table Field Name Description DeviceIp IP address of management interface

of the router NumInterfaces Number of physical interfaces on

the router Make Make of router Model Model number of router UpSince Time since router was last brought

online

Schema of Interface table Field Name Description InterfaceNum Number of the physical interface InterfaceAddress IP address of the interface InterfaceSubnet Subnet address of the interface InterfaceType Type of interface

(10/100/1000BT) InterfaceSpeed Line speed of the interface in bits

per second

Schema of System table

Field Name Description WANIp IP address of Weaver Active Node Memory Amount of RAM on node FreeDisk Amount of disk space available on

node UpSince Time since node was last brought

online

Schema of Flow table Field Name Description SrcIp IP address of source node DstIp IP address of destination node SrcPort Port on source node DstPort Port on destination node Application Type of application generating

the traffic ByteCount Number of bytes forwarded in

the current sampling interval PacketCount Number of packets forwarded in

the current sampling interval Protocol Protocol field value of packets Timestamp Time when record was received

from Router SamplingInterval Duration of sampling interval

Figure 1: Schema of local tables on WAN that are implemented on Weaver

Figure 1 shows the schema of four tables on a WAN in our current implementation. The System table contains a single record holding configuration information about the WAN. The Device table contains a single record holding configuration information of the router associated with the WAN. The Interface table contains a record for each interface on the router and the Flow table contains records of TCP/IP flows traversing through the router. In addition to these tables, MIB variables of the router are also accessible. This is achieved by emulating a local virtual table named “MIB” on each WAN which contains columns corresponding to every OID in the associated router’s MIB. For example, the column “MIB.1.3.6.1.2.1.1.5.0” refers to the value of the “sysName” MIB variable of

121Real-time Views of Network Traffic using Decentralized Management

Page 4: [IEEE 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005. - Nice, France (15-19 May 2005)] 2005 9th IFIP/IEEE International Symposium on Integrated

the router. Similarly, the column “MIB.1.3.6.1.2.1.1.5.0.SYNTAX” returns the scalar type of the “sysName” variable, which, in this case would be the string “OCTECT STRING”. Using the WQS, records from these local tables can filtered, combined or aggregated across the network to create global views of applications and services running on the network.

3. System Architecture

The WQS consists of a set of WANs which cooperatively create a virtual overlay network through which queries and results are transported. The query process begins when a network administrator sends a query to the management console (called the Weaver console), which in turn dispatches it to a WAN in the network called the start node where it is decomposed into a number of sub-queries. The WQS utilizes a navigation pattern known as the Echo pattern (see [5] and [9] for a more detailed discussion on navigation patterns) for coordinating the distribution of these sub-queries among WANs within the overlay network. Specifically, the two phases of the echo pattern – the expansion and the contraction phases are used to propagate sub-queries to nodes and to aggregate results from nodes respectively. Processing a sub-query on a WAN involves retrieving and returning the data stored in the local tables of the WAN or using the data to compute results. A query is completed when the last echo returns to the start node. At this point, the result of the query is sent back to the console which forwards it to the administrator for display. Our WQS testbed currently consists of 16 Cisco 2621 modular routers and 16 WANs interconnected via a Cisco Catalyst 2900 fast Ethernet switch. Each router is equipped with 2 fast Ethernet ports one of which is connected directly to a WAN. Static routes have been set up from each router to the fast Ethernet switch, so that all WANs are able to communicate with each other. Each WAN is an Intel Xscale based Linux server with 64MB of SDRAM and a 100Mbps Ethernet interface. Each WAN runs the MySQL 4.1[13] RDBMS as well as the Apache [7] HTTP server.

4. The Query Language WQL

Queries submitted to WQS are expressed as statements in WQL, which is based on SQL. WQL includes extensions to SQL, such as the definition of the start node, which relate to the distributed nature of the network database and the real-time quality of network information. The syntax of a WQL query is given as follows: SELECT <columns> FROM <tables> [ ON <startnode> [ FOR <hops> ]] [ WHERE <conditions> ] [ GROUP BY <groups> [ HAVING <having> ]] [ ORDER BY <ordering> ] [ LIMIT <limit> ]

Session Three Traffic Monitoring122

Page 5: [IEEE 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005. - Nice, France (15-19 May 2005)] 2005 9th IFIP/IEEE International Symposium on Integrated

Here, <columns>,<groups> and <ordering> refer to columns of the virtual global tables (see below) or to operations on these columns, <tables> refers to the names of virtual global tables, <startnode> refers to the IP address of the start node of the query, <hops> restricts the execution of the query to a specific distance from the start node, <conditions> and <having> refer to boolean expressions that filter the rows returned by the query and <limit> specifies the maximum number of rows returned. A WQL query is executed against a set of global virtual tables, which consist of all data records that make up the local tables stored on the WANs. For each type of local table, there is a corresponding virtual global table with the same structure. Consequently, our current implementation includes five global virtual tables; namely, the Device, Interface, System and Flow tables as well as the MIB table which contains every MIB variable in routers associated with WANs. WQS translates each global query into three SQL sub-queries, which are executed on the WANs against their local tables. This translation process is explained in section 4.1. In order to demonstrate the flexibility and expressive power of WQL, we present the following three example queries based on the schema described in Figure 1, together with possible returned results. Query A: List the heaviest flows currently in the network Query A lists addresses and ports of the top 3 IP flows in the network, ordered by their bit rate in the three minute interval between 2004-04-18 05:23:00 and 2004-04-18 05:26:00 UTC. SELECT MAX((ByteCount*8)/SamplingInterval) as BitRate, SrcIp, DstIp,

DstPort

FROM Flow

GROUP BY SrcIp, DstIp, DstPort

WHERE Timestamp <= “2004-04-18 05:26:00” and Timestamp >=

“2004-04-18 05:23:00”

ORDER BY BitRate DESCENDING

LIMIT 3

Result: BitRate SrcIp DstIp DstPort 1245232.34 192.168.1.45 192.168.2.27 1400 1212442.22 192.168.2.56 192.168.3.42 5000 1022451.78 192.168.3.17 192.168.51.24 138

Query B: Find the gateway with most FTP traffic currently Query B identifies the subnet that generates the highest volume of FTP traffic through a single gateway during over the last 5 seconds. The result of the query identifies that gateway and provides the volume of the subnet’s traffic through the gateway router. The function SUBNET() takes two arguments, the first an IP address and the second a subnet address, and returns true, if the given address is in the specified subnet. SELECT DeviceIp, InterfaceSubnet, SUM(ByteCount) as Volume

FROM Device, Interface, Flow

WHERE SUBNET(SrcIp, InterfaceSubnet) = true and Application = “FTP”

123Real-time Views of Network Traffic using Decentralized Management

Page 6: [IEEE 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005. - Nice, France (15-19 May 2005)] 2005 9th IFIP/IEEE International Symposium on Integrated

and Timestamp <= “2004-04-18 05:23:05” and Timestamp >=

“2004-04-18 05:23:00”

ORDER BY Volume

LIMIT 1

Result: DeviceIp InterfaceSubnet Volume 192.168.53.4 192.168.53.0 1452525

Query C: Identify all flows currently traversing two given routers Query C identifies all flows traversing the two routers 192.168.1.1 and 192.168.4.1 during the past 5 seconds. Here, the aggregate function SET_CONCAT()coalesces all records in a group into a single record by concatenating the value of each record and removing duplicates. The function STRSTR() takes two strings and returns true, if the second string is found within the first. Note that the column PathSet returned by the query provides an unordered set of the addresses of routers traversed by each flow. SELECT SrcIp, DstIp, SrcPort, DstPort, SET_CONCAT(DeviceIp)

as PathSet

FROM Flow, Device

WHERE Timestamp <= “2004-04-18 05:23:05” and Timestamp >=

“2004-04-18 05:23:00”

GROUP BY SrcIp, DstIp, SrcPort, DstPort

HAVING STRSTR(PathSet,”192.168.1.1”) and

STRSTR(PathSet, “192.168.4.1”)

Result: SrcIp DstIp SrcPort DstPort PathSet 192.168.1.24 192.168.4.47 21 1452 192.168.1.1

192.168.2.1 192.168.4.1

192.168.21.24 192.168.6.21 80 5523 192.168.21.1 192.168.1.1 192.168.2.1 192.168.6.1

4.1 Translating WQL queries to SQL sub-queries

This subsection describes how WQS processes a global query. We begin by first outlining the process and then elaborating it with an example. The process is tightly coupled with the state machine of the echo pattern and utilizes it to accomplish two key tasks – data transport and incremental aggregation. In the former, the explorer messages are used to propagate SQL sub-queries while the echo messages are used to carry back the results. In the latter, the echo pattern state machine is used to trigger incremental aggregation when the results are returned from a WAN’s neighbors. The process begins when a WQL query G is submitted to the start node defined in the query. The start node translates G into three SQL sub-queries S1, S2, and S3. S1 and S2 are propagated via explorer messages to all WANs in the system. On each WAN, including the start node, S1 is executed against the local database to yield a temporary local table named TEMP_TABLE. If a WAN is a leaf node in the execution tree, the records

Session Three Traffic Monitoring124

Page 7: [IEEE 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005. - Nice, France (15-19 May 2005)] 2005 9th IFIP/IEEE International Symposium on Integrated

in the WAN’s TEMP_TABLE is carried back to its parent via echo messages and TEMP_TABLE is deleted. Otherwise, the records carried in each echo message received from its neighbors are appended to its local TEMP_TABLE. When the last echo message is received on the WAN, S2 is executed against its TEMP_TABLE and the result returned via an echo message to the WAN’s parent. Its local TEMP_TABLE is then deleted. Finally, when all echoes have returned to the start node, S3 is executed against the node’s TEMP_TABLE, producing the results for the global query G. The mapping from global query G to sub-queries S1, S2, and S3 is explained choosing query C for G: G: SELECT SrcIp, DstIp, SrcPort, DstPort, SET_CONCAT(DeviceIp)

as PathSet

FROM Flow, Device WHERE Timestamp <= “2004-04-18 05:23:05” and

Timestamp >= “2004-04-18 05:23:00”

GROUP BY SrcIp, DstIp, SrcPort, DstPort

HAVING STRSTR(PathSet,”192.168.1.1”) and

STRSTR(PathSet, “192.168.4.1”)

G is mapped into S1 as follows: G is pre-pended with a statement that creates a temporary table TEMP_TABLE and the “HAVING” clause in G is deleted. The “HAVING” clause in G is deleted because group-level filtering, which is specified by the clause, can only be performed at the start node after all results have returned. S1: CREATE TEMP_TABLE SELECT SrcIp, DstIp, SrcPort, DstPort,

SET_CONCAT(DeviceIp) as PathSet

FROM Flow, Device

WHERE Timestamp <= “2004-04-18 05:23:05” and

Timestamp >= “2004-04-18 05:23:00”

GROUP BY SrcIp, DstIp, SrcPort, DstPort

G is mapped into S2 as follows: The tables specified in the “FROM” clause are replaced with TEMP_TABLE, the “WHERE” and “HAVING” clauses in G are deleted and all aggregate functions (i.e. SET_CONCAT()in this case) in the “SELECT” clause are re-applied to their corresponding columns (i.e. PathSet in this case), in TEMP_TABLE. The “WHERE” clause is dropped because record filtering has already been performed by sub-query S1 while the “HAVING” clause is omitted for the same reason as in sub-query S1. S2: SELECT SrcIp, DstIp, SrcPort, DstPort, SET_CONCAT(PathSet)

FROM TEMP_TABLE

GROUP BY SrcIp, DstIp, SrcPort, DstPort

G is mapped into S3 as follows: The tables specified in the “FROM” clause are replaced with TEMP_TABLE, the “WHERE” clause is deleted and all aggregate functions specified in the “SELECT” clause are replaced by column names. The replacement of aggregate functions with column names is necessary because aggregation has been completed by S2. The primary purpose of S3 is to filter out groups of records which do

125Real-time Views of Network Traffic using Decentralized Management

Page 8: [IEEE 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005. - Nice, France (15-19 May 2005)] 2005 9th IFIP/IEEE International Symposium on Integrated

not satisfy the conditions specified in the “HAVING” clause in G. S3: SELECT SrcIp, DstIp, SrcPort, DstPort, PathSet

FROM TEMP_TABLE

GROUP BY SrcIp, DstIp, SrcPort, DstPort

HAVING STRSTR(PathSet,”192.168.1.54”) and

STRSTR(PathSet, “192.168.4.24”)

Note that S3 is not executed on the start node if G does not include the “HAVING” clause, since its purpose is to eliminate extraneous groups.

4.2 Aggregating results in a distributed manner

As seen previously, WQS distributes sub-queries and aggregates results in an asynchronous manner controlled by the state machine of the echo pattern. In general, the order in which sub-query S2 is executed among WANs is not deterministic. However, operations that are both commutative and associative do not depend on the ordering of their operands. WQS exploits this property by allowing these operations to be computed incrementally on each WAN driven by the state machine of the echo pattern. The reader is referred to [10] where a formal proof of the validity of this approach is established. As an example, the SET_CONCAT() function used to illustrate sub-query S2 in the previous section, is a commutative and associative string aggregate function that can be computed incrementally at each WAN. In this case, an echo carrying a list of IP addresses from a WAN’s neighbors is simply concatenated with the list from the previous echo. A number SQL numerical aggregate functions, such as MIN(), MAX(), SUM() have also these properties. However, other functions, such as the SQL AVG() aggregate function, do not. In such cases, distinct “distributed” versions of these functions have will have to be implemented that support the aggregation of partial results in an incremental distributed manner. Fortunately, most commercial RDBMS support the development of these user defined aggregate functions. For example, in our implementation which uses MySQL 4.1, only three aggregate functions (AVG(), STDDEV() and VARIANCE()) had to be re-implemented to support distributed WQL queries.

5. Estimating the execution times of WQL queries

The performance of the WQS is contingent upon on the performance of its different system components. As described above, WQS relies on the echo pattern to distribute sub queries and to transport partial results between WANs. The performance profile of the echo pattern thus affects in a significant way the performance metrics of a WQL query, such as its completion time. Furthermore, since the translated SQL sub queries are executed by the relational RDBMS on each WAN, the performance of these local queries has a direct impact on the performance of the global query. We have used two performance measures to characterize the echo pattern [9] – time

Session Three Traffic Monitoring126

Page 9: [IEEE 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005. - Nice, France (15-19 May 2005)] 2005 9th IFIP/IEEE International Symposium on Integrated

complexity, i.e., execution time, and message complexity, i.e., the number of messages transported. An asymptotic analysis of both measures was presented in [9]. Subsequently, in [8] we presented a performance model for the execution time of the echo pattern on the Weaver platform. Since the message complexity is the same for all WQL queries (equal to twice the number of links in the management overlay network), it is of less interest as performance measure for a given overlay topology. Therefore we focus in this paper on modeling the completion time of a WQL query. This is an important measure, since it determines the responsiveness of the system and its suitability for real-time monitoring tasks. As was established in our earlier work, the execution of the echo pattern on the management overlay network creates a spanning tree over which results of the query are returned. The completion time of such a system, thus depends on the height of the tree and the execution time of queries on each WAN. This latter quantity in turn, is dependent on the execution time of the local SQL sub queries and the transmission time to transport results between different levels of the tree. In general, the completion time for a local SQL query on a WAN depends on the way in which the query is processed, the amount of data that is processed in order complete the query and the performance of the local RDBMS. Consider a management overlay network of n identical WANs connected with via identical links. Let ic , ni …1= denote the connectivity of the node i, d denote the length of the longest path between the start node and every other node. Furthermore, let

iT1 and iT2 denote the time taken for a node i to perform and execute the sub-queries S1 and S2. Let qt denote the time required to transmit the WQL query to an adjacent

node and rit denote the time required to transmit the results back to the parent of node i. Finally, let nt denote the time required for a message to transit between two adjacent nodes in the network. We assume that the topology of the management overlay is stable and WANs do not fail in the midst of a distributed query. Let c denote the average value of ic along d. Then if we represent the average

values of iT1 , iT2 and rit by 1T , 2T and rt respectively, the average completion time of a query on the network is given by,

( ) ( )12 21 −++++= dTttTtcdC nrqtime (1)

Consider iT1 , the time required to complete the execution of the sub-query S1 on node i. This time is proportional to the number of disk accesses required to process the query, the average seek time of the disk and processing power of the WAN. The first quantity, in turn, is a function of the physical organization of the database as well as the number of records accessed in the process of executing the query. Hence,

ii lsT α=1 (2) where is denote the number of records accessed on node i in processing sub-query S1, l is the length of a record in bytes and α is some constant. If the number of records returned by node i in processing sub-query S1 and S2 is denoted by iR1 and iR2

127Real-time Views of Network Traffic using Decentralized Management

Page 10: [IEEE 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005. - Nice, France (15-19 May 2005)] 2005 9th IFIP/IEEE International Symposium on Integrated

respectively, then

( )

+= ∑

=ijparentjii RRlT 212 β (3)

where β is a constant and the summation jR2 is over all nodes, j, which are

children of node i. Since the operations performed in sub-query S2 is identical to that in S1, αβ = . Also, since rit , the amount of time required to transmit the result of a query is proportional to the number of bytes returned by the query,

iri lRt 2γ= (4)

whereγ is some constant. Note that isR ii ∀≤ ,1 and hence( )∑

=

+≤ijparent

jii RRR 212 .

Therefore, if we denote the average values of is , iR1 and iR2 by s , 1R and 2R respectively,

( ) 22121 ,, RltRRlTslT r γαα =+== (5) Queries that include a LIMIT clause place an upper bound on iR1 and iR2 . Thus, if U is the maximum number of rows restricted by the LIMIT clause, URUR ii ≤≤ 21 , and,

( ) lUtlUnUnUnlT r γαα ≤=+≤ ,22 (6)

In summary, for a query with a limit of U rows, ( ) ( )122 −++++≤ dlUtlUsltcdC nqtime αγα (7)

The following are several implications of equation (7). First, the completion time of a global query is independent of the number of nodes in the system but is instead dependent upon, d, the longest path from the start node to every other node (for any topology, d is bounded by the diameter of the network). We can thus expect WQS to perform well in large networks in contrast to a centralized system, where execution times typically scale proportional to the network size. Consequently, this means than a more highly connected network will yield short query completion times. However, high connectivity alone is not the answer, because as we established in [9], the message complexity of the system also increases proportionally with connectivity. In general therefore, a topology where the degree of connectivity of most nodes are low, with the exception for a small handful whose high connectivity provide a short path between any two nodes, yields short execution times with only modest message complexity. The constants α and γ are dependent upon the CPU performance and network interface speeds of a WAN respectively. In fact, for a given CPU, α/1 denotes the number of bytes of data that can be processed from the local RDBMS per second while

γ/1 denotes the average throughput of a WAN when transmitting query results. A ratio of αγ / can thus be used to compare the relative contribution of these terms to the completion time of query. For example, if 1/ <αγ then it follows that the CPU is a bottleneck and a faster processor would speed up query execution. Conversely, if

Session Three Traffic Monitoring128

Page 11: [IEEE 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005. - Nice, France (15-19 May 2005)] 2005 9th IFIP/IEEE International Symposium on Integrated

1/ >αγ then a faster interface would improve system performance. In our prototype, the ratio a/γ is about 0.3 (see Section 6), indicating that the processors on the WANs are the primary bottleneck of the system. If one is only interested in using WQL to determine the supremum of a performance measure (e.g. the heaviest flow in the system), equation (7) becomes an equality with U=1. In our prototype, s is typically 1000 records, d is in the order to tens of nodes, while qt and nt are on the average, approximately 50 and 400 microseconds

respectively. Thus, for supermum queries that return a single record (U=1), the most significant term in determining the completion time of a query is given by the product sl ⋅⋅α .

6. Validating the performance model on Weaver

A series of experiments were conducted on the WQS testbed to validate the performance model presented. First, the assumptions pertaining to the linearity of sub-query execution times (as modeled by equation (2)) and transmission times (as modeled by equation (4)) were tested through two sets of experiments. Subsequently, the values of α and γ derived from these two sets of experiments were used to predict the total execution time of a global query. Finally, two further sets of experiments were performed to measure this execution time experimentally and to determine the quality of fit of the model. We detail the experiments in the following paragraphs. Note that the aim of these experiments is to verify the accuracy of the performance model, i.e., whether the model captures all significant sources of time delays. Our intention with this model is not to predict the performance of WQS in a real network environment, which is the subject of future work. Since the model does not take into account interferences, such as cross traffic and other concurrent queries, our experiments do not include such effects. In the first set of experiments, the average completion time of sub-query S1 on a WAN was measured. Each trial involved loading the database on the WAN with a varying number of records (from 100 to 500) and then measuring execution time of query C. Figure 2 (a) plots the results of the experiment. Note that each point of the plot represents the average value computed from 100 repeated trials. Each record in the database had a length of 212 bytes. Noting the linearity of the plot agrees well with equation (2), we determine the value of α to be 3.812x10-7. In the second series of experiments, code on a WAN was instrumented to measure the time required to transmit query results and a series of WQL queries requesting 100 to 500 records of 212 bytes each was issued to the WAN. Each experiment was repeated 100 times and the average value compared against that predicted by equation (4). The value of γ was then determined to be 1.274x10-7. In Figure 2 (b), we present a plot that compares the completion time of a global WQL as predicted by equation of (7) with that obtained from two sets of experiments. In each set of experiments, the WANs were connected together in the topology of a chain

129Real-time Views of Network Traffic using Decentralized Management

Page 12: [IEEE 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005. - Nice, France (15-19 May 2005)] 2005 9th IFIP/IEEE International Symposium on Integrated

and WQL queries issued from the node at one end of the chain. Since each WAN has only two neighbors, the value of d is given by the length of the chain which also corresponds to the diameter of the network. In both sets of experiments, the queries were chosen to yield an upper bound on completion times by accessing and returning all records on each WAN. For the first set of experiments, each WAN was loaded with 50 records and 100 repetitions of the query “SELECT * FROM Flow LIMIT 50” were executed (i.e.

50== Us ). The average completion time of these repetitions was noted before an additional WAN was added to the topology and the experiment repeated. The topology of the system was varied from two nodes to 15 nodes (d = 1 to 14). In the second set of experiments, each WAN was loaded with 250 records and 100 repetitions of the query “SELECT * FROM Flow LIMIT 250” were executed (i.e. 250== Us ). As before, the average completion time of these repetitions was noted before an additional WAN was added to the topology and the experiment repeated. The solid lines represented completion time of the query as calculated from equation (7) for s=50 and s=250, while the dotted lines represent measurements of mean completion times from the two sets of experiments. As can be seen from the plot, the performance model yields a close fit with the experimental results (an error of at most 30 ms is noted in the worst case).

Figure 2: Comparison of measurements with equations (4) and (7)

7. Discussion and Related Work

Compared to a centralized or strictly hierarchical architecture, as found in traditional management approaches, WQS offers potentially significant benefits. First, the capability to increase the capacity of the system by adding more WANs leads to an adaptable and scalable monitoring system. Since aggregation is performed by WANs in the overlay, management stations will be less loaded than in a centralized system. In

(a)Average completion time of sub-query S1,

0

0.01

0.02

0.03

0.04

0.05

100 150 200 250 300 350 400 450 500Number of records accessed, s

Tim

e in

sec

onds

1T (b)Measured vs Predicted Completion Times, C time

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 10 12 14 16

Diameter of network, d

Tim

e in

sec

onds

s=250 Predicted s=250 Measured

s=50 Predicted s=50 Measured

Session Three Traffic Monitoring130

Page 13: [IEEE 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005. - Nice, France (15-19 May 2005)] 2005 9th IFIP/IEEE International Symposium on Integrated

large networks, we expect fast execution times for queries, since our system scales linearly with the network diameter, as opposed to a centralized system, where execution times typically scale proportional to the network size. Also, the decentralized manner in which aggregation is performed allows for a balanced load across links and WANs. Finally, our architecture can lead to a more robust system, since every WAN performs identical functions and, consequently, there is no single point of failure. Regarding execution of the sub-queries S1 and S2 of a global query G, different schemes than the one given in section 4.1 are possible. For example, instead of waiting for all the data from child nodes to be available before executing sub-query S2 once, S2 could be executed multiple-times, once for each child. Although this scheme would, in principle, require the same amount of execution time as the one presented in Section 4.1, in reality, the cost of executing multiple SQL queries is likely to impose a higher overhead. This is, in part, due to the need for the local RDBMS to parse, process and optimize the query on each invocation. In most situations, when specifying a query, one is only interested in retrieving a limited ordered set of records (e.g. the top 10 heaviest flows). The “LIMIT” clause used in such queries allows non-qualifying records to be filtered at each stage of aggregation. This places an upper limit on the execution time of a query regardless of the total number of records in the system. In contrast, queries that include a “HAVING” clause require records from every group to be transported to the start node before non-qualifying groups can be filtered out (by sub-query S3). This can impose a significantly large overhead on the system bounded only by the total number of records in the system. The upside is an increased expressiveness in query specification brought about by being able to filter out groups of records. In our implementation for example, WANs are programmed to purge old records periodically so that the total number of records in the system remains bounded. Also timeouts are allowed to be specified in queries so that queries with unacceptably long execution times can be terminated prematurely. One limitation of the system concerns the semantics of WQL – “join” operations can only be performed on tables on the same WAN. This was imposed in our design because joining tables on different WANs can potentially be very inefficient; requiring entire tables to be transported between WANs. The syntax of WQL reflects this limitation by not allowing individual nodes to be explicitly addressed. Although a number of researchers have tackled the issues of scalability and the use of database query languages to extract network states, to our knowledge, none have proposed a coherent approach that combines the benefits of scalability, query expressiveness and real-time performance. In [14], the authors focus on the design and evaluation of an efficient query layer for simple SQL queries to process information in sensor networks. The central contribution of the work revolves around the development of a distributed query plan designed to satisfy the constraints imposed by ad-hoc sensor networks. In [11], the authors propose the use of a SQL-like query language to express active network programs that collect and process network traffic. The issue of service management scalability was addressed in [2] where an object-oriented architecture was proposed for developing web-based management services. In [4], a management framework for mobile ad-hoc networks was proposed that utilizes policy-based

131Real-time Views of Network Traffic using Decentralized Management

Page 14: [IEEE 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005. - Nice, France (15-19 May 2005)] 2005 9th IFIP/IEEE International Symposium on Integrated

specification for the collection and computation of management information. Although the system utilizes a local database at each monitoring point to store monitored data, the system does not support incremental aggregation in a distributed manner to achieve near real-time performance. Finally in [12] the authors propose a peer-to-peer protocol that incorporates elements of an SQL-like language for implementing resource location services, sensor networks and application-level routing.

ACKNOWLEDGMENT

This work has been supported by VINNOVA, the Swedish Agency for Innovation Systems, under project number 20068-1. Part of this work has been performed in the context of the Ambient Networks, an Information Society Technologies (IST) 507134 project initiated by the European Commission.

References [1] C. Adam and R. Stadler, "Patterns for Routing and Self-Stabilization", 9th

IEEE/IFIP NOMS, Seoul, Korea, Apr. 2004. [2] N. Anerousis, “An Architecture for Building Scalable, Web-based Management

Services”, JNSM, vol. 7, no. 1, 1999. [3] J. Case, M. Fedor, M. Schoffstall and J. Davin, “A Simple Network Management

Protocol (SNMP)”, RFC1157, May 1990. [4] R. Chadha, H. Cheng, Y.-H. Cheng, J. Chiang, G. Levin and H. Tanna, “Policy-

Based Mobile Ad Hoc Network Management”, 5th IEEE Intern. Workshop on Policies for Dist. Syst. and Networks. (POLICY’04), Yorktown Heights, New York, June 2004.

[5] E. J. H. Chang, “Echo Algorithms: Depth Parallel Operations on General Graphs,” IEEE Trans. on Softw. Engr., vol. 8, no. 4, July 1982, pp. 391-401.

[6] C. J. Date, “An Introduction to Database Systems”, 8th Edition, Pearson Education, July 2003.

[7] B. Laurie and P. Laurie, “Apache: The Definitive Guide”, 3rd Edition, O'Reilly; December 2002.

[8] K.S. Lim, R. Stadler: “Weaver—Realizing a scalable management paradigm on commodity routers,” 8th IFIP/IEEE Inter. Symp. on Integr. Net. Mgmt. (IM’03), Colorado Springs, Colorado, USA, March 2003.

[9] K.S. Lim and R. Stadler, “A Navigation Pattern for Scalable Internet Management”, 7th IFIP/IEEE IM’01, Seattle, USA, May 2001, pp. 405-420.

[10] K.S. Lim and R. Stadler, “Real-Time Views of Network Traffic Using Decentralized Management”, KTH/IMIT/LCN Technical Report, Dec 2004.

[11] C. M. Rogers, “ANQL - An Active Networks Query Language”, IWAN 2002, Zurich, Switzerland, December 2002, pp. 99-110.

[12] R. Van Renesse, K. P. Birman, and W. Vogels, “Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining”, ACM Trans. on Comp. Syst., vol. 21, no. 2, May 2003.

[13] M. Widenius and D. Axmark, “MySQL Reference Manual”, O’ Reilly, June 2002. [14] Y. Yao and J. Gehrke, “The Cougar approach to in-network query processing in

sensor networks”, SIGMOD, 2002.

Session Three Traffic Monitoring132