cerias tech report 2005-112 monitoring and controlling … · monitoring and controlling qos...

CERIAS Tech Report 2005-112

MONITORING AND CONTROLLING QOS NETWORK DOMAINS

by Ahsan Habib, Sonia Fahmy, and Bharat Bhargava

Center for Education and Research in Information Assurance and Security,

Purdue University, West Lafayette, IN 47907-2086

Copyright © 2004 John Wiley & Sons, Ltd.

Monitoring and controlling QoS network domains

By Ahsan Habib†, Sonia Fahmy*,† and Bharat Bhargava

Increased performance, fairness, and security remain important goals forservice providers. In this work, we design an integrated distributedmonitoring, traffic conditioning, and flow control system for higherperformance and security of network domains. Edge routers monitor(using tomography techniques) a network domain to detect quality ofservice (QoS) violations—possibly caused by underprovisioning—as wellas bandwidth theft attacks. To bound the monitoring overhead, a routeronly verifies service level agreement (SLA) parameters such as delay, loss,and throughput when anomalies are detected. The marking component ofthe edge router uses TCP flow characteristics to protect ‘fragile’ flows.Edge routers may also regulate unresponsive flows, and may propagatecongestion information to upstream domains. Simulation results indicatethat this design increases application-level throughput of dataapplications such as large FTP transfers; achieves low packet delays andresponse times for Telnet and WWW traffic; and detects bandwidth theftattacks and service violations. Copyright © 2004 John Wiley & Sons, Ltd.

Ahsan Habib is a Postdoctoral Researcher in the School of Information Management and Systems (SIMS), University of California at Berkeley. He works with Prof. John Chuang in the areas of Peer-to-Peer Networking, Network Economics, Multihoming, and Next GenerationNetwork Architecture. His research interests also include Network Security, Network Monitoring, Quality of Service, and Distributed Systems.He received his Ph.D. from the Department of Computer Sciences at Purdue University in August 2003. He received a M.S. from the Depart-ment of Computer Science at Virginia Tech in 1999, and a B.Sc. from Department of Computer Science and Engineering at Bangladesh Uni-versity of Engineering and Technology (BUET) in 1996.

Sonia Fahmy received her PhD degree at the Ohio State University in August 1999. Since then, she has been an assistant professor at the Com-puter Sciences department at Purdue University. Her research interests are in the design and evaluation of network architectures and proto-cols. She is currently investigating Internet tomography, overlay networks, network security, and wireless sensor networks. Her work is publishedin over 50 papers, including publications in IEEE/ACM Transactions on Networking, Computer Networks, IEEE INFOCOM, IEEE ICNP,and ACM NOSSDAV. She received the National Science Foundation CAREER award in 2003, the Schlumberger foundation technical meritaward in 2000 and 2001, and the OSU presidential fellowship for dissertation research in 1998. She is a member of the ACM, IEEE, Phi KappaPhi, Sigma Xi, and Upsilon Pi Epsilon. Please see http://www.cs.purdue.edu/homes/fahmy/ for more information.

Bharat Bhargava’s research involves both theoretical and experimental studies in distributed systems. He has conducted experiments in large-scale distributed systems, communications, authentication, key management, fault-tolerance and Quality of Service. He was the chairman ofthe IEEE Symposium on Reliable and Distributed Systems held at Purdue in October 1998. He is on the editorial board of three internationaljournals. In the 1988 IEEE Data Engineering Conference, he and John Riedl received the best paper award for their work on ‘A Model for Adapt-able Systems for Transaction Processing.’ He is a fellow of Institute of Electrical and Electronics Engineers and Institute of Electronics andTelecommunication Engineers. He has been awarded the charter Gold Core Member distinction by IEEE Computer Society for his distinguishedservice. He received Outstanding Instructor Awards, from the Purdue chapter of the ACM in 1996 and 1998. He has received IEEE TechnicalAchievement award for a major impact of his decade long contributions to foundations of adaptability in communication and distributed systemsin 1999. His students have received best paper awards in International conferences and have started a Nasdaq listed company.

*Correspondence to: Sonia Fahmy, Department of Computer Sciences, Purdue University, West Lafayette, IN 47907-2066, USA†E-mail: [email protected]

INTERNATIONAL JOURNAL OF NETWORK MANAGEMENTInt. J. Network Mgmt 2005; 15: 11–29Published online 18 November 2004 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nem.541

1. Introduction

The success of the Internet has increased itsvulnerability to misuse and performanceproblems. Internet service providers are

now faced with the challenging task of continuousmonitoring of their network to ensure that secu-rity is maintained, and customers are obtainingtheir agreed-upon service. Service providers mustalso identify when their networks need to be re-configured or re-provisioned, and must obtain thebest performance they can out of these networks.Based upon these requirements, our primaryobjective in this work is twofold: (i) achievinghigher user-perceivable quality of service (QoS)and overall resource utilization in a networkdomain, and (ii) flagging under-provisioningproblems and network misuse. Towards this objec-tive, we design edge routers that combine (a) low-overhead monitoring, together with (b) trafficconditioning at network domain edges, and (c)unresponsive flow control, in order to mitigatemisuse, congestion, and unfairness problems inInternet domains. Monitoring network activity hasthe additional benefit of detecting denial of service(DoS) and bandwidth theft attacks, which havebecome an expensive problem in today’s Internet.

T he success of the Internet has increasedits vulnerability to misuse and

performance problems.

Our integrated monitoring, conditioning andcontrol techniques will be illustrated on the differ-entiated services (diff-serv) architecture.1 Diff-servis a simple approach to enhance quality of service(QoS) for data and multimedia applications in theInternet. In diff-serv, complexity is pushed to theboundary routers of a network domain to keepcore routers simple. The edge routers at the bound-ary of an administrative domain shape, mark, anddrop traffic if necessary. The operations are basedon Service Level Agreements (SLAs) between adjacent domains. The SLA defines what levels ofservice a user can expect from a service provider.The SLA clarifies the goals of both parties, andensures that both of them abide by the agreement.

Our design comprises three primary compo-nents. The monitoring component infers possibleattacks and SLA violations. The traffic condition-ing component marks TCP traffic, using basicknowledge of TCP operation. The unresponsiveflow control component regulates traffic enteringa domain, and conveys congestion information toupstream domains. These three edge router com-ponents, and the flow of data and control amongthem, are depicted in Figure 1. Our focus whendesigning each component will be on scalabilityand low overhead. We now outline the operationof each component.

—SLA Monitoring Component—

QoS-enabled networks (e.g., differentiated ser-vices networks) are vulnerable to different types ofattacks from traditional IP network domains. Forexample, users may inject or re-mark traffic withhigh QoS requirements, which may cause otherusers to have lower throughput, or higher delayand packet loss. We define an attack to be an inci-dent when a user violates his/her SLA or re-markspackets to steal bandwidth. We need to flag suchSLA violations or bandwidth theft attacks.Towards this end, we will extend and exploitnetwork tomography techniques that use end-to-end measurements to infer internal domain behav-ior. Although network tomography has witnesseda flurry of research activity in the past few years,these new tomography results have not been inte-grated with the more mature research on monitor-ing and control. In contrast, we will use networktomography techniques such as per-segment lossinference mechanisms2 to monitor and controlnetwork domains.

—Traffic Conditioning Component—

At the edge of a network domain, traffic condi-tioners can utilize knowledge of TCP characteris-tics to give priority to ‘critical’ packets, andmitigate TCP bias to flows with short round triptimes (RTTs). Such intelligent conditioning func-tions, however, have traditionally required main-taining per-flow state information. While edgerouters between a stub domain and a transitdomain do not typically handle very largenumbers of flows, many edge routers, such asInternet Exchange points among peering domains,

12 A. HABIB, S. FAHMY AND B. BHARGAVA

Copyright © 2004 John Wiley & Sons, Ltd. Int. J. Network Mgmt 2005; 15: 11–29

are highly loaded. To address this problem, wewill design a conditioner that only uses packetheader information instead of stored state whenpossible, and employs replacement policies tocontrol the amount of state maintained.

—Congestion Control Component—

Congestion collapse from undelivered packets isan important problem in the Internet.3 Congestioncollapse occurs when upstream bandwidth is con-sumed by packets that are eventually droppeddownstream. This can be caused by unresponsiveflows that do not reduce their transmission ratesin response to congestion. The congestion collapseproblem can be mitigated using improved packetscheduling or active queue management,4,5 butsuch open loop techniques do not affect congestioncaused by unresponsive flows in upstreamdomains. To address this important problem, wewill design a mechanism to control the rate atwhich packets enter a network domain to the rateat which packets leave the domain. Congestion isdetected when many high priority packets arebeing dropped.6 Edge routers which detect or infersuch drop can therefore regulate unresponsiveflows. This results in better performance for users,and better overall resource utilization in thenetwork domain.

We conduct a series of simulation experimentsto study the behavior of all three components of

this framework. Our simulation results show that TCP-aware edge router marking improvesthroughput of greedy applications like large FTPtransfers, and achieves low packet delays andresponse times for Telnet and WWW traffic. Wealso demonstrate how attacks and unresponsiveflows alter network delay and loss characteristics,and hence can be detected by our monitoring com-ponent, and controlled by the congestion controlcomponent.

The remainder of this paper is organized asfollows. Section 2 gives an overview of the differ-entiated services architecture—which we use as anunderlying quality of service (QoS) framework—and discusses previous results related to the components of our proposed edge routers. Ournetwork monitoring and loss inference techniquesfor attack detection are discussed in Section 3.Section 4 discusses the design of the TCP-awaretraffic conditioning component. Section 5 explainshow to detect and control unresponsive flowsduring congestion. Our simulation set-up for per-formance evaluation is described in Section 6.Section 7 discusses our simulation results, andconclusions are given in Section 8.

2. Background and Related Work

As previously mentioned, we use the diff-servframework as the underlying QoS approach. In

MONITORING AND CONTROLLING QoS NETWORK DOMAINS 13


FlowControl

Control

Feedback to

unresponsive

upstream domains

control

Enters

domain

TrafficConditioning

Meter

Marking

Shaping/

Dropping

SLAMonitoring

Take action basedon violation

parameters todetect violation

Violation

Good

Dropped

Check SLA

flow

Rejected

Figure 1. Monitoring, conditioning, and flow control components in an edge router

diff-serv networks, traffic enters a domain at aningress router and leaves a domain at an egressrouter. An ingress router is responsible for ensur-ing that the traffic entering the domain conformsto the SLA with the upstream domain. An egressrouter may perform traffic conditioning functionson traffic forwarded to a peering domain. In thecore of the network, Per Hop Behaviors (PHBs)achieve service differentiation. The current diff-serv specification defines two PHB types: Expe-dited Forwarding7 and Assured Forwarding (AF).8

AF provides four classes (queues) of delivery withthree levels of drop precedence (DP0, DP1, andDP2) per class. The Differentiated Services CodePoint (DSCP), contained in the IP headerDSFIELD/ToS field, is set to mark the drop prece-dence. When congestion occurs, packets markedwith higher precedence (e.g., DP2) must bedropped first.

Providing QoS in diff-serv networks has beenextensively studied in the literature. Clark andFang introduced RIO in 1998,9 and developed theTime Sliding Window (TSW) tagger. They showthat sources with different target rates can achievetheir targets using RIO even for different RoundTrip Times (RTTs), whereas simple RED routerscannot. Assured Forwarding is studied by Ibanezand Nichols in Reference 10. They use a tokenbucket marker and show that target rates andTCP/UDP interaction are key factors in determin-ing throughput of flows. Seddigh, Nandy andPieda11 also show that the distribution of excessbandwidth in an over-provisioned network is sen-sitive to UDP/TCP interactions. Lin, Zheng andHou12 propose an enhanced TSW profiler, but theirsolution requires state information to be main-tained at core routers.

In the next three subsections, we discuss workrelated to network monitoring and tomography,traffic conditioning, and congestion collapse solu-tions, which comprise the three components of ourproposed design.

—2.1. Network Tomography andViolation Detection—

Since bottleneck bandwidth inference tech-niques such as packet pairs were proposed in theearly 1990s, there has been increased interest ininference of internal network characteristics (e.g.,

per-segment delay, loss, bandwidth, and jitter)using correlations among end-to-end measure-ments. This problem is called network tomography.Recently, Duffield et al.2 have used unicast packet‘stripes’ (back-to-back probe packets) to infer link-level loss by computing packet loss correlation fora stripe at different destinations. This work is anextension of loss inference with multicast traffic,e.g., References 13, 14. We develop a tomography-based, low-overhead method to infer delay, loss,and throughput and detect problems that alter theinternal characteristics of a network domain. Themain contribution of the monitoring component ofour work is to utilize existing network tomogra-phy techniques in novel ways. We also extendthese tomography techniques for QoS networks.

In addition to tomography work, a number of network monitoring techniques have beenrecently proposed in the literature. In efficientreactive monitoring,15 global polling is combinedwith local event driven reporting to monitor IPnetworks. Breitbart et al.16 use probing-based tech-niques where path latencies and bandwidth aremeasured by transmitting probes from a singlepoint of control. They find the optimal number ofprobes using vertex cover solutions. Recent workon SLA validation17 uses a histogram aggregationalgorithm to detect violations. The algorithm mea-sures network characteristics like loss ratio anddelay on a hop-by-hop basis and uses them tocompute end-to-end measurements. These arethen used in validating the end-to-end SLArequirements. All of these techniques involve corerouters in their monitoring. In contrast, our workpushes monitoring responsibilities to the edgerouters. We use an Exponential Weighted MovingAverage (EWMA) for delay, and an average ofseveral samples for loss as in RON,18 since it is bothflexible and accurate.

—2.2. Traffic Conditioning—

Edge routers perform traffic conditioning andcontrol functions. The edge router may alter thetemporal characteristics of a stream to bring it intocompliance with a traffic profile specified by thenetwork administrator.1 A traffic meter measuresand sorts packets into precedence levels. Marking,shaping, or dropping decisions are based upon themeasurement result.



Marking—Markers can mark packets deter-ministically or probabilistically. A probabilisticpacket marker, such as Time Sliding Windowmarker,19 obtains the current flow rate, measure-dRate, of a user from the meter. The marker tagseach packet based on the targetRate from the SLAand the current flow rate. An incoming packet ismarked as IN profile (low probability to drop) ifthe corresponding flow has not reached the targetrate, otherwise the packet is marked as high dropprecedence with probability 1 - p, where p is givenby, equation (1):

(1)

Shaping/dropping—Shaping reduces trafficvariation and provides an upper bound for the rateat which the flow traffic is admitted into thenetwork. A shaper usually has a finite-size buffer.Packets may be discarded if there is insufficientspace to hold the delayed packets. Droppers dropsome or all of the packets in a traffic stream inorder to bring the stream into compliance with thetraffic profile. This process is know as policing thestream.

A number of recent studies explore the designof more sophisticated traffic conditioners. Fang etal.19 proposed the Time Sliding Window ThreeColor Marker (TSW3CM), which we use as a standard traffic conditioner. Adaptive packetmarking20 uses a Packet Marking Engine (PME),which can be a passive observer under normalconditions, but becomes an active marker at thetime of congestion. Yeom and Reddy21 also conveymarking information to the sender, so that it canslow down its sending rate in the case of conges-tion. This requires modifying the host TCP imple-mentation, which is difficult to deploy. Feroz et al.22

propose a TCP-Friendly marker. The marker pro-tects small-window flows from packet loss bymarking their traffic as IN profile. In this paper, wedevelop similar intelligent conditioning tech-niques with lower overhead.

An important problem with TCP is its bias toconnections with short round trip times. Nandy etal. design RTT-aware traffic conditioners23 whichadjust packet marking based on RTTs, to mitigateTCP RTT bias. Their conditioner is based on thesteady state TCP behavior as reported by Mathis

pmeasuredRate targetRate

measuredRate=

-

et al.24 Their model, however, does not considertime-outs which we consider in this paper.

—2.3. Congestion Collapse—

As previously discussed, congestion collapseoccurs when upstream bandwidth is consumed bypackets that are eventually dropped downstream.A number of solutions have been proposed to miti-gate this problem in Internet domains. Seddigh et al.25 propose separating TCP (responsive to congestion) and UDP (may be unresponsive) tocontrol congestion collapse caused by UDP. Albuquerque et al.26 propose a mechanism,Network Border Patrol, where border routersmonitor all flows, measure ingress and egressrates, and exchange per-flow information with alledge routers periodically. The scheme is elegant,but its overhead is high. Chow et al.27 propose asimilar framework, where edge routers periodi-cally obtain information from core routers, andadjust conditioner parameters accordingly. Aggre-gate-based Congestion Control (ACC) detects andcontrols high bandwidth aggregate flows.28

In the Direct Congestion Control Scheme(DCCS),6 packet drop of packets with the lowestdrop priority is tracked by core routers. Droppingpackets with lowest drop priority is an indicationthat there is severe congestion in the network. Thecore routers send the load information to the edgerouters only during congestion. We employ asimilar methodology for detecting congestion andcontrolling unresponsive flows. However, our pro-posed approach has lower storage overhead andincludes a mechanism to avoid congestion col-lapse. Core router participation is optional.

In the next three sections, we discuss the threecomponents of our proposed edge router design.

3. Tomography-based ViolationDetection Component

An attacker can impersonate a legitimate cus-tomer of a service provider by spoofing its iden-tity. Network filtering29 can detect spoofing if theattacker and the impersonated customer are in different domains, but the attacks may proceedunnoticed otherwise. QoS domains support



low-priority classes, such as best effort, which arenot controlled by edge routers. The serviceprovider should ensure that high-priority customers are getting their agreed-upon service,so that the network can be re-configured or reprovisioned if needed, and attackers whichbypass or fool edge controls are prevented. In case of distributed DoS attacks, flows from variousingress points are aggregated as they approachtheir victim. Monitoring can control such high bandwidth aggregates at the edges, andpropagate attack information to upstreamdomains30.*

We employ network tomography—an approachto infer the internal behavior of a network purelybased on end-to-end measurements. We use anedge-to-edge measurement-based loss inferencetechnique to detect service violations and attacksin a QoS domain. The measurements do notinvolve any core router in order to scale well. Wemeasure SLA parameters such as delay, packetloss, and throughput to ensure that users areobtaining their agreed upon service. Delay isdefined as the edge-to-edge latency; packet loss isthe ratio of total flow packets dropped in thedomain† to the total packets of the same flowwhich entered the domain; and throughput is thetotal bandwidth consumed by a flow inside a domain. If a network domain is properly provisioned and no user is misbehaving, the flows traversing the domain should not experienceexcessive delay or loss. Although jitter (delay vari-ation) is another important SLA parameter, it is flow-specific and therefore, not suitable to use in network monitoring. In this section, we describe edge-to-edge inference of delay, loss and throughput, and a violation detection mechanism.

—3.1. Delay Measurements—

Delay bound guarantees made by a providernetwork to customer flows are for the delays expe-

rienced by the flows between the ingress andegress edges of the provider domain. For eachpacket traversing an ingress router, the ingresscopies the packet IP header into a new packet witha certain pre-configured probability pprobe. Theingress encodes the current time into the payloadand marks the protocol identifier field of the IPheader with a new value. The egress router recog-nizes such packets and removes them from thenetwork. Additionally, the egress router computesthe packet delay for flow i by subtracting theingress time from the egress time. (We assumeNTP is used to synchronize the clocks.) The egressthen sends the packet details and the measureddelay to an entity we call the SLA monitor whichtypically resides at an edge router. At the monitor,the packets are classified as belonging to customerj, and the average packet delay of the customertraffic is updated using an exponential weightedmoving average (EWMA) (we use a currentsample weight 0.2). If this average packet delayexceeds the delay guarantee in the SLA, we conclude that this may be an indication of an SLAviolation.

—3.2. Loss Inference—

Packet loss guarantees made by a providernetwork to a customer are for the packet lossesexperienced by its conforming traffic inside theprovider domain. Measuring loss by observingpacket drop at all core routers and communicatingthem to the SLA monitor at the edge imposes sig-nificant overhead. We use packet stripes2 to inferlink-level loss characteristics inside the domain. Aseries of probe packets with no delay between thetransmission of successive packets, or what isknown as a ‘stripe’, is periodically transmitted. Fora two-leaf tree spanned by nodes 0, k, R1, R2, stripesare sent from the root 0 to the leaves to estimatethe characteristics of three links (Figure 2). Forexample, the first two packets of a 3-packet stripeare sent to R2 and the last one to R1. If a packetreaches a receiver, we can deduce that the packethas reached the branch point k. By monitoring thepacket arrivals at R1, R2 and both, we can writeequations with three known quantities and esti-mate the three unknown quantities (loss rates oflinks 0 Æ k, k Æ R1 and k Æ R2) by applying con-ditional probability definitions, as discussed inReference 2. We combine estimates of several



*As with any detection mechanism, the attackers can attack the mech-anism itself, but the cost to attack our distributed monitoring mech-anism is higher than the cost to inject or spoof traffic, or bypass asingle edge router.†A flow can be a micro-flow identified by source and destinationaddresses and ports and protocol identifier, or an aggregate of severalmicro-flows.

stripes to limit the effect of non-perfect correlationamong the packets in a stripe. This inference tech-nique extends to trees with more than 2 leaves andmore than 2 levels.2

We extend this end-to-end unicast probingscheme to routers with multiple active queue management instances, e.g., 3-color RED,31 anddevelop heuristics for the probing frequency andthe particular receivers to probe to ensure gooddomain coverage. Assured forwarding queues usethree drop precedences referred to as green,yellow, and red. Suppose Pred is the percentage ofred packets accepted (not dropped) by the activequeue. We define percentages for yellow and greentraffic similarly, and show how these percentagesare computed in the Appendix. Link loss can beinferred by subtracting the transmission probabil-ity from 1. If Lg, Ly, and Lr are the inferred lossesof green, yellow and red traffic, respectively, losscan be expressed as:

(2)

where ni is the number of samples taken from i-colored packets. However, when loss of greentraffic is zero, we take the average of yellow andred losses. When the loss of yellow traffic is alsozero, we report only loss of red probes. We reducethe overhead of loss inference by probing thedomain links with high delay only, as determinedby the delay measurement procedure. We alsomeasure throughput by probing egress routersonly when delay and loss are excessive. This helpspinpoint the user or aggregate which is consum-ing excessive bandwidth, and causing other flowsto receive lower quality than their SLAs.

Ln P L n P L n P L

n n nclass

g green g y yellow y r red r

g y r

=+ +

+ +

—3.3. Violation and Attack Detection—

When delay, loss, and bandwidth consumptionexceed the pre-defined thresholds, the monitorconcludes there may be an SLA violation or attack.Excessive delay is an indication of abnormal con-ditions inside the network domain. If there arelosses for the premium traffic class, or if the lossratios of assured forwarding traffic classes exceedcertain levels, a possible SLA violation is flagged.The violation can be caused by aggressive or unre-sponsive flows, denial of service attacks, flashcrowds, or network under-provisioning. To detectdistributed DoS attacks, the set of links with highloss are identified. If high bandwidth aggregatestraversing these high-loss links have the same des-tination IP prefix, there is either a DoS attack or aflash crowd, as discussed in Reference 28.

W hen delay, loss, and bandwidthconsumption exceed the pre-defined

thresholds, the monitor concludes there maybe an SLA violation or attack.

Decisions are taken by consulting the destina-tion entity. Jung et al. analyze the characteristics offlash crowds and DoS attacks in Reference 32.Their work reveals several distinguishable fea-tures between these two. For example, the clientdistribution of a flash crowd event follows populardistribution among ISPs and networks, however,this is not true for a DoS attack. The other distin-guishable features are per client request rate,overlap of clusters a site sees before and after theevent, and popularity distribution of the fileaccessed by the clients. Using these characteristics,the monitor can decide whether it is a DoS attackor a flash crowd. If this is determined to be anattack, the appropriate ingress routers are notifiedand the offending user traffic is throttled, as dis-cussed in Section 5.

4. Traffic ConditioningComponent

We incorporate several techniques in the trafficconditioning component to improve performance



0

k

R R21

Figure 2. Binary tree to infer per-segment lossfrom source 0 to receivers R1 and R2

of applications running on top of TCP. The keyidea behind these techniques is to protect criticalTCP packets from drop, without requiring largestate tables to be maintained. We protect SYNpackets (as indicated in the TCP header) by givingthem low drop priority (DP0).‡ Since TCP growsthe congestion window exponentially until itreaches the slow start threshold, ssthresh, and thecongestion window is reduced to 1 or half of thessthresh for time-outs or packet loss, we alsoprotect small window flows from packet losses bymarking them with DP0, as proposed in Reference22. Edge routers use sequence number informationin packet headers in both directions to determineif the window is small. ECN-Capable TCP mayreduce its congestion window due to a time-out,triple duplicate ACKs, or in response to explicitcongestion notification (ECN).33 In this case, TCPsets the CWR flag in the TCP header of the firstdata packet sent after the window reduction.Therefore, we give low drop priority to a packet ifthe CWR or ECN bit is set. This avoids consecu-tive ssthresh reductions that lead to poor perfor-mance with TCP Reno.34 We also mark packetsinversely proportionally to the square of the flowrequested rates if proportional sharing of excessbandwidth is required.23 The marker avoidsmarking high drop priority in bursts, in order towork well with TCP Reno, as proposed in Refer-ence 22.

An optional feature to mitigate the TCP shortRTT bias is to mark based on RTT and RTO, if suchinformation is available to the edge router. Tounderstand how RTT and RTO information can beused to increase fairness, we review two TCPthroughput models. Equation (3) shows that, in asimple TCP model that considers only duplicateACKs,35 bandwidth is inversely proportional toRTT where MSS is the maximum segment size andp is the packet loss probability:

(3)

An RTT-aware marking algorithm based on thismodel (proposed in Reference 23) works well for

BWMSS

RTT pµ

a small number of flows because equation (3) accu-rately represents the fast retransmit and recoverybehavior when p is small. We have observed thatfor a large number of flows, short RTT flows timeout because only long RTT flows are protected bythe conditioner after satisfying the target rates. Tomitigate this unfairness, we use the throughputapproximation by Padhye et al.35:

(4)

where b is the number of packets acknowledgedby a received ACK, and To is the time-out length.Designing an RTT-aware traffic conditioner usingequation (4) is more accurate than using equation(3) because it considers time-outs. Simplifying thisequation, we compute the packet drop ratiobetween two flows, r as:

(5)

where RTTi and Toi are the RTTs and timeouts,respectively, of flow i.36 If the measured rate isbeyond the target rate of a flow, the marker marksthe packets as DP0 with probability (1 - p2) wherep is defined in equation (1). The unmarked packetsare out-of-profile (DP1 and DP2) packets. Thesepackets are directly related to the packet dropprobabilities at the core. This means that packetdrop at the core is proportional to the out-of-profile marked packets. Equation (5) is used tomark out of profile packets as DP1 or DP2, wheresuch packets will be dropped before DP0 duringcongestion. The RTT and RTO are estimated at theedge routers using an algorithm similar to thatspecified in Reference 23.

Each of the techniques discussed in this sectionhas advantages and limitations. Protecting SYN,ECN, and CWR packets, and marking according tothe target rate do not need to store per flow information and are simple to implement. On theother hand, protecting small window flows andmarking according to the RTT and RTO valuesrequires maintaining and processing per flowinformation. To bound state overhead at the edgerouters, we store per flow information at the edgeonly for a certain number of flows based on avail-able memory. The edge router uses a least recentlyused (LRU) state replacement policy when the

r2 1

2

21

2

= ÊË

ˆ¯ ¥

RTT

RTT

To

To

BWMSS

RTTbp

To minbp

p pµ

+ ¥ ÊË

ˆ¯ +( )2

31 3

3

81 32 2,



‡With SYN DoS attacks, this may be unfavorable. Therefore, if amechanism to detect SYN attacks is included in the router, this optioncan be turned off as soon as a SYN attack is detected.

number of flows exceeds the maximum numberthat can be maintained. Therefore, for every flow,conditioning is based on state information if it ispresent. If there is no state present, conditioningonly uses techniques that rely on header informa-tion. The conditioner pseudo-code is given inFigure 3.

5. Congestion ControlComponent

This section describes the congestion controlcomponent of the edge router. This componentdetects and controls unresponsive flows, i.e.,flows that do not reduce their rates as a responseto congestion. As previously discussed, SLA mon-itors can inform edge routers of congestion insidea domain. A shaping algorithm at the edgerouters can therefore control unresponsive flowsat the time of congestion. In addition, ingressrouters of a domain may propagate congestioninformation to the egress routers of upstreamdomains. A stub domain that is connected to only end-users does not propagate this informa-tion to the users, instead, it controls the incom-

ing flows to conform with the service level agreements.

—5.1. Optional Core RouterDetection—

In Section 3, we have shown how tomography-based loss inference techniques can be applied todetect per-segment losses using edge-to-edgeprobes. An alternative strategy is to track excessivedrop of only the high priority (i.e., green or DP0)packets at core routers, as proposed in Reference6. The core router maintains the tuple {sourceaddress, destination address, source port, destina-tion port, protocol identifier, timestamp, btlnkbw}for dropped DP0 packets. The outgoing link band-width at the core, btlnkbw, helps regulate the flow.Edge routers shape more aggressively if the corehas a ‘thin’ outgoing link. The core sends this dropinformation to the ingress routers only when thetotal drop exceeds a local threshold (thus the flowappears to be non-adaptive). The information issent only during congestion, and only for the flowsthat are violating SLAs. Thus, this design does notsignificantly reduce the scalability of the differen-tiated service framework.



for For each incoming flow do

if there is a complete state entry for this flow then

statePresent = TRUE

Update the state table

else

statePresent = FALSE

Add the flow in the state table (replace if needed)

end if

if statePresent is TRUE then

Use Standard conditioner plus SYN, ECN, CWR, small window, burst, RTT-RTO based marking

else

Use Standard conditioner plus SYN, ECN, and CWR based marking

end if

end for

Figure 3. Algorithm for adaptive traffic conditioner

—5.2. Metering and Shaping—

At the egress routers, we distinguish two typesof drops: a drop due to metering and shaping atdownstream routers (sdrop), and a drop due tocongestion (cdrop) (either obtained via inference asdiscussed in Section 3 or communicated from coreto edge routers as in Section 5.1). For a particularflow, assume the bottleneck bandwidth is btlnkbw(as given above); the bandwidth of the outgoinglink of the flow at the ingress router is linkbw; theflow has an original profile (target rate) of targe-trate; and the current weighted average rate forthis flow is wavg. In case of cdrop, the profile of the flow is updated temporarily (to yield rate new-profile) using equations (6) and (7) where 0 < g < 1is the congestion control aggressiveness parameter:

(6)

(7)

A higher value of g speeds up convergence, butapplication QoS may deteriorate. A lower valuemakes traffic smoother, but it takes longer to read-just the rate. The ‘max’ term in the equation can beignored if the bottleneck bandwidth informationcannot be obtained (i.e., tools like pathchar or Net-timer cannot be used), or core router detection(Section 5.1) is unavailable. In equation (7), theweighted average of the arrival rate is computedusing the Time Sliding Window [10] algorithm.

For sdrop, the profile is adjusted as follows:

(8)

The newprofile is initialized to targetrate. In theabsence of drops, the router increases the adjustedprofile periodically at a certain rate increment. Therate increment is initialized to a constant number of packets each time the router receives drop information, and is doubled when there is no drop,

until it reaches a threshold , and then it is

increased linearly. Thus, the rate adjustment algo-rithm follows the spirit of TCP congestion control.At the edge, shaping is based on the current

wavg

f

newprofile

max newprofile sdrop packet size

=- ¥( )0, _

newprofile max min newprofile decrement

wavg decrement

= -((

- ))

0, ,

decrement

cdrop packet size maxlinkbw

btlnkbw

=

¥ ¥ ( )_ ,1 g

average rate and the adjusted profile. For eachincoming flow, if the current average rate is greater than the adjusted profile, unresponsiveflow packets are dropped.

6. Simulation Set-up

We use simulations to study the effectiveness of our edge router design. The ns-2 simulator37

with the differentiated services implementation of Nortel Networks38 is used. We use the fol-lowing RED parameters {minth, maxth, Pmax}: for DP0{40, 55, 0.02}; for DP1 {25, 40, 0.05}; and for DP2{10, 25, 0.1} (as suggested in Reference 23). wq is 0.002 for all REDs. TCP New Reno is usedwith a packet size of 1024 bytes and a maximumwindow of 64 packets. We vary the number ofmicro-flows (where a micro-flow represents asingle TCP/UDP connection) per aggregate from10 to 200. We compute the following performancemetrics:

Throughput.—This denotes the average bytesreceived by the receiver application over simula-tion time. A higher throughput usually meansbetter service for the application (e.g., shorter com-pletion time for an FTP flow). For the ISP, higherthroughput implies that links are well-utilized.

Packet drop ratio.—This is the ratio of totalpackets dropped to the total packets sent. A usercan specify for certain applications that packetdrop should not exceed a certain threshold.

Packet delay.—For delay sensitive applicationlike Telnet, the packet delay is a user metric.

Response time.—This is the time betweensending a request to a Web server and receivingthe response back from the server.

7. Simulation Results

The objective of this preliminary set of experi-ments is to evaluate the effectiveness of the threecomponents of our edge router. In the next few sec-tions, we study the performance of each compo-nent under various conditions.



—7.1. Detecting Attacks and SLAViolations—

In this section, we investigate the accuracy andeffectiveness of the delay, loss, and throughputapproximation methods for detecting violationsdiscussed in Section 3. We use a similar networktopology to that used in Reference 2 as depicted in Figure 4. The violation detection mechanismworks for higher link capacities and for a largernetwork network topology as shown in Figure 4.Having links with capacity higher than 10Mbpsrequires more traffic to simulate an attack. Multiple domains are connected to all edge routersthrough which flows enter the network domain.The flows are created from domains attached toE1, E2, E3, and destined to the domains connectedto edge router E6, so that the link C4 Æ E6 is congested. An attack is simulated on C4 Æ E6 by flooding this link with an excessive amount oftraffic entering through different ingress routers.The purpose of this simulation is to show that the edge routers can detect service violations andattacks due to flow aggregation towards a down-stream domain. Many other flows are created toensure all links carry a significant number of flows.

We first measure delay when the network is correctly provisioned or over-provisioned (andthus experiences little delay and loss). The delayof E1 Æ E6 is 100ms; E1 Æ E7 delay is 100ms; andE5 Æ E4 delay is 160ms. Attacks are simulated onrouter E6 through links C3 Æ C4 and C4 Æ E6.With the attack traffic, the average delay of the E1Æ E6 link increases from 100ms to approximately180ms. Since all the core links have a higher capac-ity than other links, C4 Æ E6 becomes the mostcongested link.

Figure 5 shows the cumulative distributionfunction (CDF) of edge-to-edge delay for the linkE1 Æ E6 under various traffic loads and in thepresence of attacks. When there is no attack, theend-to-end delay is close to the link transmissiondelay. If the network path E1 Æ E6 is lightlyloaded, for example with a 30% load, the delaydoes not go significantly higher than the link trans-mission delay. Even when the path is 60% loaded(medium load in Figure 5), the edge-to-edge delayof link E1 Æ E6 is increased by less than 30%. Someinstantaneous values of delay increase to as highas 50% of the link transmission delay, but theaverage value does not fluctuate too much. In both



C2

C1

C4

E1

E5

C3

E3E4

E6

E7

C5

E2

Core Router

Edge Router

20 Mbps, 30 ms

10 Mbps, 20 ms

Figure 4. Topology used to infer loss and detectservice violations. All edge routers are connected

to multiple domains, and each domain has multiple hosts

0

20

40

60

80

100

100 120 140 160 180 200

% o

f tr

affi

c

delay (ms)

light loadmedium load

mild attacksevere attack

0

20

40

60

80

100

100 120 140 160 180 200

% o

f tr

affi

c

delay (ms)

light loadmedium load

mild attacksevere attack

Figure 5. Cumulative distribution function (CDF)of one way delay from E1 to E6

cases, the network is properly provisioned, i.e., theflows do not violate their SLAs. On the other hand,excess traffic introduced by an attacker increasesthe edge-to-edge delay and most of the packets of attack traffic experience a delay 40–70% higher than the link transmission delay (Figure 5).Delay measurement is thus a good indication ofthe presence of excess traffic inside a networkdomain.

The frequency of delay probing is a critical para-meter that affects the accuracy of the estimation.

Sending fewer probes reduces overhead but usingonly a few probes can produce inaccurate estima-tion, especially when some of the probes are lostin the presence of excess traffic due to an attack. We have found that sending probes at a rateof 10–15 per second is a good choice in this experiment.

We demonstrate detection of such abnormalconditions using delay measurements in three sce-narios labeled ‘No attack’, ‘Attack 1’, and ‘Attack2’ in Figure 6. ‘No attack’ indicates no significanttraffic in excess of capacity. This is the baseline caseof proper network provisioning and traffic condi-tioning at the edge routers. Attacks 1 and 2 injectmore traffic into the network domain from differ-ent ingress points. The intensity of the attack isincreased during time t = 15 seconds to t = 45seconds. Loss is inferred when high delay is experienced inside the network domain. To inferloss inside a QoS network, green, yellow, and redprobes are used. We use equation (2) to computeoverall traffic loss per class in a QoS network. Theloss measurement results are depicted in Figure 7.The loss fluctuates with time, but the attack causespacket drops of 15% to 25% in the case of Attack 1and more than 35% with Attack 2. We find that ittakes approximately 10 seconds for the inferredloss to converge to the same value as the real loss in the network. Approximately 20 stripes persecond are required to infer a loss ratio close to theactual value. For more details on the probing frequencies and convergence of the estimations,see Reference 30.

Delay and loss estimation, together with theappropriate thresholds, can thus indicate the pres-ence of abnormal conditions, such as distributedDoS attacks and flash crowds. When the SLAmonitor detects such an anomaly, it polls edgedevices for throughputs of flows. Using these outgoing rates at egress routers, the monitor computes the total bandwidth consumption byany particular user. This bandwidth consumptionis then compared to the SLA bandwidth. By iden-tifying the congested links and the egress routersconnected to the congested links, the downstreamdomain where an attack or crowd is headed isidentified. Using IP prefix matching, we determinewhether many of these flows are being aggregatedtowards a specific network or host. If the destina-tion confirms this is an attack, we control theseflows at the ingress routers.

—7.2. Adaptive Conditioning—

As discussed in Section 4, TCP-aware markingcan improve application QoS. We first performseveral experiments to study each marking technique separately and study all combinations.We find that protecting SYN packets is useful forshort-lived connections and very high degrees ofmultiplexing. Protecting connections with smallwindow sizes (SW) contributes the most to totalbandwidth gain, followed by protecting CWRpackets and SYN. SW favors short RTT connec-tions, but it reduces packet drop ratio and time-outs for long RTT connections as well, compared



0

0.1

0.2

0.3

0.4

0.5

0.6

0 10 20 30 40 50 60

loss

rat

io

Time (sec)

<------- severe attack -------->

No AttackAttack 1Attack 2

Figure 7. Packet loss ratio in link C4 Æ E6follows the same pattern as the edge-to-edge

delay of the path E1 Æ E6

100

110

120

130

140

150

160

170

180

0 10 20 30 40 50 60

del

ay (

ms)

Time (sec)

<------- severe attack -------->

No AttackAttack 1Attack 2

Figure 6. Observed delay on the path E1 Æ E6 atthe time of an attack. ‘Attack 1’ results in packet

loss in excess of 15–30%. ‘Attack 2’ increasespacket loss to more than 35%

to a standard traffic conditioner. Not marking inbursts is effective for short RTT connections. If SWis not used, Burst + CWR achieves higher band-width than any other combination. The RTT–RTObased conditioner mitigates the RTT-bias amongshort and long RTT flows. This is because whenthe congestion window is small, there is a higherprobability of time-outs in the case of packetdrops. Protecting packets (via DP0 marking) whenthe window is small reduces time-outs, especiallyback-to-back time-outs. A micro-flow also recoversfrom time-outs when RTO as well as RTT is usedto mark packets. All these marking principles areintegrated together with an adaptive state replace-ment policy, as given in Figure 3. We now evalu-

ate the performance of this adaptive traffic condi-tioner with FTP and CBR traffic, Telnet and WWWapplications. The network hosts and routers areECN-enabled for all experiments in this section,since we use the ECN and CWR packet protectionmechanism. Additional results can be found inReference 39.

Figure 9(a) compares the bandwidth with thestandard and with the adaptive (Figure 3) condi-tioner for the simple topology shown in Figure8(a). The total throughput is measured over theentire simulation time at the receiving end. Thecurve labeled ‘Max’ denotes the bandwidth whenthe standard conditioner is combined with allmarking techniques and stores per-flow informa-



C1

E2

C2E3

E1

E4

Core RouterEdge RouterHost

n1

n2

n5

n3

n6

n4

Bottleneck Link

(a) Simple topology

n4

n5

E5C3

E4

10ms 20ms10ms 20ms

5ms

10ms

C2

E3

5ms

C4

10ms

E6

n6

n7

n8

E7

E2

n1

E1

40ms

C1

10ms

n3

n2

Core RouterEdge RouterHost

n9

n10

(b) Multiple domain topology

Figure 8. Simulation topologies. All links are 10Mbps. Capacity of the bottleneck links are altered insome experiments

9.1

9.2

9.3

9.4

9.5

9.6

9.7

9.8

9.9

10

0 20 40 60 80 100 120 140 160 180 200

Ban

dw

idth

(M

bps)

Number of Flows

MaxAdaptiveStandard

(a) Simple topology. Adaptive state table

size=20 micro-flows

9.5

10

10.5

11

11.5

12

12.5

13

13.5

0 20 40 60 80 100 120 140 160 180 200

Ban

dw

idth

(M

bps)

Number of Flows

MaxAdaptiveStandard

(b) Multi-domain topology. Adaptive state

table size=50micro-flows

Figure 9. Achieved bandwidth with the standard conditioner and adaptive conditioner. ‘Max’ is thebandwidth when the standard conditioner is combined with all marking techniques and stores per-

flow information for all flows

tion for all flows. We find that the adaptive conditioner outperforms the standard one for allaggregate flows. The adaptive conditioner is morefair (not shown) in the sense that short RTT flowsdo not steal bandwidth from long RTT flows andtotal achieved bandwidth is close to 10Mbps (bot-tleneck link speed). Therefore, with simple pro-cessing (Figure 3), performance can be increased.

Figure 8(b) depicts a more complex simulationtopology where three domains are interconnected(all links are 10Mbps). The link delay between host and the edge is varied from 1 to 10ms for dif-ferent hosts connected to a domain to simulate usersat variable distances from the same edge routers.Aggregate flows are created between nodes n1 Æn8,n2 Æ n9, n3 Æ n4, n5 Æ n6, and n7 Æ n9. Thus, flowsare of different RTTs, and experience bottlenecks atdifferent links. Not all flows start/stop transmissionat the same time: flows last from less than a secondto a few seconds. C2 Æ E4, E5 Æ C4 and C4 Æ E7 are the most congested links. Figure 9(b) shows thetotal bandwidth gain for this topology with different conditioners. From the figure, the adaptiveconditioner performs better than the standard one,and achieves performance close to the maximumcapacity. We also analyzed the bandwidth gain ofeach individual aggregate flow. The flows achievesimilar bandwidth gains except for flows withextremely short RTTs. Thus, the adaptive condi-tioner improves fairness between short and longRTT flows, without requiring large state tables.

When each aggregate flow contains 200 micro-flows, the soft state table for the adaptive condi-tioner covers only a small percentage (4.16%) ofthe flows passing through it. We use a table for the

50 most recent micro-flows. Table 1 shows that the bandwidth achieved with the adaptive condi-tioner always outperforms standard conditioner.Note that when critical TCP packets are protected,they are charged from the user profile to ensurethat UDP traffic is not adversely affected.

We also study performance with Telnet (delay-sensitive) and WWW (response time sensitive)applications. For the Telnet experiments, the per-formance metric used is the average packet delaytime for each Telnet packet. We use the topologyin Figure 8(b), but capacity of the C1 Æ E4 and E5Æ E7 links is changed to 0.5Mbps and all otherlink capacities are 1Mbps to introduce congestion.We simulate 100 Telnet sessions among hosts n1 Æn8, n2 Æ n9, n3 Æ n4, n5 Æ n6, and n7 Æ n9. Asession transfers between 10 and 35 TCPpackets. Results show that the adaptive condi-tioner reduces packet delay over the standard con-ditioner for short RTT flows.

Since web traffic constitutes most (60%–80%) ofthe Internet traffic, we study our traffic conditionerwith the WWW traffic model in ns-2.37 Details ofthe model are given in Reference 40. The modeluses HTTP 1.0 with TCP Reno. The servers areattached to n6, n8 and n9 in Figure 8(b), and n1, n2 and n5 are used as clients. A client can send a request to any server. Each client generates arequest for 5 pages with a variable number ofobjects (e.g., images) per page. The default ns-2 probability distribution parameters are usedto generate inter-session time, inter-page time,objects per page, inter-object time, and object size(in kB). The network set-up is the same as withTelnet traffic. Table 2 shows the average response



Micro- Standard Adaptive Max Adaptive (% flowsflows bandwidth bandwidth bandwidth covered at E4)

10 12.65 12.87 12.87 41.1650 12.18 13.84 14.20 16.66100 11.67 13.48 14.89 8.33200 11.77 13.61 14.91 4.16

Table 1. Bandwidth shown is in Mbps. State table size = 50 micro-flows

Conditioner Avg response time Std Avg response time Std(sec), first pkt dev (sec), all pkts dev

Standard 0.48 0.17 2.23 0.78Adaptive 0.45 0.14 2.15 0.75

Table 2. Response time for WWW traffic. Number of concurrent sessions = 50

time per WWW request received by the client. Tworesponse times are shown in the table; one is to getthe first packet and another is to get all data. Thetable shows that our adaptive conditioner reducesresponse time over the standard traffic condi-tioner. The adaptive conditioner does not changethe response time significantly if the network isnot congested.

T he adaptive conditioner does not changethe response time significantly if the

network is not congested.

—7.3. Congestion Control—

We conduct experiments to demonstrate the roleof the congestion control mechanism in preventingcongestion collapse. Figure 8(a) depicts the simpletopology used to demonstrate congestion collapsedue to unresponsive flows. An aggregate TCP flowwith 10 micro-flows from host n1 Æ n3 and a UDPaggregate flow with 10 micro-flows from host n2Æ n4 are created. Both flows have the same targetrate (5Mbps). Figure 10 shows how TCP and UDPflows behave with respect to changing the bottle-neck bandwidth (btlnkbw) from 1–5Mbps. The x-axis denotes the btlnkbw and the y-axis gives thethroughput achieved by both flows. Figure 10(a)shows that the TCP flow gets its share of 5Mbpsall the time because it does not go through the

congested link. When the bottleneck bandwidth is 1Mbps, 4Mbps are wasted by UDP flows in the absence of the flow control. Figure 10(b) showsthat, with flow control, the TCP flow gets an extra8Mbps when btlnkbw is 1Mbps. The flow controlmechanism prevents congestion collapse due toundelivered packets.

We also experiment with varying the rate ratio,

for UDP traffic. A value for Rr of

0.5 means that the flow is sending at 50% of itsprofile and a value for Rr of 4 means the flow is sending at four times its profile. When the UDP sending rate is zero, TCP can use the entire10Mbps, and there is no shaping (shaping drop iszero) at the edge. When the UDP sending ratecauses drops at the bottleneck link (e.g., whenbtlnkbw = 1Mbps), congestion collapse occurs inthe absence of flow control. With flow control,even when Rr is 4 (the profile is 5Mbps and UDPis sending at 20Mbps), there is no congestion collapse.

A more complex topology with multipledomains (Figure 8(b)) and with cross traffic is alsoused to study the flow control framework. Anaggregate of TCP flows F1 between n1 Æ n8 iscreated, in addition to several UDP flows such asF2, Cr1, Cr2, and Cr3 between n2 Æ n9, n3 Æ n4,n5 Æ n6, and n7 Æ n10, respectively. These Crs areused as cross traffic. The start and the finish timesof the Cr flows are set differently to change theoverall traffic load over the path for the flows F1and F2. There are 10 micro-flows per aggregate in

RsendingRate

Profiler =



0

1

2

3

4

5

6

7

8

9

10

1 1.5 2 2.5 3 3.5 4 4.5 5

Thro

ughput

(Mbps)

Bottleneck (E4-n4) Bandwidth in Mbps

TCPUDPTotal

(a) No Flow Control

0

1

2

3

4

5

6

7

8

9

10

1 1.5 2 2.5 3 3.5 4 4.5 5

Thro

ughput

(Mbps)

Bottleneck (E4-n4) Bandwidth in Mbps

TCPUDPTotal

(b)With Flow Control

Figure 10. (a) Without flow control and the TCP gets only 5Mbps when bottleneck bandwidth is1Mbps of topology in Figure 8(a). (b) With flow control and now the TCP gets 8Mbps. Both flows have

the same profile

this set-up. Flows F1 and F2 have the same pro-file with target rate 5Mbps, and the cross trafficsending rate is 2Mbps.

Figure 11 illustrates the bandwidth of theseaggregate flows with and without flow control.The cross traffic achieves the same target in bothschemes, because the flows do not send more thantheir profiles and they do not encounter any bot-tleneck. If there is no flow control, F1 (TCP) cannotachieve its target of 5Mbps. With flow control, F1 obtains more than the target. This is because,after controlling UDP, TCP uses the remainingbandwidth.

8. Conclusions

We have investigated the design of edge routersthat include tomography-based edge-to-edge pro-bing methods to detect service level agreementviolations in QoS networks, together with TCP-aware conditioning and flow control for unre-sponsive flows. In addition to the primary goal of this design, which is to increase performance of compliant flows, the proposed mechanisms areuseful for network re-dimensioning, as well as fordetecting and controlling flooding-based attacks.We have designed methods that use edge-to-edgepacket stripes to infer loss for different drop precedences in a QoS network, based on observeddelays. Aggregate throughputs are then measuredto detect distributed denial of service attacks orflash crowds.

Marking, shaping, and policing are also adaptedto respond to detection results and adapt to flow

characteristics. We give priority to critical TCPpackets and mark according to flow characteris-tics. We use an adaptive conditioner that over-writes previous state information based on a leastrecently used strategy. Marking is based on infor-mation in packet headers if state information for aflow is unavailable. The adaptive conditioner isshown to improve FTP throughput, reduce packetdelay for Telnet, and response time for WWWtraffic. The conditioner also mitigates TCP RTTbias if it can deduce the flow RTT and RTO. Finally,we have designed a simple method to regulateunresponsive flows to prevent congestion collapsedue to undelivered packets.

Most of our mechanisms can be adapted to otherarchitectures that support service differentiation,or to active queue management techniques atnetwork routers. For example, the RED algorithmat network routers can itself protect critical TCPpackets, e.g., CWR marked packets, from dropwithout requiring any additional state. The adap-tive conditioner concept can also be employed tokeep some window size information and use thatin RED dropping decisions.

Acknowledgements

The authors would like to thank Srinivas R.Avasarala and Venkatesh Prabhakar for many discussions and ideas about the monitoring com-ponent. This research is supported in part by theNational Science Foundation grants ANI-0238294(CAREER), ANI-0219110, CCR-001712 and CCR-001788, CERIAS; an IBM SUR grant, the Purdue



0

1

2

3

4

5

6

7

0 10 20 30 40 50 60 70 80 90 100

Ban

dw

idth

(M

bp

s)

Time (s)

F1 (TCP)F2 (UDP)

Cr1Cr2Cr3

(a) No Flow Control

0

1

2

3

4

5

6

7

0 10 20 30 40 50 60 70 80 90 100

Ban

dw

idth

(M

bp

s)

Time (s)

F1 (TCP)F2 (UDP)

Cr1Cr2Cr3

(b) With Flow Control

Figure 11. Dynamic adjustment of F2 flow works fine in the presence of cross traffic. TCP flow (F1)gets more bandwidth with the flow control scheme

Research Foundation, and the SchlumbergerFoundation technical merit award.

References

1. Blake S, Black D, Carlson M, Davies E, Wang Z,Weiss W. An architecture for differentiated services.RFC 2475, December 1998.

2. Duffield NG, Lo Presti F, Paxson V, Towsley D. Inferring link loss using striped unicast probes. In Proc. IEEE INFOCOM, Alaska, April 2001.

3. Floyd S, Fall K. Promoting the use of end-to-endcongestion control in the Internet. IEEE/ACM Trans-actions on Networking 1999; 7(6):458–472.

4. Braden B, et al. Recommendations on queue man-agement and congestion avoidance in the internet.RFC 2309, April 1998.

5. Ott T, Lakshman T, Wong L. SRED: Stabilized RED.In Proc. IEEE INFOCOM, March 1999.

6. Wu H, Long K, Cheng S, Ma J. A Direct CongestionControl Scheme for Non-responsive Flow Control inDiff-Serv IP Networks. Internet Draft, draft-wuht-diffserv-dccs-00.txt, August 2000.

7. Jacobson V, Nichols K, Poduri K. An Expedited Forwarding PHB. RFC 2598, June 1999.

8. Heinanen J, Baker F, Weiss W, Wroclawski J. AssuredForwarding PHB Group. RFC 2597, June 1999.

9. Clark D, Fang W. Explicit allocation of best effortpacket delivery service. IEEE/ACM Transactions onNetworking 1998; 6(4):362–374.

10. Ibanez J, Nichols K. Preliminary simulation evalua-tion of an assured service. draft-ibanez-diffserv-assured-eval-00.txt, August 1998.

11. Seddigh N, Nandy B, Pieda P. Bandwidth assuranceissues for TCP flows in a differentiated servicesnetwork. In Proc. Globecom 99, 1999.

12. Lin W, Zheng R, Hou J. How to make Assured Ser-vices more Assured. In Proc. ICNP, October 1999.

13. Adams A, et al. The use of end-to-end multicast measurements for characterizing internal networkbehavior. IEEE Communications Magazine 2000;38(5):152–159.

14. Cáceres R, Duffield NG, Horowitz J, Towsley D.Multicast-based inference of network-internal losscharacteristics. IEEE Transactions on InformationTheory, November 1999; 45(7):2462–2480.

15. Dilman M, Raz D. Efficient reactive monitoring. InProc. IEEE INFOCOM, Alaska, April 2001.

16. Breitbart Y, Chan CY, Garofalakis M, Rastogi R. Silberschatz A. Efficiently monitoring bandwidthand latency in IP networks. In Proc. IEEE INFOCOM,Alaska, April 2001.

17. Chan MC, Lin YJ, Wang X. A scalable monitoringapproach for service level agreements validation. In

Proceedings of the International Conference on NetworkProtocols (ICNP), pages 37–48, November 2000.

18. Anderson D, Balakrishnan H, Kaashoek F, Morris R.Resilient overlay networks. In Proc. ACM Symp. onOperating Systems Principles (SOSP), Banff, Canada,October 2001.

19. Fang W, Seddigh N, Nandy B. A time slidingwindow three colour marker. RFC 2859, June 2000.

20. Feng WC, Kandlur D, Saha D, Shin KG. Under-standing and improving TCP performance over net-works with minimum rate guarantees. IEEE/ACMTransactions on Networking 1999; 7(2):173–186.

21. Yeom I, Reddy N. Realizing throughput guaranteesin a differentiated services network. In Proc. IEEEInt. Conf. on Multimedia Comp. and Systems, June1999.

22. Feroz A, Kalyanaraman S, Rao A. A TCP-friendlytraffic marker for IP differentiated services. In Proc.IEEE/IFIP Eighth International Workshop on Quality ofService—IWQoS, 2000.

23. Nandy B, Seddigh N, Pieda P, Ethridge J. Intelligenttraffic conditioners for assured forwarding baseddifferentiated services networks. In Proc. IFIP HighPerformance Networking, Paris, June 2000.

24. Mathis M, Semke J, Mahdavi J, Ott T. The macro-scopic behavior of the TCP congestion aviodancealgorithm. ACM SIGCOMM Computer Communica-tion Review 1997; 27(3):67–82.

25. Seddigh N, Nandy B, Pieda P. Study of TCP andUDP interaction for the AF PHB, 1999.

26. Albuquerque C, Vickers B, Suda T. Network borderpatrol. In Proc. IEEE INFOCOM, 2000.

27. Chow H, Leon-Garcia A. A feedback control extension to differentiated services. Internet Draft,draft-chow-diffserv-fbctrl-00.pdf, March 1999.

28. Mahajan M, Bellovin SM, Floyd S, Ioannidis J,Paxson V, Shenker S. Controlling high bandwidthaggregates in the network. ACM Computer Commu-nication Review 2002; 32(3):62–73.

29. Ferguson P, Senie D. Network ingress filtering:Defeating denial of service attacks which employ IPsource address spoofing agreements performancemonitoring. RFC 2827, May 2000.

30. Habib A, Fahmy S, Avasarala SR, Prabhakar V, Bhargava B. On detecting service violations andbandwidth theft in QoS network domains. ComputerCommunications 2003; 26(8):861–871.

31. Floyd S, Jacobson V. Random early detection gateways for congestion avoidance. IEEE/ACMTransactions on Networking, 1993; 1(4):397–413.ftp://ftp.ee.lbl.gov/papers/early.ps.gz.

32. Jung J, Krishnamurthy B, Rabinovich D. Flashcrowds and denial of service attacks: Characteriza-tion and implications for cdns and web sites. In Proc.



World Wide Web (WWW), Honolulu, Hawaii, May2002.

33. Ramakrishnan K, Floyd S. A proposal to add explicitcongestion notification (ECN) to IP. RFC2481,January 1999.

34. Fahmy S. New TCP Standards and Flavors, in Text:High Performance TCP/IP Networking, chapter 11,pages 264–280. Prentice Hall, October 2003.

35. Padhye J, Firoiu V, Towsley D, Kurose J. ModelingTCP throughput: a simple model and its empiricalvalidation. In Proc. ACM SIGCOMM ’98, 1998.

36. Habib A, Bhargava B, Fahmy S. A round trip timeand timeout aware traffic conditioner for differenti-ated services networks. In Proc. IEEE InternationalConference on Communication (ICC), New York, April2002.

37. McCanne S, Floyd S. Network simulator ns-2.http://www.isi.edu/nsnam/ns/, 1997.

38. Shallwani F, Ethridge J, Pieda P, Baines M. Diff-Serv implementation for ns. http://www7.nortel.com:8080/CTL/#software, 2000.

39. Habib A, Fahmy S, Bhargava B. Design and evaluation of an adaptive traffic conditioner in differentiated services networks. In Proc. IEEE International Conference on Computer Communicationand Networks (IC3N), Arizona, pages 90–95, October2001.

40. Feldmann A, Gilbert AC, Huang P, Willinger W.Dynamics of IP traffic: A study of the role of variability and the impact of control. In Proc. ACMSIGCOMM, pages 301–313, 1999. �



If you wish to order reprints for this or anyother articles in the International Journal ofNetwork Management, please see the SpecialReprint instructions inside the front cover.

Appendix A

—Percentages Used in Multi-PriorityLoss Inference—

Figure A1 depicts the drop probabilities in REDwith three drop precedences. The red traffic hashigher drop priority than yellow and green traffic.The red traffic is dropped with a probability Pred

when the average queue size lies between twothresholds Rmin and Rmax. All incoming red packetsare dropped when the average queue lengthexceeds Rmax. Pyellow and Pgreen are similar. SupposeaG(n) is the probability that an incoming greenpacket will be accepted by the queue given that n packets are in the queue. aY(n) and aR(n) aredefined similarly for yellow and red traffic respec-tively. The a values for green packets are definesas follows:

(A1)

The equations are similar for yellow and redtraffic. Let P ¢red be the percentage of packet dropsdue to active queue management for red packets,and let P ¢yellow and P ¢green be defined similarly foryellow and green, respectively. These percentagescan be computed as:

(A2)

(A3)

(A4)

where B is the buffer (queue) size at the router. Thepercentage of class k traffic accepted by an activequeue can be expressed as:

(A5)P Pk k= - ¢1

¢ =-

¥PG G

GPmax min

maxgreen green

¢ =-

¥ +-

¥PR Y

YP

G Y

Bmax min

max

max maxyellow yellow 100

¢ =-

¥ +-

¥PR R

RP

G R

Bmax min

max

max maxred red 100

a

a

a

G min

G max

Gmin

max min

n n G

n n G

n Pn G

G G

( ) = <( ) = >

( ) = --

-

1

0

1

if

if

otherwisegreen ,



1

dropP

redP

Rmin Rmax Ymin

yellowP

YmaxGmin GmaxAvg queue length

Pgreen

Figure A1. RED parameters for three drop precedences

cerias tech report 2005-112 monitoring and controlling … · monitoring and controlling qos...

Documents