Modern Computer Networks: An Open Source Approach Chapter 6
1
Chapter 6 Internet QoS
Problem Statement
The Internet, an IP-based network, has existed well many years. In order to
transmit data from one end to another end, TCP, the most popular protocol in the
world, is developed to resolve the problems about end-to-end congestion control
and data reliability as we talked in Chapter 4. However, there are more and more
new services provided on the Internet such as WWW, E-Commence, video
conference talked in Chapter 5. The quality of service provided by the reliable
end-to-end congestion control protocol over a network without resource
management is not enough to support these applications. A path with the specific
quality such as suitable bandwidth, low delay, and low jitter is necessary for the
new applications.
It is not news for network to support quality of service. The ATM network was
capable of supporting it 20 years ago. Hence, why we need to construct the QoS
network on Internet? Low-cost, simple and popular is the major reasons. Today,
almost all popular network services are based on TCP/IP service architecture. For
getting QoS, it is impossible and ineffective either changing all hardware and
software at hosts or cooperating with other network architecture by some specific
translators.
So, how about an IP-based QoS network? The architecture which is based
on IP network and able to provide the quality of services may be a good solution
currently. However, there is no QoS in the current Internet. Most routers only
support the best effort service for the applications. What is the best effort service?
The simple answer is that routers do their best without caring about what
applications can get. More specifically speaking, on the one hand users’ arrival
packets are inserted into the queue until the queue is overflow, and on the other
hand the server sends out packets from the queue continuously with maximal rate
until the queue is empty. As the server loading is not heavy, the performance of
the best effort service may be enough for most users. Unfortunately, the loading
turns heavy always due to the greedy of people.
Thus, a QoS-aware IP network is necessary for the maintenance of fairness.
According to the intuition and precedent of QoS design in ATM network, people
build a similar and ideal architecture, Integrated Service Architecture (IntServ). It
is expected to provide the service with accurate and guarantee quality for the
needed end users. Just as in ATM, the users need to build the path and reserve
Modern Computer Networks: An Open Source Approach Chapter 6
2
the resource before they get the quality of service. For all routers on the reserved
path not only need to know where packets should be send as they do now, but
also need to know which flow they belong to and when they should be sent out
exactly. In fact, it is too difficult and complex to implement IntServ in a larger
environment.
However, many algorithms of key components in IntServ are good and
mature. What we lack is to simplify the IntServ architecture by loosing some
accurate boundary of service, but still satisfying the requirement of most popular
applications, such as WWW, FTP. Differentiated Service Architecture (DiffServ)
was proposed with the concept. Building a simple and coarse network that only
provides differentiated class services for users in a large scaled area is the basic
objective of DiffServ. This architecture attempts to supply the end-to-end QoS via
different forward behavior per hops. And signing contracts is necessary before
getting the service. From some aspect, DiffServ indeed is a simple model.
However, in fact, the description in the standard of DiffServ is a little rough and
there exists many points needed to be concreted further. Anyway, at this time it
still has a high probability to be implemented on the Internet some day.
Although there is no any one large scale QoS IP network existed now, some
IP traffic control components have been provided in many OS system, such as the
traffic control (TC) modules of Linux system. Of course, a DiffServ experiment
environment consisted of these traffic control modules also could be built in this
system. In the chapter, we will introduce the two architectures as mentioned
above, IntServ and DiffServ. Along with the discussion of each section, the open
source code of TC in Linux is given for helping to understand how QoS is supplied
in a router on Internet more clearly.
6.1 Issues
In order to provide QoS in the network, many additional functions are
necessary for a traditional router. In general, they can be cataloged into six
components as shown in Figure 6.1. In the section, we will introduce their
concepts and capabilities respectively. Besides, from the discussion, you will also
get what are the difficult problems for designing them. The further detail
description of each component will be mentioned in following sections of the
chapter.
Signal Protocol
Signal Protocol, a common language used to talk with each router, is the first
Modern Computer Networks: An Open Source Approach Chapter 6
3
requirement in a QoS network, because quality of service is provided by
cooperating of all routers in a network. Several signal protocols are proposed for
various purposes. Among them, the most famous example is Resource
ReserVation Protocol (RSVP), which are used by the application to reserve
resource from the network. About RSVP you can get a further introduction in
Section 6.2.3. Besides, Common Open Policy Service (COPS) protocol currently
being examined by the IETF is another example. It is a simple query-response
protocol and used in the policy control system, which is one of the QoS
management architecture.
QoS Routing
If routing is regarded as placing road signs and guiding cards at the fork in
the road, QoS routing could be imaged as an advanced roadway system which
provide a more detail guide, such as the arrival time of goal or congestion
condition of roads. In current IP network, routers only provide some basic
QoS Aware Network Element
Admission Control
Classification Policing
Signal Protocol QoS Routing
Scheduling
Con
trol
P
lane
Data
Pla
ne
Figure 6.1: Six basic components consists of QoS aware network element.
Sender ReceiverA
D
BC
QoS -awareNEQoS Network
?
?
•Bandwidth Constraint•Packet Loss Constraint
Figure 6.2: A possible QoS routing architecture.
Modern Computer Networks: An Open Source Approach Chapter 6
4
information, such as the most less hop-counts path selection based on your
destination address. It is insufficient as we want to further integrate the interactive
media transmission into IP network. Besides the number of passed hops, in order
to provide users a specific level of QoS, the guarantee on bandwidth and delay
also is required for these applications. In fact, it is difficult and complex to offer the
capability mentioned above in a large network due to the necessary of
cooperating among all routers, especially as the suitable path of one application is
needed to decide dynamically. A more detail introduction and discussion including
the catalog and possible architecture of QoS routing will be given in Section 6.2.4.
Admission Control
Even after the building of the advanced roadway system, the road may be still
congested and the only difference is that you can get the arrival time. Thus, we
need to further control the number and type of cars droved on the road. Admission
control component is responsible for this job. According to the type of network
architecture, its controlled targets are different that either just a single link or an
area network. In general, because the information and quality of road is provided
by QoS routing, the common styles of resource controlled by admission control is
not beyond the scope, such as bandwidth and delay. It seem a simple job,
assumed that the current conditional of network is got. However, the thought is
wrong. In fact, the admitted decision is not applied on a single packet. In other
words, an admission may present the agreement of all well-behaved packets
belong to one traffic source, where “well-behaved” may imply some statistical
characteristics. Figure 6.3 shows a simple example. There is a requirement with a
3MB/sec bandwidth constraint arrived to the router A. Then, the router A decides
whether to accept the request based on the variable bandwidth usage. The
difficult is how to estimate correct resource usage to reserve enough resource for
Figure 6.3: A simple operating example of the admission control component.
A
A path with 3 MB/s is requirement
10MBMax Support BW
Current BW
BW
TIME
10MBMax Support BW
Current BW
BW
TIME
•Bandwidth Constraint•Packet Loss Constraint
Modern Computer Networks: An Open Source Approach Chapter 6
5
successful transmitting of each admitted traffic source, while the total network
resource is keeping on a high utilization.
Packet Classification
After the road is selected or reserved, which is supporting by the above three
components, cars are going now. The road system needs a component to identify
the difference of cars in order to billing or managing the traffic. For example, we
need to find out some very old cars, forbidding them entering the highway system.
Or need to know the owner of a car to charge the passage expense. As you can
imaged, there exists many various rules for classifying packets and it may be
experienced several verification of rules to classify one packet to a particular class.
Though the job is heavy, fast is the first necessary. Thus, how to classify packets
quickly with many various rules becomes the major issue of the component.
In IntServ, the component is applied to identify which traffic source a packet
belongs to according to the values of 5-fixed fields in the packet. A further
description can be got in Section 6.2.4. In DiffServ, it plays a more important and
complex role, which is expected to provide a multi-fields range-matching capability.
By reading Section 6.3.5, you can understand its difficulty and possible
resolutions.
Policing
There always exists some cars exceeded the speed limit on the road, which
brings the dangerous of other cars. It also happened in the network, so we need a
policing component to monitor the traffic of network. If the arrival rate of some
traffic source is exceeding the allocated rate, the policing component need to
either mark, drop or delay them. However, in most case the policing threshold is
not always a so exact value, just as the estimated resource usage in the
admission control component. Some level variation is tolerated. The most popular
theory, leaky bucket theory, also called token bucket theory is a example. It gives
the policed traffic a mean rate limit, while permitting it to send with a burst rate in a
time period of particular length. Besides policing, in fact, the token bucket theory
also is used to describe the traffic source model due to it also can be regards as a
traffic regulator. Obviously, it is applied widely, so we will introduce it in Section
6.2.2. Traffic Description Model.
Scheduling
The scheduling is the most major and classical component of QoS network.
Its general goal is to enforce the resource sharing between different users with
Modern Computer Networks: An Open Source Approach Chapter 6
6
some predefined rules or ratios. There are various algorithms proposed to reach
the specific purposes. Some methods are very simple and primitive like FIFO and
some are complex and ingenious enough to provide a good resource fair sharing
guarantee. Regardless of the type of scheduling, as shown in Figure 6.4, a
scheduling basically should offers two default functions, enqueue and dequeue,
that are used to handle a new arrival packet and assign the next packet going
form output driver. Following, as looking the inside of the black box, we can further
catalog its job into the buffer management in one queue and the resource sharing
among multiple queues. Obviously, all processes are concerning the queuing
management, scheduling is also called Queuing Discipline.
As mentioned above, there are many designs in the black box for different
purposes. In IntServ, the architecture with an exquisite selector and multiple
queues, called as fair queuing discipline, is in common use, because it attempts to
provide users a good isolation from others and an exact bandwidth guarantee. We
will talk the class scheduling in Section 6.2.6. In DiffServ, multiple styles of
scheduling algorithms are required respectively in different functions. The
single-queue style scheduling is introduced in Section 6.3.6. Such style
scheduling also called as buffer management intents to supply some differential
degrees of services with simple architectures. Although the quality provided by
them is limited, they are easier to be implemented in routers than fair queuing
disciplines in general.
Open Source Implementation 6.1: Traffic Control Elements in Linux
DequeueScheduling Black PipeEnqueue
arrival packets
departure packets
Single FIFO Queue
FIFO Queue
FIFO Queue
FIFO Queue
SC
FIFO Queue
FIFO Queue
FIFO Queue
SC
Figure 6.4: The concept and possible architectures of scheduling.
Modern Computer Networks: An Open Source Approach Chapter 6
7
Linux kernels provide a wide variety of traffic control functions. Users are
expected to use these functions to construct an IntServ-aware router or a DiffServ
aware router or any other QoS aware router. The relationship between TC and
other router functions are given in Figure 6.5. As shown in figure, TC attemps to
replace the original role of Output Queuing with a series of control elements. Those
elements coded in Linux consists of the following three major conceptual elements:
filters
queuing disciplines
classes
The filters play the role of classifications and classify packets based on some
particular rules or fields in the packet header. Their source codes are named
beginning with “cls_” and put in the directory /usr/src/linux/sched. For
example, the element implemented in the file “cls_rsvp.c” provides the
capability of flow identification required in an IntServ router.
The queuing disciplines attempt to support two basic functions, enqueue and
dequeue. The function enqueue decides whether to drop or queue the packets
while the function dequeue determines the transmitted order of packets or simply
delays some packets to send out. A simple queuing discipline may be a FIFO
which queues the arrived packets until the queue is full and sends out the packet
from the head of queue continuously. However, some queuing disciplines are more
complex, such as the elements CBQ implemented in “sch_cbq.c”, where packets
are further classified into several classes. In the case, the queuing discipline may
include several classes and each class owns the self queuing discipline. The
source codes of queuing disciplines or classes are put in the directory as same as
the filters, but their filename are begun with “sch_”.
Modern Computer Networks: An Open Source Approach Chapter 6
8
The bottom of Figure 6.5 provides a possible combination of these control
elements mentioned above. As shown in the figure, the combination is various and
free, where a queuing discipline may consist of multiple classes and multiple filters
may guide packets into the same class. Users can design the structure according
to their necessary from user plane by the Perl script. In the following open source
implementation blocks, we will introduce several TC elements in detail which are
related to the text discussed in the chapter. A more detail description about the
traffic control in Linux could be found in [WA99]
6.2 Integrated Service
Before beginning to discuss IntServ, the definiation and concept of flow must
be described first. A flow is the basic manageable unit in IntServ and flow isolation
is an important capability of IntServ. In other words, each flow in IntServ should
own the resource which is allocated based on itself particular requirement and
independent of other flows.
In general, the creation and resource reservation of a flow is applied by the
application and negotiated with all network elements on the path. Following, we
will introduce the internal operating processes of the IntServ router in Section
6.2.1. In Section 6.2.2, we would look the services provided in IntServ. From
Figure 6.5: The simple combination of TC elements in Linux
Input Device
IP Forwarding
Upper Layers Process
Output Queuing
Output Device
Traffic Control
Output Device
Without QoS
Filter Class Queuing Discipline
Filter ClassPolicing Queuing Discipline
Queuing Discipline
Modern Computer Networks: An Open Source Approach Chapter 6
9
Section 6.2.3, we begin to give the detail introduction for each key component.
The last is a simple summary about IntServ.
6.2.1 Basic Concept
The subsection would give you an introduction about the general operating
processes of IntServ. We would talk about the reservation request from the
viewpoint of an application first and then describe how to handle the request for
the router in IntServ.
The Trip of a Resource Resevation Request
A QoS request is asked from an application. For an application, in order to
get the resource in the IntServ domain, it needs to decide the service type and the
value of quality parameter first according to its traffic type and requirement. And
then, the request is sent out to the nearest IntServ router as described in Figure
6.6. The router will decide whether accept the requirement based on its status;
forwarding the request to the next router. Assumed that the request is accepted by
all routers on the path, which means the resource reservation process is finished,
and then the application can begin to send out data packets with a quality of
service guaranteed. Generally speaking, the common resource reservation
protocol in IntServ is RSVP For more detail description about the handling of
resource reservation request in IntServ will be introduced in Section 6.2.3.
The Request Response in IntServ Routers
Once received a resource reserved request, the IntServ router will pass it to
signal processing component, which is corresponding to the signal protocol
component shown in Figure 6.1. According to the negotiated result with the
admission control component, the component attemps to update the signal packet
Figure 6.6: The operating processes of the viewpoint from an application.
A
BC
QoS-Aware Router
IntServ Domain
Reservation Request
Application
Server
Accept?Y->Forward the Request
N ->Reject the Request
Modern Computer Networks: An Open Source Approach Chapter 6
10
and forward it to next router. In other words, the signal processing component
only is a “transcriber” and the decision is controlled by the admission control
component. The admission control component attempts to dynamically manage
and allocate the resource of the output link for the required application. If further
looking the inside of admission control, we can divide its functions into two parts.
One is to gather and maintain the current usage of the output link and the other is
to decide whether the resource is enough to satisfy the requirement of the new
request. Besides the two components, there still exists another one in the control
plane of the IntServ router without mentioning here which is QoS routing. In fact,
due to the IntServ intents to employ admission control to manage the link
bandwidth allocation, the QoS routing component is not emphasized here.
However, it still can be used in the creation of virtual static path with QoS
guarantee. The existence of such path is helpful for reducing complexity and time
of resource reservation.
The Request Enforcement in IntServ Routers
Once the path has been created successfully, the data packets are begun
transmitting on it. The routers on the path should guarantee that the treatment got
by the application does conform with their previous requirement or make their
packets transmitting in a light traffic like path. Three basic components in the data
plane of the router enforce these promises as shown in Figure 6.7. The data
packet entrance of the router is the flow identification component. It attempts to
identify whether a packet belongs to some one reserved flow or nothing according
to the five fixed fields of the packet. For those packets belonging to the particular
reserved flow, they will be inserted into to the corresponding flow queue. Basically,
in the IntServ architecture, each reserved flow has self individual queue. As
regards packets without belonging to any reserved flow, they will be classified to
the best effort traffic; being inserted into a signal FIFO queue in most cases. In a
better situation, the packets would be placed into some improved version FIFO
queue. Anyway, it is necessary to reserve some portion of resource for the best
effort traffic in order to avoid starving them.
Modern Computer Networks: An Open Source Approach Chapter 6
11
After the packet enters the individual flow queue, the next component
discussed is policing. It monitors the incoming rate of traffic sources to determine
whether they does confirm the behavior addressed previous. Those out of
packets may be dropped or delayed until they conforms the agreed behavior.
Sometimes, the policing component is ignored and its existence depends on the
necessary of the absolute rate control. Take an example to further explain the
demand. For the schedulers which attemp to share the residual bandwidth for
other flows innately, the component is necessary if there is an upper bound limited
of the bandwidth got by one flow.
Following, the scheduler selects the next delivery packet among all policed
head packets of the flow queues according to their bandwidth reserved
requirement. The common goal of the scheduler in IntServ is doing his best to
reduce the worst latency among all packets and the different treatment among all
reserved flows. The role of the scheduler is important in IntServ, because it is the
key to provide the guaranteed service with some critical end-to-end delay bound
and the characteristic of flow isolation. More detail explanation about the
capability will be described in Section 6.2.5. The packets selected by the
scheduler are sent to the output device. In most case, the output device does not
queue the packets anymore, because the rate should must smaller than or equal
to the output physical rate.
6.2.2 Service Type
In the current Internet, most routers only provide the best effort service which
is without any quality control. Obviously, it is difficult to satisfy the requirement of
smooth media transmitting, so two additional service types are defined in the
IntServ specification that are guaranteed service and control-load service. In the
section we will introduce the two service types. However, before discussing them,
let us look how to describe source traffic in IntServ, because the applications has
the responsibility to describe their traffic behavior first to help routers to allocate
and guarantee resource.
Data Plane of IntServ Router
FlowIdentification
Src IP Dest. IPDest PortSrc Port
Protocol ID
Flow Queue
Fq1
Fqn
Fq2
Best Effort Q
Scheduler
Policing
Figure 6.7: The internal operating procedures of the data plane in an
Modern Computer Networks: An Open Source Approach Chapter 6
12
Traffic Description Model
In IntServ, the traffic is used to be described by a leaky bucket model. The
leaky bucket traffic model is an important concept and applied to many different
machines. As located at the traffic source, it could be a rate regulator to control
the packet sending rate. At router, it could make a police to monitor the income
traffic behavior of flows and shape it to the description in TSpec. Following we will
introduce the leaky bucket model by the style of the source traffic regulator.
As show in Figure 6.8, there is a leaky bucket where it can accumulate the
water with a limit of volume B. Above it, a water stream fills the bucket with a fixed
rate r. You can image that the packet is the person and water is necessary for
people. The basic principle is that enough water is necessary for permitting a
packet to pass through. The amount of water required for passing a packet is the
same as the length of the packet. If the water is insufficient, the packet would be
blocked in the queue and for the new arrival packets, they may be even dropped
as its flow queue is full. On the other hand, if there are no packets in the queue,
water may be accumulated in the bucket where the accumulated amount is no
more than the volume of bucket, B. As the new packets arrive, water is beginning
to be consumed until the bucket is empty. According to the principle described
above, the packet rate permitted by the leaky bucket must be bounded by a linear
function. A (t) = r * t + b.
Why the traffic in IntServ is described with leaky bucket model? The major
reason is that it is not an exact traffic model and only bound the traffic rate in a
Token Bucket
Peak Rate p
Token Stream with
average rate r
Bucket depth b
Flow Queue
Rate Permitted PacketsIncoming PacketsDROP?
Token Bucket
Peak Rate p
Token Stream with
average rate r
Bucket depth b
Flow Queue
Rate Permitted PacketsIncoming PacketsDROP?
Figure 6.8: The operation architecture of a leaky bucket.
Modern Computer Networks: An Open Source Approach Chapter 6
13
region which is more resilience and suitable to be applied at many traffic sources.
Open Source Implementation 6.1
The token bucket algorithm is used widely for policing or shaping traffic. You can
find its trace in the code of police.c or sch_tbf.c. Below we would look the code
in sch_tbf.c to introduce the implementation of token bucket in linux. A basic
parameter and variable used by token bucket is define as
struct tbf_sched_data
{
/* Parameters */
u32 limit; /* Maximal length of backlog: bytes */
u32 buffer; /* Token bucket depth/rate: MUST BE >= MTU/B */
u32 mtu;
u32 max_size;
struct qdisc_rate_table *R_tab;
struct qdisc_rate_table *P_tab;
/* Variables */
long tokens; /* Current number of B tokens */
long ptokens; /* Current number of P tokens */
psched_time_t t_c; /* Time check-point */
struct timer_list wd_timer; /* Watchdog timer */
};
euque a packet from sk_buff
skb->len > q->max_size
kfree_skb(skb)
skb_queue_tail
Figure 6.9: The flowchart of the function enqueue() in code scf_tbf.c
Modern Computer Networks: An Open Source Approach Chapter 6
14
Related to the introduction in the subsection, the token bucket (R,B,b) is
projected to the tbf_sched_data (R_tab,buffer,token). In the structure
tbf_sched_data, the basic calculated unit of the token bucket size is based on
time. In other words, under the transmitted rate is rate R, the buffer parameter
presents the maximum transmitted time bound and token present the admitted
transmitted time currently.
According to the description mentioned before, the packet is admitted to be
sent out with peak rate as the token size in bucket is larger than packet size. In
the original thought the peak rate is the line rate, the max sending rate of a
device; however, sometimes user may set the self peak rate. Thus, TC used
thesecond set token bucket (P_tab, mtu, ptokens) to support the case. The two
token bucket architecture able to make the traffic passed through it would
confirm the requirement with mean rate R and max burst time buffer with peak
rate P.
The flowchart of the function enque() of TBF is list in Figure 6.9. It is easy
to understand that packets are admitted to be inserted into the queue if the
packet size is not larger than the max size MTU. The function dequeue() are
list in Figure 6.10.
is there peak rate setting?
get one packet from skb queue
calculate the interval between last query and current
calculate the current tx time ptoks with peak rate
both ptoks and toks >=0
reinsert into the head of the skb queue
calculate the current tx time toks with avg rate
admitted the packet to be tx and update the value of tbf_sched
Figure 6.10: The flowchart of the function dequeue() of source code
sch_tbf.c
Modern Computer Networks: An Open Source Approach Chapter 6
15
Guaranteed Service
Guaranteed service is a service which can provide applications to delivery
on a path with the guarantee of available bandwidth and end-to-end worst delay
bound. What is the end-to-end worst delay bound guarantee? It menas that for all
packets transmitting on this path, the total time of transmission must be smaller
than the required bound if the sender sends the packet under the reserved
constraint. The characteristic is very important for interactive, real-time media
transmission, because it is the base for providing the guarantee of the low delay
jitter which directly affects the processing of the media reproduction at the
receiver.
Based on the description in its RFC, A guaranteed service is invoked by a
sender specifying the flow’s traffic parameters and the receiver subsequently
requesting a desired service level. The former named Traffic Specification (TSpec)
includes the information about the traffic behavior that will be injected into the
network, and the latter called Reservation Specification (Rspec) describes the
resource requirement of the receiver. TSpec is composed of five traffic description
parameters that originates from the leaky bucket traffic model. The RSpec are
consisted of two parameters, a data rate R and a slack term S. On the most case
in order to get an error-free service, the requirement described in RSpec is larger
then that in TSpec. You can find the more detail introduction about guaranteed
service in RFC 2212.
Control-Load Service
Compared to the clear definition of the guaranteed service, the control-load
service is blurred. The definition of control-load service in IntServ is providing a
path where the transmission behavior on it is like to through a low-utilization link.
What is a low-utilization link? It is not spoken very clear in the RFC. The first
NoneRFC 2211RFC 2212RFC
None•Emulate a lightly loaded network for AP
•Guaranteed BW
•E2E* Delay Bound
Provide QoS
NoneTSpecRSpec >TSpecParameters
Best EffortControl LoadGuaranteedService Types
NoneRFC 2211RFC 2212RFC
None•Emulate a lightly loaded network for AP
•Guaranteed BW
•E2E* Delay Bound
Provide QoS
NoneTSpecRSpec >TSpecParameters
Best EffortControl LoadGuaranteedService Types
*) “E2E” implies End-to-End
Table 6.1: The service types provided in the IntServ
Modern Computer Networks: An Open Source Approach Chapter 6
16
explained direction in RFC is
1. A very high percentage of transmitted packets will be successfully
delivered by the network to the receivers.
2. The transit queuing delay experienced by a very high percentage of
delivered packets will not greatly exceed the minimum delay.
The other explained direction is
1. Little or no average packet queuing delay over all time scales significantly
larger than the burst time.
2. Little or no congestion losses over all time scales significantly larger than
the burst time.
You can say the control-load service is only a best than best effort service.
Although the service does not seems as good as the guaranteed service, it is
suitable for some non-critical services due to its cheap. Related to the low
utilization of the guaranteed service caused by offering the absolute satisfied
service, the control-load service takes a resource sharing concept so as to
increase the resource utilization and reduce the cost for each user.
Since it just emulates a light-traffic link, no more specific guarantee on
bandwidth and delay bound are provided. The application which wants to invoke
the service, TSpec that describes the traffic behavior is the only necessary
requirement specification and RSpec is no more required. According to the TSpec,
the router decides whether the resource it owned is enough to let the packets of
the flow feel they are passing through a low-loss low-congestion link. About the
more detail you can got in RFC 2211.
6.2.3 Resource Reservation Protocol (RSVP)
As we talked in Section 6.2.1, an application in IntServ needs to invoke a
reservation process to build a path before it begins to send out data packets. Thus,
a common resource reservation protocol is necessary for the application and
routers. The RSVP is developed for this purpose by the IETF. However, the
design of RSVP is general and not limited on specific QoS architecture. In other
word, RSVP does not define the internal parameters related to traffic controlling
directly. All detail formats of parameters like the resource reservation information
are packaged in the object called as “opaque”. RSVP simply plays a role of a
signal communication protocol and is responsible for the delivery of these
messages.
Based on the principle of the general purpose , RSVP also avoids the difficult
routing problem. It only queries the routing table and gets the next hop to forward
Modern Computer Networks: An Open Source Approach Chapter 6
17
the control message. Since the path building messages are routing by the routing
table, it is not the business of RSVP to decide whether the resource of the path is
enough to satisfy the application’s requirement. In fact, that is handled by the
admission control or QoS routing component.
According to the description in the RFC xx, the resource reserved request for
creating a path, called as “RESV”, is receiver oriented. In other words, the
receiver will gather the TSpec from the “PATH” message of the sender and setup
itself RSpec and then send it with the RESV message to reserve the resource.
Because the reservation is simplex, for some interactive applications like video
conference or VoIP, the twice path reservations are invoked from two head
respectively. After receiving the RESV message, sender can begin to transmit
packets along the reserved path. A simple traffic flow of message model is given
in Figure 6.11.
If you are family to ATM network, you maybe find the style of the reservation
resource in RSVP is very similar to the SVC building. Indeed, it does, but in order
to adapt the Internet dynamic network topologies, RSVP take the soft state
approach to maintain the reservation status in the routers along the path. The
reservation in each router has a timeout limited and would be automatically
deleted once the timer is expired. The advantage is that once the path is changed
by the low level routing component, the old path would cancel naturally. However,
a periodical refresh message is necessary which bring the network some
additional burden.
6.2.4 QoS Routing
For each resource reservation request, the IntServ network needs to decide
receiver sender
IntServ Domain
R
R R
R
R RSVP aware Router
RSVP PATH
RSVP PATH
RSVP RESVRSVP RESV
Control Plane Packet
Data Plane Packet
Figure 6.11: Traffic flow of the RSVP messages
Modern Computer Networks: An Open Source Approach Chapter 6
18
which the path is just satisfied the application’s requirement. The current IP
routing protocols use a simple metric such as hop count or delay to calculate the
shortest path for the connection. The information is not enough to handle the
complex QoS requirement. For example, the path with the most less hop count is
not sure to be the path with the enough bandwidth. Thus, IntServ need a routing
protocol with QoS consideration to get the information about the network resource
usage and supply a suitable path for the application’s request.
We first look at the contents of a QoS problem. A basic routing problem is
consisted of a request target and behavior. The targets of the requirement are
almost focused on delay, cost and bandwidth. Bandwidth requirement must be
guaranteed by each link in the path, so it belongs to a link routing problem. Delay
or cost is accumulated by all links in the path, so it belongs to a path routing
problem. On the other hand, for the same target their request behaviors maybe
different request behavior. Some requirements are greedy, which is an optimal
routing problem and some are critical. The former belongs to an optimal routing
problem and the latter, which is a constraint routing problem. Table 6.2 gives a
good collection about the basic routing problem. All the basic routing problems are
Table 6.2: The basic routing problems
Polynomical Complexity
Delay-constrained routing
BW-constrained routing
Bandwidth
Delay, Cost
Apply to
Least-cost routing
BW-optimization routing
ExampleBasic Routuing Problems
Path Constrained
Path Optimization
Link Constrained
Link Optimization
PO
PC
LC
LO
Polynomical Complexity
Delay-constrained routing
BW-constrained routing
Bandwidth
Delay, Cost
Apply to
Least-cost routing
BW-optimization routing
ExampleBasic Routuing Problems
Path Constrained
Path Optimization
Link Constrained
Link Optimization
PO
PC
LC
LO
Modern Computer Networks: An Open Source Approach Chapter 6
19
only polynomial complexity, but sometimes a QoS routing problem maybe multiple
requirements, e.g. a path with 1 MB link bandwidth constraint and the delay is
smaller than 20ms. The complexity of those composite routing problems are large,
some even are a NP completed problem as list in Table 6.3.
In order to find the path satisfied the complex requirement, many routing
architectures are designed currently. However there is no a standard to specific
how to QoS routing. According to the regional characteristic, we can class the
architectures to three classes: local architecture, global architecture and
hierarchical architecture as shown in Table 6.5. In the local architecture, each
router maintains its up-to-date local state and a routing is done on a hop-by-hop
basis. The architecture is the extreme example of distribution decision computing
and the major issue is how to decide the path quickly. The second is the
centralized example. In the case every router is able to maintain global state by
QoS Router Major Issues
Hierarchical Routing:
Scalability, but there is a significant negative impact on QoS routing
Aggregated Global State:Containing detail state info. About the nodes in the same group and aggregate state info. About the other groups
Hier –archical
Source Routing:
A feasible path is locally computed at the source node
Global State: Every node is able to maintain global state by exahanging the local state of every node.
Global
Distributed Routing:
Routing is done on a hop-by-hop basis
Local State:Each Node maintain its up-to-date local state
Local
Routing StrategiesMaintenance of State Information
QoS Router Major Issues
Hierarchical Routing:
Scalability, but there is a significant negative impact on QoS routing
Aggregated Global State:Containing detail state info. About the nodes in the same group and aggregate state info. About the other groups
Hier –archical
Source Routing:
A feasible path is locally computed at the source node
Global State: Every node is able to maintain global state by exahanging the local state of every node.
Global
Distributed Routing:
Routing is done on a hop-by-hop basis
Local State:Each Node maintain its up-to-date local state
Local
Routing StrategiesMaintenance of State Information
Table 6.5: The major issues of QoS routing
Table 6.3: The composite routing problems
Delay-constrained least-cost routingNPPC + PO
Delay-delayjitter- constrainted routingNPPC + PC
Delay-constrainted BW-optimization routingPNPC + LO
BW-Delay-constrained routingPNLC + PC
ExampleComp-lexity
Composite Routing
Delay-constrained least-cost routingNPPC + PO
Delay-delayjitter- constrainted routingNPPC + PC
Delay-constrainted BW-optimization routingPNPC + LO
BW-Delay-constrained routingPNLC + PC
ExampleComp-lexity
Composite Routing
Modern Computer Networks: An Open Source Approach Chapter 6
20
exchanging the local state of every node. A feasible path is locally computed at
the source node where the path decision is easier than that in the first architecture.
However the addition message exchange protocol is required for this architecture
and it is difficult and non-scalable for each node to keep the up-to-date state
information of all nodes. Thus, the third, a mixed architecture, is proposition. Each
node contains detail state information about the nodes in the same group and
aggregated state information about the other group. It is scalability, but there is a
significant negative impact on QoS routing.
QoS routing is a difficult problem and not be implemented in any one
products currently. Large amount of dynamic QoS routing request from the
applications directly is not easy to handle, so currently the QoS routing algorithm
is used to allocate some virtual link with resource guaranteed for the routers. And,
the admission control component manages the resource and dynamically decides
whether to allocate for the required application.
6.2.5 Admission Control
After an application send its QoS request to the router, the admission control
component need to decide whether it can accept this request. The admission is
based on the current resource usage of the output link and the requirement of
application. How to getting the information about link resource usage efficiently is
the first major issue. Besides, deciding whether the residual resource is enough to
satisfy a user’s request without loss the utilization of resource is the other
important issue.
Usually, we catalog the approaches to two classes. One is the statistical
based and the other is measured based.
Statistical Based
In the early years, most quality of service is required on multimedia data
transmitting. Most the behavior of these traffic sources is specific. In other word,
they are easy to be described by some mathematical model. Thus, for the
statistical based approach, the requirement is consisted of several parameters
and the router simply calculates the answer of the specific traffic function with the
value from the request to decide whether it can accept it. However it is not always
truth that the traffic could easy be well modeled.
Besides, the real difficult of this class algorithm is how to define the traffic
amount function under the consideration of the tradeoff between the utilization
and loss probability. For example, we could describe the traffic by two parameters,
peak rate and average rate. Assume the traffic model is an on-off model that
Modern Computer Networks: An Open Source Approach Chapter 6
21
either transmits at its peak rate or is idle. The function of traffic amount could
simply be the summation of the peak rate of all connections. If the calculated
result of the traffic function is under the maximum constraint after adding the peak
rate of the new request, the request is accepted. Otherwise is rejected. The
algorithm supply a one hundred guaranteed allocation for the application, but it
causes the low resource utilization. How about if we use the summation of
average rate as the function of the amount traffic? It is expected that there are
50% probability for a user to meet congestion and catch a delay or drop.
Measured Based
Since it is hard to get a suitable traffic amount function, some people start to
measurement the current resource usage directly. In order to get a respective
measured value and avoid a suddenly burst value, some people calculate the new
usage estimation by exponential averaging over consecutive measurement
estimation. It seems as the following equation, newoldnew MeasuredwEstimationwEstimation ×+×−= )1( .
The variable w provides the admission control component to decide the affect
ratio of past status. The smaller w make the history is forgot easily, which means
that the algorithm is more aggressive and the link is under a high utilization status,
however, the user may usually not get the treatment as they requirement. For
example, the admission control may accept a request if the current measurement
estimation is just lower than some constraint. However, it is possible that the
estimation becomes back to the original high value at next time. Then, the
acceptation would affect the treatment got by all originally reserved flows.
The other measurement approach is time windows. The estimation is getting
over several consecutive measurement intervals. It seem like
)...3,2,1max( CnCCCateEstimatedR =
where Ci is the average rate measured on a sample period. Along the value n
increased, the algorithm is more conservative.
Open Source Implementation 6.3: Traffic Estimator
TC provides a simple traffic estimated module for estimating the current sending
bit and packet rate. You can find the module in file “estimator.c” and there
are three functions in the module. Function qdisk_new_estimator()
handles the creation of new estimator and function
qdisk_kill_estimator() deletes the estimator of no use. est_timer() is
invoked by system once the setting time is up where time interval can be (1<<
interval) seconds for variable interval > 0. In function est_timer(), a
Modern Computer Networks: An Open Source Approach Chapter 6
22
sending rate is calculated and EWMA is got by the following code:
6.2.6 Flow Identification
In IntServ, each flow owns self resource reserved requirement, so flow
identification is necessary to examine each packet and decide whether the packet
belong to one flow or no any flow. A resource reserved table which contains the
relation between flow number and flow identification is necessary for helping
identifying the packet. The identifier of a flow in IntServ is composed of five fields
in the TCP packet. Those are the source IP address, and port, destination IP
address and port, and protocol ID. The length of the identifier is 104 bits which is
larger than the 32 bits, so we need an effective data structure to storage the table
and process the identification. On the other hand, because the identification is
processed for every packet, the identified speed also is an important issue.
Identification is a classical data search problem. There are many kind data
structure to storage the table, however all algorithm are tradeoff between speed
and space. A simple structure for storage the information is the binary tree. The
storage space requirement is small, but multiple memory access operations are
necessary for identifying a packet. The other extreme is direct memory mapping,
but it does not fit in the space requirement. In order to get the balance between
the space and speed requirement, it is a common and popular thought for using a
hash structure. However, if we further study the hash structure, we will find there
are many uncertain reasons to affect the performance of flow identification in the
hash table. Thus, in fact, the better solution is the advanced tree structure like
xxxyyy. The advanced introduction and example about the structure will be
nbytes = st->bytes;
npackets = st->packets;
rate = (nbytes - e->last_bytes)<<(7 - idx);
e->last_bytes = nbytes;
e->avbps += ((long)rate - (long)e->avbps) >> e->ewma_log;
st->bps = (e->avbps+0xF)>>5;
rate = (npackets - e->last_packets)<<(12 - idx);
e->last_packets = npackets;
e->avpps += ((long)rate - (long)e->avpps) >> e->ewma_log;
e->stats->pps = (e->avpps+0x1FF)>>10;
Figure 6.12: a portion of code in function est_timer() of estimator.c
Modern Computer Networks: An Open Source Approach Chapter 6
23
described in the 6.3.x, because the flow identification in IntServ is a small partial
set of packet classification in DiffServ.
Open Source Implementation 6.2: Flow Identification
According to the definition of IntServ, a flow is identified by five fields. In TC of
Linux, the flow identification is implemented by the double level hash structure shown
in Figure 6.13. The first level hash is keyed by the destination address, protocol ID,
and tunnel ID and its hash result could address the rsvp session which a packet
belong to. In RSVP, a session is identified by the destination address, port and
protocol ID. Following, by using the second level hash that is owned by the rsvp
session and keyed on source address and port, the flow where a packet belongs is
identified. The major function to support flow identification is rsvp_classify() and
its flowchart is shown in the left part of Figure 6.14.The flowchart of the function
rsvp_change, which provide the user to add new flow identification filter or modify the
existed one, is shown in the right part of Figure 6.14.
rsvp_head: First-Level Hash
hash_dst()
src_dst()
rsvp_session: Second-Level Hash
hash bucket
hash function
rsvp_session list
rsvp_session list
rsvp_session list
rsvp_filter list
rsvp_filter list
total 256 (dst,protocol id, tunnelid) lists
rsvp_filter list
16 (src,src port) list + 1 wildcard src lists
A pkt. arrives
Figure 6.13 The double-level hash structure in the source code CLS_RSVP
Modern Computer Networks: An Open Source Approach Chapter 6
24
6.2.7 Packet Scheduling
There are many scheduling algorithms proposed in different domain. For the
IntServ architecture, because every reserved flow has self flow queue and all
packets belong to the flow are inserted into its self queue, a scheduling should
make all flows got their expected treatment at least. Besides, a worst delay bound
is important for some critical traffic. Thus, the scheduler we discussed in the
subsection is only constrained in the fair queuing style. There is an additional
feature that is sharing the residual bandwidth to the flow required bandwidth
based on the allocated ratio of them. According to the different designed concept,
we can catalog the scheduler to two classes. One is round robin based and the
other is sorted based.
Round Robin Based
The algorithms in the class are heuristic. Below we would take the Weighted
Round Robin (WRR) scheduler, the most popular one, to introduce the class. In
the WRR, each active flow can send out a particular number of packets in one
round. The packet numbers of one flow sent out are corresponding to the value of
the weight. The method is simple, but it only performs good in the environment
where all packets are fixed-length. An improved version, Deficit Round Robin
(DRR), is proposed to solve the problem. Related to the WRR, it is more
adaptable to the current network environment, such as Internet.
Figure 6.14: The flowchart of two functions in the source code CLS_RSVP
hash_dst()
sequential search in the rsvp_session list
hash_src()
sequential search in the rsvp_filter list
match nomatch
N
N
The flowchart of function rsvp_classify
Has rsvp_filter assigned ?
adjust classid
Has rsvp_session existed ?
modify
create
insert a rsvp_session
insert a rsvp_filter
Y
N
Y
The flowchart of function rsvp_change
Modern Computer Networks: An Open Source Approach Chapter 6
25
The implementation of round robin based scheduler is simple, but the class of
scheduler is hard to support a fine quality bandwidth guaranteed. As the number
of flow is large, the flow may wait a long time to get one turn to send out packets.
If the source traffic of the flow is arrived at a constant bit rate, long waited time
may bring the flow a large delay jitter which means some packets may be sent out
quickly and others are not. In order words, the traditional round robin based
algorithm only can support fairness over the time interval of one round.
Sorted Based
The concept to design the sorted-based scheduler is very different with the
round robin based scheduler. Before describing it, we first introduce a conceptual
scheduler which is only applied on the fluid model network architecture. Assume
there are three flows fair shared a 3Mbits/sec link. In the fluid model architecture,
the scheduler is expected to completely divide the link into three virtual links. Each
flow can send out packets with rate 1Mbits/sec continuously in itself virtual link
without any delay caused by other flows. Besides, as one flow has no packets to
send out, the residual bandwidth could share fair to other flows. In other words,
the other two flows can share 1.5 Mbits/sec respectively. The ideal scheduler is
called generalized processor sharing (GPS). But, it is impossibly implemented in
the current network architecture, because in principle the current network
architecture transmits one packet at one time, which is called the packetized
model architecture. Figure 6.15 describes the difference between the fluid model
and the packetized model.
The optimal scheduler does not exist, but the order of packets that have been
finished transmitting in the fluid model could be got by computing. Thus, it is
accepted commonly that one packet-model scheduler is good if it can select
packets to send out to make the order finished transmitting of them is similar to
that doing by the fluid-model scheduler. The thought is good, but for such
A1 A2 A3B1 B2
Flow A
Flow BFluid Model
A1 B1 A2 B2 A3 Packetized Model
Figure 6.15: The difference of packet transmitting order between fluid model
and packetized model.
Modern Computer Networks: An Open Source Approach Chapter 6
26
scheduler, it is the nightmare about how to get the transmitted order simplely and
quickly. There are many variant algorithms proposed to solve the problem.
However it becomes a tradeoff between the exact bandwidth sharing and
implemented complexity. Below we will use one version of sorted based scheduler,
packetized GPS (PGPS) to describe the detail working situation of such
scheduler.
Packetized GPS
PGPS is also called weighted fair queuing (WFQ). The default operating
architecture is that each packet would get a virtual finish timestamp (VFT) as they
arrived to the flow queue and the scheduler selects the packet with the smallest
VFT to send out. The computation of VFT is related to the arrival virtual system
time (VST) and the size of the packet and the reserved bandwidth of the flow
which the packet belonged to. The VFT of packets determine their transmitted
order, so a good VFT computation is the key point to well emulate the fluid model
scheduler.
According to the algorithm, if the flow is active which means there are
packets existed in its flow queue, the VFT of the next arrival packet equals to
i
kik
ik
i
LFF
φ+= −1
where Fik is the VFT of the k-th packet of flow i. Li
k is the length of the k-th packet of flow i and iφ is the allocated ratio of bandwidth. Theoretically, if the first packets
of each flow would be arrived at the same time and all flows would be backlogged
forever, according to the above equation, it is easy to get the finished transmitting
order of packets in the fluid model scheduler. Unfortunately, that is impossible. For
the non-active flow, the VFT of their first arrival packet is calculated by
i
kik
i
LtVF
φ+= )(
where V(t) is the virtual system time which is a linear function of real time t in each
of the time intervals. In fact, the maintence of VST is the real difficult point of such
scheduler. A bad V(t) will cause the new active flow to share more or less
bandwidth than other flows actived at beginning, which will further affect a
scheduler about its worst delay guarantee.
Open Source Implementation 6.3: Packet Scheduling
For each flow, the csz_qdisk_ops module allocates a structure csz_flow to
keep the information about it. There are two variables, start and finish, to keep the
minimal and maximal finish timestamp of packets in its flow queue. In principle,
the headed packet of the flow queue has the smallest finish timestamp and the tail
Modern Computer Networks: An Open Source Approach Chapter 6
27
packet has the largest timestamp. Besides the structure csz_flow, the
csz_qdisk_ops module maintains two lists, s and f, in order to conveniently
implement the PGPS scheduler. The item in the lists is the address pointed to the
structure csz_flow. The list s is ordered by the variable start in the structure
csz_flow and only contained the active flows. It provides function csz_deque() to
quickly pick up the next transmitted packet from the proper flow queue. The list f is
ordered by the variable finish and provides the calculation of virtual system time of
PGPS in function csz_update(). Below we will introduce the three major functions
in the csz_qdisk_ops module and show their flowcharts respectively.
The function csz_enque is the entry of the module and a flowchart is shown
in Figure 6.16. For the arrival packet, the csz_enque first calculates its virtual
finish timestamp (VFT). As calculated the VFT of packets belong to the non active
flow, a current system virtual time is necessary. Thus, the function csz_update is
invoked before calculating. For the non active flow, the csz_enqueue
additionally need to wake up it by inserting its point into the list s, which gives the
flow chance to be sent out packets again.
The function csz_dequeue sends out the head packet of the flow queue
which pointed by the first item in the list s continuously. Every time the packet of
one flow is sent out, csz_dequeue will call csz_insert_start() to re-insert the
address of the flow into the list s again to keep its transmitted change in next
round if the flow queue is non empty. For the flow whose queue is empty, it will
disappear from the list s to avoid the system resource wasting.
csz_classify():get the flow id
csz_update(): update VST
check the len. of flow queue
drop the pktfull
calculated new VFT based on the last VST
Is the flow active?calculated new VFT based on the VST
csz_insert_finish() :Wake up the flow
csz_insert_start() :Wake up the flow
skb_queue_tail()
N
Y
csz_classify():get the flow id
csz_update(): update VST
check the len. of flow queue
drop the pktfull
calculated new VFT based on the last VST
Is the flow active?calculated new VFT based on the VST
csz_insert_finish() :Wake up the flow
csz_insert_start() :Wake up the flow
skb_queue_tail()
N
Y
Figure 6.16 The flowchart of the function csz_enque()
Modern Computer Networks: An Open Source Approach Chapter 6
28
The third function csz_update is a magic and key in the csz_qdisk_ops
module. That is the calculation of the system virtual time. Based on the
description in PGPS, a system virtual time is calculated every time a packet
arrives and departures. However, via the maintenance of list f, the csz_qdisk_ops
Get the time interval, delay, between now and the time where last packet arrived
If exist any flow actively
Get the minimum VFT, F, from the tail packet of headed
item in the list f
Assume all flows are active and calculate the current VST
by delay and last VST
F>VST
All residual flows are still active and the VST, F,
calculated just now is right.
The flow pointed by the headed item in the list f is no
longer active
Assume the time A means the non active flow sends out the least packet. Then, Get the
VST at the time A and adjust delay to the time interval
between now and the time A.
Figure 6.18: The flowchart of the function csz_update()
get the csz_flow where the headed packet with the smallest VFT
skb_dequeue(): get the headed packet of the flow
recalculate the min VFT in the flow
if the flow is non empty
csz_insert_start()Return the packet
for sending out
Y
N
Figure 6.17 The flowchart of the function csz_deque()
Modern Computer Networks: An Open Source Approach Chapter 6
29
just recalculated the SVT as a packet arrived. It is maintained by the function
csz_update. First, the csz_update get the time interval, delay, between now and
the last time been invoked. Secondly, it assumes that all flow are still active from
last invoking and calculates the current SVT. Then, the SVT is compared with the
variable finish of the header item in the list f. If SVT is smaller than the variable,
the flow must have been non active. The csz_update will remove it from the list f
and calculate the SVT at the time where the flow becomes to non active. The
delay also will be corrected to the time from the flow non active to now. And the
csz_update will executed the step 2 again until the correct SVT is got and all non
active flows are removed from the list f.
6.2.8 Summary
We introduced the IntServ architecture and related key components in the
chapter. The architecture attempts to provide a end-to-end QoS resource
guarantee based on a IP network. Two different QoS level of services, guaranteed
service and control-load service, are specific in the RFC. RSVP is used as a
common language to negotiate the resource reservation with the admission
control component of each router on the path between the sender and receiver.
For each IntServ router, in order to enforcement the negotiated result, it need
to identify the flow that every packet belongs to first and transmit them at the
current time. In order to control the transmission time of all flows at the same time
efficiently, a packet scheduling component is deployed before the output device.
Summarily speaking, IntServ is a explicit architecture indeed, though it may
be too complex for ISP to deploy in marketing. The traffic control components
developed for the arachitecture such as packet scheduling are the foundation of
the later architecture like Differentiated Servce. If we look the QoS IP network
from the historical viewpoint, IntServ can be said as the development period of
QoS tools.
6.3 Differentiated Service
In the section we will introduce another IP-based QoS architecture. It
should be a more possible model for implementation in the Internet. We would
describe the architecture and concept of DiffServ in Subsection 6.3.1. A
Comparison with IntServ could also be found in the subsection. In Subsection
6.3.2, the field in IP header used by DiffServ is introduced detail. The residual
Modern Computer Networks: An Open Source Approach Chapter 6
30
subsections describe the key elements of DiffServ respectively.
6.3.1 Concept
Although IntServ supplies an accurate quality of server, the IntServ
architecture is too complex for ISP. Especially for the core router which is
necessary to identify huge traffic arrived from different applications, the highly
complex design is hard to provide a good perform. Besides, there are too many
applications have been used in our daily and it is impossible to change the code of
all these applications to make them adapt for the IntServ network in a short term.
Thus, a simpler and more scalable and manageable solution were needed. The
Differentiated Service (DiffServ) is designed for this goal.
General Model
A DiffServ network is composed of one or many DiffServ domains and one
DiffServ domain is composed of several routers. According to the capaility, the
routers in DiffServ are cataloged into two types as shown in Figure 6.19. The
router at the boundary of the domain is called edge router while that at the
interior of domain is called core router. While the packets enter a DiffServ domain,
it must pass through the edge router first. For each packet, there are two stages at
the edge router. The first stage is to identify and mark packets based on some
predefined policies. The mark on the packet would affect the forward treatment
received by the packet in the domain. The second stage is to police and shape
packets based on the traffic profile which is described and negotiated by the
customer and service provider. The stage assures that the traffic which would be
injected into the domain is under the service ability of the domain. Within the
interior of the domain, no further classification of profiling is performed. The
Edge Router
Core Router
DiffServ Domain
Ingress Router Egress Router
Core Routers
Police, Mark, Shape, Drop Packets
Forward Packet
Figure 6.19: The basic architecture of the DiffServ network.
Modern Computer Networks: An Open Source Approach Chapter 6
31
residual stage executed by the core router is to forward packets with some
particular behavior according to their mark.
Comparison with IntServ
Compared with IntServ, the DiffServ architecture is simpler but roughly. Table
xx show the major difference between the two architectures. First, DiffServ does
not support the resource reserved of one single flow. A large amount of flows
seriously reduces the performance of several key components in IntServ such as
scheduler, classifier, etc. In DiffServ, arrival traffic only is divided into several
groups called forwarding classes. Each forwarding class represents a
predefined forwarding treatment. We would detail introduce the different
forwarding treatment of DiffServ in Subsection 6.3.3
Secondly, the job of packet classification only is handled at the boundary of
DiffServ domain. That is to say, only the edge routers need to classify and mark
each packet entered the domain according some predefined policies. The core
routers forward the packets with different behaviors just based on the marks at the
header of packets. The design avoids the complex and difficult problem about
classifying and scheduling huge amount of packets in the high speed core router.
That is one of the major reasons which cause the failure of IntServ.
Third, the IntServ specification clear defined the services which IntServ
provides, but the specification of DiffServ only specifics the forwarding behaviors
Table 6.x the difference between DiffServ and IntServ
End-to-endDomainWork region
ReservationProvisioning Guarantee required
Service TypeForwarding behaviorDefined in the standard
All in One Edge and CoreRouter capability
FlowClassManageable unit
IntServDiffServCompared Items
End-to-endDomainWork region
ReservationProvisioning Guarantee required
Service TypeForwarding behaviorDefined in the standard
All in One Edge and CoreRouter capability
FlowClassManageable unit
IntServDiffServCompared Items
Precedence D T R 0 0IP TOS
DS DSCP 0 0
1 2 3 4 5 6 7 8
No use
264 = 64 behaviors
12 AF PHBs1 EF PHB1 Best Effort PHB8 Class Selector PHBs
12 AF PHBs1 EF PHB1 Best Effort PHB8 Class Selector PHBs
Figure 6.20: The DS field redefined from the TOS field of IPv4 header
Modern Computer Networks: An Open Source Approach Chapter 6
32
in the core router. The forwarding behavior describes how a packet to be
forwarded in one hop, and affects the service got by the packet. As regards for the
service provided in DiffServ, it is decided and designed by the service provider.
The forth difference is the resource is estimated and preserved before the
using of customers while the resource in IntServ is allocated and reserved as the
customer addresses the request. Besides, the DiffServ architecture is composed
of multiple DiffServ domains, which is helpful to the network management. In
IntServ, it only emphasizes the end-to-end service which is hard to be
implemented in a large scale area for service providers.
6.3.2 DS Field
Packet entered the DiffServ domain is marked by the edge router. This mark
provided the core router to decide how to treat the packet. Since DiffServ is based
on IP network directly without adding other level, the mark of packets must use
the field in the current IP header. Thus, DiffServ reclaims the Type of Service
(TOS) 8-bit field with in the IPv4 header to indicate forwarding behaviors. The
replacement field is called as DS field and only the 6 bits are used as a DS
CodePoint (DSCP) in DiffServ to encode the PHB as shown in Figure 6.20.
DSCP field can represent 64 distinct values and many kind of space allocation
about the codepoints are proposed. The standard decides to divide the space into
three pools as shows in Table 6.7. The codepoints defined in the pool 1
corresponds to the major standard PHBs, which are detail described in
Subsection 6.3.4. The codepoints in other two pools are reserved for the
experimental and local use.
6.3.3 Per-Hop Forward Behavior
In the subsection, we will introduce 4 major forwarding behavior types and
their corresponding recommended codepoints defined in the standard. The first
Similar above but may be subject to standards action
xxxx013
Experimental and local usexxxx112
Standards actionxxxx01
Assignment policyCodepoint spacePool
Similar above but may be subject to standards action
xxxx013
Experimental and local usexxxx112
Standards actionxxxx01
Assignment policyCodepoint spacePool
Table 6.7: The allocated space of the codepoints
Modern Computer Networks: An Open Source Approach Chapter 6
33
two types are for sake of providing some limited backward compatibility since the
DS field is redefined from the original IP TOS field. The other two new PHB
groups are standardized by the IETF, Assured Forwarding (AF) PHBs and
Expedited Forwarding (EF) PHB, and provide the services with some degree of
service quality.
Default PHB
For most packets in the original IP network, the TOS field is useless and the
value of the field is set to zero. In order to let these packets which is DiffServ
unaware be able to pass through the DiffServ network painlessly, DiffServ define
the default DSCP value to 000000, which just right equals to the value of the TOS
field in most DiffServ-unaware packets. For these packets, DiffServ inserts them
into a non-policy queue and reserved a minimal bandwidth for them.
Class Selector PHB Group
Though at most cases the TOS field is not be used, however some vendors
are used the first 3 bits of the field. In order to allow older implementations of the
IP functionality to coexist with the DiffServ implementations, a DSCP field that
contains xxx000 is recommended to map to a group of PHB that corresponds to a
set of relative priorities for the traffic. The packet with a higher value of DSCP is
expected to have a higher relative priority than that with a lower value.
AF PHB Group
The PHBs in the group expect to forward all packets successfully as long as
the traffic source conforms to its traffic profile, Traffic Conditioning Agreement
(TCA). As regards to the traffic exceeded its TCA, the AF PHB will forward them
at his best. In other words, if the traffic load is not heavy, these packets would be
forwarded successfully. However, if the load is heavy, it would be discarded with a
higher probability. It deserves to be mentioned that for all packets if successfully
forwarded, the hop must remain their original transmitted sequence.
There are 4 forwarding rate classes in the AF PHB Group and each class is
allocated a certain bandwidth and buffer space. For each class, traffic is divided
into 3 drop precedence levels. In other words, there are 12 individual PHBs totally
in the group. As long as the buffer of a class is near full, which implies the amount
of the arrived traffic is over than the allocated bandwidth of the class, the packet in
the class with high drop precedence level would be discarded with a higher
probability than the packet with low drop level.
In order to avoid the congestion happened in a class, the amount of traffic
Modern Computer Networks: An Open Source Approach Chapter 6
34
arrived into the class needs to be controlled. Moreover, because the class of one
packet is not changed in the same DiffServ domain, the edge router needs to
admit, shape, and even drop packets to keep off the usage of the DiffServ domain
overloading. As we talked in Subsection 6.3.1, in DiffServ, to provide the quality of
service is based on provision and monitor of the edge routers.
In fact, to decide whether the congestion happened is an interesting
researchable issue. There are many algorithms about the buffer management to
be proposed to detect the congestion and reduce its effect in advance, such as
random early drop (RED). We will look these buffer management algorithms in
Subsection 6.3.7. For a more detail and formal description about the AF PHBs,
you can read the RFC xxxx.
EF PHB
The EF PHB attempts to forward packets with low loss, low latency, and low
jitter. It hopes to provide a similar performance of the traditional point-to-point
leased-line service. In order to offer the above three characters in DiffServ, the
core router must offer the bandwidth at anytime which at least is enough to
transmit the EF traffic with the rate in their profile.
Besides, EF traffic is allowed to preempt other traffic type in the core router
for supporting the three “low” characters more easy, That is to say, if the core
router uses a priority queue to mange the bandwidth among all types of
forwarding behaviors, the EF traffic may own the highest priority to be forward.
However, in order to avoid starving all other traffic and assure the EF traffic itself
to be forwarded smoothly, a strict bandwidth constraint is very important and
necessary. A shaper implemented by a leaky bucket installed at the edge router
may be a good tool to reach the goal. All out-of-profile traffic may be forwarded
with Default PHB or even be discarded at the edge router.
Related to the AF PHBs, EF PHB offers a higher quality and lower traffic
burst tolerated of service. For the traffic with const bit rate and a higher
transmitted quality requirement, AF PHB is a good choice while the EF PHBs are
more adapt the traffic whose rate is a little burst but is loss tolerated, A collation
about the AF PHB and EF PHB and their relative feature are shown in table xxxx.
A more detail description can be found in the RFC 2598.
6.3.4 A Packet Life in a DiffServ Domain
It can be divided into three stages for a packet passing through a DiffServ
domain which are ingress, egress, and interior stages. The first two stages are
happened and handled by edge routers while the letter is handled by the core
Modern Computer Networks: An Open Source Approach Chapter 6
35
routers. Following, we would introduce each stage by detailed describing the
operations in the routers.
Ingress Stage
As shown in Figure 6.21, the ingress stage of one packet is passed through
three blocks, traffic classification, traffic conditioning, and traffic forwarding. In the
first block, the classifier bases on some policies to identify the arrival traffic and
tells the behind components that they should take which traffic profile to manage
the behavior of the traffic. The classified packets streams are than passed into the
second block, traffic conditioning.
In the second block, according to the definition described in the profile, the
meter measures the traffic and catalogs the packets to in-profile or out-of-profile.
For the in-profile packets, the marker sets the suitable codepoint to let them
successfully through the domain in principle. As regards to the out-of-profile
packets, they may be dropped or marked a codepoint corresponding to the
forward behavior with a high drop probability. Alternatively, they are just passed
into the shaper as the in-profile packets. However unlike the in-profile packets
which pass through the shaper almost without any delay, they would stay behind
until they are conformed to their traffic profile.
The marked packets would be inserted into the corresponding class queue in
the traffic forwarding block. The implementation of DSCP classifier is far simpler
than the packet classifier mentioned in the first block, which is only looking the DS
Packet Classifier
Meter
DSCP Marker
Shaper
Dropper
Traffic Conditioning
DS Domain
DSCPClassifier
Class Scheduler
Traffic Forwarding
Figure 6.21: The ingress stage of a packet in the edge router
Modern Computer Networks: An Open Source Approach Chapter 6
36
field of the packet marked in the traffic conditioning block and then dispatch it to
the corresponding class queue. And, the class scheduler forwards the packet
from each class queue with the particular forwarding rate which is decided as the
design of the network service.
Interior Stage
Related to the multiple processing blocks in the ingress stage, there is only
one block in the interior stage as shown in Figure 6.22. Simple architecture
reduces the implemented cost of core router and increases the performance of
forwarding speed. The core router is only responsible for triggering per-hop
behaviors based on the DSCP of the packets, which is similar as processing in the
third block of the ingress stage.
Egress Router
6.3.5 Packet Classification
The quality of service provided in the DiffServ domain is based on provision,
thus it is important for DiffServ to verify the arrived traffic with their TCA, and
control the amount of traffic injecting into the domain. In order to know which
traffic the profile respective to, a packet classifier is necessary. The classification
is not only applied in DiffServ, but also used in the many domains, such as in a
firewall.
DS Domain
DSCPClassifier
Class Scheduler
Traffic Forwarding
Routing Database Control Plane
Data Plane
Figure 6.22: The interior stage of a packet in the core router
Modern Computer Networks: An Open Source Approach Chapter 6
37
Best Requirement
The traditional role of the classifier in the router is to help it finding the
corresponding forward target of one packet, which belongs to a one-dimensional
longest prefix matching classification. There is only one matching field, the
destination IP address, but the matching value may be fall in a range. In IntServ,
the classifier provides the router to identify which flow the packet belong to. The
definition of the flow in IntServ is composed of five fixed field in the IP and TCP
packet header. Compared to that used in the traditional routing, the number of
matching fields is 5, whose total bit length is 104 bit. However, there are only one
specific pattern to identify the flow. Related to the above two examples, the
classifier in DiffServ is far complex. DiffServ attempts to provide a very width way
to describe what kinds of packets belong to one class. Thus, the classifier in
DiffServ is a multi-dimensional range style. The classified conditions may include
several IP, TCP, and UDP packet header field value. For example, we can catalog
all packets with IP source address falling between 140.113.88.1 to
140.113.88.254 and port number equal 100 into the same class. Or we can
catalog all UDP packet whose port number is between 5000~6000, which may
belong to a audio traffic, to one class. It is obviously that the packet classifier in
Diffserv includes the two difficult problem, multi-dimensional and range matching.
Because this style of classifier is widely applied in many type of equipments,
such as the security firewall or the bandwidth controller, which may have to handle
a large amount of traffic. Moreover, because the classifier is the entry of router in
most case, any little delay is hard to be tolerated. Thus, the scalable and speed
still is the important issues for the design of the classified algorithm. Besides, all
traditional issues also exist, such as low storage requirement, fast update. The
only change is that the problems mentioned above become more complex and
difficult.
Classification Algorithms
Below we will look two different basic approaches for multi-dimensional range
match which respectively are the tries approach and the geometric approach.
The tries approach has been used for the longest prefix match in IP address
lookup widely. The traditional application is a one- dimensional range matching
problem, which is a special case of multi- dimensional range match.
Another approach turns the classification problem to a geometric problem.
The value in one field is projected to a number line and k classified fields would
composes a k-dimensional space. The k-dimensional classified rule is
transformed to a k-dimensional area and the packet can be represented as a point
Modern Computer Networks: An Open Source Approach Chapter 6
38
in the same space. The packet classification problem is equal to decide a point
belong to which area in the k-dimensional space. Let us take a 2D example to
explain the transfer. Assume we want to classify packets according to their 3-bits
source address and 3-bits target address, which composes a 2-dimensional
space as shown in Figure 6.23. Assume we have three rules and the rule A is that
all packets with source address between (100, 110) and destination address
between (001, 101) belong to the class 1. That mean we can plot a rectangle A at
the space as shown in Figure6.23. And the rule B and C are pictured based on the
above principle. To classify which rule a packet with (source address, destination
address)=(101,011) is matching is equal to know which the rectangle the point
(101, 011) falls in. A 2D classification algorithm based on the above concept is
presented in [LAKSH98]. The reader is encouraged to read the paper directly.
6.3.6 Packet Discard
In DiffServ, besides the necessary of scheduling among class queues, the
management in one queue is also important since one queue is always shared by
a group of users. Especially in the EF service, a packet discarding policy is
expected to admit the small variant usage of users and avoid the happen of
congestion at the same time. Below we will introduce two kinds of policy.
Tail Drop
It is the simplest and most basic packet discard policy. The policy, normally
used in conjunction with FIFO queuing, drops new arrival packets as there is no
srcaddr
destaddr
000 001 010 011 100 101 110 111
001
010
011
100
101
110
111
B
A
Figure 6.23: The 2D example of the geometric classification
Modern Computer Networks: An Open Source Approach Chapter 6
39
more space left in queue. Packets will continue to be drop until the queue space is
available.
Because tail drop is the default policy of FIFO queuing, some problems which
are often imputed to FIFO queuing are also belong to tail drop. For example, as
the bursting source shares a FIFO queue with other well-behaved sources, it may
occupy all available queue space in a short time; forcing new arrival packets of
other well-behaved sources are dropped. The problem could be avoided if we
divide the single queue to multiple queues where each traffic source owns itself
length limited queue. However, it means some packets may be dropped even
when the router still has queue space.
So, the current way implemented in many routers is longest queue tail drop.
All service queues share a common memory pool and the packet located at the
tail of the longest service queue will be dropped first when there is no more space
to queue a new arrival packet. The refinement makes the service classes that are
exceeding their allocated service rate will have a high dropping probability, while
service classes operating within their service allocation will maintain short queues,
and therefore will experience a low dropping probability.
Early Random Drop
If we do not attempt to classify services into different queues, a single queue
with fair sharing dropping policy is necessary. A possible way is to drop new
arrival packets based on some probability as the queue space is expected to be
full, which is termed early drop. The policy is expected to early warning the source
that queue size will be insufficient and avoid to drop continued packet in a short
time such as doing in tail drop policy, which may brings TCP of some current
versions serious harm.
It is the key of policy to decide whether the queue space is going to full? A
threshold on queue length may be the most direct way. Once the queue length
over the threshold, the probability function is applied to either queue or discard
new arrival packets. However, due to the variation of queue length is very large,
the threshold on queue length directly may be not very suitable.
A algorithm proposed in [Floyd93] presents a better way to reduce the
variation of queue length and is expected more correctly to estimate the happen
of full queue. The algorithm calculates the average queue size and adjusts the
packet discard probability with the value.
6.3.7 Summary
Modern Computer Networks: An Open Source Approach Chapter 6
40
6.4 Pitfalls and Misleading 1. Shaper and Scheduler
2. WFQ and WRR
3. Service and Forwarding Behavior
6.5 Further Reading 6.6 Exercises Hands-on Exercises
Written Exercises
1. As mentioned in Section 6.1, there are six basic components required by a
QoS aware router. Can you give a block diagram to describe how to design a
IntServ router by the components and the operating relationship among them.
Of course, adding some components is allowed for demand as designing.
2. Assume a traffic is regulated by a token bucket with parameters ( r, p, B). Can
you further discuss the effect caused from the token bucket? For example,
what is its expected output result? Or if we modify any one parameter, what
does the result become?
3. There are two common traffic estimated methods introduced in
measured-based admission control. One is EWMA and the other is time
window. Can you further compare the difference on estimation between
them ?
4. There is a 107 bits/sec link and WRR is used to schedule. Suppose N flows
attemps to share the link and the size of their packets is 125 bytes. If we plan
to fair alloc 8*106 bits/sec bandwidth for half number of flows and the residual
bandwidth for other half. If N-1 flows are backlogged and what is the possible
worst delay waitted for sending the first packet by the non-actived flow once its
packet arrived.
5. Generally speaking, WRR is suitable for the network whose packet size is
fixed length and DRR is a improved version which is able to handle packets
with variant length. In fact, due to the simple implementation of DRR, it is more
and more popular. However, it still exists the drawback for providing a small
Modern Computer Networks: An Open Source Approach Chapter 6
41
worst delay guarantee. Can you further study their abilities about the worst
delay guarantee. Does DRR guarantee the smaller worst delay than WRR?
6. A trace on queue length and a calculation on average queue length
periodically are required in the original algorithm of RED, which are a large
load for implementation. In TC, a better skill is provided to reduce the load.
You should observe the source code in the file sch_red.c and try to picture a
flowchart and describe how the problem is solved as implementation