Internet DRAFT - draft-adams-qos-broadband
INTERNET-DRAFT J L Adams BT
individual submission A J Smith Cranfield University
rap working group
Expires May 31, 2002
A New QoS Mechanism for Mass-Market Broadband
Status of this Memo
This document is an Internet-Draft and is subject to all provisions of Section
10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task Force
(IETF), its areas, and its working groups. Note that other groups may also
distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be
updated, replaced, or obsoleted by other documents at any time. It is
inappropriate to use Internet-Drafts as reference material or to cite them other
than as 'work in progress'.
The list of current Internet-Drafts can be accessed at http://www.ieft.org/lid-
The list of Internet-Draft Shadow Directories can be accessed at
This document describes a proposal which deals with congestion conditions that
may arise when a home or SME customer requests too many simultaneous flows to be
forwarded down a DSL link or other access technology. It provides a solution to
guaranteeing certain flows while making others (typically the latest or another
flow selected for policy reasons) the subject of focused packet discards. It
has a number of significant benefits over other possible solutions, such as
classical RSVP, and these are also listed in the document.
Broadband services delivered over DSL to residential or SME consumers have been
the focus of much interest recently. The potential opportunities include TV
distribution for selected residential areas, combined with voice and data
services. Consumers may select lower value packages to begin with and move
progressively through a process of 'upsell' towards higher value packages. It is
envisaged that an upsell is automatically configured after the consumer selects
it, e.g. using a browser.
This creates a market opportunity where lower value packages may be the normal
offering in early realisations, and higher value packages are added to the
platform in stages as vendor equipment develops. For example, service packages
may exclude TV in the early offering.
Among the higher value packages that could be added later to a service platform
is one which relies on a QoS function controlling the aggregate mix of services
forwarded to each consumer.
This QoS function would protect certain flows that could be pre-selected by the
consumers. Such flows would not be interrupted or subject to packet discard.
This internet draft proposes a new QoS function at the IP layer that provides
policy-based flow protection for consumers. We believe this new function has
advantages over classical RSVP, but it may be accommodated within a more
lightweight version of the protocol.
2.1 Edge Nodes
Edge nodes exist in a network and channel all content from service providers
towards customers. While this could be achieved using a separate ATM VC for
each service type (TV, voice, and data), this is very complex if it is extended
so that e.g. the data VC is no longer a single VC but a separate VC for several
types of data. In particular web streaming would need a separate VC if its QOS
is to be treated different to other data types. Therefore, it is advantageous
if all flows are aggregated onto a single ATM VC because it is possible to give
each flow the possibility of policy controlled QOS treatment. This implies that
the IP layer has to handle the separate QOS requirements of each service type.
Several vendors have developed equipment (Edge Nodes) that channels services
using separate ATM VCs and several vendors are now considering how they can move
to IP-based multiservice aggregation. The device in this document relates to an
improved Edge Node which operates in conjunction with essentially functionally
similar equipment to that currently existing, except for a modification to the
set top box. This is made in order for it to recognise certain new alarm
signals created by the device described here.
The target requirements for Edge Nodes are that large numbers of customers
should be connected (ultimately this may be 100,000 and upwards per Edge Node).
The customer is connected to the Edge Node via a DSL link. In the network a
number of fibre interfaces may be used, e.g. ATM which is well known as a simple
and effective technology to pick out individual or aggregated content flows for
a specific customer and forward them down the correct DSL links.
ATM currently places some restrictions on the maximum link rate; most vendors
currently stop at 625 Mbit/s for ATM interfaces, because their products do not
include SAR chips that go faster than this rate. However it is expected that
2.5 Gbit/s ATM links will be commonly available, within the next two years;
this would permit rates towards groups of customers to be increased. A higher
rate of a few megabit/s per customer would permit the aggregation of TV and VOD
signals into the mix.
We may anticipate patterns of demand to be such that there will be a mix of both
lower rate customers and higher rate (multimegabit per second customers) on the
same link, enabling that link to handle many hundreds of customers in total.
2.2 Quality of Service
While much effort has been directed by vendors towards the development of an
Edge Node, there is one aspect where further improvements are needed. An Edge
Node must be able to control QoS when congestion occurs, and this is the subject
of the device described in this document.
As an example scenario, consider a customer connected to a Virtual Private
Network (VPN) which in turn is connected to various content sites. The customer
has subscribed to a basic service package, which provides a main content source,
which can include TV and data. This basic service package can be extended, and
the customer is able to select from extra TV or data content sources.
More generally, a customer can be connected to multiple VPNs and receive
additional content via the internet. All these sources of traffic can combine
to cause congestion. Both the simpler case of a single VPN and the extended
case of multiple VPNs leads to the QoS issue.
An example of this occurs if a source of real time video is demanded at the same
time as streamed media is being viewed by another person in the same home. These
two flows could both have low loss tolerance. If the combined traffic load
produced by these sources is larger than the capacity of the link then it
results in some information being lost (typically from both flows), and the
perceived QoS becomes unacceptable.
The actual place where packets are discarded is best handled at a single point
in the network for all downstream flows to a specific customer and, logically,
this best point is the Edge Node. We propose that a shaping function should be
located at the Edge Node, which controls an envelope of traffic destined for any
one customer at, or below, the customer's link capacity. This negates the
requirement for sophisticated traffic handling functions in the DSLAM equipment.
3 The device
The device described in this document would modify and improve the above
proposed shaping function. It could also operate equally well in other network
locations, wherever packets are buffered and can be examined in terms of their
flow identities and class type.
Currently, when flows consist of different priority information, such as video
and data, shapers would first cause the discard of the lower priority flows
(typically the data flow) and protect the video flows. However, our device
addresses the problem of equal priority flows causing congestion, and unable to
slow down through the control of e.g. TCP.
3.1 Classical RSVP: some disadvantages
Classical RSVP can be used for congestion control of IP based flows. However,
there are disadvantages with the full heavyweight version of the protocol. RSVP
messages are separate from the higher-level call request and acknowledgement
messages that lead to e.g. phone -ringing (ie the H225 messages). To introduce
RSVP into e.g. the standard voice signalling message sequence requires the
suspension of this sequence and then its resumption following the successful
completion of the RSVP message sequence. This kind of suspension-resumption
methodology will have to be added to the higher level signalling sequence of any
kind of content, to prevent such content from starting to flow before the
reservations have been made.
If some flows are variable bit rate, RSVP is faced with difficult choices, which
present disadvantages to this solution:
- To admit the latest reservation request based on some average rate with the
possibility that the flow will exceed this average rate for significantly long
intervals and cause congestion and loss of packets to itself and other reserved
- To admit the latest reservation request based on a peak rate, thereby wasting
some of the available capacity through the condition that flows will only be
admitted while the sum of their peak rates is less than the available capacity.
- To operate a function that tries to estimate the remaining available capacity
on a link by estimating a percentile point of the current offered traffic load,
and use this estimate as the condition for accepting or rejecting the latest
Another disadvantage of RSVP (and indeed any other call admittance procedure),
is the need to keep state information on flow arrivals and cessations so that
guaranteed bandwidth can be returned to a notional common pool of available
Yet another disadvantage is the need to suspend higher-level session control
protocols until RSVP has completed its reservations. This requires certain
timeouts to be implemented so that suspension does not continue indefinitely and
various failure modes then need to be catered for, requiring additional state
information to be kept. For example, the 'call state' reached at the point
where suspension is implemented needs to be kept, so that it can be torn down if
3.2 Device Advantages
All these disadvantages are overcome with the device described in this document.
With this device it is possible to:
- Admit variable bit rate flows without being constrained to accept only a set
of flows whose peak rates are less than the available capacity.
- Admit such flows without knowing the remaining capacity of the link.
- Admit flows without being required to keep active/ceased state information on
- Admit flows without requiring a suspension of higher-level session control
- Provide guarantees to each of the admitted flows except under certain extreme
traffic conditions, when selected flows will be targeted for packet loss
enabling other flows to continue without any loss or undesirable packet delays.
3.3 Device Operation Description
What follows is a detailed description of how the device operates to achieve
3.3.1 Start Packet
When a flow towards the customer commences, a new control packet must be sent;
we have called this a 'Start Packet'. There is no requirement for that flow to
wait for any processing or acknowledgement of its Start Packet and it can
immediately start transmitting actual data packets after the Start Packet.
A Start Packet is an IP-layer control packet with an identifying field. This
field may be split into two parts, with one part being included in the standard
IP layer. For example, by setting bit 49 (not yet used for any other purpose),
it would be identified as a control packet. The other part of the field is the
first element of the information field that further identifies it as a Start
packet, or as an Alarm message packet. The exact nature of this field needs
agreement among the standards community. In other respects, a Start Packet
carries the same information (destination address, source address, and
source/destination port numbers) as will be on the IP packet headers of the
stream of data packets which form the flow behind the Start Packet.
Because of its field, the Start Packet would be recognisable to a packet discard
device located, for example, at the edge of the network to the customer. The
basic principle is that the Start Packet contains information (such as the IP
header fields of the subsequent data packets) which is loaded into a register by
the Edge Node. Subsequent data packets are examined, and if their headers
match what is in the register, then such packets may be discarded when the
buffer is filled beyond a certain threshold value.
Note that although we describe the device as operating on flows towards
customers converging at Edge Nodes, there is no restriction on it to be only
operating in that direction, or at one particular buffer point.
The device has a set of functions which are co-located with a buffer to achieve
the advantages listed above. The buffer is part of the proposed shaping
function specific to a single customer and the output from this buffer towards
the customer is restricted in maximum rate to be compatible with the capacity on
the corresponding link. The function that controls this rate limitation is a
The set of functions include:
- The customer-specific buffer; the implementation of such a buffer need not be
in the form of physically separate buffers per customer. It would normally be
a single buffer shared by all customers, with flow accounting maintained on a
- A packet discard function which maintains a state machine specific to a
customer (although it should be noted that the number of states maintained per
customer are far more limited than the number required by RSVP). It also serves
to detect newly arriving start packets that are routed to the customer-specific
buffer. It is an assumption of this description that there is already a routing
process set up that routes packets (including start packets) destined for a
specific customer towards a customer-specific buffer, where a shaped output is
enforced. The buffer is needed to absorb some degree of burstiness in the
- A main processor that controls which flows specific to a given customer may be
subject to focused discards, as discussed further below. As with the buffer, an
actual implementation would normally run a virtual process per customer in a
single processor capable of handling all customers on a link.
- A register which maintains discard control information specific to a given
customer. Again, an actual implementation would use a single register which is
divided on a per-customer basis into a number of virtual registers.
3.3.3 Basic Operation
In its simplest operation a succession of start packets (preceding a succession
of new flows) are sent towards a customer and are loaded into the (virtual)
register such that each overwrites the previous one and therefore the register
always contains the latest flow.
When a packet identity is removed from the register (usually by being
overwritten) the corresponding flow becomes bandwidth guaranteed except under
certain extreme traffic conditions to be discussed below. This means that,
normally, there are no packets discarded from such a flow when the buffer
experiences congestion. We speak here of such flows as having entered the
If, over an interval of time a sequence of flows start with their corresponding
start packets, then the normal behaviour of the system being described here
allows some of the earlier flows to move to the guaranteed area. It will always
retain at least one flow identity in the register, whose packets will be the
subject of focused discard if the buffer becomes too full.
A focused discard is triggered when the buffer has sent a control signal to the
main control logic indicating that a fill threshold level has been exceeded.
This subsequently instructs the discarding function to commence packet
discarding. Before beginning to discard packets the discarding function sends
two control packets.
The first is sent forward towards the customer; this control packet is called
the 'Congestion Notification' packet. This advises the application resident in
the customer's equipment that a network congestion condition has occurred. An
application may choose to continue receiving such data packets that are not
deleted by the discarding function, or it may close down and indicate network
busy to the user.
The second is sent backwards towards the source to indicate that this flow is
about to become the subject of focussed packet discard. Again, the source may
choose to ignore this control packet, or may terminate the flow.
The discard function then commences to discard all packets whose flow identity
matches the identity in the flow register. Packet discarding will continue
until the buffer fill level is reduced to a lower threshold value.
The main control logic may also inform a network billing function that flow
discarding has commenced on a specific flow, if the charging arrangements
require this information. In some preferred arrangements the customer will be
billed on a flat-rate basis and therefore it may be unnecessary to send any
indication to a billing function.
If an application chooses to close down on receipt of the Congestion
Notification signal then it is responsible for sending the appropriate signals
to the source end to shut down the flow. These procedures are outside the scope
of this device and will vary from application to application.
3.3.4 More Refined Operations
In a refinement of the simplest way of operating such control functions for
packet discard, a field is utilised in the Start Packet. This field is known as
the 'Rate Advisory' field. This field conveys the peak bit-rate of the flow. The
register is now loaded so that it always retains a set of flows whose rate
advisories sum to N percent, e.g. 5 percent of the link bandwidth. The value N
can be varied to suit certain known traffic conditions. It caters for the degree
of uncertainty that exists when accepting variable bit-rate flows. Thus, if the
combined set of flows in the 'guaranteed area' bursts to a load-level which is
significantly higher than the link capacity, then focused discard on the set of
flows in the register can be expected to reduce the load by up to N percent of
the link capacity. It provides sufficient flows to focus on for packet discards
to get the buffer fill level down below the threshold value.
Another refinement is concerned with the problem of how equal flows are subject
to focused discards if several such flows are currently in the register. It is
possible to operate the discard function so that the latest flow is the most
vulnerable, and earlier flows (which are still retained because they make up a
combined set of flows whose rate advisories sum to N percent) become less and
less vulnerable, until they eventually leave the window into the guaranteed
The discarding function will try to control the forwarding rate towards the
buffer according to a leaky bucket principle, where only a limited burst of
packets above a defined rate is permitted to be forwarded to the buffer. This
defined rate is equal to the rate at which the buffer can transmit packets
towards the customer. The discarding function can start by discarding every
packet of the latest flow and only pick on additional packets from other flows
in its register if it would otherwise exceed its burst size restrictions.
There are other ways of operating the discard function, such as policy-based
controls where, instead of the latest flow being the one chosen for total
discard, another flow is chosen due to policy information stored in the discard
A specific way of obtaining such policy information is to make use of a second
control field of the start packet. This field is termed the sub-components
field, and allows policy information to be captured within the register and
readable by the discard function. When a flow consists of different media
components, such as video and data, this sub-component field stores information
relating to each component including its priority in terms of packet discard.
The packet matching performed on the data packets passing through the discard
function includes not only the destination and source addresses but also other
information that uniquely identifies a sub-component. This may include source or
destination port numbers or other information such as TOS QoS settings.
The fraction of the link bandwidth that is used as a control for the set of
flows retained in the discard function effectively defines a 'window' size for
flow retention. Thus a flow starts up, its identity enters the discard function,
and exits when further additional flows have arrived whose combined set of rate
advisories makes it no longer necessary to retain this earlier flow. The term
'within the window' is used here to describe flows that are currently
progressing towards guaranteed status.
3.3.5 Failure Conditions
In this section we describe the action taken by the device under failure
- It is possible that a start packet may fail to arrive, having been lost
in the network between the point of generation and the device buffer. The
proposed solution is to make the source generate two start packets which are
sent prior to any data packet; and, regardless of the QoS setting of the
subsequent data packets, to mark all start packets with a very high priority or
class setting, thus making the loss of both packets very improbable.
- It is possible that too many flows are requested by the customer, within a
very short interval of time, so that the ability to assess their impact on the
buffer cannot be assessed on a one-flow-at-a-time basis. The solution to this
is to have a guard period that is the minimum time that a flow identity can
remain in the window. For example, a counter is reset for each such flow at the
moment when it enters the window and the flow identity must remain in the window
until some number N of data packets has been sent and detected by the discard
function, regardless of any other criteria governing exit from the window.
However, it does not apply to those flows which, through policy reasons, are not
put in the window.
- It is possible that so many flows are requested by the customer within some
very short interval of time that the control logic has insufficient space to
handle all of their separate identities and guard periods within the window.
The solution to this is for the main processor to have an alarm function that is
triggered by such a condition. If triggered, the discard function is instructed
to send a special Alarm signal towards the customer, indicating that the service
is being abused outside of its expected parameters, and that all flows are being
discarded. The discard function now deletes all packets. The Alarm message
will advise the customer to contact the network or service administrator because
the administrator will need to reset a discard flag and clear the existing
window data after clearing down all flows.
- It is possible that a flow is maintained by the customer as active even
though it has been silent for some time. It now starts up again and creates
congestion. Normally, this would not cause a problem since the flow is either
still in the window, in which case its packets would start to be discarded, or
it has moved to the guaranteed area, in which case other, newer flows will start
to be discarded. There are, however, exceptional conditions when a high bit-
rate real-time flow behaves in this way. Its rate could now exceed the
protective window capacity (e.g. the N percent figure). This would also happen
if some malicious flow delayed the onset of some very high rate until after it
was likely to be guaranteed and then overloaded the buffer.
The solution to these abnormal conditions is to allow the discard function to
randomly choose an additional flow (by selecting this information from any
passing data packet) and add such a random flow to the register window, and
begin discarding on it. This flow could be distinguished from other flows by
setting an additional parameter called the aux flowid parameter to 'emergency'
(usually it is set to 'normal'). The discard function would also send an Alarm
signal to the customer saying that the operation is outside of the expected
The discard function would be triggered into this mode of selecting one or more
additional flows whenever the buffer fill-level hits a second, higher threshold,
generating an alarm signal. Once in this mode (emergency delete mode), the
discard function can repeat the random selection of further flows any number of
times until the buffer loading starts to reduce.
If the buffer load starts to reduce, a buffer alarm off signal is generated,
causing the discard function to perform a stability check before removing any
flowids from the register which have their aux parameter set to the value
emergency. The stability check is designed to prevent the discard function from
removing emergency flowids from the register and then quickly needing to add a
random new set under the conditions that the buffer is quickly oscillating
between alarm off and alarm on signals. It is preferable to keep the same set
of emergency flowids under these conditions, which helps to limit the number of
different flowids that become randomly selected.
The stability check consists of the discard function inspecting flowids for
emergency settings. If this is the case, a timeout period is begun, and the
function monitors buffer alarm on/off transitions during timeout. If the buffer
generates an alarm during this period, the emergency flows are not cleared from
the register and remain the target of discard. This situation continues until
an alarm off signal is generated by the buffer which causes a further timeout
period to commence. The discard function will always perform the stability
check before removing aux=emergency flowids from the register. The final error
trap used by the discard function protects against a timeout period beginning if
there is already a timeout in progress.
- It is possible that a flow sends no Start Packet. This may cause existing
flows in the window to be discarded if the additional flow (which has sent no
Start Packet) causes congestion. In the extreme, it will trigger the same
actions and alarm messages as described in the previous condition.
Under the circumstances of the abnormal conditions described in the last two
conditions, it is possible that some guaranteed flows are subject to packet
discard, but this should be an exceptional event that is regarded as an alarm
The device described in this document offers a bridge between two worlds. The
narrower the interval that we have termed the window, the more the current
invention emulates the classic connection-oriented paradigm. The latest
connection is the only one in the window. It is therefore either accepted or
subject at any moment to full packet discard. If accepted it is placed in the
guaranteed area as soon as a further new flow starts up.
On the other hand, the wider the window, the more like the classic
connectionless world. Most flows are vulnerable to packet discard when the
buffer is too full.
Notice also that this device fits with the connectionless paradigm in that
sources are only required to transmit a start packet and then, without waiting
further, start to transmit their data. There is no negotiation (unlike classical
RSVP) yet there are still guaranteed flows in the case of window sizes that are
some small fraction of the link rate.
So we effectively have a new QoS procedure that is based only on start packets
and no subsequent response packets are triggered or used. In place of such
additional control messages as would have been expected in the classic 'circuit
world' only warning indications are triggered on flows just prior to packet
The device may be refined by the addition of policy controls governing how a
flow gets into the window. A family may decide that viewing the main film on a
Thursday is the most important thing that day. That's why they subscribed to the
service and they want that guaranteed. So, even though the film is the latest
flow it moves straight into the guaranteed area because of a policy database
that can be written to by the customer using, for example, a browser. This
database information is readable by the main control logic. When this function
is informed by the discard function that a new start packet has arrived, it
checks the policy database and determines if the flow is to be added to the
register or simply ignored so that effectively it passes straight to the
guaranteed area. If it is moved straight to the guaranteed area, the possibility
of recovering from buffer overload at the time when the movie starts is still
achieved by focusing on the previous flows which remain in the window.
John L Adams firstname.lastname@example.org
pp MLB G 7
Orion Building (B62-MH) +44 1473 606321
Ipswich IP5 3RE
A new QoS Mechanism for Mass-market Broadband Adams and Smith
draft-adams-QoS-broadband-00.txt expires: May 31, 2002