Network Working Group                                     Stewart Bryant
Internet Draft                                               Bruce Davie
Expiration Date: December 2006                              Luca Martini
                                                           Eric C. Rosen
                                                     Cisco Systems, Inc.


                                                               June 2006


                   PWE3 Congestion Control Framework


                   draft-rosen-pwe3-congestion-03.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.


   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/1id-abstracts.html

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

   Insofar as pseudo wires may be used to carry non-TCP data flows, it
   is necessary to provide pseudo wire-specific congestion control
   procedures.  These procedures should ensure that pseudo wire traffic
   is "TCP-compatible", as defined in [RFC2914].  This document attempts
   to lay out the issues which must be considered when defining such
   procedures.


Bryant, et al.                                                  [Page 1]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


Table of Contents

    1          Introduction  .......................................   2
    1.1        Conventions used in this document  ..................   2
    1.2        PWE3 and Congestion in IP Networks  .................   2
    1.3        Is This a Practical Problem?  .......................   4
    1.4        Why isn't this Easy?  ...............................   6
    1.5        The Goal of PW-specific Congestion Control  .........   6
    1.6        Constant Bit Rate PWs  ..............................   8
    2          Detecting Congestion  ...............................   9
    2.1        ECN  ................................................  12
    3          Feedback from Receiver to Transmitter  ..............  12
    4          Responding to Congestion  ...........................  15
    5          Rate Control per Tunnel vs. per PW  .................  16
    6          Fixed Rate of Transmission Services  ................  16
    7          Mandatory vs. Optional  .............................  17
    8          Informative References  .............................  17
    9          Author's Addresses  .................................  17
   10          Intellectual Property Statement  ....................  18
   11          Full Copyright Statement  ...........................  19


1. Introduction

1.1. Conventions used in this document

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
    document are to be interpreted as described in RFC-2119 [RFC2119].


1.2. PWE3 and Congestion in IP Networks

   Congestion in an IP network occurs when the amount of traffic that
   needs to use a particular network resource exceeds the capacity of
   that resource.  This results first in long queues within the network,
   and then in packet loss.  If the amount of traffic is not then
   reduced, the packet loss rate will climb, potentially until it
   reaches 100%.

   To prevent this sort of "congestive collapse", there must be
   congestion control: a feedback loop by which the presence of
   congestion somewhere in the network forces the transmitters to reduce


Bryant, et al.                                                  [Page 2]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


   the amount of traffic being sent.  As a connectionless protocol, IP
   has no way to push back directly on the originator of the traffic.
   Procedures for (a) detecting congestion, (b) providing the necessary
   feedback to the transmitters, and (c) adjusting the transmission
   rates, are thus left to higher protocol layers such as TCP.

   The vast majority of traffic in IP networks is TCP traffic.  TCP
   includes an elaborate congestion control mechanism which causes the
   end systems to reduce their transmission rates when congestion
   occurs.

   For those readers not intimately familiar with the details of TCP
   congestion control, we give below a brief summary, greatly simplified
   and not entirely accurate, of TCP's very complicated feedback
   mechanism. The details of TCP congestion control can be found in
   [RFC2581]. [RFC2001] is an earlier but more accessible discussion.
   [RFC2914] articulates a number of general principles governing
   congestion control in the Internet.

   In TCP congestion control, a lost packet is considered to be an
   indication of congestion.  Roughly, TCP considers a given packet to
   be lost if that packet is not acknowledged within a specified time,
   or if three subsequent packets arrive at the receiver before the
   given packet.  The latter condition manifests itself at the
   transmitter as the arrival of three duplicate acks in a row.  The
   algorithm by which TCP detects congestion is thus highly dependent on
   the mechanisms used by TCP to ensure reliable and sequential
   delivery.

   Once a TCP transmitter becomes aware of congestion, it halves its
   transmission rate.  If congestion still occurs at the new rate, the
   rate is halved again.  When a rate is found at which congestion no
   longer occurs, the rate is increased by one MTU ("Maximum Transport
   Unit") per RTT ("Round Trip Time").  The rate is increased each RTT
   until congestion is encountered again, or until something else limits
   it (e.g., the flow control window reached, or the application is
   transmitting at its max desired rate, or at line rate).

   This sort of mechanism is known as an "Additive Increase,
   Multiplicative Decrease" (AIMD) mechanism.  Congestion causes
   relatively rapid decreases in the transmission rate, while the
   absence of congestion causes relatively slow increases in the allowed
   transmission rate.

   Currently, traffic in IP networks is predominantly TCP traffic.  Even
   the layer 2 tunneled traffic (e.g., PPP frames tunneled through L2TP)
   is predominantly TCP traffic from the end-users.  If pseudo wires
   (PWs) were to be used only for carrying TCP flows, there would be no


Bryant, et al.                                                  [Page 3]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


   need for any PW-specific congestion mechanisms.  The existing TCP
   congestion control mechanisms would be all that is needed, since any
   loss of packets on the PW would be detected as loss of packets on a
   TCP connection, and the TCP flow control mechanisms would ensure a
   reduction of transmission rate.

   However, if a PW is carrying non-TCP traffic, then there is no
   feedback mechanism to cause the end-systems to reduce their
   transmission rates in response to congestion.  When congestion
   occurs, any TCP traffic that is sharing the congested resource with
   the non-TCP traffic will be throttled, and the non-TCP traffic may
   "starve" the TCP traffic.  If there is enough non-TCP traffic to
   congest the network all by itself, there is nothing to prevent
   congestive collapse.

   The non-TCP traffic in a PW can belong to any higher layer
   whatsoever, and there is no way to retrofit TCP-like congestion
   control mechanisms to all those layers.  Hence it appears that there
   is a need for an edge-to-edge (i.e, PE-to-PE) feedback mechanism
   which forces a transmitting PE to reduce its transmission rate in the
   face of network congestion.

   As TCP uses window-based flow control, controlling the rate is really
   a matter of limiting the amount of traffic which can be "in flight"
   (i.e., transmitted but not yet acknowledged) at any one time.
   Obviously a different technique needs to be used to control the
   transmission rate of the non-windowed protocol used for transmitting
   data on PWs.


1.3. Is This a Practical Problem?

   One may argue that congestion due to non-TCP PW traffic is only a
   theoretical problem.

     - "99.9% of all the traffic in PWs is really IP traffic"

       If this is the case, then the traffic is either TCP traffic,
       which is already congestion-controlled, or "other" IP traffic.
       While the congestion control issue may exist for the "other" IP
       traffic, it is a general issue which is not specific to PWs.

       Unfortunately, we cannot be sure that this is the case. It may
       well be the case for the PW offerings of certain providers, but
       perhaps not for others.  It does appear that many providers want
       to be able to use PWs for transporting "legacy traffic" of
       various non-IP protocols.


Bryant, et al.                                                  [Page 4]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


     - "PW traffic usually stays within one SP's network, and an SP
       always engineers its network carefully enough so that congestion
       is an impossibility"

       Perhaps this will be true of "most" PWs, but inter-provider PWs
       are certainly expected to have a significant presence.

       Even within a single provider's network, the provider might
       consider whether he is so confident of his network engineering
       that he does not need a feedback loop reducing the transmission
       rate in response to congestion.

       There is also the issue of keeping the network running (i.e., out
       of congestive collapse) after an unexpected reduction of
       capacity.

     - "If one provider accepts PW traffic from another, policing will
       be done at the entry point to the second provider's network, so
       that the second provider is sure that the first provider is not
       sending too much traffic.  This policing, together with the
       second provider's careful network engineering, makes congestion
       an impossibility"

       This could be the case given carefully controlled bilateral
       peering arrangements.  Note though that if the second provider is
       merely providing transit services for a PW whose endpoints are in
       other providers, it may be difficult for the transit provider to
       tell which traffic is the PW traffic and which is "ordinary" IP
       traffic.

     - "The only time we really need a general congestion control
       mechanism is when traffic goes through the public Internet.
       Obviously this will never be the case for PW traffic."

       It is not at all difficult to imagine someone using an IPsec
       tunnel across the public Internet to transport a PW from one
       private IP network to another.

       Nor is it difficult to imagine some enterprise implementing a PW
       and transporting it across some SP's backbone, e.g., if that SP
       is providing VPN service to that enterprise.

   The arguments that non-TCP traffic in PWs will never make any
   significant contribution to congestion thus do not seem to be totally
   compelling.


Bryant, et al.                                                  [Page 5]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


1.4. Why isn't this Easy?

   One easy solution would be to run the PWs through a TCP connection.
   This would provide congestion control automatically.  However, the
   overhead is prohibitive for the PW application.  The PWE3 data plane
   may be implemented in a microcoded hardware engine which needs to
   support thousands of PWs, and needs to do as little as possible for
   each data packet; running a TCP state machine, and implementing TCP's
   flow control procedures, would impose too high a cost in this
   environment.  Nor do we want to add the large overhead of TCP to the
   PWs -- the large headers, the plethora of small acks in the reverse
   direction, etc., etc.  In fact, we want to avoid acknowledgments
   altogether.  These same considerations lead us away from using e.g.,
   DCCP.

   Therefore we will investigate some PW-specific solutions for
   congestion control.

   We also want to minimize the amount of interaction between the data
   processing path (which is likely to be distributed among a set of
   line cards) and the control path; we need to be especially careful of
   interactions which might require atomic read/modify/write operations
   from the control path, or which might require atomic
   read/modify/write operations between different processors in a
   multiprocessing implementation, as such interactions can cause
   scaling problems.


1.5. The Goal of PW-specific Congestion Control

   [RFC2914] defines the notion of a "TCP-compatible flow":

       "A TCP-compatible flow is responsive to congestion notification,
       and in steady-state uses no more bandwidth than a conformant TCP
       running under comparable conditions (drop rate, RTT [round trip
       time],  MTU [maximum transmission unit], etc.)"

   TCP-compatible flows respond to congestion in much the way TCP does,
   so that they do not starve the TCP flows or otherwise obtain an
   unfair advantage.

   [RFC2914] further points out:

       "any form of congestion control that successfully avoids a high
       sending rate in the presence of a high packet drop rate should be
       sufficient to avoid congestion collapse from undelivered
       packets."


Bryant, et al.                                                  [Page 6]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


       "This does not mean, however, that concerns about congestion
       collapse and fairness with TCP necessitate that all best-effort
       traffic deploy congestion control based on TCP's Additive-
       Increase Multiplicative-Decrease (AIMD) algorithm of reducing the
       sending rate in half in response to each packet drop."

       "However, the list of TCP-compatible congestion control
       procedures is not limited to AIMD with the same increase/
       decrease parameters as TCP.  Other TCP-compatible congestion
       control procedures include rate-based variants of AIMD; AIMD with
       different sets of increase/decrease parameters that give the same
       steady-state behavior; equation-based congestion control where
       the sender adjusts its sending rate in response to information
       about the long-term packet drop rate ... and possibly other forms
       that we have not yet begun to consider."

   The AIMD procedures are not mandated for non-TCP traffic, and might
   not be optimal for non-TCP PW traffic.  Choosing a proper set of
   procedures which are TCP-compatible while being optimized for a
   particular type of traffic is no simple task.  [RFC3448], "TCP
   Friendly Rate Control (TFRC)" provides an alternative:

       "TFRC is designed to be reasonably fair when competing for
       bandwidth with TCP flows, where a flow is "reasonably fair" if
       its sending rate is generally within a factor of two of the
       sending rate of a TCP flow under the same conditions.  However,
       TFRC has a much lower variation of throughput over time compared
       with TCP, which makes it more suitable for applications such as
       telephony or streaming media where a relatively smooth sending
       rate is of importance."

       "For its congestion control mechanism, TFRC directly uses a
       throughput equation for the allowed sending rate as a function of
       the loss event rate and round-trip time.  In order to compete
       fairly with TCP, TFRC uses the TCP throughput equation, which
       roughly describes TCP's sending rate as a function of the loss
       event rate, round-trip time, and packet size."

       "Generally speaking, TFRC's congestion control mechanism works as
       follows:

         o The receiver measures the loss event rate and feeds this
           information back to the sender.

         o The sender also uses these feedback messages to measure the
           round-trip time (RTT).


Bryant, et al.                                                  [Page 7]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


         o The loss event rate and RTT are then fed into TFRC's
           throughput equation, giving the acceptable transmit rate.

         o The sender then adjusts its transmit rate to match the
           calculated rate."

   Note that the TFRC procedures require the transmitter to calculate a
   throughput equation.  For these procedures to be feasible in the as a
   means of PW congestion control, they must be computationally
   efficient.  Section 8 of [RFC3448] describes an implementation
   technique that appears to make it efficient to calculate the
   equation.


1.6. Constant Bit Rate PWs

   Some types of PW, for example SAToP, CESoPSN, TDMoIP, SONET/SDH and
   CBR ATM PWs represent an inelastic constant bit-rate (CBR) flow and
   although they cannot respond to congestion in a TCP-friendly manner
   prescribed by [RFC2914], the percentage of total bandwidth they
   consume remains constant.  AIMD techniques are clearly no applicable
   to such services that are also much more sensitive to packet loss
   than connectionless packet PWs. Given the CBR services are not
   greedy, there is a case for allowing them greater latitude in
   ignoring such services during congestion peaks. Depending on the
   specific level of resilience to packet loss, CBR PWs may not be able
   to endure any packet loss without compromising the transported
   service, therefore in case of congestion such PWs MUST be shutdown
   when the level of congestion becomes excessive. At lower levels of
   congestion they should be allowed to continue to offer traffic to the
   network.

   Some CBR services are carried over connectionless packet PWs. An
   example of such a case would be an MPEG-2 video stream carried over
   over an Ethernet PW. One could argue that such a service - provided
   the rate was policed at the ingress PE - should be offered the same
   latitude as an a priori CBR PE.  However there is an issue of trust
   that needs to be resolved (section 7)


Bryant, et al.                                                  [Page 8]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


2. Detecting Congestion

   In TCP, congestion is detected by the transmitter; the receipt of
   three successive duplicate TCP acks are taken to be indicative of
   congestion. What this actually means is that the several packets in a
   row were received at the remote end, such that none of those packets
   had the next expected sequence number.  This is interpreted as
   meaning that the packet with the next expected sequence number was
   lost in the network, and the loss of a single packet in the network
   is taken as a sign of congestion.  (Naturally, the presence of
   congestion is also inferred if TCP has to retransmit a packet.) Note
   that it is possible for mis-ordered packets to be misinterpreted as
   lost packets, if they do not arrive "soon enough".

   In TCP, a time-out while awaiting an ack is also interpreted as a
   sign of congestion.

   Since there are no acknowledgments on a PW, the PW-specific
   congestion control mechanism obviously cannot be based on either the
   presence of or the absence of acknowledgments.  In fact, existing PW
   mechanisms and procedures provide no way for a transmitter to
   determine (or even to make an educated guess as to) whether any data
   has been lost.

   Thus we need to add a mechanism for determining whether data packets
   on a PW have gotten lost.  There are two evident methods for doing
   this:

        -i. Trying to Detect Congestion Using PW Sequence Numbers

            When the optional sequencing feature is in use on a PW, it
            is necessary for the receiver to maintain a "next expected
            sequence" number for the PW.  If a packet arrives with a
            sequence number that is earlier than the next expected (a
            "mis-ordered packet"), the packet is discarded; if it
            arrives with a sequence number that is greater than or equal
            to the next expected, the packet is delivered, and the next
            expected sequence number becomes the sequence number of the
            current packet plus 1.

            It is easy to tell when there is one or more missing packets
            (i.e., there is a "gap" in the sequence space) -- that is
            the case when a packet arrives whose sequence number is
            greater than the next expected.  What is difficult to tell
            is whether any misordered packets that arrive after the gap
            are indeed the missing packets.  One could imagine that the
            receiver remembers the sequence number of each missing
            packet for a period of time, and then checks off each such


Bryant, et al.                                                  [Page 9]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


            sequence number if a misordered packet carrying that
            sequence number later arrives.  The difficulty is doing this
            in a manner which is efficient enough to be done by the
            microcoded hardware handling the PW data path.  This
            approach does not really seem feasible.

            One could make certain simplifying assumptions, such as
            assuming that the presence of any gaps at all indicates
            congestion.  While this assumption makes it feasible to use
            the sequence numbers to "detect congestion", it also
            throttles the PW unnecessarily if there is really just
            misordering and no congestion.  Such an approach would be
            considerably more likely to misinterpret misordering as
            congestion than would TCP's approach.

            An intermediate approach would be to keep track of the
            number of missing packets and the number of misordered
            packets for each PW. One could "detect congestion" if the
            number of missing packets is significantly larger than the
            number of misordered packets over some sampling period.
            However, gaps occurring near the end of a sampling period
            would tend to result in false indications of congestion. To
            avoid this one might try to smooth the results over several
            sampling periods; While this would tend to decrease the
            responsiveness, it is inevitable that there will be a
            trade-off between the rapidity of responsiveness and the
            rate of false alarms.

            One would not expect the hardware or microcode to keep track
            of the sampling period; presumably software would read the
            necessary counters from hardware at the necessary intervals.

            Such a scheme would have the advantage of being based on
            existing PW mechanisms.  However, it has the disadvantage of
            requiring sequencing, and it also introduces a fairly
            complicated interaction between the control processing and
            the data path.

       -ii. Detecting Congestion Using Modified VCCV Packets

            It is reasonable to suppose that the hardware keeps counts
            of the number of packets sent and received on each PW.

            Suppose that the PW uses MPLS, and that the transmitter
            periodically inserts VCCV [VCCV] packets into the PE data
            stream, where each VCCV packet carries:


Bryant, et al.                                                 [Page 10]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


              - A sequence number, increasing by 1 for each successive
                VCCV packet.

              - The current value of the transmission counter for the
                PW.

            We assume that the size of the counter is such that it
            cannot wrap during the interval between n VCCV packets, for
            some n > 1.

            When the receiver gets one of these VCCV packets on a PW, he
            inserts into it his count of received packets for that PW,
            and delivers the packet to the software.

            The receiving software can now compute, for the inter-VCCV
            intervals, the count of packets transmitted and the count of
            packets received.  The presence of congestion can be
            inferred if the count of packets transmitted is
            significantly greater than the count of packets received
            during the most recent interval.  Even the loss rate could
            be calculated.

            VCCVs would not need to be sent on a PW (for the purpose of
            detecting congestion) in the absence of traffic on that PW.

            Of course, misordered packets that are sent during one
            interval but arrive during the next will throw this off;
            that's why the different between sent traffic and received
            traffic should be "significant" before the presence of
            congestion is inferred.  The value of "significance" can be
            made larger or smaller depending on the probability of
            misordering.

            Note that congestion can cause a VCCV packet to go missing,
            and anything that misorders packets can misorder a VCCV
            packet as well as any other.  One may not want to infer the
            presence of congestion if a single VCCV packet does not
            arrive when expected, as it may just be delayed in the
            network, even if it hasn't been misordered.  However,
            failure to receive a VCCV packet after a certain amount of
            time has elapsed since the last VCCV was received (on a
            particular PW) may be taken as evidence of congestion.

            This scheme has the disadvantage of requiring periodic VCCV
            packets, and it requires VCCV packet formats to be modified
            to include the necessary counts.  However, the interaction
            between the control path and the data path is very simple,
            as there is no polling of counters, no need for timers in


Bryant, et al.                                                 [Page 11]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


            the data path, and no need for the control path to do read-
            modify-write operations on the data path hardware.

            A bigger disadvantage may arise from the possible inability
            to ensure that the transmit counts in the VCCVs are exactly
            correct.  The transmitting hardware may not be able to
            insert a packet count in the VCCV IMMEDIATELY before
            transmission of the VCCV on the wire, and if it cannot, the
            count of transmit packets will only be approximate.

   Neither scheme can provide the same type of continuous feedback that
   TCP gets.  TCP gets a continuous stream of acknowledgments, whereas
   the PW congestion detection mechanism would only be able to say
   whether congestion occurred during a particular interval.  If the
   interval is about 1 RTT, the PW congestion control would be
   approximately as responsive as TCP congestion control, and there does
   not seem to be any advantage to making it smaller.  However, sampling
   at an interval of 1 RTT might generate excessive amounts of overhead.


2.1. ECN

   In networks that support explicit congestion notification (ECN)
   [RFC3168] the ECN notification provides congestion information to the
   PEs before the onset of congestion discard. This is particularly
   useful to PWs that are sensitive to packet loss, since it gives the
   PE the opportunity to intelligently reduce the offered load. However
   ECN is not widely deployed and the PEs must also be capable of
   operating in a network where packet loss is the only indicator of
   congestion.


3. Feedback from Receiver to Transmitter

   Given that the receiver can tell, for each sampling interval, whether
   or not a PW's traffic has encountered congestion, the receiver must
   provide this information as feedback to the transmitter, so that the
   transmitter can adjust its transmission rate appropriately.

   The feedback could be as simple as a bit stating whether or not there
   was any packet loss during the specified interval.  Alternatively,
   the actual loss rate could be provided in the feedback, if that
   information turns out to be useful to the transmitter.

   There are a number of possible ways in which the feedback can be
   provided:


Bryant, et al.                                                 [Page 12]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


        -i. Control Plane

            A control message can be sent periodically to indicate the
            presence or absence of congestion.  For example, when LDP is
            the control protocol, the control message would of course be
            delivered reliably by TCP.  (The same considerations apply
            for any protocol which has a reliable control channel.)
            When congestion is detected, a control message can be sent
            indicating that fact.  No further congestion control
            messages would need to be sent until congestion is no longer
            detected.  If the loss rate is being sent, changes in the
            loss rate would need to be sent as well.  When there is no
            longer any congestion, a message indicating the absence of
            congestion would have to be sent.

            Since congestion in the reverse direction can prevent the
            delivery of these control messages, periodic "no congestion
            detected" messages would need to be sent whenever there is
            no congestion.  Failure to receive these in a timely manner
            would lead the control protocol peer to infer that there is
            congestion. (Actually, there might or might not be
            congestion in the transmitting direction, but in the absence
            of any feedback one cannot assume that everything is fine.)
            If control messages really cannot get through at all,
            control protocol keepalives will fail and the control
            connection will go down anyway.

            If the control messages simply say whether or not congestion
            was detected, then given a reliable control channel,
            periodic messages are not needed during periods of
            congestion.  Of course, if the control messages carry more
            data, such as the loss rate, then they need to be sent
            whenever that data changes.

            If it is desired to control congestion on a per-tunnel
            basis, these control messages will simply say that there was
            congestion on some PW (one or more) within the tunnel.  If
            it is desired to control congestion on a per-PW basis, the
            control message can list the PWs which have experienced
            congestion, most likely by listing the corresponding labels.
            If the VCCV method of detecting congestion is used, one
            could even include the sent/received statistics for
            particular VCCV intervals.

            This method is very simple, as one does not have to worry
            about the congestion control messages themselves getting
            lost or out of sequence.  Feedback traffic is minimized, as
            a single control message relays feedback about an entire


Bryant, et al.                                                 [Page 13]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


            tunnel.

       -ii. Reverse Data Traffic

            If a receiver detects congestion on a particular PW, it can
            set a bit in the data packets that are traveling on that PW
            in the reverse direction; when no congestion is detected,
            the bit would be clear.  The bit would be ignored on any
            packet which is received out of sequence, of course.

            There are several disadvantages to this technique:

              - There may be no (or insufficient) data traffic in the
                reverse direction

              - Sequencing of the data stream is required

              - The transmission of the congestion indications is not
                reliable

              - The most one could hope to convey is one bit of
                information per PW (if there is even a bit available in
                the encapsulation).

      -iii. Reverse VCCV Traffic

            Congestion indications for a particular PW could be carried
            in VCCV packets traveling in the reverse direction on that
            PW.  Of course, this would require that the VCCV packets be
            sent periodically in the reverse direction whether or not
            there is reverse direction traffic.  For congestion feedback
            purposes they might need to be sent more frequently than
            they'd need to be sent for OAM purposes.  It would also be
            necessary for the VCCVs to be sequenced (with respect to
            each other, not necessarily with respect to the datastream).
            Since VCCV transmission is unreliable, one would want to
            send multiple VCCVs within whatever period we want to be
            able to respond in.  Further, this method provides no means
            of aggregating congestion information into information about
            the tunnel.


Bryant, et al.                                                 [Page 14]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


4. Responding to Congestion

   In TCP, one tends to think of the transmission rate in terms of MTUs
   per RTT, which defines the maximum number of unacknowledged packets
   that TCP is allowed to maintain "in flight".

   Upon detection of a lost packet, this rate is halved ("multiplicative
   decrease").  It will be halved again approximately every RTT until
   the missing data gets through.  Once all missing data has gotten
   through, the transmission rate is increased by one MTU per RTT.
   Every time a new acknowledgment (i.e., not a duplicate
   acknowledgment) is received,  the rate is similarly increased
   (additive increase).

   Thus TCP can adjust its transmit rate very rapidly, i.e., it responds
   on the order of a RTT.

   For simplicity, this discussion only covers the "congestion
   avoidance" phase of TCP congestion control.  The analogy of TCP's
   "slow start phase" would also be needed.

   For PWs, the detection of congestion by the receiver is based on a
   periodic comparison of the number of packets received in an interval
   with the number transmitted.  Unless we are willing to sample at a
   rate of about half a RTT, PWE3 will have difficulty being as
   responsive.  The dynamic effects of sampling at a slow rate are
   difficult to understand.

   TCP can easily estimate the RTT, since all its transmissions are
   acknowledged.  In PWE3, the best way to estimate the RTT might be via
   the control protocol.  In fact, if the control protocol is TCP-based,
   getting the RTT estimate from TCP might be a good option.

   TCP's rate control is window-based, expressed as a number of bytes
   that can be in flight.  PWE3's rate control would need to be rate
   based, using a policing mechanism such as token bucket.

   If the congestion detection mechanism only produces an approximate
   result, the probability of a "false alarm" (thinking that there is
   congestion when there really is not) for some interval becomes
   significant.  It would be better then to have some algorithm which
   smoothes the result over several intervals.  The TFRC procedures,
   which tend to generate a smoother and less abrupt change in the
   transmission rate than the AIMD procedures, may also be more
   appropriate in this case.


Bryant, et al.                                                 [Page 15]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


5. Rate Control per Tunnel vs. per PW

   Rate controls can be applied on a per-tunnel basis or on a per-PW
   basis.  Applying them on a per-tunnel basis (and obtaining congestion
   feedback on a per-tunnel basis) would seem to provide the most
   efficient and most scalable system.  Achieving fairness among the PWs
   then becomes a local issue for the transmitter.

   However, if the different PWs follow different paths through the
   network, it is possible that some PWs will encounter congestion while
   some will not.  If rate controls are applied on a per-tunnel basis,
   then if any PW in a tunnel is affected by congestion, all the PWs in
   the tunnel will be throttled.  While this is sub-optimal, it is not
   clear that this would be a significant problem in practice, and it
   may still be the best trade-off.


6. Fixed Rate of Transmission Services

   Some PW services may require a fixed rate of transmission, and it may
   be impossible to provide the service while throttling the
   transmission rate.  To provide such services, the network paths must
   be engineered so that congestion is impossible; providing such
   services over the Internet is thus not very likely.  In fact, as
   congestion control cannot be applied to such services, it may be
   necessary to prohibit these services from being provided in the
   Internet, except in the case where the payload is known to consist of
   TCP connections. It is not known how such a prohibition could be
   enforced.

   One might try to be less draconian, by simply having the service
   turned off during periods of congestion.  The problem though is that
   there is no way to have it come up to speed slowly when the
   congestion disappears.

   If the fixed rate service is channelized, it may be possible to
   reduce the transmission rate by selectively shutting down channels,
   and to increase the transmission rate by adding back channels one at
   a time.

   In any event, the application of congestion control to fixed rate of
   transmission services is likely to be that all or part of the service
   gets shut down, an event which is likely to be made explicitly
   visible to the endusers.  This puts a premium on the ability to avoid
   "false alarms".


Bryant, et al.                                                 [Page 16]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


7. Mandatory vs. Optional

   As discussed in section 1, there are a significant set of scenarios
   in which PW-specific congestion control is not necessary.  One might
   therefore argue that it doesn't seem to make sense to require PW-
   specific congestion control to be used on all PWs at all times.  On
   the other hand, if the option of turning off PW-specific congestion
   control is available, there is nothing to stop a provider from
   turning it off in inappropriate situations.  As this may contribute
   to congestive collapse outside the provider's own network, it may not
   be advisable to allow this.


8. Informative References

   [RFC2001] RFC2001, "TCP Slow Start, Congestion Avoidance,
        Fast Retransmit, and Fast Recovery Algorithms", W. Stevens.
        January 1997

   [RFC2581] RFC2581, "TCP Congestion Control", M. Allman, V. Paxson,
        R. Stevens, April 1999

   [RFC2914] RFC2914, "Congestion Control Principles", S. Floyd.
        September 2000

   [RFC3168] RFC3168, "The Addition of Explicit Congestion
        Notification (ECN) to IP", K. Ramakrishnan, S. floyd, D. Black,
        September 2001

   [RFC3448] RFC3448, "TCP Friendly Rate Control (TFRC): Protocol
        Specification", M handley, S. Floyd, J. Padhye, J. Widmer,
        January 2003

   [VCCV] "Pseudo Wire (PW) Virtual Circuit Connection Verification
        (VCCV)", draft-ietf-pwe3-vccv-09.txt, Nadeau and Aggarwal,
        editors, June 2006


9. Author's Addresses


      Eric C. Rosen
      Cisco Systems, Inc.
      1414 Massachusetts Avenue
      Boxborough, MA 01719
      Email: erosen@cisco.com


Bryant, et al.                                                 [Page 17]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


      Luca Martini
      Cisco Systems, Inc.
      9155 East Nichols Avenue, Suite 400
      Englewood, CO, 80112
      Email: lmartini@cisco.com


      Bruce Davie
      Cisco Systems, Inc.
      1414 Massachusetts Avenue
      Boxborough, MA 01719
      Email: bdavie@cisco.com


      Stewart Bryant
      Cisco Systems,
      250, Longwater,
      Green Park,
      Reading, RG2 6GB,
      United Kingdom
      Email: stbryant@cisco.com


10. Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at ietf-
   ipr@ietf.org.


Bryant, et al.                                                 [Page 18]

Internet Draft     draft-rosen-pwe3-congestion-03.txt          June 2006


11. Full Copyright Statement

   Copyright (C) The Internet Society (2006).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Bryant, et al.                                                 [Page 19]