TE-WG Working Group Heinrich Hummel Internet Draft Siemens AG Expiration Date: October 2000 April 2000 Orchestrally conducted Traffic ( OCT ) draft-hummel-te-oct-02.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This draft proposes to engineer traffic such that we can speak of an Orchestrally Conducted Traffic (OCT): Any traffic stream (given by its source and destination nodes and by a priority class) may use several differently routed LSPs. Each, traffic stream ingress reports, periodically, the currently measured traffic load (in bitrate) to a common TE Conductor (TEC) who evaluates them as to forecast what traffic load change might occur in the immediate time ahead. Accordingly the TEC computes well synchronized likelihood values by which to take which route/LSP. These values will be such that any traffic stream will be served as good as possible, as often as possible, while equally ranked traffic streams are treated fair, higher ranked streams are prioritized and the network thruput is maximized. The TEC sends the likelihood values to the respective ingress node, who will distribute the received packets of the traffic stream to the pertaining LSPs accordingly. Hummel April 2000 [Page 1] Orchestrally conducted Traffic (OCT) Exp. Oct. 2000 1. Introduction and summary This draft proposes to engineer traffic such that we can speak of an Orchestrally Conducted Traffic (OCT): Any traffic stream (given by its source and destination nodes and by a priority class) may use several differently routed Label Switched Pathes (LSPs). Each, traffic stream ingress reports, periodically, the currently measured traffic load (=bitrate value) to a common Traffic Engineering Conductor (TEC) who evaluates them as to forecast the immediately upcoming traffic load. Accordingly the TEC computes well synchronized likelihood values by which to take which route/LSP. These values will be such that any traffic stream will be served as good as possible, as often as possible, while equally ranked traffic streams are treated fair, higher ranked streams are prioritized and the network thruput is maximized. The TEC sends the likelihood values to the respective ingress node, who will distribute the received packets of the traffic stream to the pertaining LSPs accordingly. 2. OCT concept in detail 2.1 Goals and Basics The OCT concept is based on a periodic information exchange between the ingress nodes of all traffic streams and a central Traffic Engineering Conductor (TEC). For now, it is assumed that communication channels are established (i.e. special LSPs) to cater this information exchange. It is also assumed that one or several differently routed LSPs per traffic stream are established for carrying the user data. The OCT concept solves the question "by which likelihood should which LSP been taken so that the entire traffic is balanced best onto the given network ?" Best balancing includes many aspects, some of which have already been mentioned: Higher ranked traffic streams shall be prioritized versus lower ranked traffic streams. Equally ranked traffic streams shall be serviced equally fair. The thruput should be maximized. But implicitly there are even more properties of "best balancing": avoiding congestions as best as possible, resolving a congestion without creating a new congestion at just another location. Maximizing the thruput: It means: Among equally ranked traffic streams, give preference nevertheless to that portion of a traffic stream that can be serviced by a shorter LSP (using less hops). Eventually such a preferenced handling is beneficial to all traffic streams (e.g. so that all traffic streams can be completely serviced rather than some are more, some are less deteriorated). Eventually however the traffic streams that use shorter LSPs are indeed better Hummel April 2000 [Page 2] Orchestrally conducted Traffic (OCT) Exp. Oct. 2000 serviced than the others. Conclusion: Maximizing the thruput is more important than equal treatment. Let us partition the whole traffic into traffic streams T(i,e,p), each of which is identified by an information-triplet {i-ngress node, e-gress node, p-riority class}. By other words, a traffic stream is the sum of all IP-Micro flows which enter the MPLS network domain at the same node (= ingress node), which will leave the MPLS network domain at the same node (=egress node) and which share the same priority class, based on their DiffServ parameters. For each traffic stream T(i,e,p) there shall be n(i,e,p) pre-established label switched pathes, i.e. LSP-k; k=1,...,n(i,e,p), for transmitting its packets. LSP-1 shall be the most preferrable one, LSP-n(i,e,p) the least preferrable one. An ingress node i knows all traffic streams T(i,e,p) it originates. For each of them it knows all respective n(i,e,p) LSPs, incl. all their link sequences, and measures, continuously, the respective current total traffic load as well as the individual current traffic load share per LSP. Periodically the ingress node i reports the total traffic load (in bitrate) as well as the LSP-specific traffic load shares to the TEC -with respect to all traffic streams it originates. Besides these reports, the TEC shall know the network topology incl. the maximum bandwidth BWj of each network link j. Furthermore it shall know the link sequences of all LSPs for all traffic streams in the network. The sending of the reports to the TEC by all ingress nodes may be slightly jittered, but by neglecting the jitter, the TEC shall receive all reports with respect to the same time of day. It is advisable that any report also contain either the exact time of day it has been sent or that time of day the report is meant for. 2.2 Forecasting traffic load The TEC may apply different mathematical/heuristical methods, algorithms, concepts, theories as to compute / forecast the total traffic load TTL (i,e,p) for each traffic stream T(i,e,p). For example, two methods are shown by the following: Method 1: Make a linear extrapolation based on the youngest traffic load value at time t-1 ("t minus one") in the past and the present traffic load value at time t0 ("t zero"), as to determine the future traffic load at time t1 ("t one"). If total traffic load TTL(i,e,p)(t1) exceeds Hummel April 2000 [Page 3] Orchestrally conducted Traffic (OCT) Exp. Oct. 2000 the highest traffic load ever reported for traffic stream T(i,e,p) then replace the extrapolated value by the highest total traffic load ever reported for T(i,e,p). If total traffic load TTL (i,e,p) (t1) turns out negative then replace this value by the value 0 + epsilon. Method 2 (very promising, if upstream from ingress node i* there is a webpage of general interest): Build the average traffic load AVTL (i*,e,p) (t0) out of all currently measured traffic loads TTL (i*,e,p) (t0) and forecast for all traffic streams TTL(i*,e,p) (t1) = AVTL(i*,e,p) (t0). Certainly there will be more methods as to forecast any total traffic load for time t1. Learning from the success: Note that one report-timer-interval later the prognosticated traffic load can be verified, how well it turned out. We can make statistics which method is more often more successful than the other methods. We may apply the more successful method more often ! So we may learn from the success. 2.3 Prosaic description how to compute likelihood values Imagine the TEC as a person sitting in front of a casino table. On the table however, the topology graph of his network is drawn. On each ingress node i the TEC has placed, with respect to traffic stream T(i,e,p), a pile of uniformly sized (traffic load) chips: The height of the pile amounts to the absolut value of the forecasted total traffic load TTL (i,e,p) (t1). Let's call it total traffic load-pile. (Eventually, at the bottom of any such pile, there may be a chip whose size is smaller than the uniformly sized chips.) On any link j of the network there is also a pile of (bandwidth) chips whose height amounts to the maximum bandwidth BWj of that link. Let's call it link-j-bandwidth pile. (Eventually, at the bottom of any link-j-bandwidth pile, there may be a chip whose size is smaller than the uniformly sized chips.) The goal is to distribute the chips of each total traffic load-pile by piling up respective LSP-specific traffic load piles. Eventually, a remainder of the total traffic load-pile cannot be distributed in a way which is shown next. It will reflect that proportion of the Hummel April 2000 [Page 4] Orchestrally conducted Traffic (OCT) Exp. Oct. 2000 traffic load which cannot be transmitted. The TEC processes one chip after the other, each time,at least in general, from a different total traffic load-pile: The TEC tries to move the chip to the k-th LSP's pile; (k=1,2,...,n(i,e,p) ). At the beginning k=1. Whenever a traffic load chip is moved to the k-th LSP-specific pile, a (same sized) bandwidth chip is removed from each link-j-bandwidth pile of all those links j which this k-th LSP traverses. If this is not feasable because at least one of these links lacks such a bandwidth chip, the same is tried with the (k+1)st LSP. If it is not feasable with any of the n(i,e,p) LSPs then there will be a remainder total traffic load pile whose (normed) height will indicate the likelihood by which a packet from traffic stream T(i,e,p) shall not be transmitted. The sequence by which the traffic load chips of all total traffic load -piles are handled as described above is very important as to: achieve fairness among equally ranked traffic streams, priortize/discriminate higher/lower ranked traffic streams, and get best thruput: Higher ranked traffic streams will be preferred versus the lower ranked traffic streams by processing ALL their traffic load chips first. Fairness among equally ranked traffic streams will be achieved by processing their traffic load chips alternatingly. However, make also the following exception: Among equally ranked traffic streams process at first those traffic load chips which can be moved to "shorter" LSPs, i.e. to LSPs with fewer hops. When all the traffic load chips have been processed, the heights of the resulting LSP-specific piles and of the remainder pile need to be normed i.e. divided by the sum of all these heights as to get likelihood values. The TEC shall send back to any ingress node i these likelihood values with respect to all traffic streams T(i,e,p) which are originated at that node i. 2.4 Distributing the traffic load onto the LSPs An ingresss node must distribute the received traffic load onto the respective LSPs according to the likelihood values. Ingress node i whichs receives a new IP packet may identify the traffic stream to Hummel April 2000 [Page 5] Orchestrally conducted Traffic (OCT) Exp. Oct. 2000 which this packet belongs. I.e. it may identify its egress node based on its IP destination address and may identify its priority class based on its Diffserv parameters. (a): In accordance with the actual likelihood values it will select an appropriate LSP for transmitting this packet, respectively will dishonor its forwarding request: The ingress node may build a hash with respect to the IP source addresses of the IP packets of T(i,e,p). The simplest hash is such that it intersects the entire IP UNICAST address space into n(i,e,p) +1 intervalls. The sizes of these intervalls shall be proportional to the likelihood values. The IP source address of a received IP packet of a particular traffic stream will be examined as to determine from which interval it is part of. If it is from the k-th interval, then it is forwarded along the k-th LSP; k=1,...,n(i,e,p). If it is from the n(i,e,p)+1 st intervall, then it is not forwarded at all. This very simple hash mechanism suits best if the ingress node is fairly in the "center of the internet", i.e. receives IP packets containing any existing IP UNICAST source address. (b): An ingress node i (e.g. of an access network) may know, by means of network management configuration or by learning while operating, the appropriate candidate IP source address range which may be intersected as to form the right hash-intervals. (c): If an ingress node receives IP packets from a fairly limited number of sources, e.g. given by less than 100 different IP source addresses, it may be feasable to queue up information pertaining to all currently active IP micro flows: A queue element may consist of an IP source address and the LSP to be selected. If the first packet of a new IP micro flow is received, i.e. if no queued information can be found that matches the received packet's IP source address then a new queue element must be allocated whose LSP information may be determined by this: Generate a random number which hits/selects any column of a statistical stage function which is built from the likelihood values. The k-th column will select the k-th LSP. It shall be used and also entered in the new queue element. Any succeeding packet of a particular IP micro flow may be sent along the same LSP. Advantage: all packets of the same IP micro flow will arrive in proper sequence at the egress node. A mechanism needs to be added to determine the end of the lifetime of an IP micro flow, i.e. the point in time when a queue element is deleted: Maintain an information in the queue element which is refreshed/incremented by the transmission request of each packet of the IP micro flow. Employ Hummel April 2000 [Page 6] Orchestrally conducted Traffic (OCT) Exp. Oct. 2000 a periodic timer. Whenever it expires check all queue elements for all traffic streams originated in this ingress node whether they are aged out. If so, delete them. 3. Further Aspects 3.1 Other traffic load input: The TEC may also receive information about SCHEDULED traffic from other senders than the ingress nodes. Such senders may be a policy agent or a Network Management station/agent. 3.2 Adding/Removing another LSP per traffic stream 3.2.1 Adding an LSP: In the preceding it was assumed that there are some LSPs per traffic stream, without saying when and why an(other) LSP may be added or removed. An obvious reason for establishing another LSP is given, when the TEC determines a considerable likelihood for NOT forwarding some packets of the traffic stream along any of the existing LSPs. Indeed, the TEC may compute the route for another LSP while taking into account the currently available bandwidth in the network due to the present traffic loads and may send to the right ingress node the computed route for the new LSP - as well as the likelihood by which to use it. 3.2.1 Removing a LSP: Over a longer period of time, an ingress node just like the TEC may notice that an LSP is hardly used. It may be aborted. TEC and ingress node must inform each other. 3.3 Delaying instead of dishonoring packet transmission Despite of best traffic balancing and even despite of 3.2 there may be IP packets whose transmission request might be dishonored. Alternatively, their transmission may only be delayed. Further study is required how to do it and what are the consequences with respect to the entire "traffic balancing result". 3.4 On-demand LSPs: Another variation would be to establish LSPs just on demand. The ingress node may have for a traffic stream, instead of a set of existing LSPs, a set of routes to choose from. The described procedure as to determine the right LSP for transmitting an IP packet may be applied analogously however as to determine the explicit route to use when the on-demand LSP is to be established. Hummel April 2000 [Page 7] Orchestrally conducted Traffic (OCT) Exp. Oct. 2000 3.5 Long term results Over a longer period of time, while and though the total traffic is orchestrally conducted, the TEC may find out which links are permanently under-utilized resp. over-utilized and may conclude which links should be stocked up by which amount of bandwidth respectively may conclude which links may be removed or used differently by becoming re-configured. Determining to which extent a link is permanently under-utilized: Simplest method: determine the smallest available bandwidth ever of that link. Determining to which extent the bandwidth of a link should be stocked up: After the TEC has computed and sent out the likelihood values for all traffic streams, it may compute likelihood values for a fictitious "should be"-network by the following: In one additional cycle the TEC may process each remainder TTL (i,e,p)-pile like being one big chip and by moving this "big chip" onto its most preferred LSP-1. Hereby substract from each affected link-j-bandwidth pile the size of this "big chip" and do not mind if the height of any link-j-bandwidth pile becomes a negative number. Indeed,after this cycle any resulting negative link-j-bandwidth pile, indicates the absolute amount of bandwidth by which link j should be stocked up. Of course, average values should be built, over a longer period of time, as to determine the permanent over/under-utilization of any link in the network. 3.6 Equally ranked LSPs So far it has been assumed that per traffic stream T(i,e,p) there are n(i,e,p) LSPs and that they are in ranking order: LSP-1 > LSP-2 >....> LSP-n(i,e,p) . LSP-1 ist the most favorable LSP, LSP-n(i,e,p) is the least favorable LSP e.g. in the sense that the more favorable LSP consists of fewer hops than the less favorable LSP. However among the n(i,e,p) LSPs there may be subsets of LSPs which are equally favorable and should be treated as equally ranked. Accordingly, when another traffic load chip shall be distributed to any LSP-specific traffic load pile, we should at first try any LSP from the best ranked subset of LSPs. That LSP should be selected such that an even distribution among the LSPs of this subset is warrented (which is accomplished by proper alternation). Hummel April 2000 [Page 8] Orchestrally conducted Traffic (OCT) Exp. Oct. 2000 Proceed analogously with the next ranked subset of LSPs, in case no LSP of the better ranked subset can accommodate this chip. 3.7 Central TEC versus de-centralized solutions This OCT concept operates with a central TEC. There may also be other alternative concepts which operate in a de-centralized fashion. In case the TE-WG is interested in standardizing a protocol according to this OCT concept, I would like to point out one important advantage of the centralized solution, i.e. one associated standardization objective: The ingress nodes have to be provided with some basic initial software - just like the TEC. Later on the TEC may get smarter and smarter by getting newer software versions. At the same time however the ingress nodes can keep their old initial software. 3.8. Standardization - required / not required It is possible to install the OCT concept in a completely proprietary fashon, i.e. without standardizing any new messages , TLVs , codepoints or well-known TCP port number. In a simplest implementation special p2p LSPs are established from each ingress node to the TEC and vice versa for exchanging all relevant information in proprietary syntax. Alternatively the TEC may send out the likelihood values down a spanning tree, which is provided by a tree-structured source routing information called TREE ROUTE TLV, see [3], whereby each node i* will become aware which part of the received info is to forward along which child link and which part is just targetted to itself - which are all likelihood values for all traffic streams T(i*,e,p). It is up to the TE-WG (and maybe others) whether or not to standardize all required protocols and formats with respect to the communication between any ingress node and the TEC, as well as between any Network Management and/or Policy Agent and the TEC. 4. Intellectual Property Considerations Siemens AG may seek patent or other intellectual property protection for some or all of the technologies disclosed in this document. If any standards arising from this document are or become protected by one or more patents assigned to Siemens AG. Siemens AG intend to disclose those patents and license them under openly specified and non-discriminatory terms. 5. References Hummel April 2000 [Page 9] Orchestrally conducted Traffic (OCT) Exp. Oct. 2000 [1] "Traffic Engineering Concept", draft-awduche-mpls-traffic-eng-00.txt, Awduche [2] "Provider Architecture for Differentiated Services and Traffic Engineering (PASTE)", 1/1998, draft-li-paste-00.txt, Li, Rekhter [3] "Explicit Tree Routing",draft-hummel-mpls-tree-route-01.txt, Hummel, Loke 6. Author's Address Heinrich Hummel Siemens AG Hofmannstrasse 51 81379 Munich, Germany Tel: +49 89 722 32057 Email: heinrich.hummel@icn.siemens.de Full Copyright Statement "Copyright (C) The Internet Society (March 2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. Hummel April 2000 [Page 10]