Internet Draft Mick Seaman Expires May 1997 3Com Corp. draft-ietf-issll-802-00.txt Andrew Smith Extreme Networks Eric Crawley Bay Networks November 1996 Integrated Services over IEEE 802.1D/802.1p Networks Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. Abstract This document describes the support of IETF Integrated Services over LANs built from IEEE 802 network segments which are interconnected by standard IEEE 8021.D [1] switches. It describes the practical capabilities and limitations of this technology for supporting Controlled Load [8] and Guaranteed Service [9] using the inherent capabilities the relevant 802 technologies [5],[6] etc. and the proposed 802.1p queuing features in switches. It provides a functional model for the layer 3 to layer 2 and user-to-network dialogue which supports admission control and defines requirements for interoperability between switches. This scheme is consistent with the ISSLL over LANs framework discussed at the October 1996 ISSLL interim meeting and described in [7]. Seaman, Smith, Crawley Expires May 1997 [Page 1] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 1. Introduction The IEEE 802.1 Interworking Task Group is currently enhancing the basic MAC Service provided in Bridged Local Area Networks (aka "switched LANs"). As a supplement to the IEEE MAC Bridges standard [1] , P802.1p [2], proposes differential traffic class queuing ("priorities") and access to media on the basis of a "user_priority" signaled in frames. In this document we * review the meaning and use of user_priority in LANs and the frame forwarding capabilities of a standard LAN switch. * examine alternatives for identifying layer 2 traffic flows for admission control. * review the options available for policing traffic flows. * derive requirements for consistent priority handling in a network of switches and use these requirements to discuss priority queue handling alternatives for 802.1p and the way in which these meet administrative and interoperability goals. * consider the benefits and limitations of this switched-based approach, contrasting it with full router based RSVP implementation in terms of complexity, utilisation of transmission resources and administrative controls. We then describe a model which: * partitions the admission control process into two separable operations: * an interaction between the user of the integrated service and the local network elements ("provision of the service" in the terms of 802.1D) to confirm the availability of transmission resources for traffic to be introduced. * selection of an appropriate user_priority for that traffic on the basis of the service and service parameters to be supported. * distinguishes between the user to network interface above and the mechanisms used by the switches ("support of the service"). These include communication between the switches (network to network signaling). * describes a simple architecture for the provision and support of these services, broken down into components with functional and interface descriptions: * a single "user" component: a layer-3 to layer-2 negotiation and translation component. * bridge/switch processes to handle admission control and mapping requests, including proposals for actual traffic mappings to user_priority values. * proposes a set of protocol exchange primitives based on the functions introduced. This document contains much background material that is used as Seaman, Smith, Crawley Expires May 1997 [Page 2] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 justification for the approach taken. It is anticipated that much of this material will not form a part of the final specification. It will be noted that this document is written from the pragmatic viewpoint that there will be a widely deployed network technology and we are evaluating it for its ability to support some or all of the defined IETF integrated services: this approach is intended to ensure development of a system which can provide useful new capabilities in existing (and soon to be deployed) network infrastructure. 2. Goals and Assumptions It is assumed that the network is "switch-rich": that is to say all communication between end stations using integrated services support will pass through at least one switch. Perhaps the mechanisms and protocols described will be trivially extensible to communicating systems on the same shared media, but it is important not to allow problem generalisation to complicate the practical application that we target: the access characteristics of Ethernet are forcing a trend to switch-rich topologies together with MAC enhancements to ensure access predictability on half-duplex switch to switch links. It is assumed that layer-3 entities, including end-stations, are running the RSVP protocol in support of integrated services at that layer. No extra modifications to this protocol are assumed. There may be a heterogeneous mixture of switches with different capabilities, all compliant with IEEE 802.1p, but implementing queuing and forwarding mechanisms in a range from simple 2-queue per port, strict priority, up to more complex multi-queue (maybe even one per- flow) WFQ or other algorithms. The problem is broken down into smaller independent pieces: this may lead to sub-optimal usage of the network resources but we contend that such benefits are often equivalent to very small improvements in network efficiency in a LAN environment. Therefore, it is a goal that the switches in the network operate using a much simpler set of information than the RSVP engine in a router. In particular, it is assumed that such switches do not need to implement per-flow queuing and policing. One corollary is that no per-flow policing function need take place in the switches: it is a fundamental part of the intserv model that flows are isolated from each other throughout their transit across a network. Intermediate queuing nodes are expected to police the traffic to ensure that it conforms to the pre-agreed traffic flow specification. In the architecture proposed here for mapping to layer-2, that policing function is assumed to be implemented in the transmit schedulers of the Seaman, Smith, Crawley Expires May 1997 [Page 3] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 layer-3 devices (end stations, routers): it is reasonable to assume that end stations are "trusted" to adhere to their agreed contracts at the inputs to the network and that we can afford to over-allocate resources to compensate for the inevitable extra jitter/bunching introduced by the switched network itself. 3. User Priority and Frame Forwarding User_priority is a value associated with the transmission and reception of all frames in the IEEE 802 service model: it is supplied by a sender which is using the MAC service. It is provided to a receiver using the MAC service. It may or may not be actually carried over the network: Token-Ring/802.5 carries this value (encoded in its FC octet), basic Ethernet/802.3 does not. 802.1p defines a way to carry this value over the network in a similar way on Ethernet, Token Ring, FDDI or other MACs using an extended frame format. The "user_priority" or "traffic class" (the latter term is to be preferred and it is the title of the 802.1p document) field in packets is a simple label in the data stream enabling packets in different classes to be discriminated by downstream nodes. Apart from making the job of desktop or wiring-closet switches easier, it means they do not have to change (hardware or software) as the rules for classifying packets evolve (based on new protocols or new policies). Layer-3 switches do provide added value here by performing the classification more accurately and, hence, utilising network resources more efficiently: this appears to be a good economic choice since there are likely to be very many more desktop/wiring closet switches in a network than switches requiring layer 3 functionality. The IEEE 802 specifications make no assumptions about how user_priority is to be used by end stations or by the network, although the current 802.1p draft defines static priority queuing as the default mode of operation of all switches (user_priority is defined as a 3-bit quantity with value 7 = high priority, 0 = low priority). The switch algorithm in this case is as follows: packets are placed onto a particular queue based on the received user_priority (from the packet if a 802.1p header or 802.5 network was used, invented according to some local policy if not). The selection of queue is based on a mapping from user_priority [0,1,2,3,4,5,6 or 7] onto the number of available queues - switches may implement any number of queues from 1 upwards. On transmit, any/all frames from a higher priority queue are sent first before transmitting any from a lower priority queue. In particular, IEEE makes no recommendations about how a sender should select the value for user_priority: one of the main purposes of this draft is to propose such usage rules. Seaman, Smith, Crawley Expires May 1997 [Page 4] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 Additionally, there are no IEEE 802-defined rules for switches to agree on how to treat frames with different user_priority values: later on in this draft we make some recommendations as to what information needs to be shared amongst switches. 4. Mapping of integrated services to layer-2 in layer-3 devices The end-station or router itself is responsible for local admission control and scheduling packets onto its link in accordance with the service agreed. Just as in the intserv model, this involves per- flow schedulers somewhere in every such data source: it is an implementation issue whether there are separate schedulers for layer-3 and layer-2 or whether these are combined. 5. Mapping of integrated services through layer-2 switches 5.1 Queuing Connectionless packet-based networks in general and LAN switched networks in particular, work today because of scaling choices in network provisioning. Consciously or (more usually) unconsciously, enough excess bandwidth and buffering is provisioned in the network to absorb the traffic sourced by higher-layer protocols or cause their transmission windows to run out, on a statistical basis, so that the network is only overloaded for a short duration and the average expected loading is less than 60% (usually much less). With the advent of time-critical traffic such overprovisioning has become far less easy to achieve. Time critical frames may find themselves queued for annoyingly long periods of time behind temporary bursts of file transfer traffic, particularly at network bottleneck points, e.g. at the 100 Mb/s to 10 Mb/s transition that might occur between the riser to the wiring closet and the final link to the user from a desktop switch. In this case, however, if it is known (guaranteed by application design, merely expected on the basis of statistics, or just that this is all that the network guarantees to support) that the time critical traffic is a small fraction of the total bandwidth, it suffices to give it strict priority over the "normal" traffic. The worst case delay experienced by the time critical traffic is roughly the maximum transmission time of a maximum length non-time-critical frame - less than a millisecond for 10 Mb/s Ethernet, and well below an end to end budget based on human perception times. When more than one "priority" service is to be offered by a network element e.g. it supports controlled-load as well as Guaranteed Service, the queuing discipline becomes more complex. In order to provide the Seaman, Smith, Crawley Expires May 1997 [Page 5] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 required isolation between the service classes, it will probably be necessary to queue them separately. There is then an issue of how to service the queues - a combination of admission control and maybe weighted fair queuing may be required in such cases. As with the service specifications themselves, it is not the place for this document to specify queuing algorithms, merely to observe that the external behaviour meet the services' requirements. 5.2 Multicast Heterogeneity IEEE 802.1D and 802.1p use a model for multicast whereby a switch performs multicast routing decisions based on the destination address: this would produce a list of output ports to which the packet should be forwarded. In its default mode, such a switch would use any user_priority value in received packets to enqueue the packets at each output port. At layer-3, the intserv model allows heterogeneous multicast flows where different branches of a tree can have different types of reservations for a given multicast destination, or even supports the notion that some trees will have some branches with reserved flows and some using best effort (default) service. If a switch is selecting per-port output queues based only on the incoming user_priority, it will have to treat all branches of all multicast sessions within that user_priority class with the same queuing mechanism: no heterogeneity is then possible (if it were to implement a separate mapping at each output port then some limited form of heterogeneity could be supported). It is proposed that per- user_priority queuing support is adequate as minimum standard functionality for systems *in a LAN environment*. Layer-3 switches (a.k.a. routers) can be used if more flexible forms of heterogeneity are considered necessary: their behaviour is well standardised. 6. Selecting User Priority classes One fundamental question is "who gets to decide what the classes mean and who gets access to them?" One approach would be for the meanings of the classes to be "well-known": we would then need to standardise a set of classes e.g. 1 = best effort, 2 = controlled- load, 3 = guaranteed (loose delay bound, high bandwidth), 4 = guaranteed (slightly tighter delay) etc. The values to encode in such a table in end stations, in isolation from the network to which they are connected, is problematical: the best we could probably do would be to define on user_priority value per intserv service type and leave it at that (reserving the rest of the combinations for future traffic classes - there are sure to be plenty!). Seaman, Smith, Crawley Expires May 1997 [Page 6] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 We propose a more flexible mapping: clients ask "the network" which user_priority traffic class to use for a given traffic flow, as categorised by its flow-spec and layer-2 endpoints. The network provides a value back to the requester which is appropriate to the current network topology, load conditions, other admitted flows etc. The task of configuring switches with this mapping (e.g. through network management or some other switch-switch protocol) is an order of magnitude less complex than performing the same function in end stations. Also, when new services (or other network reconfigurations) are added to such a network, the network elements will typically be the ones to be upgraded with new queuing algorithms etc. and can be provided with new mappings at this time. Given the need for a new session or "flow" requiring some QoS support, a client then needs answers to the following questions: 1. which traffic class do I add this flow to? The client needs to know how to label the packets of the flow as it places them into the network. 2. who do I ask/tell? The proposed model is that a client ask "the network" which user_priority traffic class to use for a given traffic flow. This has several benefits as compared to a model which allows clients to select a class for themselves. 3. how do I ask/tell them? A request/response protocol is needed between client and network: in fact, the request can be piggy-backed onto an admission control request and the response can be piggy-backed onto an admission control acknowledgment. The network (i.e. the first network element encountered downstream from the client) must then answer the following questions: 1. which traffic class do I add this flow to? This is a packing problem, difficult to solve in general, but many simplifying assumptions can be made: presumably some simple form of allocation can be done without a more complex scheme able to dynamically shift flows around between classes. 2. which traffic class has worst-case parameters which meet the needs of this flow? This might be an ordering/comparison problem: which of two service classes is "better" than another? Again, we can make this tractable by observing that all of the current intserv classes can be ranked (best effort <= Controlled Load <= Guaranteed Service) in a simple manner. If any classes are implemented in the future that cannot be simply ranked Seaman, Smith, Crawley Expires May 1997 [Page 7] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 then the issue can be finessed by either a priori knowledge about what classes are supported or by configuration. and return the chosen user_priority value to the client. Note that the client may be either an end station, router or a first switch which may be acting as a proxy for a client which does not participate in these protocols for whatever reason. Note also that a device e.g. a server or router, may choose to implement both the "client" as well as the "network" portion of this model so that it can select its own user_priority values: such an implementation is, however, discouraged unless the device really does have a close tie-in with the network topology and resource allocation policies. 7. Flow Identification Several previous proposals for intserv over lower-layers have treated switches very much as a special case of routers: in particular, that switches along the data path will make packet handling decisions based on the RSVP flow and filter specifications and use them to classify the corresponding data packets. However, filtering to the per-flow level becomes cost-prohibitive with increasing switch speed: devices with such filtering capabilities are unlikely to have a very different implementation cost to IP routers, in which case we must question whether a specification oriented toward switched networks is of any benefit at all. This document proposes that "flow" identification based in user_priority be the minimum required of switches. 8. Reserving Network Resources - Admission Control So far we have not discussed admission control. In fact, without admission control it is possible to scratchbuild a LAN network of some size capable of supporting real-time services, providing that the traffic fits within certain scaling constraints (relative link speeds, numbers of ports etc. - see below). This is not surprising since it is possible to run a fair approximation to real time services on small LANs today with no admission control or help from encoded priority bits. Imagine a campus network providing dedicated 10 Mbps connections to each user. Each floor of each building supports up to 96 users, organized into groups of 24, with each group being supported by a 100 Mbps downlink to a basement switch which concentrates 5 floors (20 x 100 Mbps) and a data center (4 x 100 Mbps) to a 1 Gbps link to an 8 Gbps central campus switch, which in turn hooks 6 buildings together (with 2 Seaman, Smith, Crawley Expires May 1997 [Page 8] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 x 1 Gbps full-duplex links to support a corporate server farm). Such a network could support 1.5 Mb/s of voice/video from every user to any other user or (for half the population) the server farm, provided the video ran high priority: this gives 3000 users, all with desktop video conferencing running along with file transfer/email etc. In such a network RSVP's role would be limited to ensuring resource availability at the communicating end stations and for connection to the wide area. In such a network, a discussion as to the best service policy to apply to high and low priority queues may prove academic: while it is true that "normal" traffic may be delayed by bunches of high priority frames, queuing theory tells us that the average queue occupancy in the high priority queue at any switch port will be somewhat less than 1 (with real user behaviour, i.e. not all watching video conferences all the time) it should be far less. A cheaper alternative to buying equipment with a fancy queue service policy may be to buy equipment with more bandwidth to lower the average link utilisation by a few per cent. In practice a number of objections can be made to such a simple solution. There may be long established expensive equipment in the network which does not provide all the bandwidth required. There will be considerable concern over who is allowed to say what traffic is high priority. There may be a wish to give some form of "prioritised" service to crucial business applications, above that given to experimental video-conferencing. The task that faces us is to provide a degree of control without making that control so elaborate to implement that the control oriented solution is not simply rejected in favor of providing yet more bandwidth, at a lower cost. The proposed admission control mechanism requires a query-response interaction with the network returning a "YES/NO" answer and, if successful, the user_priority value with which to tag the data frames of this flow. 9. Client mapping to layer 2 We assume the same host model as intserv and RSVP: the client is running an RSVP process which presents a session establishment interface to applications, signals RSVP over the network, programs scheduler and classifiers in the driver and interfaces to a policy control module. In particular, RSVP also interfaces to a local admission control module: it is this entity that we focus on here. Seaman, Smith, Crawley Expires May 1997 [Page 9] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 The following diagram is taken from the RSVP spec: _____________________________ | _______ | | | | _______ | | |Appli- | | | | RSVP | | cation| | RSVP <-------------------- | | <-- | | | | | |process| _____ | | |_._____| | --Polcy|| | | |__.__._| |Cntrl|| | |data | | |_____|| |===|===========|==|==========| | | --------| | _____ | | | | | ----Admis|| | _V__V_ ___V____ |Cntrl|| | | | | | |_____|| | |Class-| | Packet | | | | ifier|==Schedulr|==================== | |______| |________| | data | | |_____________________________| Figure 1 - RSVP in Hosts The local admission control entity (known as "TUTU") within a client is responsible for mapping these layer-3 requests in TO layer TwO language. The upper-layer entity requests from TUTU: "May I reserve for traffic with <traffic characteristic with <performance requirements from <here to <there and how should I label it?" where <traffic characteristic = Flow Spec, Tspec, Rspec (e.g. bandwidth, burstiness, MTU etc.) <performance requirements = latency, jitter bounds etc. <here = IP address(es) <there = IP address(es) - may be multicast The TUTU entity: * maps the endpoints of the conversation to layer-2 addresses in the LAN, so it can figure out what traffic is really going where. * applies local admission control on outgoing link and driver (may have some interaction with classifier and scheduler here e.g. to give classifier information about which user_priority values to expect) * formats a request to the network with the mapped addresses and flow Seaman, Smith, Crawley Expires May 1997 [Page 10] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 specs * receives response from the network and reports the YES/NO admission control answer and, for successful requests, the resulting user_priority back to the upper layer entity. from IP from RSVP ____|____________|____________ | | | | | | __V____ | ___V___ | | | | | | | | | | ARP | | | | | ISSLL signaling | |protocl| | | TUTU |<------------------------ | | |<-| | | | | | | | | | | |_______| | | | | | | | |_______| | | |data | | | | |====|===========|==|==========| | | +--------| | _____ | | | | | +-|Local| | | __V__V_ ____V___ |Admis| | | | | | | |Cntrl| | | |Class-| | Packet | |_____| | | | ifier|==Schedulr|====================== | |______| |________| | data | | |______________________________| Figure 2 - ISSLL in Hosts 10. Switch Functions 10.1 Admission Control For the sake of this discussion, we define the following entities within a layer-2 switch: * traffic class mapping authority - this holds the mapping table of intserv classes to user_priority. * reservation accountants - one of these on each port accounts for the available bandwidth on that link. For half-duplex links, this involves taking account of both transmit and receive flows. For full-duplex the input port accountant's task is trivial. * reservation propagators - these propagate requests that have passed admission control at the input port's accountant to the relevant output ports' accountants. This will require access to the switch's forwarding table (layer-2 "routing table" - cf. RSVP model) and spanning-tree state. Seaman, Smith, Crawley Expires May 1997 [Page 11] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 These are shown by the following diagram: _______________________________ | _____ ______ _____ | | |Span | |filter| |traff| | | |Tree |<-|data- | |class| | | |Prot.| | base| |map | | | |_____| |______| |_____| | | ^ | | _____ __|___ ______ | ISSLL signaling | | in | | | | out | | ISSLL signaling <------------------|resv |<-| resv |<-| resv |<---------------- | |acct.| | prop.| | acct.| | | |_____| |______| /|______| | | | \ / | | |====|====\=========|======|====| | __V__ | | __V__ | | |Local| | | |Local| | | |Admis| | | |Admis| | | |Cntrl| | | |Cntrl| | | |_____| | | |_____| | | ____V_ __V____ | | |Class-| | Packet | | ===============-| ifier|====Schedulr|=================== data | |______| |________| | data | | |_______________________________| Figure 3 - ISSLL in Switches On reception of an admission control request, a switch performs the following actions: * ingress bandwidth accountant observes the current state of allocation of resources on the input port/link and then determines whether the new allocation would be excessive. The request is passed to the reservation propagator if accepted so far. * reservation propagator relays the request to the bandwidth accountants on each of the switch's outbound links to which this reservation would apply (implied interface to routing/forwarding database). * egress bandwidth accountant observes the current state of allocation of queueing resources on its outbound port and bandwidth on the link itself and determines whether the new allocation would be excessive. Note that this is only the local decision of this switch hop: each further layer-2 hop through the network gets a chance to veto the request as it passes along. * the request, if accepted by this switch, is then passed on down the line on each output link selected. * if this is the first switch in line, the traffic class mapping authority selects a layer-2 traffic class which appears compatible Seaman, Smith, Crawley Expires May 1997 [Page 12] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 with the request and whose use does not violate any administrative policies in force. In effect, it matches up the requested service with those available in each of the user_priority classes and chooses the "best" one. It ensures that, if this reservation is successful, the selected value is passed back to the client. * if accepted, the switch must notify the client of the user_priority to use for packets belonging to this flow. Note that this is a "provisional YES" - we assume an optimistic approach here: later switches can still say "NO" later. * if this switch wishes to reject the request, it can do so by notifying the original client (by means of its layer-2 address). 10.2 Mappings to IEEE 802 user_priority There are several options available for mapping service models (Best Effort, Controlled Load, and Guaranteed) to IEEE 802.1p user_priority classes. The problem with making choices at this time is that we don't have much experience with any particular mappings to help make a determination as to the "best" mapping. So, the following options are presented to stimulate discussion in this area. Note, this does not dictate what mechanisms/algorithms a network element (e.g. an Ethernet switch) needs to do implement these mappings: this is an implementation choice and does not matter so long as the requirements for the particular service model are met. In order to reduce the administrative problems of maintaining such mappings, such a mapping table is held by *switches* only (and routers if desired) and is a read-write table. The values proposed below are defaults and can be overridden by management control so long as all switches agree to some extent (the required level of agreement requires further thought). Option A: The Simple Method In this method, all traffic that uses a particular service model is mapped to a single 802.1p user_priority. This is fine as long as all traffic for a given service model does not exceed any capacity in the 802 device and fine control of delay is not needed. Here is an example: Priority Service 0 "less than" Best Effort 1 Best Effort 2 reserved 3 reserved 4 Controlled Load 5 reserved Seaman, Smith, Crawley Expires May 1997 [Page 13] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 6 Guaranteed Service 7 reserved The "less than" best effort service is useful for devices that wish to tag packets that are exceeding a committed network capacity and can be optionally discarded by a downstream device. Note, this is not necessarily incorporated in any current IntServ model. The advantage of this mapping is that it leaves room for future service models. The choices of priority 4 and priority 6 for Controlled Load and Guaranteed Service, respectively, is somewhat arbitrary. Any two priorities greater than Best Effort can be used as long as Guaranteed Service is "greater" than Controlled Service although those proposed here have the advantage that, for transit through 802.1p switches with only two-level strict priority queuing, they both get "high priority" treatment (the current 802.1p split is 0-3 and 4-7 for 2 queues). One disadvantage to this mapping is that it ignores the delay characteristics of the guaranteed service and groups all guaranteed traffic, no matter what the delay bound, into the same priority. Option B: Two Classes of Guaranteed Service For this method, we expand the number of priorities assigned to the Guaranteed Service: Priority Service 0 "less than" Best Effort 1 Best Effort 2 reserved 3 reserved 4 Controlled Load 5 Guaranteed Service, 100ms bound 6 Guaranteed Service, 10ms bound 7 reserved Again, the choices of the exact priorities are somewhat arbitrary as long as they are increasing. Similarly, the choice of delay bound is also arbitrary but potentially very significant. One of the key differences is that now there is a bound on delay through the network (and hence through each device) which may be much harder to implement although it can lead to a much more efficient allocation of resources. The advantage to this approach is that it puts some real delay bounds on the Guaranteed Service without adding any additional complexity to the other services. It still ignores the amount of *bandwidth* available for each class. Further derivations of this option could be made by dividing the Seaman, Smith, Crawley Expires May 1997 [Page 14] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 Guaranteed Service classes into more levels with particular delay bounds. Expanding the number of priorities for Controlled Load service is not as appealing since there is no need to map to a particular delay bound. There may be a cases where an administrator might map Controlled Load to more priorities for particular bandwidths or policy levels. It may also be necessary to further classify Controlled Load traffic in cases and where the Controlled Load traffic is frequently non-conformant for certain applications. 10.3 Policy A policy agent may also be implemented by a switch. This determines, how to interpret received user_priority values from packets, whether to trust them and whether to map them to something else. The policies in force may be configured by network management. Default is to use what is received and pass it on unchanged. 11. Signaling protocol It is not the intention to precisely define a protocol in this document at this time. For now, we propose only some issues that such a protocol should consider: * need to tackle problem of reservation request crossing on a shared medium ("collisions"): this needs some form of tie- breaker. * failed reservation retry policy: may be a bad idea to retry but we have to specify behaviour. * one simple approach might be to avoid the election of any "master" bandwidth arbiter on a segment: if we were to assume an optimistic approach to reservations with later "veto" power by subsequent switches or receivers then a large degree of complexity might be avoided. * signaling protocol needs to be able to notify failure of admission control back to client or back to previous switch hop. 12. Shared media The astute reader will have noticed that we have not mentioned the difficulty of dealing with allocation on a single shared CSMA/CD segment: there are a number of reasons for this. Firstly, we do not believe this is a truly solvable problem: it would seem to require a new MAC protocol. Those who are interested in solving this problem per se should probably be following the BLAM developments in 802.3 but we would be suspicious of the interoperability characteristics of a series of new software MACs running above the traditional 802.3 MAC. Seaman, Smith, Crawley Expires May 1997 [Page 15] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 Secondly, we are not convinced that it is really an interesting problem. While not everyone in the world is buying desktop switches today and there will be end stations living on repeated segments for some time to come, the number of switches is going up and the number of stations on repeated segments is going down. This trend is proceeding to the point that we may be happy with a solution which assumes that any network conversation requiring resource reservations will take place through at least one switch (be it layer-2 or layer-3). Put another way, the easiest QoS upgrade to a layer-2 network is to install segment switching: only when has been done is it worthwhile to investigate more complex solutions involving admission control. Thirdly, in the core of the network (as opposed to at the edges), there does not seem to be enough economic benefit for repeated segment solutions as opposed to switched solutions. While repeated solutions *may* be 50% cheaper, their cost impact on the entire network is amortised across all of the edge ports. There may be special circumstances in the future (e.g. Gigabit buffered repeaters) but these have differing characteristics to existing CSMA/CD repeaters anyway. 13. Compatibility and Interoperability with existing equipment Layer-2-only "standard" 802.1p switches will have to work together with routers and layer-3 switches. Wide deployment of such 802.1p switches is envisaged, in a number of roles in the network. "Desktop switches" will provide dedicated 10/100 Mbps links to end stations at costs comparable/compatible with NICs/adapter cards. Very high speed core switches may act as central campus switching points for layer 3 devices. Real network deployments provide a wide range of examples today. The question is "what functionality beyond that of the basic 802.1D bridge should such 802.1p switches provide?". In the abstract the answer is "whatever they can do to broaden the applicability of the switching solution while still being economically distinct from the layer 3 switches in their cost of acquisition, speed/bandwidth, cost of ownership and administration". Broadening the applicability means both addressing the needs of new traffic types and building larger switched networks (or making larger portions of existing networks switched). Thus one could imagine a network in which every device (along a network path) was layer-3 capable/intrusive into the full data stream; or one in which only the edge devices were pure layer-2; or one in which every alternate device lacked layer-3 functionality; or most do - excluding some key control points such as router firewalls, for example. Whatever the mix, the solution has to interoperate with these layer-3 QoS-aware devices. Of course, where intserv flows pass through equipment which is ignorant of priority queuing and which places all packets through the same queuing/overload-dropping path, it is obvious that some of the Seaman, Smith, Crawley Expires May 1997 [Page 16] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 characteristics of the flow get more difficult to support. Suitable courses of action in the cases where sufficient bandwidth or buffering is not available are of the form: (a) buy more (and bigger) routers (b) buy more capable switches (c) rearrange the network topology: 802.1Q VLANs may help here. (d) buy more bandwidth: Gigabit Ethernet is nearly here. It would also be possible to pass more information between switches about the capabilities of their neighbours and to route around non- QoS-capable switches: such methods are for further study. 14. Epilogue An obvious comment is that this is all too complex, it's what RSVP is doing already, why do we think we can do better by reinventing the solution to this problem at layer-2? The key is that we do not have to tackle the full problem space of RSVP: there are a number of simple scenarios that cover a considerable proportion of the real situations that occur: all we have to do here is cover 99% of the territory at significantly lower cost and leave the other applications to full RSVP running in strategically positioned high-function switches or routers. This will allow a significant reduction in overall network cost (equipment and ownership). This approach does mean that we have to discuss real life situations instead of abstract topologies that "could happen". Sometimes, for example, simple bandwidth configuration in a few switches e.g. to avoid overloading particular trunk links, can be used to overcome bottlenecks due to the network topology: if there are issues with overloading end station "last hops", RSVP in the end stations would exert the correct controls simply by examining local resources without much tie-in to the layer-2 topology. In this case there has been no need to resort to any form of complex topology computation and much complexity has been avoided. In the more general case, there remains work to be done. This will need to be done against the background constraint that the changing of queue service policies and the addition of extra functionality to support new service disciplines will proceed at the rate of hardware product development cycles and advance implementations of new algorithms may be pursued reluctantly or without the necessary 20-20 foresight. Seaman, Smith, Crawley Expires May 1997 [Page 17] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 However, compared to the alternative of no traffic classes at all, there is substantial benefit in even the simplest of approaches (e.g. 2-4 queues with straight priority), so there is significant reward for doing something: wide acceptance of that "something" probably means that even the simplest queue service disciplines will be provided for. 15. References [1] ISO/IEC 10038, ANSI/IEEE Std 802.1D-1993 "MAC Bridges" [2] "MAC Bridges - Traffic Classes and Dynamic Multicast Filtering Services in Bridged Local Area Networks", October 1996 IEEE P802.1p/D4 [3] "Integrated Services in the Internet Architecture: an Overview" RFC1633, June 1994 [4] "Resource Reservation Protocol (RSVP) - Version 1 Functional Specification" Internet Draft, November 1996 <draft-ietf-rsvp-spec-14.ps [5] "Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications" ANSI/IEEE Std 802.3-1985. [6] "Token-Ring Media Access Control" IEEE Std 802.5 [7] "A Framework for Providing Integrated Services Over Shared and Switched LAN Technologies", Internet Draft, November 1996 <draft-ghanwani-framework-is-lan-01.txt [8] "Specification of the Controlled-Load Network Element Service", Internet Draft, August 1996, <draft-ietf-intserv-ctrl-load-svc-03.txt [9] "Specification of Guaranteed Quality of Service", Internet Draft, August 1996, <draft-ietf-intserv-guaranteed-svc-06.txt 16. Security Considerations Security issues are not addressed in this memo. Seaman, Smith, Crawley Expires May 1997 [Page 18] INTERNET DRAFT Intserv over IEEE 802.1D/p November 1996 17. Authors' addresses Mick Seaman 3Com Corp. 5400 Bayfront Plaza Santa Clara CA 95052-8145 USA +1 (408) 764 5000 mick_seaman@3com.com Andrew Smith Extreme Networks 1601 S De Anza Blvd. #220 Cupertino CA 95014 USA +1 (408) 342 0999 andrew@extremenetworks.com Eric Crawley Bay Networks 3 Federal St. Billerica MA 01821 USA +1 (508) 670 8888 esc@baynetworks.com Seaman, Smith, Crawley Expires May 1997 [Page 19]