INTERNET-DRAFT Michael Smirnov GMD FOKUS Expires September, 27, 1997 March 22, 1997 EARTH - EAsy IP multicast Routing THrough ATM clouds Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract The EARTH could be positioned between MARS [1] and VENUS [2], however EARTH deals not only with address resolution but with routing issues as well. This document describes a solution simplifying distribution of IP multicast flows over ATM clouds with the use of point-to-multipoint connections. The EARTH solution includes: - the notion of multicast IP subnet over ATM; - IP multicast addresses (Class D) resolution to ATM addresses; - support for IP multicast shortcuts over multiple LISs and for the DVMRP capable egress routers; - support for a multicast group management and receiver-initiated quality of service (QoS) specification; - optional replication of EARTH servers with server synchronisation based on SCSP [3]. Similarly to a "special" use of IP Class D addresses for multicast, the EARTH proposes a special use of a Multicast Logical IP Subnet (MLIS). MLIS is a concept providing conformance of EARTH to Classical IP over ATM extended to support IP multicast. EARTH borrows heavily from MARS (e.g.messages for address resolution) but also differs from it: address resolution is made _only_ for IP Class D addresses; multiple Smirnov [Page 1] ^L Internet Draft March, 22, 1997 LISs could be served by a single EARTH server. On contrary to VENUS proposal, EARTH simplifies bypassing Mrouters and requires no coordination of multiple MARSs. The current version of EARTH proposal has an extension to solve problems of ATM backbone for the Internet with DVMRP support over MLIS; while retaining support for ATM subnets (LANs.Another extension addresses redundant EARTH servers specifying, how SCSP protocol could be re-designed to run over MLIS. This document, proposing a solution not yet implemented completely, is intended to help focus ION efforts. 1. Introduction Connection oriented ATM technology is known to have a large spectrum of differences from that of connectionless IP [4]; address resolution problem being one of the most serious. Moreover, address resolution is even more difficult when it is needed to resolve one IP [multicast] address to a set of ATM addresses. This fact is emphasised in the description of MARS: "the service provided by the MARS and MARS Clients are almost orthogonal to the IP unicast service over ATM" [1]. This was the initial motivation to separate IP unicast from IP multicast addresses resolution to ATM addresses in the EARTH. Actually, for IP unicast over ATM there is no need to [re]invent something new: IP unicast ARP is supported fairly well in those ATM products which conform to the Classical IP over ATM [5]. Architecture described in [5] fits the need of unicast but provides strong resistance to deployment of IP multicast service model [6] over ATM clouds. This resistance is due to a restriction for LISs to interwork over routers only, even if these LISs share the same physical ATM network [5]. In case of unicast communication interworking between different LISs with the use of routers makes no difficulties; ATM connection establishment is supported by ATM ARP servers - one server per LIS. The content of unicast ARP table is quasi permanent. On contrary, the multicast ARP table's content (e.g. for MARS proposal) is dynamic and follows receiver initiated membership registration messages for particular multicast group[s]. Bypassing routers for inter LIS multicast requires coordination of multiple MARSs along with propagation of join and leave messages to all MARS clients [2] which can congest the network. However, in case of IP multicast flow distribution over ATM cloud the "cut through/shortcut" functionality - bypassing routers - is highly desired. The cut through will help to achieve better performance in terms of join/leave latency and also provides more flexibility in selecting different QoS levels based on receiver's preferences. The evolving IP Multicast over ATM solution (the `MARS model' [1]) retains the Classical IP over ATM model can treat LIS as a `MARS Smirnov [Page 2] ^L Internet Draft March, 22, 1997 Cluster'. Clusters are interconnected by conventional IP Multicast routers (Mrouters). An attempt to achieve LIS boundary cut-through, keeping at the same time the Classical IP over ATM model for both unicast and multicast communications, has been undertaken in the VENUS model [2]. In case of MARS routers between LISs make IP multicast over ATM inefficient. Each Mrouter views the designated LIS as a totally disjoint subnet, i.e. as if this LIS has no other ways to internetwork with other LISs. In fact, LISs are fully interconnected (via ATM) when they share the same cloud. In case of unicast the overhead of going to the neighbour (in ATM sense) host which, however, belongs to another IP subnet, via the path instead of could be conditionally considered not so high. But it's clear that in case of multicast this overhead is much higher, because the IDMR protocols are involved. The IDMR protocols (such as DVMRP, MOSPF, PIM, etc.) try to establish an efficient multicast distribution tree (MCT). The MCT efficiency is the basic feature of multicast technology (i.e. saving network resources via replicating IP datagrams only at splitting points of the MCT). Hence, to be efficient, the MCT has to have minimal number of splitting points. In case of MARS over multiple LISs the number of overhead splitting points is at least N-1, where N is the number of LISs which have participants of the multicast group. In EARTH proposal these N-1 overhead splitting points are eliminated. The [2] ends with a conclusion that the approach is too complex to bring reasonable benefits. Instead, "developing solutions for large clusters using a distributed MARS, and single clusters spanning multiple Logical IP Subnets, is a much better focus for ION's work" [2]. This document describes such a solution, which is dabbed `EARTH - EAsy IP multicast Routing THrough ATM clouds'. The EARTH has following components described below in a more detailed way: - Multicast Logical IP Subnet (MLIS) `spanning' the whole physical ATM network; - EARTH server providing IP Class D address resolution to ATM hardware addresses; - support for a [limited] number of QoS levels for various receivers of the multicast flow; - support for IDMR protocols in case of crossings of IP:ATM boundary; - support for efficient server cache synchronisation over MLIS with the use of modified SCSP. The scope of this document is an `ATM cloud' - a nickname for physically connected ATM network with a uniform management and signalling throughout the cloud. The hosts are ATM endpoints with IP addresses. The ATM interworking with other IP networks is Smirnov [Page 3] ^L Internet Draft March, 22, 1997 provided by a [set of] egress IP router[s] supporting IP multicast. Each egress IP router is an ATM endpoint of the ATM cloud and has a management assignment to operate as a designated Mrouter for a [set of] particular LISs of the ATM cloud. ATM cloud is supposed to be able to support point-to-multipoint connections controlled by the root endpoint via signalling. The Classical IP over ATM retains because EARTH makes no changes to IP unicast over ATM. Therefore EARTH implementation will not disturb any on-going operation of Classical LISs over the ATM cloud. The rest of the document is structured as follows. Chapter 2 provides a general vision for current version (02) of the draft with the list of differences from version 01. Chapter 3 introduces a MLIS concept, 4 - defines EARTH database; 5 - touches the QoS stuff. Chapter 6 is separated into two parts, treating EARTH for ATM LANs (with shared MCT) and WANs respectively.Scalability issue is addressed in chapter 8. Appendixes and references provide other relevant information. 2. General Vision EARTH is a [set of] point[s] in a ATM cloud where the IP multicast over ATM relevant information is concentrated. Therefore, it should be feasible to im- plement both IP multicast address resolution and a Resource Reservation pro- tocol (like RSVP) over the given ATM cloud co-located with this [set of] point[s]. For IDMR support where IP to ATM ARP is required for inter commu- nication between routers EARTH could also be used. Finally, to address scal- ability problems, a set of EARTH servers is considered with SCSP-like protocol running between these servers. In general changes to version 01 are as follows: - the scope of EARTH is enlarged to support not only ATM LANs but also dis- tribution of IP multicast flows with IDMR protocols (example is provided for DVMRP); - scalability features are addressed with special emphasis on SCSP as a mean to synchronise multiple servers; - QoS support is now intended to be provided by a separate, say, Resource Reservation (RR) protocol, for which RSVP is a good example (this work for RSVP is considered in [7]), however via strong cooperation with EARTH service; - from the very first version this document has recognised the MARS model [RFC 2022] as its basement with regard to address resolution; this version provides special reference for MARS users willing to build EARTH functionality over MARS via TLV mechanism defined in MARS; - the notion of multicast logical IP subnet is refined in this version and is treated now as an extension of Classical IP over ATM rather than its antago- nist. Smirnov [Page 4] ^L Internet Draft March, 22, 1997 EARTH implementation principles described in this memo could be stated, per- haps, more professionally. However, the design philosophy behind these prin- ciples could be of greater importance. This memo tries to map IP multicast service model [8] to ATM service model which has resulted in two major chang- es: - Source specific multicast groups: sender[s] (source[s] of multicast traf- fic) should be considered as part of multicast [sub]group[s] due to the nature of ATM point-to-multipoint connections explicitly owned by roots; - Protocol concentration: separation of data and control flows for almost all protocols doing IP multicast over ATM becomes the necessary prerequisite due to a centralised nature of address resolution over ATM; therefore these cen- tral points (EARTH servers) are becoming natural points of attraction for con- trol flows facilitating IP multicast over ATM (ARP, RR, IDMR); - Efficient replication of service access points: protocol concentration builds conditions for efficient synchronisation of multiple servers due to already existent integration of various related data items. 2. Multicast Logical IP Subnet Throughout this document a Logical IP Subnet which conforms to [5] will be called Classical LIS. Multicast LIS is a single LIS per a physical ATM network which normally has no other traffic except IP control traffic (queries, replies). MLIS is shared by all Classical LISs in cases of IP multicast flow distribution with the use of point-to- multipoint ATM connections. Historically, IP multicast was enabled with the utilisation of IP address space separation. Multicast addresses are borrowed from the previously reserved Class D address space [5,8]. During a multicast session each participating receiver has at least two IP addresses: one, permanent, for regular unicast communication, and, second, temporary, for a multicast communication. However, in this situation a receiver is not considered to be a multihomed host because all its multicast communications are controlled/enabled by a multicast router (Mrouter) designated for a particular subnet. (This could be called `indirect multihoming'.) Mrouter runs IGMP to make forwarding decision for a particular multicast flow to the [broadcast] subnet and also runs IDMR protocols with other Mrouters for a global distribution of the multicast flow. This architecture fails in case of NBMA subnets at the `bottom' of multicast distribution tree. In fact IP class D address is treated not as any other regular IP address, i.e. it was not intended to identify uniquely a particular IP host, but rather all hosts willing to announce themselves group members. In a similar manner, this draft treats MLIS not as 'one more LIS' in sense of RFC1577. The treatment is special and has more a flavour of 'abstract notion'. Multicast LIS is a concept providing conformance of EARTH to Classical IP over ATM extended with regard to IP multicast, i.e. this proposal retains the Smirnov [Page 5] ^L Internet Draft March, 22, 1997 notion of LIS for unicast, however, also recognizes, that for multicast purposes this notion should be redefined. Multicast LIS is important to define some basic elements of EARTH, such as All Egress Routers Multicast Address. This address is defined in MLIS only; that is only multicast capable routers will be taken into account. All other routers are "invisible" in MLIS. Is ATM cloud equal to MLIS? Perhaps, not, with the following motivation: i/. A concept of LIS relates to IP over ATM only, i.e. does not include native ATM services; and ii/. For some reasons not all of the ATM cloud could be multicast capable (e.g. some interfaces of some Mrouters could be configured to have administrative boundaries for multicast [10]). The EARTH scenario implies two options of host extensions. The first host option requires that each ATM endpoint should be configured to work with one additional LIS - MLIS, and supplied with the ATM address of EARTH server (see section 3). That is, the EARTH server could be seen as the ATM ARP server for MLIS (with two notes: MLIS is shared by all endpoints from all Classical LISs, and ATM ARP considers here only IP Class D addresses). The second host extension option requires that ATM adapter driver should be able to distinguish between unicast and multicast addresses. If it needs to resolve the unicast IP address it should contact the regular ATM ARP server; in case of multicast it should contact the EARTH server.The second option seems to be more feasible. The MLIS scenario makes some assumptions about Mrouters, being directly connected to the ATM cloud. In case of shared Point-to-Multipoint connections (PtM) within ATM LANs each Mrouter is supposed to know the ATM address of the EARTH server. Each Mrouter has to be able to distinguish between unicast LISs, i.e. to know which LIS[s] it is designated for. That is, if the first sender to a multicast group belongs to, say, LIS_1, and Mrouter Rtr_A is the designated Mrouter for LIS_1, then Rtr_A is the only gateway for the multicast flow to go out of the ATM cloud, and, therefore, to get in. More details could be found in Section 6.1. Another assumption refers to IGMP support. Both EARTH client (at Host and Mrouter) and server will use IGMP messages from existing implementation to trigger the creation of EARTH specific messages. After receiving the requested reply by EARTH client, it will reformat it back and present to IP application or Mrouter software as IGMP queries/reports. 4. EARTH: issues for algorithms and data 4.1 Algorithmic issues Smirnov [Page 6] ^L Internet Draft March, 22, 1997 The EARTH server makes IP Class D to ATM addresses resolution of those ATM endpoints which have registered with the EARTH server for a particular multicast session as receivers. The information exchange follows the principles of ARP found in [11], applied for NBMA, i.e. those found in RFC2022. A potential receiver of IP multicast flow, whatever unicast LIS it is located, sends its registration message to the EARTH server.The message contains receiver's ATM address, receiver's IP address (i.e. the address from the Classical LIS), receiver LIS's subnet mask, target multicast address. The QoS level mentioned in a previous version should be provided by RR (see [7] for details in case when RR is represented by RSVP). The EARTH server keeps these membership information in a multicast address resolution table in a form of a list of individual entries: Member list is a heap of member list entry elements. One member list is per each active IP multicast address. Activity of IP multicast address is defined by the EARTH server with appropriate ageing functionality. The ageing decision for a particular member list depends on communication lifetime between the EARTH server and both senders and receivers. One member list entry is kept for each member of the multicast group, however this entry can participate virtually in several QoS-specific and/or source-specific member sub-lists. Following IP multicast model, a sender to the group needs not to be a member of the group. However, the EARTH server treats egress Mrouter, being current [re]sender, as potential receiver for the next sender to the same group (Section 5 provides this algorithm). In a connection-oriented ATM cloud new senders to an active IP address will need to establish their own point-to-multipoint connections to the group members. Note: Phase 1 of UNI 3.1. specification supports only zero return - from Leaves to Root - bandwidth [12, p. 154]. A sender to the multicast group queries the EARTH server periodically to get the membership information and to use it for its point-to-multipoint connection management.The member list is sent by the EARTH server back to the querying endpoint (sender or Mrouter) as the answer to its EARTH request. An endpoint - sender is supposed to keep its local cache of the member list for comparison and deriving needed changes to the connection. A case of multiple senders to the same group in EARTH scenario with the shared PtM should be protected from chaotic crossings of the ATM cloud's boundary. Suppose each sender to the same group will use that Mrouter which is a designated Mrouter for the sender's LIS to advertise/forward its traffic to receivers outside of the ATM cloud. In case of multiple egress Mrouters this will confuse the IDMR protocols and disbalance the multicast distribution tree. For needed Smirnov [Page 7] ^L Internet Draft March, 22, 1997 protection EARTH server implements a single gateway principle (see section 6.1). On contrary to that, when ATM cloud is considered to be a part of Internet backbone, EARTH has to provide the IDMR functionality over MLIS. This draft proposes a straightforward solution: to treat EARTH server as one of the Mrouters. In fact this will be the only Mrouter within the MLIS which is not the egress Mrouter. Moreover, only control messages will reach the EARTH- MRouter; it should be protected from flooding by multicast flows. The Mrouter functionality inside the EARTH server will facilitate needed neighbour discovery and efficient shortcuts through the ATM [backbone] cloud. Details with regard to DVMRP are discussed in Chapter 6.2. It should be mentioned that the EARTH server, as ATM ARP server has absolutely passive behaviour: it simply keeps the registration information and answers queries. Conceptually, with regard to IGMP, the EARTH server represents a broadcast media collapsed to the size of a single point. When, and if, the size of the ATM cloud will require replications of this point, then, the anycast capability (to contact this distributed EARTH service, i.e. redundant servers) could be employed. Other solution to improve scalability is to use multiple EARTH servers with partitioning the ATM cloud into service zones (clusters) and with a modified SCSP protocol to be run among these servers to synchronise their caches. Actually, the EARTH server becomes a sort of Multicast Integrated Server (MIS) due to a co-location of several protocols in the same [set of] point[s]. 4.2 Data Issues Preliminary note: EARTH keeps an ATM specific QoS definitions; mapping from receiver initiated QoS-specs should be done either in RR-daemon or ATM controller. That is, the QoS granularity is defined in EARTH in ATM specific way.More details will be provided in a separate document, meanwhile refer to [7]. The rest of this chapter is an attempt to define an extended (v.2) EARTH database:[if other is not stated explicitly, under IP Address the unicast one is assumed]: MARPTable = null | MARPTableEntry [, MARPTableEntry [, ... ]], MARPTableEntry = , MemberList = null | MemberListEntry [, MemberListEntry [...]] , MemberListEntry: = Smirnov [Page 8] ^L Internet Draft March, 22, 1997 where null - represents an empty (e.g. zeroed) data structure[s]; a | b | ... - mutually excluding alternatives a, b, ...; a [, a [, ...]] - one or any number of instances of entity of type a; - a set of elements a, b, ... comprising a data structure <...>; MARPTableEntry - a logical aggregation of data specifying , i.e. source-specific multicast subgroup for purposes of EARTH; MultIPAddr - Multicast IP Address (i.e IP Class D Address); SourceIPAddr - is either IP Address of a source (sender) of multicast traffic or zero, when multicast group has only registered receivers but no single query was made to EARTH from any sender to the group; SourceNetMask - NetMask for the SourceIPAddr; HolderIPAddr - IP Address of EARTH server being a holder of a source specific multicast subgroup [Note: this field should be defined only for multiple EARTH servers per MLIS; the concept of Holder is de- ined in chapter 8.1]; QoSLevel - quality of service specification supported by ATM signal- ling, TBD; RecNSAPA - NSAP Address of a receiver; RecIPAdd - receiver's IP address; RecNetMask - receiver's network mask. These data structures imply that membership information is stored in source specific and QoS specific ways. Additional data Elements which are of importance for EARTH operation are: QoSMap - a table defining, how to set up QoSLeveles, provided by ATM ased on reservations provided by receivers (with regard to RSVP and EARTH integration this work is addressed in on-going development [7]); AllEarthList = - a list specify- ing all alive EARTH servers within the ATM cloud, where AllEarthMultIPAddr - IP Multicast address reserved for IP Multicast communication to all EARTH servers within the ATM cloud; AliveEarthList = null | [, March, 22, 1997 EarthNSAPA> [, ... ]]; AllEgressList = a list speci- fying all egress [multicast capable] routers within the ATM cloud (or, in other words, all routers within the MLIS), where: AllEgressMultiIPAddr - IP Multicast address reserved for IP Multicast communication to all routers within the MLIS; AliveEgressList = null | [, [, ... ]]. In similar manner AllRecList could be defined with AllRecMultiIPAddr and AliveRecList of alive (here, being registered to any MG. Actu- ally, AllRecMultiIPAddr is a well known AllHosts addressed defined in RFC1122. However, AllHosts address is reserved for the use in IGMP and another one should be used for MLIS. 5. Support for various QoS levels The EARTH server optionally can provide a sender to a multicast group with a classified membership information. The member_list could be partitioned into several subgroups reflecting receivers' preferences of the QoS levels. These QoS levels should be negotiated with the ATM network administration and should be known to all potential receivers. The classified member list is a collection of member_lists with equal QoS levels. It is assumed that each QoS level is supported with a separate point-to-multipoint connection (ptm_qs); each ptm_qs starts at the switch under the control provided by sender via signalling. Upon receiving a member_list_qos a sender updates/creates each ptm_qs separately, if needed. However, these scheme does not provide a seamless 'IP signalling' through the ATM cloud in case of a backbone. On contrary, it implies that _no_ RR should be run over MLIS. As a future research direction a combined implementation of EARTH and RSVP "reservations merging" entity could be considered, see [7]. 6. Support for IDMR in case of cut through The EARTH proposal tries to support IDMR protocols outside the scope of the ATM cloud and to protect them (in case of shared PtM in ATM LAN) from being confused by multiple entry/exit points at the boundary of the ATM cloud for the same IP multicast group. This protection is via a single gateway principle implementation. From a Mrouter's viewpoint the single gateway keeps the whole ATM cloud as a single subnet, whatever number of LISs it has. However, this is only true for a single IP multicast group, not for a collection of multicast groups - each group can have various gateways. Note, that for unicast communications the Classical IP over ATM still apply, with partitioning of endpoints to a number of LISs and with a set of routers designated to these LISs. In case of ATM Smirnov [Page 10] ^L Internet Draft March, 22, 1997 cloud as a part of the backbone, the MLIS should simply run needed IDMR protocols. 6.1 Single Gateway (ATM subnet) A `Single Gateway' principle says that the ATM cloud, whatever LISs and routers it has, should have a single gateway (router) forwarding IP multicast packets to and from the ATM cloud for each multicast session. Suggested implementation is as follows. The single gateway router is determined either by its designation to the LIS where the multicast flow first originated or by the fact of forwarding first multicast flow to the ATM cloud. For both cases above the `first' denotes that the single gateway Mrouter is determined by the first sender to the active IP multicast group. These cases distinguish between the inside-out and outside-in propagation of the multicast traffic with regard to the ATM cloud boundaries. The first case uses internal partition of the ATM cloud into LISs with designated Mrouters. The second case treats the ATM cloud as a single subnet (without distinctions between LISs). For example, let us consider an IP multicast session over the ATM cloud with 4 LISs numbered 1 through 4 and 3 Mrouters - A, B, C - designated, correspondingly, Rtr_A to LIS_1 and LIS_3, Rtr_B to LIS_2 and Rtr_C to LIS_4. If the first sender to a group is external and its flow reaches the ATM cloud via Rtr_B then Rtr_B will retain the single gateway for the lifetime of the session (depending on the ageing decision of the EARTH server). If the first sender to a group is internal with regard to the ATM cloud and comes from LIS_3, then Rtr_A will retain the gateway for the lifetime of the session, even if other endpoints will start sending to the group later from LIS_2 or LIS_4. The paragraph above has two cases (the 1st sender could be a/.external, then single gateway is defined by the Rtr which was first to forward the flow to the cloud; and b/. internal - the rest of the paragraph. If the 1st sender was internal, its LIS defines the gateway. If now a new sender to the group arrives from outside the cloud and its traffic will try to go inside the cloud via another Mrouter then this another Mrouter will contact the EARTH server and will see in the EARTH reply that the gateway is already defined. This another Mrouter will have to forward its traffic to this defined gateway. How this information is kept by EARTH server? According to section 3 the EARTH server is queried by senders periodically to get updates about the group membership. We require that EARTH: - [quasi] permanently keeps a list of ATM addresses of all egress routers to the ATM cloud and their designation to LISs (egress_list); - temporarily keeps ATM address of the current gateway (gateway_atm) for a particular multicast group (for the session's lifetime), Smirnov [Page 11] ^L Internet Draft March, 22, 1997 gateway_atm is linked to a member_list; - gets sender's ATM address (query_atm) from the query. Session's lifetime is defined by EARTH with the use of ageing functionality. After the group membership is dismissed the gateway_atm is zeroed. When the first query arrives for a particular group (i.e. gateway_atm is equal to zero), EARTH compares the query_atm with the content of egress_list, and, if finds the coincidence (that is the first sender to the group is one of the egress Mrouters), it stores this egress router's ATM address in the gateway_atm linked to the member_list and replies with the current member_list, else (that is the first sender is a host) automatically adds to the member_list the ATM address of the egress router designated for the sender's LIS, stores this router's address in the gateway_atm and replies with the modified member_list. With the arrival of any next query for this group the EARTH server compares query_atm with the content of the gateway_atm. If the result is positive (i.e., the current query is the refresh query from the first sender) then the server replies with the current member_list and makes no changes to it. Otherwise (i.e. it's a query from a new sender) the server compares query_atm with the content of egress_list and, if finds coincidence (i.e. a new sender is also an Mrouter) then replies with the gateway_atm (readdressing external flow to the group to existing gateway), else (a new sender is internal endpoint) replies to this new sender with a modified member_list: it adds the gateway_atm to the member_list. Example. CASE1: let us assume that the 1st sender was external: member_list = {1,2,3,4}; /*list of receivers*/ egress_list = {10,20,30,40} /*list of Mrouters*/ gateway_atm=10 /*sender is Rtr 10*/ /* now arrives a new query */ if query_atm=gateway_atm /*positive result*/ then earth_reply={1,2,3,4} else if {query_atm in egress_list} /*new sender is external*/ then earth_redirect_reply={10} /*it's redirect type of reply*/ else earth_reply={1,2,3,4,10} /*new sender will have the */ /*gateway in its pt-to-mpt */ CASE2: let us assume that the 1st sender was internal member_list = {1,2,3,4}; /*list of receivers*/ egress_list = {10,20,30,40} /*list of Mrouters*/ gateway_atm=10 /*sender is, say, 5 and */ /* Rtr 10 is the designated*/ /*Mrouter for sender's LIS*/ /* now arrives a new query */ if query_atm=gateway_atm /*positive result*/ /* in this case it could not be positive: new sender is a host*/ Smirnov [Page 12] ^L Internet Draft March, 22, 1997 then earth_reply={1,2,3,4} else if {query_atm in egress_list} /*new sender is external*/ then earth_redirect_reply={10} /*it's another type of reply*/ /*it's also is false for the CASE2: query_atm is ATM address of host*/ else earth_reply={1,2,3,4,10} /*new sender will have the */ /*gateway in its pt-to-mpt */ When a host - new sender to the group - receives this list it opens its own pt-to-mpt connection to the group, i.e. to hosts 1,2,3,4 and to the Rtr 10 (for possible external members). Important is that Rtr 10 will be used in both cases: when new sender, say, 6 belongs to the same LIS as old sender (5), and, when 6 belongs to another LIS. That is Rtr 10 is retained as a single gateway. This example is provided to show that in both cases above the algorithm is the same. Comment: the term `sender' elsewhere in the document should be treated as a `sender or potential sender'; if the Mrouter's query results in a zero member_list then no forwarding of the flow takes place. 6.2 DVMRP Over MLIS (ATM backbone) The single gateway principle has been criticised in the ION mailing list as being "not popular" among folks thinking of IP over ATM backbone. The back- bone, surely has to provide the best single short-cut for a unicast communi- cation and a [set of] shortcut[s] in case of a multicast communication. The current version of EARTH draft proposes an extension of EARTH function- ality for the case of ATM backbone. A typical scenario will be to make an interconnection of a number of IP clouds via the ATM cloud. otivation to integrate DVMRP support to the EARTH server is, mainly, the "classical" nature of mrouted and its availability for experimentation pur- poses. DVMRP protocol [10] defines the following: - there should exist well known All-DVMRP-Routers IP multicast address; - reverse path forwarding check ensures optimal [in sense of unicast criteria, however] paths for distribution of multicast flows; - there exist a mechanism (dependency check, probe messages) for neighbour discovery among multicast capable routers; - prune and Graft messages provide all needed dynamics for multicast distri- bution tree among mrouters. Inscalability of DVMRP is caused by the need for periodical exchange of [uni- cast] routing tables between adjacent routers. However, if in any modifica- tion, or unusual use of DVMRP the number of routing tables reports will not Smirnov [Page 13] ^L Internet Draft March, 22, 1997 be enlarged with the original protocol, then scalability of the extension could be considered as that of original DVMRP. Such an usual deployment of DVMRP is proposed below. Due to the need for sep- aration of data and control flows in IP multicast over ATM along with the strong willingness to retain (for seamless interworking) existing IP proto- cols supporting multicast, the proposal is made here to co-locate EARTH server and an instance of DVMRP router. DVMRP over MLIS is defined as follows: - any EARTH server should run an instance of mrouted (e-mrouted); - any e-mrouted should be considered as a neighbour by each egress mrouted belonging to MLIS; - mrouted belonging to egress points of MLIS can keep a regular exchange of reports (or flash updates) to keep their [unicast- based] routing table con- sistent over multicast routers group (MRG); also run their dependency check without EARTH involvement; however instances of e-mrouted (in case of multiple EARTH servers) should not consider each other as a neighbour in terms of DVM- RP; instead SCSP could be involved; - each egress mrouted can send a message to All-DVMRP-Routers-Address; this address being IP Class D address, will be resolved by known by each querier EARTH server to a set of NSAPA of egress routers, hence a PtM could be estab- lished between all egress routers for a successful multicast short-cut over MLIS; - if MLIS has registered receivers to one of the MG being short-cut, then EARTH-IGMP support should be involved with establishment of a MLIS specific PtM[s] for this MG[s]. Under this condition, when the short-cut interconnection will be established between egress routers, i.e. that they know each other, they will - according to DVMRP protocol specification - start sending Prune and Graft messages to each other and to EARTH server as well. Note, that being a fake mrouter e- mrouted should _not_: - report any members to any multicast groups on its non-existent subnets; - Prune itself from any multicast tree or Graft itself to any multicast tree; - use any QoS specification for its non-existent members; instead a best- effort default connection should be used for communication of egress mrouters with the e-mrouted. All the above should provide that no multicast data traffic will be ever for- warded to e-mrouted, however, being a neighbour to any egress mrouter the e- mrouted will receive all control messages from all the egress mrouters. An egress mrouter, having no idea about this fake configuration, will send Probe messages every 10 s to each multicast capable interface, including ATM Smirnov [Page 14] ^L Internet Draft March, 22, 1997 interface. From the latter, each mrouter will receive back a list of all egress mrouters of the MLIS, however the ATM interface of the mrouter having issued the Probe message could be seen as an interface to a single LIS. Nev- ertheless, the list of Mrouters will have both IP Addresses and NSAPAs and egress mrouter should have no problem to map received addresses to ATM inter- faces (if several). The very first Probe message being sent (at ATM interface[s]) to a reserved IP multicast address of All DVMRP routers will require to be forwarded to EARTH for address resolution, however, all following messages could use an EARTH's cache within the mrouter. The state of the cache could be out of date in comparison with the current situation in a living group of egress routers. To avoid the usage of outdated cache in egress mrouters (for Probe messages) the ageing interval for this cache should be established less than 10 sec (default DVMRP Probe - refresh interval). Note, that mrouted by default times out its multicast cache every 5 sec, which fits the above requirement. Finally, major conclusions on DVMRP over MLIS are: - e-mrouted should run within the EARTH server; - e-mrouted should be always a neighbour of any egress mrouter; - e-mrouted participates in all control messages exchanges for DVMRP (Probe, Prune, Graft, Report, Dependency check) in the following way: - e-mrouted never prunes or grafts itself; - e-mrouted sends and receives Probe messages; - when receiving a routing table report, e-mrouted drops it si- lently; - when sending a report message, e-mrouted puts infinity (default value for mrouted is 32) as a distance metric to e-mrouted from any egress mrouter. The last requirement is intended to prevent forwarding of IP multicast data towards the e-mrouted. Optionally, this could be done in a different manner. For example, an administrative boundary could be placed within all egress mrouters for IP multicast data traffic in the direction of e-mrouted, However, it's unclear, how in this case to continue with the control traffic. While we are interested in establishing a mrouter neighbour discovery and connectivity establishment over ATM, the format of Probe message for DVMRP should be changed in two ways. First, the neighbour address field should include the neighbour NSAPA. Second, for the back compatibility with the original DVMRP, some classifier should be included within the probe message format. It is proposed here to use for this purpose a Reserved field of original DVMRP [10] to distinguish between values, say, 0 (zero) for original DVMRP and 1 (one) for DVMRP over MLIS. The format of the proposed probe packet format is shown in Table 1. Smirnov [Page 15] ^L Internet Draft March, 22, 1997 24 16 8 0 +---------+----------+----------+---------+ |Classi- | Capabil- | Minor | Major | |fier | ities | Version |Version | +---------+----------+----------+---------+ | Generation ID | +-----------------------------------------+ | Neighbor IP Address 1 | +-----------------------------------------+ | | | Neighbor NSAPA 1 | +-----------------------------------------+ | ... | +-----------------------------------------+ | Neighbor IP Address N | +-----------------------------------------+ | | | Neighbor NSAPA N | +-----------------------------------------+ Table 1 - DVMRP over MLIS Probe Packet Format 7. Scalability In large physical ATM network a single EARTH server could become a bottleneck. In case of ATM Forum UNI 4.0 implementation this problem could be readdressed to the ILMI which is supposed to support group ATM addresses. That is, EARTH server may resolve IP multicast address to ATM group address while changes to the group within the ATM network will be managed by ILMI. Currently two other solutions could be proposed for future research: server synchronisation and server hierarchy. An attempt to apply SCSP for the case of multiple EARTH servers splitting the ARP load is undertaken below. 7.1 SCSP over MLIS The Server Cache Synchronisation Protocol was developed with the following service model in mind (derived out of [3]): 1. All servers [belonging to a server group] need to align their caches; (that is, changes in one cache should be propagated to all copies); 2. Redundant servers are using copies of the same cache; (the problem known as multiple copies update); 3. A client may be aware of existence of any number of servers; (from a single server up to all servers in a relevant group); 4. Multiple instances of SCSP may coexist, running in parallel for differ- Smirnov [Page 16] ^L Internet Draft March, 22, 1997 ent Server Groups; (the identification of Server Group is out of the scope of SCSP); 5. SCSP is a topology independent protocol; (the only requirement is mutual reachability of all relevant servers); 6. SCSP is intended to support other protocols which require synchronisa- tion of caches; (therefore, SCSP is a generic protocol, which should be modified for each consumer protocol). SCSP is composed out of three separate protocols: Hello - investigates the state of the connection used for synchronisation pur- poses; Cache Alignment (CA) - permits a booting Local Server to align its entire cache with that of Directly Connected Servers; Client State Update (CSU) - permits propagate changes in local cache entries for the entire SG. Hello and CA protocols are more or less supporting ones while the core job (however, only when all servers and all synchronisation links are in a steady state) is done by CSU. The SCSP protocol was applied by ION WG of IETF to NHRP and MARS as consumer protocols [3, 13]. In recent ID [14] the notion of Living Group was intro- duced, which makes the SG concept dynamic. Modification of SCSP to EARTH protocol should be made in a separate document. The intention of this chapter is to highlight benefits of protocol concentra- tion and MLIS concept with regard to cache synchronisation. Some terminology should be introduced first. 1. Cluster. Let us assume that an ATM cloud is partitioned to a number of Clusters. That is, IP hosts, being a cluster members, are served by the same EARTH server. Thanks to G. Armitage [RFC2022], the notion of cluster is well separated from that of Logical IP subnet, therefore, borrowing the term `clus- ter', this document simply states the following. If a single EARTH server per MLIS is not enough then network management should run several servers, pro- vided reasonable clusterisation of the cloud. Note, that this draft deals with shortcuts through LISs, therefore, overlapping of LISs and Clusters could vary from 0% (one cluster per ATM cloud) up to 100% (each cluster is equal to LIS). 2. Holder. While only the owner of a point-to-multipoint connection (PtM root) can update the PtM, the receiver initiated joins, leaves, QoS-resv, etc. should be propagated only to the root. The root, in turn, gets this informa- tion from the EARTH server, which is serving the root's cluster. This partic- ular server is called the Holder of Source Specific Multicast [sub]Group (SSG). Note, that root above is not the same as source in : source could be outside of MLIS, while an egress Mrouter will become the root in this case. Smirnov [Page 17] ^L Internet Draft March, 22, 1997 Motivation for changes in SCSP. As it was stated above, all joins in MLIS are actually root-specific. That is, if a Host joins MG with NS senders to it, and with NR PtM connections being established, this will require to fulfil "AddParty" at roots sides NR times. If these NR roots belong to NC different clusters, while superset of leaves of NR PtMs covers NCR different clusters then this will require propagation of changes in caches from NCR to NC servers with the highest priority. The update of the rest of servers could be made as a background activity with the "available rate". Until there are no roots for PtMs for MG in other clusters, there is no need to propagate knowledge on changes within the SSGs to EARTH servers within these other clusters. Therefore, the SCSP's CSU protocol (modified for EARTH purposes) should run as usual, however, in order to decrease join/leave la- tency it is recommended that in parallel, a `Forwarding to the Holder' (FH) protocol should be implemented. Finally, the modified SCSP (E-SCSP) service model over MLIS is presented below in the same order as the original one: 1. All servers [belonging to a server group] need to align their caches to e ready to accept roots-newcomers to the MG within their clusters, in parallel a simple mechanism for forwarding of joins/leaves/QoS-resv from SSG members to Holders should exist; (that is, changes in one cache should be propagated to all copies in a background, and changes in SSG should be forwarded to Holders asap); 2. Multiple servers are using copies of the same cache, however the rate of updates is different for Holders and non-Holders; (the problem which could be defined as "prioritised multiple copies up- date"); 3. A client should be aware of existence of a single server in its cluster; (a backup server could be specified in a manner similar to RFC 2022); 4. A single instance of E-SCSP should run for the whole MLIS; (in sense of [3] E-SCSP is one of the instances of a generic SCSP); 5. E-SCSP is a topology independent protocol; (with regard to ATM connections that might mean: FH protocol uses a mesh of PtM VCs separate from that for the rest of E-SCSP); 6. E-SCSP is intended to support other co-located protocols such as EARTH, RR, DVMRP; (therefore, E-SCSP benefits from the fact that in one synchronisation session several types of caches are updated at once). This chapter is presented to facilitate discussion on the use of SCSP within the MLIS concept. 8. Future Work Simulation study of EARTH performance gives promising results - about one order lower latency than MARS in case of a single LIS and less than Smirnov [Page 18] ^L Internet Draft March, 22, 1997 30% of the ATM switch's call establishment capacity (100 calls/sec for ASX 200). However, MARS has running code which makes it more attractive. For EARTH only some implementation concept is available below. 8.1 MIS = Multicast Integrated Server MIS is a combination of EARTH server, RR daemon (say, rsvpd), IDMRP-daemon, say, e-mrouted and ATM controller daemon (atmcd) (providing a joint access to the shared data field described above). We assume that each endpoint (IP host) of the ATM cloud has an entity of EARTH client (EC) and a [set of] endpoint[s] - EARTH server[s] (ES). Communication between ES and EC is based on EARTH messages. Semantics of EARTH messages follow semantics of MARS messages (see [RFC2022] for REQUEST, MULTI, JOIN, NAK, LEAVE, GROUPLIST-REQUEST, GROUPLIST-REPLY). Implementation of ES and EC are supposed to be made in the user space, however, both - server and client sides - have to have the kernel support for their operation. The EC will need to establish (upon receiving a Multi message) a PtM connection, therefore it'll need to have an ATMARP cache in its kernel. The ES will have a need to be synchronised with other ES caches. 8.1.1 Architecture The Multicast Integrated Server (MIS) is comprised out of a set of following components: · Protocol Machines (PM), · Shared Data Field (SDF) and · SDF Access Controller (SDFAC). Protocol Machines under consideration are: · EARTH server PM (ES); · Resource Reservation Server PM (RRS); · Inter Domain Multicast Routing Over ATM server PM (IRS) - say, e-mrouted; · Server Cache Synchronisation PM (CS) - e-scspd; · [IGMP PM]; · SDFAC PM. Shared Data Field is comprised of the following: · ARP table with a set of entries, explained above; Smirnov [Page 19] ^L Internet Draft March, 22, 1997 · egress_list with a set of entries, explained above; · MIS server_list with a set of entries, explained above. Shared Data Field Access Controller is intended to provide multiprotocol co- ordination inside a MIS with the help of following primitives: · Set entry (e.g. assign a particular value to the entry in SDF); · Get entry (e.g. show value of a particular entry in SDF); · Lock entry (e.g. apply restrictions to write operation with this entry in SDF); · Age entry (e.g. [may be temporarily] define an entry as being not in use). 8.1.2 Motivation The integration of above components is due to a need of mapping IP multicast service model to ATM and it seems reasonable to combine in a single location all components providing a full IP multicast service over non-broadcast ATM cloud. That is: EARTH for ARP; IGMP for subnet level receivers registration/dereg- istration with the mrouter forwarding multicast traffic to the subnet; E-DVMRP - for forwarding IP multicast traffic over the ATM cloud (for example, there are IP Mrouters which are connected to the rest of the Internet only via this ATM cloud - a sort of ATM backbone - therefore for them to participate in IDMR talk, this should be provided over/across ATM); E-SCSP for synchronisation of caches of multiple MISs in the same ATM cloud. Resource Reservation for seam- less interworking of RR deployed elsewhere with hosts inside the cloud, etc. SDFAC is needed to coordinate multiple protocols over ATM inside a single MIS. Major coordination task in conjunction with SCSP is locking the SDF copy for the time of synchronisation. However, to avoid a single point of failure and also to provide reasonable scalability to the whole service it seems fairly natural to have several MISs covering the ATM cloud. Placement of MISs and their number are not discussed in the current document. 9. Security Considerations This document raises no security issues. However, a single gateway principle could be helpful in controlling access to the ATM cloud. 10. Acknowledgements Acknowledgements to ion (ipatm) working group of IETF. Personal acknowledgements to all who are reading this and to: Steve Deering (Cisco) for inventing IP multicast puzzle Smirnov [Page 20] ^L Internet Draft March, 22, 1997 Grenville Armitage (Bellcore) for shaping the road over ATM; Dorota Witaszek (GMD FOKUS) for implementing what seems reasonable from the above; Henning Sanneck (GMD FOKUS) for sharp questions and RSVP support; Luca Salgarelli (Cefriel) for constant help and efforts towards integration; Ehab Jewabreh, Nurten Sahyazici, Berrin Gueltekin (TU Berlin) for their ability to create 80 pages of SDL specification for EARTH v.01 [16] out of 10 pages of text; Long Le (TU Berlin) for his effort to use MARS's TLVs for EARTH messages [15]. 11. Supporting Information 11.1 EARTH as an extension of MARS For those who are hard to please genuine MARS solutions this chapter offers two options of how EARTH could be implemented as an extension of MARS. The first solution will be to borrow heavily from the specification of MARS, while the second one will be to build EARTH as extended functionality of MARS with the use of TLVs [RFC2022, Chapter 10]. The first solution implies that implementor makes its own protocol machines for both server and client, while messages between them will be more or less like those from MARS. Note, that borrowing message' structure does not make any obligation to follow the same processing logic or architecture as in MARS. For example, one can keep a soft state approach (as in EARTH) and use the MARS-REQUEST message' format, while in MARS the hard state approach is the one adopted. The second option is for those who have already MARS code and want for some reasons to "upgrade" it with EARTH capabilities. This could be done with TLVs defined in RFC2022 (chapter 10). The document which is the very first attempt to sketch EARTH messages over MARS implementation is [15]. 11.2 Some Definitions Multicast session group (MSG) is, generally speaking, a set of multicast groups. Multicast group (MG) - a set of IP hosts subscribed to the same IP class D address. Source specific multicast subgroup (SMG) - a set of hosts G which have joined a multicast group in a source-specific join mode [IGMP v3]; if S denotes the source then denotes the SSMG; Source and QoS specific multicast subgroup (SQMG) - a set of hosts which have joined a multicast group with the same QoS requirements, if Q is a QoS level to which were mapped reservations requested by a set of receivers belonging Smirnov [Page 21] ^L Internet Draft March, 22, 1997 to SMG and Q is supported by ATM network then denotes the QSMG. QoS specific multicast subgroup (QMG) - could be defined as a superset of all SQMG with the same QoS Level Q and should be identified as . Earth Server Living Group (ESLG) - borrowed from [14]. Multicast Routers [Living] Group (MR[L]G) - same as above with regard to egress routers. Protocol Machine (PM) - an extension of a Finite State Machine model with the notion of internal variables (Shared Data Field) affecting the state transi- tion. Shared Data Filed (SDF) - a collection of internally defined data structures; which values could be changed by more than one PM; relation between data structures are defined for each PM separately, provided that data consistency occurs and access rights are controlled for mutual exclusion of critical op- erations over data. Server Protocol Machine (SPM) - a Protocol Machine representing the server operation; operates over a Data Filed shared with other SPMs. 12. References [1] Armitage, G., Support for Multicast over UNI 3.0/3.1 based ATM Networks, RFC 2022, November, 1996 [2] Armitage, G., VENUS - Very Extensive Non-Unicast Service, Internet-Draft , July, 1996 [3] Luciani, J. V., Armitage, G., Halpern, J. " Server Cache Synchronization Protocol (SCSP)", Internet Draft draft-ietf-ion-scsp-00.txt, exp. June 1997. [4] Cole, R.G., Shur, D.H., Villamizar, C. IP over ATM: A Framework Document Internet-Draft , October, 1995 [5] Laubach, M., Classical IP and ARP over ATM, Network Working Group, Request for Comments: 1577, Category: Standards Track, January, 1994 [6] Deering, S., Host Extensions for IP Multicasting, IETF, RFC 1112, August, 1989 [7] L. Salgarelli, A. Corghi, H. Sanneck, D. Witaszek "Supporting IP Multicast Integrated Services in ATM Networks", - submitted to the SPIE'97 ("Broadband Network Technologies"), Dallas, November 97 [8] Deering, S., "Multicast Routing in a Datagram Internetwork", Ph.D. Thesis, Stanford University, December, 1991. Smirnov [Page 22] ^L Internet Draft March, 22, 1997 [9] Pusateri, T., Distance Vector Multicast Routing Protocol, Internet Draft , Exp.,August, 1997 [11] Postel, J., and J. Reynolds, "A Standard for the Transmission of IP Datagrams over IEEE 802 Networks", STD 43, RFC 1042, USC/ Information Sciences Institute, February 1988. [12] ATM User-Network Interface Specification, The ATM Forum, v.3.1, September, 1994 [13] Armitage, G., Redundant MARS architectures and SCSP, Internet Draft , Expires May 26th, 1997 [14] Armitage, G., A Distributed MARS Protocol, Internet Draft , Exp.,January 22nd, 1997 [15] Smirnov, M., Le, L. EARTH as an extension of MARS, Work in Progress, ftp.fokus.gmd.de/ pub/step/earth/tlv/earth-over-mars-tlv.fm.ps.gz [16] Smirnov, M., Jewabreh, E., Sahyazici, N., Gültekin, B., "Protocol Specification for EARTH v.01, in SDL for SDT v.3.0,", under ftp.fokus.gmd.de/pub/step/earth/tlv/earth-sdt.tar.gz 13. Author's Address Michael Smirnov GMD FOKUS Hardenbergplatz 2, Berlin 10623 Phone: +49 30 25499113 Fax: +49 30 25499202 EMail: smirnow@fokus.gmd.de Smirnov [Page 23] ^L Internet Draft March, 22, 1997