HTTP/1.1 200 OK Date: Mon, 08 Apr 2002 22:35:02 GMT Server: Apache/1.3.20 (Unix) Last-Modified: Mon, 15 Jul 1996 22:22:00 GMT ETag: "2e7d2c-49c2-31eac488" Accept-Ranges: bytes Content-Length: 18882 Connection: close Content-Type: text/plain Internet-Draft Grenville Armitage Bellcore July 12th, 1996 Issues affecting MARS Cluster Size Status of this Memo This document was submitted to the IETF Internetworking over NBMA (ION) WG. Publication of this document does not imply acceptance by the ION WG of any ideas expressed within. Comments should be submitted to the ion@nexen.com mailing list. Distribution of this memo is unlimited. This memo is an internet draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress". Please check the lid-abstracts.txt listing contained in the internet-drafts shadow directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim) to learn the current status of any Internet Draft. Abstract IP multicast over ATM currently uses the MARS model [1] to manage the use of ATM pt-mpt SVCs for IP multicast packet forwarding. The scope of any given MARS services is the MARS Cluster - typically the same as an IPv4 Logical IP Subnet (LIS). Current IP/ATM networks are usually architected with unicast routing and forwarding issues dictating the sizes of individual LISes. However, as IP multicast is deployed as a service, the sizes of LISes will only be as big as a MARS Cluster can be. This document looks at the issues that will constrain MARS Cluster size, and why large scale IP over ATM networks might preferably be built with many small Clusters rather than few large Clusters. Armitage Expires January 12th, 1997 [Page 1] Internet Draft July 12th, 1996 1. Introduction A MARS Cluster is the set of IP/ATM interfaces that are wil- ling to engage in direct, ATM level pt-mpt SVCs to perform IP multicast packet forwarding [1]. Each IP/ATM interface (a MARS Client) must keep state information regarding the ATM addresses of each leaf node (recipient) of each pt-mpt SVC it has open. In addition, each MARS Client receives MARS_JOIN and MARS_LEAVE messages from the MARS whenever there is a requirement that Clients around the Cluster need to update their pt-mpt SVCs for a given IP multicast group. The definition of Cluster 'size' can mean two things - the number of MARS Clients using a given MARS, and the geographic distribution of MARS Clients. The number of MARS Clients in a Cluster impacts on the amount of state information any given client may need to store while managing outgoing pt- mpt SVCs. It also impacts on the average rate of JOIN/LEAVE traffic that is propagated by the MARS on ClusterControlVC, and the number of pt-mpt VCs that may need modification each time a MARS_JOIN or MARS_LEAVE appears on ClusterControlVC. The geographic distribution of clients impacts on the latency between a client issuing a MARS_JOIN, and it finally being added onto the pt-mpt VCs of the other MARS Clients transmitting to the specified multicast group. (This latency is made up of both the time to propagate the MARS_JOIN, and the delay in the underlying ATM cloud's reaction to the subsequent ADD_PARTY messages.) 2. Limitations on state storage A Cluster should not contain more MARS Clients than the maximum number of leaf nodes supportable by the most limited member of the cluster. Two items are affected by this limitation: ClusterControlVC from the MARS. It has a leaf node per cluster member (MARS Client). This limitation applies only to the node supporting the MARS itself. Packet forwarding SVCs out of each MARS Client for each IP multicast group being sent to. The number of MARS Clients that may chose to be members of a given group may encompass every MARS Client in the cluster. Under UNI 3.0/3.1 the most obvious limit on the size of a cluster is the 2^15 leaf nodes that can be added to a pt-mpt SVC. However, in Armitage Expires January 12th, 1997 [Page 2] Internet Draft July 12th, 1996 practice most ATM NICs (and probably switches) are going to impose a limit much lower than this - a function of how much per-leaf node state information they need to store (and are capable of storing) for pt-mpt SVCs. A MARS Client may impose its own state storage limitations, such that the combined memory consumption of a MARS Client and the ATM NIC in a given host limits the client to fewer leaf nodes than the ATM NIC alone might have been able to support. Limitations of the switch to which a MARS or MARS Client is directly attached may also impose a lower limit on leaf nodes than that of the MARS, MARS Client, or ATM NIC. Cluster size is limited by the most constraining of these limits. It may be possible to work around leaf node limits by distributing the leaf nodes across multiple pt-mpt SVCs operating in parallel. However, such an approach requires further study, and is unlikely to be a useful workaround for Client or NIC based limitations. A related observation can also be made that the number of MARS Clients in a Cluster may be limited by the memory constraints of the MARS itself. It is required to keep state on all the groups that every one of its MARS Clients have joined. For a given memory limit, the maximum number of MARS Clients must drop if the average number of groups joined per Client rises. Depending on the level of group memberships, this limitation may be more severe that pt- mpt leaf node limits. 3. Signaling load. In any given cluster there will be an 'ambient' level of MARS_JOIN/LEAVE activity. What that level will actually be depends on the types of multicast applications running on the majority of the hosts in the cluster. It is reasonable to assume that as the number of MARS Clients in a given cluster rises, so does the ambient level of MARS_JOIN/LEAVE activity that the MARS receives and propagates out on ClusterControlVC. The existence of MARS_JOIN/LEAVE traffic also has a consequential impact on signaling activity at the ATM level (across the UNI and {P}NNI boundaries). For groups that are VC Mesh supported, each MARS_JOIN or MARS_LEAVE propagated on ClusterControlVC will result in an ADD_PARTY or DROP_PARTY message sent across the UNIs of all MARS Clients that are transmitting to a given group. As a clusters membership increases, so does the average number of MARS Clients that trigger ATM signaling activity in response to MARS_JOIN/LEAVEs. Armitage Expires January 12th, 1997 [Page 3] Internet Draft July 12th, 1996 The size of a cluster needs to be chosen to provide some level of containment to this ambient level of MARS and UNI/NNI signaling. Some refinements to the MARS Client behaviour may also be explored to smooth out UNI signaling transients. The MARS spec currently requires that revalidation of group memberships only occurs when the Client starts sending new packets to an invalidated group SVC. A Client could apply a similar algorithm to decide when it should issue ADD_PARTYs after seeing a MARS_JOIN - wait until it actually has a packet to send, send the packet, then initiate the ADD_PARTY. As a result actively transmitting Clients would update their SVCs sooner than intermittently transmitting Clients. This requires careful implementation of the Client state machine. 4. Group change latencies The group change latency can be defined as the time it takes for all the senders to a group to have correctly updated their forwarding SVCs after a MARS_JOIN or MARS_LEAVE is received from the MARS. This is affected by both the number of Cluster members and the geographical distribution of Cluster members. The number of Cluster members affects the ATM level signaling load offered as soon as a MARS_JOIN or MARS_LEAVE is seen. If the load is high, the ATM Cloud itself may suffer slow processing of the various SVC modifications that are being requested. Wide geographic distribution of Cluster members delays the propagation of MARS_JOIN/LEAVE and ATM UNI/NNI messages. The further apart various members are, the longer it takes for them to receive MARS_JOIN/LEAVE traffic on ClusterControlVC, and the longer it takes for the ATM network to react to ADD_PARTY and DROP_PARTY requests. If the long distance paths are populated by many ATM switches, propagation delays due to per-switch processing will add substantially to delays due to the speed of light. Unfortunately, some mechanisms for smoothing out the transient ATM signaling load described in section 3 have a consequence of increasing the group change latency (since the goal is for some of the senders to deliberately delay updating their forwarding SVCs) A related effect will also be felt by the MARS itself. The larger the MARS database, the longer it may take to process MARS_JOIN/LEAVE messages (which involve locating and updating individual group entries). Whilst this issue may not be important for conferencing applications (with group membership changes on the human time frame), high speed simulation environments may find such Armitage Expires January 12th, 1997 [Page 4] Internet Draft July 12th, 1996 considerations important. 5. Large IP/ATM networks using Mrouters Building a large scale, multicast capable IP over ATM network is a tradeoff between Cluster sizes and numbers of Mrouters. For a given number of hosts across the entire IP/ATM network, as cluster sizes drop you need more of them. Clusters must be interconnected by Mrouters, so the number of Mrouters rises. (The actual rise in the number of Mrouters depends largely on the logical IP topology you choose to implement, since a single physical Mrouter may interconnect more than two Clusters at once.) It is a local deployment question as to what the optimal mix of Clusters and Mrouters will be. A constructive way to view conventional Mrouters is that they are aggregation points for signaling and data plane loads. An Mrouter hides group membership changes in one cluster from senders within other Clusters, and protects local group members from being swamped by SVCs from senders in other Clusters. MARS_JOIN/LEAVE traffic in one Cluster is hidden from the members of all other Clusters. (The consequential UNI signaling load is localized to the source Cluster too.) Group members in a cluster are fed packets from an SVC originating on the MARS Client residing in their local Mrouter, rather than terminating multiple SVCs originating on the actual senders in remote Clusters. As a side effect of the Mrouters role in aggregating data path flows, it reduces the impact of SVC leaf-node limits. A hypothetical 10000 node Cluster could be broken into two 5000 node Clusters, or four 2500 node Clusters. In each case the individual Cluster members need only source pt-mpt SVCs with maximums of 5000 or 2500 leaf nodes respectively. 6. Large IP/ATM networks using Cell Switch Routers (CSRs) A Cell Switch Router may act as a conventional Mrouter, and provide all the benefits described in the previous section. However, one of the useful characteristics of the CSR is the ability to internally 'short-cut' the cells from an incoming VCC to an outgoing VCC. Once the CSR has identified a flow of IP traffic, and associated it with an inbound and outbound VCC, it begins to function as an ATM cell level device rather than a packet level device. Even when operating in a 'short-cut' mode the CSR is still able to protect Clusters from the MARS_JOIN/LEAVE activities of surrounding Clusters. From the perspective of Clusters to which the CSR is directly attached, the CSR terminates and originates pt-mpt SVCs. It acts as the path out of a source Cluster, and the entry point into a Armitage Expires January 12th, 1997 [Page 5] Internet Draft July 12th, 1996 target Cluster. It remains unnecessary for senders in one Cluster to issue ADD_PARTY or DROP_PARTY messages in response to group membership changes in other Clusters - the CSR tracks these changes, and updates the pt-mpt trees rooted on its own ATM ports as needed. However, there is one significant point of difference to a conventional Mrouter - a simple CSR cannot aggregate the packet flows from multiple senders in one Cluster onto a single SVC into an adjacent Cluster. Within a Cluster with multiple sources, the CSR is a leaf node on an individual SVC per source (just like a conventional Mrouter). But if it chooses to 'short-cut' traffic at the cell level to group members in another Cluster, it must construct a separate forwarding SVC into the target cluster to match each VCC from each sender in the source Cluster. This requirement stems from the need to maintain AAL_SDU boundaries at the ultimate recipients - the group members in the target cluster. If the cells from individual senders in the source Cluster were FIFO merged onto a single outgoing SVC into the target Cluster, recipients in the target Cluster would have a hard time reconstructing individual AAL_SDUs from the interleaved cells. (This is mostly due to our use of AAL5. AAL3/4 could provide a solution using the MID field, although we would be limited to 2^10 senders per Cluster and introduce a MID management problem.) Interestingly, this problem can magnify the UNI signaling load offered within the target Cluster whenever a new group member arrives. If there are N senders in the source Cluster, the CSR will have built N identical pt-mpt SVCs out to the group members within the target Cluster. If a new MARS_JOIN is issued within the target Cluster, the CSR must issue N ADD_PARTYs to update its SVCs into the target Cluster. (Under similar circumstances a conventional Mrouter would have issued only one ADD_PARTY for its single SVC into the target Cluster.) A possible solution is for the CSR's underlying cell switching fabric to provide AAL_SDU-aware cell forwarding. If segmented AAL_SDUs arriving from the source Cluster could be buffered and forwarded in groups of cells representing entire AAL_SDUs, the CSR would need only a single SVC into the target Cluster. Its impact on the Clusters it was attached to would then be the same as that of a conventional Mrouter. (This does not necessarily imply full re- assembly followed by segmentation. It would be sufficient for the incoming cells to be buffered in sequence, and the fed onto the outbound SVC. The CSRs switch fabric would not be performing any AAL level checks other than detecting AAL_SDU boundaries.) Armitage Expires January 12th, 1997 [Page 6] Internet Draft July 12th, 1996 7. The impact of Multicast Servers (MCSs) The MCS has an intra-Cluster affect somewhat analogous to the inter-Cluster affect of the Mrouter. It aggregates AAL_SDU flows around the Cluster into a single pt-mpt SVC. This single pt-mpt SVC is the only one that needs to be updated when an intra-cluster group membership change occurs. It also reduces the amount of MARS_JOIN/LEAVE traffic on ClusterControlVC - such messages for MCS supported groups are propagated out on ServerControlVC, thus interrupting only the (presumably smaller) set of MCSes attached to the MARS. One way to look at an MCS is a stripped-down Mrouter, operating intra-Cluster and performing minimal (if any) forwarding decisions based on IP level information. Whether the use of MCSs allows you to deploy larger Clusters depends on the mix of MCS supported groups and VC Mesh supported groups within your Cluster. 8. Conclusion This short document has provided a high level overview of the parameters affecting the size of MARS Clusters within multicast capable IP/ATM networks. Limitations on the number of leaf nodes a pt-mpt SVC may support, sizes of the MARS database, propagation delays of MARS and UNI messages, and the frequency of MARS and UNI control messages are all identified as issues that will constrain Clusters. Mrouters (either conventional or in Cell Switch Router form) were identified as useful aggregators of IP multicast traffic and signaling information. Large scale IP multicasting over ATM requires a combination of Mrouters and appropriately sized MARS Clusters. Security Consideration Security consideration are not addressed in this document. Acknowledgments Author's Address Grenville Armitage Bellcore, 445 South Street Morristown, NJ, 07960 USA Armitage Expires January 12th, 1997 [Page 7] Internet Draft July 12th, 1996 Email: gja@thumper.bellcore.com Ph. +1 201 829 2635 References [1] G. Armitage, "Support for Multicast over UNI 3.0/3.1 based ATM Networks.", Bellcore, INTERNET DRAFT, draft-ietf-ipatm-ipmc-12.txt, February 1996. Armitage Expires January 12th, 1997 [Page 8]