Internet Engineering Task Force                        AVT  Working Group
Internet Draft                                 J.Rosenberg, H.Schulzrinne
draft-ietf-avt-aggregation-00.txt                   Bell Labs/Columbia U.
May 6, 1998
Expires: November 6, 1998


              An RTP Payload Format for User Multiplexing

STATUS OF THIS MEMO

   This document is an Internet-Draft. Internet-Drafts are working docu-
   ments of the Internet Engineering Task Force (IETF), its areas, and
   its working groups.  Note that other groups may also distribute work-
   ing documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference mate-
   rial or to cite them other than as ``work in progress''.

   To learn the current status of any Internet-Draft, please check the
   ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

   Distribution of this document is unlimited.


                                 ABSTRACT


        This memo describes an RTP payload format for multiplexing
        data from multiple users into a single RTP packet. Such
        multiplexing is especially useful for transporting voice
        data between Internet telephony gateways. It causes signif-
        icant reductions in header overheads and improves scalabil-
        ity.

1 Introduction

   Internet telephone gateways (ITGs) allow a public switched telephony
   user (PSTN) user to contact another PSTN user, with the long distance
   portion of the call routed over the Internet. Such a scenario is
   depicted in Figure 1.


J.Rosenberg, H.Schulzrinne                                    [Page 1]


Internet Draft                  RTP Mux                   April 24, 1998


           ~~~~~~~~    -------     ~~~~~~~~~~     -------    ~~~~~~~~
      A --|        |  |       |   |          |   |       |  |        |-- C
          |  PSTN  |--|  ITG  |---|  IP NET  |---|  ITG  |--|  PSTN  |
      B --|   X    |  |   J   |   |          |   |   K   |  |   Y    |-- D
           ~~~~~~~~    -------     ~~~~~~~~~~     -------    ~~~~~~~~

   Figure 1: Internet telephony gateway architecture

   Subscribers A and B connect to ITG J via their local telephone net-
   work, X. A wishes to speak with user C, and B wishes to speak with
   user D, both of which are connected to local phone network Y.  To
   complete the call, ITG J packetizes and transports the voice to and
   from A and B through the IP network, to remote gateway K. There, ITG
   K completes the calls to C and D through PSTN Y. This type of
   arrangement and common destination may be particularly common for
   connecting the PBXs of corporate branch offices across the Internet.

   In this scenario, ITGs J and K act as Internet hosts, which are
   effectively proxies for the telephone users connected to them. Unlike
   typical Internet telephony, however, their will often be multiple
   active calls between a pair of gateways, each representing a differ-
   ent pair of users. Gateways can signal calls using SIP [1], H.323 or
   proprietary signalling protocols. Media data is transported via a
   separate RTP [2] session for each user.

   We observe that using a separate RTP session for each user connected
   between a pair of gateways is wasteful. The payloads carried in each
   packet are generally small. For example, the ITU G.729 speech coder
   [3] generates a rate of 8 kb/s in frames of 10 ms duration.  If
   packed three frames per packet, the resulting RTP payloads are 30
   bytes long. The IP, UDP and RTP headers add up to 40 bytes, resulting
   in a packet efficiency of only 43%.

   On the other hand, suppose the payloads from both users are multi-
   plexed into the same RTP session and packet. A multiplexing protocol
   is now required to delineate the packets. The protocol defined here
   typically adds 16 bits of overhead per multiplexed user. In the two-
   subscriber example above, this would allow an RTP packet to be con-
   structed with 60 bytes of useful payload and 41 bytes of header, the
   efficiency improves to 59%. Since most voice trunks can carry at
   least 24 calls at a time, we anticipate much better efficiencies in
   practice, making IP-based interconnection competitive, from an effi-
   ciency standpoint, with leased lines.

   A further benefit of multiplexing is a potential reduction in packe-
   tization delays. Most Internet telephony applications use fairly
   large packetization delays, mainly for the purpose of raising the


J.Rosenberg, H.Schulzrinne                                    [Page 2]


Internet Draft                  RTP Mux                   April 24, 1998


   size of the payloads to increase efficiency. However, if multiplexing
   is performed, the packet payload increases. This would allow smaller
   packetization delays to be used as the number of multiplexed users
   increases.

   Yet another benefit is the reduction in interrupt processing at the
   receiving ITG. Whenever a packet arrives at the gateway, the operat-
   ing system must perform a context switch into the kernel and process
   the packet. Without aggregation, the frequency of these interrupts
   increases linearly with the number of users. However, with aggrega-
   tion, the packet rate does not increase as more users are added, and
   thus the interrupt rate stays constant. This improves scalability.

   The main drawback to multiplexing is the increase in store-and-
   forward delays. These delays are often most problematic in end sys-
   tems, which are typically connected via dialup modems. In this appli-
   cation, ITGs are likely connected to the Internet via higher rate
   connections, such as a T1. Assuming 24 users are multiplexed into a
   packet, using the same codec and packetization delays as above, store
   and forward delays are only 3.7 ms, which are relatively low compared
   to typical queueing and propagation delays.

   This draft describes an RTP payload format for supporting multiplex-
   ing of many users into a single RTP session. It is based on an ear-
   lier expired Internet Draft [4]

2 Payload Format

   The format of RTP packets with multiplexed users is given in Figure
   2.


   Figure 2: Packet format for multiplexing

   All fields of the RTP header except the timestamp, marker bit and
   SSRC maintain their current definition.  Payload Type: The payload
   type field designates the RTP packet as a multiplexed payload. The
   payload type value is chosen dynamically and the binding to this for-
   mat is conveyed via non-RTP means, such as SDP [1].  Timestamp: This
   protocol requires that all multiplexed streams in one packet have the
   same clock rate (i.e., sampling rate for audio) and generate media
   frames at integer multiples of a common frame duration. It is possi-
   ble, for example, that a set of users generates a packet every 10 ms,
   while others generate packets at intervals of 20 and 30 ms, but all
   frame generation instants must be multiples of this 10 ms interval.
   (We expect this to be the common case for ITGs.  Otherwise, each
   media frame would require a timestamp offset or an independent


J.Rosenberg, H.Schulzrinne                                    [Page 3]


Internet Draft                  RTP Mux                   April 24, 1998


       0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     |                          RTP Header                           |
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     |                          mux header                           |
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                       user 1 media data                       |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                       user 2 media data                       |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                       user 3 media data                       |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                                 ......
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                       user N media data                       |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   timestamp, significantly increasing overhead.) This issue is dis-
   cussed further in Section 3.  Marker Bit: This field is not used for
   multiplexing and always has a value of zero. A marker bit is included
   for each user in the multiplexing header (see below).  SSRC: This
   field is used to identify groups of users (instead of a single user)
   whose frames are time synchronized. This is described further in Sec-
   tion 3.

   The format of the multiplexing header is shown in Fig. 3.


       0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      user 1 header            |          user 2 header        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      user 3 header            |          user 4 header        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                                   ......
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      user N-1 header          |  user N header/padding        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


J.Rosenberg, H.Schulzrinne                                    [Page 4]


Internet Draft                  RTP Mux                   April 24, 1998


   Figure 3: Multiplexing header

   Each media data frame that is being multiplexed is associated with a
   header, 16 bits (optionally, 32) in length. These headers indicate,
   among other things, the length of the media data for user i present
   in the packet (call this length Li). To determine the number of users
   present, user headers are read until the sum of the lengths Li
   encoded in them is equal to the remaining RTP packet length.

   The format of each user header is shown in Fig. 4:


       0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |M|      PT     |X|       ID      |          Len                |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   Figure 4: Format of user header

   The fields have the following meaning: M: The marker bit has the same
   definition as in [5], but applies only to the particular media frame.
   PT: This is the payload type corresponding to the media frame.  It is
   anticipated that this field will also generally be sufficient to
   determine the length of the data for the user. For example, a PT
   value of 26 might indicate that the payload is 10 bytes of G.729 com-
   pressed speech. The method by which the payload types are bound to
   particular codec types and length values is outside the scope of this
   document. It is envisioned, however, that signaling protocols such as
   SIP [6] together with a session description protocol such as SDP or
   H.323 could be used for this purpose.  L: The length flag; if set to
   one, the 16-bit length field is present.  ID: The ID field is a 7-bit
   key used to identify the user to the remote end system. It associates
   the media data with the user identifiers contained in signaling pro-
   tocols. For example, one gateway may set up a call through another,
   indicating that user 38 is calling PSTN number (800) 555-1212. In
   order to know which user data corresponds to user 38, the ID value in
   the header would be set to 38.  The value of zero is reserved (see
   below).  Length: In cases where the payload type is not sufficient to
   determine the length of the user media data, a 16-bit length field
   can be included. This field contains the length of the user media
   data, in bytes. It is only present if the L bit has a value of one.

   The user header fields appear immediately after the RTP header, inde-
   pendent of their size. If the last user header does not end on a 32-
   bit boundary, an additional user header with a distinguished user ID


J.Rosenberg, H.Schulzrinne                                    [Page 5]


Internet Draft                  RTP Mux                   April 24, 1998


   of zero is added. The other fields in that padding header MUST be
   zero.

   The user media frames follow after the multiplexing header, packed
   without any padding or headers between. The first user data field
   contains the data for the user defined by the first user header
   field, and so on.

   Note that not every RTP packet has to contain media data frames from
   all active users. For example, if the packetization interval for a
   particular user is twice that of another user, only every other
   packet will contain media frames from both users.

3 Multiple Packets

   In some cases, it may not be possible to multiplex all users together
   into the same packet. This could be because the timestamps are not
   all the same, or because the tramsitter wants to restrict the packet
   size.  We define a set of users whose data are placed into the same
   packet as a user group. Each user group is sent in a separate packet.

   To distinguish packets from different user groups, the SSRC field is
   used. The SSRC field is always the same across two packets from the
   same user group, and always different between two packets in differ-
   ent user groups. Packets from different user groups have different
   sequence number spaces as well. Each user group can essentially be
   considered a different virtual user, and it is for this reason that
   we use the SSRC field to identify them.

   Note that the users are only identified by their ID field, not by the
   SSRC value of their user group. This allows a user to migrate from
   user group to user group. Such migration may be necessary if the user
   chooses to change media coder, which would affect its timestamp fre-
   quency. The size of the ID field imposes a limitation of 127 users in
   a multiplexed RTP session. Additional RTP sessions would need to be
   opened if this number is not sufficient. Users may not migrate from
   session to session, since ID's are only guaranteed unique within a
   single RTP session.

4 Impact on QoS

   In this section, we discuss the impact of the multiplexing protocol
   on end to end QoS.

4.1 Loss

   At first glance, it may seem that multiplexing linearly increases the
   impact of packet loss on per user loss rates (assuming Bernoulli


J.Rosenberg, H.Schulzrinne                                    [Page 6]


Internet Draft                  RTP Mux                   April 24, 1998


   packet losses). However, this is not the case. The fact that there
   are more users per packet is exactly offset by the decrease in the
   number of packets transmitted, yielding no difference between multi-
   plexing and non-multiplexing. Mathematically, one can consider a
   stream of packets as a Bernoulli process Xi. In the case of multi-
   plexing, media frames from a particular user are present in each
   packet Xi, whereas without, media frames are only present in some
   subset, XNj, j=0..inf. However, the average value of the Bernoulli
   process and its subsampled version are identical, so the observed
   loss rate is unchanged.

   In reality, losses aren't Bernoulli. However, multiplexing is likely
   to reduce losses on the Internet, for several reasons. First, the
   improved efficiencies mean the overall bitrate for the stream is
   less. This has the effect of helping reduce congestion, which causes
   losses in the first place. Secondly, many routers drop packets not
   because of link congestion and buffer overflow, but because of pro-
   cessing overload. A burst of small packets can overwhelm the proces-
   sors on a typical router, causing loss. Thus, a critical characteris-
   tic of a router is the number of packets per second it can process.
   The multiplexing protocol has the advantage of keeping the packet
   rates low, which can help reduce process-based losses in routers.
   Finally, many routers use packet based queues, not byte based. Thus,
   N packets, each of size 1/N, is N times as much buffer resource occu-
   pancy of 1 packet of size 1. Multiplexing therefore helps improve
   buffer usage as well.

   Put together, we expect multiplexing to therefore improve end to end
   observed loss rates.

4.2 Delay

   One of the costs of multiplexing is delay - not packetization delay
   (which can be effectively reduced by multiplexing) but store and for-
   ward delays. These delays are directly proportional to link band-
   widths. For modem speeds, the increased delays would be unacceptable.
   However, multiplexed voice is likely to be generated by gateways with
   much higher link speeds (T1, for example). Assuming all links are T1,
   the delay for storing and forwarding 20 multiplexed users over 5 hops
   is 2.2ms (assuming again G.729 with 30 ms packetization delays),
   which is far less than other delays in the system.

   Users of the multiplexing protocol with concerns about delays can
   always opt to use as few users per user group as they feel comfort-
   able with.

4.3 Jitter


J.Rosenberg, H.Schulzrinne                                    [Page 7]


Internet Draft                  RTP Mux                   April 24, 1998


   The amount of jitter introduced by the multiplexing protocol depends
   entirely on its usage. A system which uses a common payload type and
   packetization delay among all users in a user group will suffer no
   additional jitter through multiplexing.

   However, schemes which involve users changing user groups and payload
   types, and which involve mixing together different frame sizes per
   packet, may result in additional jitter. Once again, it is up to the
   administrator to make the appropriate tradeoff.

5 Security Considerations

   There are no security considerations beyond those addressed in RTP
   itself. The multiplexing protocol can make use of whatever encryption
   and authentication schemes are present in RTP, SIP, H.323 or other
   relevant protocols.

6 Open Issues

   There are a few open issues:

     1.   The multiplexing gain is based entirely on the assumption of
          synchronized packet generation among some group of users. It
          is possible to achieve gains without this assumption by intro-
          ducing timestamp offsets in the user header. The result is an
          increase in jitter and header overheads, and for this reason
          we have not taken this route. However, how valid is our
          assumption of synchronization for gateways?

     2.   Should the length field be made mandatory?

     3.   What support is needed for this in H.323 and SIP?

     4.   How does this relate to the MPEG4 Flexmux encapsulation?

7 Conclusion

   This document has specified an RTP payload format allowing multiple
   user media frames to reside in an RTP packet. This multiplexing is
   very useful for ITG-to-ITG communications, where it can reduce packet
   header overhead and improve gateway scalability.

8 Full Copyright Statement

   Copyright (C) The Internet Society (1998). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implmentation may be prepared, copied, published and
   distributed, in whole or in part, without restriction of any kind,


J.Rosenberg, H.Schulzrinne                                    [Page 8]


Internet Draft                  RTP Mux                   April 24, 1998


   provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.

   However, this document itself may not be modified in any way, such as
   by removing the copyright notice or references to the Internet Soci-
   ety or other Internet organizations, except as needed for the purpose
   of developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be fol-
   lowed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MER-
   CHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."

9 Authors' Addresses

   Jonathan Rosenberg
   Rm. 4C-526
   Bell Laboratories, Lucent Technologies
   101 Crawfords Corner Rd.
   Holmdel, NJ 07733
   electronic mail:  jdrosen@bell-labs.com

   Henning Schulzrinne
   Dept. of Computer Science
   Columbia University
   1214 Amsterdam Avenue
   New York, NY 10027
   USA
   electronic mail:  schulzrinne@cs.columbia.edu

10 Bibliography

   [1] M. Handley and V. Jacobson, SDP: session description protocol,
   Request for Comments (Proposed Standard) 2327, Internet Engineering
   Task Force, Apr.  1998.

   [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, RTP: a
   transport protocol for real-time applications, Request for Comments
   (Proposed Standard) 1889, Internet Engineering Task Force, Jan. 1996.


J.Rosenberg, H.Schulzrinne                                    [Page 9]


Internet Draft                  RTP Mux                   April 24, 1998


   [3] International Telecommunication Union, Coding of speech at 8
   kbit/s using conjugate-structure algebraic-code-excited linear-
   prediction, Recommendation G.729, Telecommunication Standardization
   Sector of ITU, Geneva, Switzerland, Mar. 1996.

   [4] J. Rosenberg and H. Schulzrinne, Issues and options for an aggre-
   gation service within RTP, Internet Draft, Internet Engineering Task
   Force, Nov. 1996.  Work in progress.

   [5] H. Schulzrinne, RTP profile for audio and video conferences with
   minimal control, Request for Comments (Proposed Standard) 1890,
   Internet Engineering Task Force, Jan. 1996.

   [6] M. Handley, H. Schulzrinne, and E. Schooler, SIP: Session initia-
   tion protocol, Internet Draft, Internet Engineering Task Force, Mar.
   1998.  Work in progress.


J.Rosenberg, H.Schulzrinne                                   [Page 10]