Robust Header Compression                               Peter J. McCann 
  INTERNET DRAFT                                               Tom Hiller 
  Document: draft-mccann-rohc-gehcoarch-02.txt        Lucent Technologies 
                                                               June, 2001 
   
   
      Requirements and Architecture for Header Stripping and Generation 
   
   
  Status of this Memo 
   
     This document is an Internet-Draft and is in full conformance with all 
     provisions of Section 10 of RFC2026 [Bradner96].  
      
     Internet-Drafts are working documents of the Internet Engineering Task 
     Force (IETF), its areas, and its working groups. Note that other groups 
     may also distribute working documents as Internet-Drafts. 
      
     Internet-Drafts are draft documents valid for a maximum of six months 
     and may be updated, replaced, or obsoleted by other documents at any 
     time. It is inappropriate to use Internet- Drafts as reference material 
     or to cite them other than as "work in progress."  
      
     The list of current Internet-Drafts can be accessed at 
     http://www.ietf.org/ietf/1id-abstracts.txt  
      
     The list of Internet-Draft Shadow Directories can be accessed at 
     http://www.ietf.org/shadow.html. 
      
   
  1. Abstract 
      
     Efficient transmission of voice over wireless links requires 
     significant engineering effort.  Because of the high cost of bandwidth 
     on such links, special techniques for compression of voice data and its 
     transmission over the air have been developed.  The compression 
     techniques and the wireless physical layers have been co-designed for 
     maximum spectral efficiency and human perceptual euphony. 
      
     Voice over IP (VOIP) applications should be able to leverage this 
     engineering effort when used over wireless links.  We advocate a 
     "header stripping and generation" approach to this problem in order to 
     enable the end-to-end service model while achieving maximum spectral 
     efficiency and simplicity of implementation.  This document outlines an 
     architectural framework for a wireless VOIP application, including the 
     wireless link layer and its interface to typical IP stack 
     implementations, and discusses the protocol elements that should be 
     standardized between the various components. 
      
      
  McCann, Hiller             Expires 08/2001                          1 
   
   
                                GEHCOARCH                February, 2001 
   
   
  2. Introduction 
      
     Voice over IP (VOIP) promises to change radically the way that 
     telephony services are built and delivered.  Integration of voice with 
     the Internet will not just be a change in the way traffic is carried; 
     rather, new types of services will be made possible by the integration 
     of voice with existing Internet applications such as the World Wide Web 
     and e-mail.  The key to these new services will be a platform that 
     offers open programmability while offering a transport for VOIP in an 
     integrated, robust, and efficient way. 
      
     Wireless links offer great challenges to the transport of voice 
     traffic, and significant engineering effort has gone into making them 
     efficient for circuit voice applications.  New voice compression 
     algorithms ("codecs"), such as EVRC [TIA-IS127], SMV [TIA-SMV], or AMR 
     [ETSI-AMR] have been developed to minimize the amount of data that must 
     be carried, and special over-the-air channels have been implemented to 
     carry these codecs with a minimum of overhead bits and minimal latency. 
      
     VOIP flows will be carried inside the Real-Time Protocol (RTP) 
     [Shulzrinne96] on wired links.  However, for wireless links, the 
     situation is less clear.  The limited bandwidth of wireless links makes 
     it impossible to transmit the entire IP/UDP/RTP header with every 
     packet, as the overhead would be prohibitive.  It is possible to 
     compress these headers by transmitting only updates to the fields that 
     change rather than the entire header [Bormann01], but these compression 
     schemes are complex and can never entirely eliminate the overhead due 
     to RTP.  Even when the header is compressed down to one byte per frame 
     on average, the impact on spectral capacity is significant.  Also, the 
     variable-sized frames produced by these compression protocols are 
     unsuitable for typical wireless links that support only a limited 
     number of frame sizes. 
      
     The fundamental reason why these schemes cannot achieve the same 
     efficiency as circuit data is that they discard information that is 
     available at the physical channel layer, including the real-time nature 
     of the traffic, which can assist in reconstructing the RTP header.  
     This document describes an architectural framework that allows such 
     real-time information to be used while not restricting the choice of 
     call control protocol, placement of call feature servers, or mobile 
     station architecture.   
      
     All work to date on header compression has taken as a basic requirement 
     that it will operate with no knowledge about the applications 
     generating the compressible packets, and therefore when a packet is 
     compressed and then decompressed, the result must be bit-for-bit 
     identical with the original packet.  However, we argue that many 
     applications, especially those that are only concerned with 
     transmission and playback of voice, can tolerate some amount of skew in 
     the reproduced RTP headers.  When a compressor/decompressor pair can 
     make these assumptions, very simple and efficient header compression 
    
  McCann, Hiller             Expires 08/2001                          2 
   
   
                                GEHCOARCH                February, 2001 
   
   
     can be performed.  Our architecture allows applications to indicate 
     their ability to tolerate such skew, and we discuss the conditions 
     under which applications may do so.  This allows us to implement a form 
     of header compression that makes use of existing circuit voice 
     implementations with minimal changes; we refer to this approach as 
     "header stripping and generation." 
   
      
  3. Wireless Technology Considerations 
      
     Cellular wireless technologies will support distinct bearer channels 
     for real-time audio flows versus non-real-time data.  Data for TCP, 
     such as web or e-mail traffic, will suffer from the lossy nature of the 
     wireless link unless a link-layer retransmission protocol is used to 
     improve its reliability.  Such a retransmission protocol (called the 
     Radio Link Protocol or RLP in the emerging cellular data networks) does 
     improve reliability but only at the expense of additional buffering and 
     latency [Fairhurst01].  Real-time audio streams cannot tolerate the 
     additional latency, which could be on the order of 1 second under 
     adverse radio conditions.  For this reason, a separate bearer channel 
     will be used for voice that does not perform retransmission.  This 
     bearer will be very similar to the existing circuit voice channels.  
     The architecture outlined below allows the mobile station to make 
     effective use of this channel for VOIP. 
      
     In addition to the two types of channels outlined above, there are 
     likely to be intermediate kinds of channels intended to carry various 
     kinds of IP multimedia data.  This data is somewhat more sensitive to 
     delay and less sensitive to loss than ordinary packet data, but is 
     unable to use the underlying link framing in the same manner as circuit 
     voice.  Such traffic will need to be carried in, e.g., HDLC, and will 
     be transported over an RLP that does few or no retransmissions to avoid 
     introduction of buffering and delay.   
      
     Because of these multiple channel types, each endpoint of the link will 
     need to know when to establish new channels and how to properly 
     allocate packet flows to channels.  Applying heuristics to guess which 
     flows should be allocated to which channels will not be acceptable; in 
     this case, unlike the heuristics used for bit-wise transparent header 
     compression, guessing wrong will do harm to application flows because 
     they will not be able to meet their real-time requirements. 
      
     The architecture of a mobile station should allow maximum flexibility 
     in its hardware and software choices.  Two basic mobile station models 
     have been identified in the wireless data community.  A "network model" 
     station is one that is completely integrated, such as a phone plus 
     browser or a palmtop with integrated radio hardware.  Such a device 
     usually has a real-time operating system, a DSP chip for processing the 
     audio codec, and an embedded IP stack implementation.  In contrast, a 
     "relay model" station is one that is split in two: it consists of a 
     piece of terminal equipment (such as a laptop computer) connected to a 
    
  McCann, Hiller             Expires 08/2001                          3 
   
   
                                GEHCOARCH                February, 2001 
   
   
     piece of radio equipment, usually by a serial connection.  The idea is 
     to make use of the mostly stock operating system on the terminal 
     equipment, while "relaying" the data to and from the wireless network 
     via the radio equipment.  We take the point of view that VOIP 
     applications must be supported for both kinds of mobile stations.  
     While network model phones will offer a tightly integrated set of 
     services, relay model stations are likely to offer a much more open and 
     programmable environment on the terminal equipment.  As these devices 
     evolve we expect the distinction between network and relay models will 
     blur as the wireless device moves closer to the UNIX notion of a 
     "network interface" to a stock operating system, and the operating 
     system evolves to take on more real-time functionality. 
      
     Both the relay and network model terminals are endpoints that both host 
     applications and terminate the complex wireless link described above.  
     In other words, the compressor/decompressor are always operating over 
     the last hop link.  This makes it simple for applications to express 
     their preferences to the compressor for how their packets should be 
     treated; an application can use a local software API for communicating 
     with the local compressor, and link-layer signaling can communicate 
     these preferences to the remote compressor one hop away.  While we 
     expect this will cover the vast majority of cases, there may also be a 
     requirement to support a router with this type of wireless link 
     interface, such as a phone that is acting as an IP-layer gateway for 
     many IP devices carried by a single user.  If voice flows are expected 
     to originate from such devices, then new signaling protocols must be 
     used to indicate to the (now remote) compressor how packets should be 
     treated.  Note that this not only includes the application's preference 
     for transparency, but also which kind of underlying wireless channel 
     should be used to carry the traffic.  Previous header compression 
     schemes have relied on heuristics to recognize which flows are RTP 
     traffic; because they were bit-for-bit transparent, they claimed that 
     choosing incorrectly did no harm to applications.  Because we relax 
     this bit-for-bit transparency requirement, we must be sure that a flow 
     belongs to an application that can tolerate skew.  Otherwise, the skew 
     could do harm to applications.  However, note that sending a packet 
     over an incorrect wireless channel would also do harm, because the 
     real-time performance needs of the application will not be met.  We 
     presume that some form of IP-layer signaling must be used to inform 
     routers how to allocate flows to channels, and we propose that 
     application transparency requirements can be carried at the same time. 
      
   
  4. Requirements 
      
      
     In this section we examine the environment in which zero-byte header 
     compression is expected to operate, including the required efficiency, 
     assumptions about applications, and concerns about simplicity. 
      
      
  McCann, Hiller             Expires 08/2001                          4 
   
   
                                GEHCOARCH                February, 2001 
   
   
  4.1 Efficiency 
      
     Approximate voice activity factors (probability distribution of frame 
     sizes) for the Selectable Mode Vocoder (SMV) are given in Figure 1.  
     These reflect one party's activity during a typical two-way interactive 
     voice call. 
      
               Rate           Activity %     Payload (bits) 
                
               Full              20              171 
               Half              20               80 
               Quarter           10               40 
               Eighth            50               16 
      
          Figure 1: Activity of the 3GPP2 Selectable Mode Vocoder 
      
     This vocoder is designed to operate synchronously with the underlying 
     physical channel: it outputs one of the above frame sizes every 20 
     milliseconds.  Which frame size is output depends on the 
     characteristics of the speech being compressed; typically, full-rate 
     (171 bit) frames are used during active talk spurts, interspersed with 
     half- and quarter-rate frames as needed.  Eighth-rate frames are used 
     mainly during silence periods, but they also contain information about 
     the noise components present in the silence, which is referred to as 
     "comfort noise generation".  Also, the physical link typically requires 
     that some frame be transmitted during every 20ms interval so that power 
     control can be maintained, and the eighth-rate frames play this role. 
      
     The cdma2000 air interface has been designed with these frame sizes in 
     mind, to support optimal transport of circuit voice.  It is not 
     possible to perform a marginal adjustment to the frame sizes to 
     accommodate header overhead.  This makes application of the basic ROHC 
     RTP profile problematic at best: if one byte of LSB-encoded sequence 
     number is added to a frame, it must be carried in the next-higher frame 
     format.  For a full-rate frame, there is no next-higher frame format 
     and so those frames could not be transported without breaking the 
     synchronization with the underlying physical link and introducing 
     additional framing, for example with the use of PPP HDLC flags or the 
     ROHC segmentation mechanism.  This would introduce another 1 or 2 bytes 
     of overhead per frame, and would also have a multiplier effect on the 
     frame error rate since most vocoder frames would now span two physical 
     frames.  Finally, this lack of synchronization would introduce an 
     occasional lag between the vocoded frame time and real time that could 
     add to the end-to-end latency and jitter of the RTP flow.   
      
     Even a very conservative calculation, assuming these problems can be 
     overcome and ignoring the contribution from eighth-rate 16 bit frames, 
     yields an additional 400 bits per second from the header and 
     segmentation overheads.  Compared to the average 3720 bps circuit voice 
     rate, this overhead (greater than 10%) would significantly diminish the 
     number of calls that can be handled in a given amount of spectrum.  We 
    
  McCann, Hiller             Expires 08/2001                          5 
   
   
                                GEHCOARCH                February, 2001 
   
   
     conclude that because the codec and physical link have been co-
     engineered to such tight tolerances, we should endeavor to use the 
     vocoder/physical link largely unchanged from its existing 
     implementation for circuit voice.   
      
   
  4.2 Application Assumptions 
      
     In order for real-time to serve as a proxy for the RTP sequence number, 
     it must be the case that the sequence number increments by one for 
     every physical layer epoch.  This would be true if the transmitter 
     sends a vocoded frame for every epoch, as is done by the existing 
     cdma2000 vocoders even during silence intervals.  Note that in 3G 
     systems the mobile node transmits continuously even during silence so 
     that the network may monitor power.  Note also that these frames are 
     not empty; they do carry information about the background noise 
     components during silence, known as "comfort noise". 
   
     We explicitly relax the assumption that reconstructed headers at the 
     decompressor are bit-for-bit identical to the headers seen by the 
     compressor.  Specifically, we note that for most VOIP applications, the 
     RTP sequence number and timestamp are primarily used to schedule frames 
     for playback over a relatively short interval.  Implementations 
     typically maintain a playback buffer of a few frames, and place 
     incoming voice samples into that buffer based on their timestamp and 
     sequence number.  Based on a running average of the buffer depth, 
     frames are discarded or silence is inserted according to whether the 
     buffer is too full or is running low, respectively.  Such a playback 
     buffer only needs the timestamps and sequence numbers to be relatively 
     accurate; that is, over short timescales, neighboring frames should 
     have neighboring timestamps and sequence numbers.  Any small, fixed 
     skew that is introduced into the packet stream will be quickly 
     corrected by the playback buffer mechanism. 
      
     However, not every application will be able to tolerate such skew.  
     Defining "non-transparent compression" as any compression that changes 
     bits end-to-end, we could make the following statements: 
      
       1. The end-system MUST be aware of any non-transparent compression. 
       2. The end-system MUST be able to turn off non-transparent 
          compression if it chooses. 
      
     Depending on the application semantics for each header field, we can 
     classify that field as follows: 
      
     BRITTLE         These fields are those that must be reconstructed by 
                     the decompressor so that they match bit-for-bit those 
                     seen at the compressor. 
      
     PLIANT          These fields are those for which the application can 
                     tolerate some form of skew. 
    
  McCann, Hiller             Expires 08/2001                          6 
   
   
                                GEHCOARCH                February, 2001 
   
   
     Note that BRITTLE fields can be either STATIC or CHANGING, where those 
     terms are used as in appendix A of ROHC [Bormann01]. 
      
     For simplicity, we assume that all PLIANT fields are CHANGING.  We note 
     that STATIC fields are easy to communicate precisely at initialization 
     time, so classifying such a field as PLIANT would not ease the 
     compression/decompression task.  However, we note that for the specific 
     application of wireless vocoders, we can often make stronger 
     assumptions about what fields are static.  For example, the Marker Bit 
     may never be used for EVRC in wireless applications [Li01].  This lets 
     us assume this field is STATIC. 
      
     The PLIANT fields can be further classified according to what kinds of 
     skew can be tolerated by applications.  For example, a RARELY-CHANGING 
     (RC) [Bormann01] field is updated infrequently and thereafter keeps its 
     new value.  We note that for most RC fields, it is not mandatory that 
     such a change reach the receiver in the exact packet it was changed by 
     the sender.  For example, a CSRC list can be updated in a somewhat 
     asynchronous manner; if the update is applied a few packets earlier or 
     later application semantics will not be affected.  Similarly, a 
     SEMISTATIC [Bormann01] field such as one used for congestion 
     notification does not need to be precisely synchronized with the 
     original packet in which it was set; as long as the receiver gets the 
     congestion notice in a reasonable amount of time it can take 
     appropriate action.  We refer to such fields as RC-PLIANT or SS-PLIANT. 
      
     For fields like the RTP Timestamp and Sequence number, we introduce a 
     new term: 
      
     OFFSET-PLIANT     These fields may be changed by some offset in the 
                       compression/decompression process.  These fields have 
                       a STATIC delta and are incremented by that delta with 
                       each packet.  The precise offset of the decompressor 
                       from the compressor is itself a RARELY-CHANGING 
                       value. 
      
     Note that some applications may impose semantics on fields that make 
     them BRITTLE.  For example, if SRTP is in use [Blom01] the sequence 
     number and/or timestamp must be matched precisely to the encrypted, 
     vocoded frame.  Also, if RTCP is being used to estimate round-trip 
     time, these estimates will be perturbed by the offset amount.  
     Applications may be able to tolerate different amounts of offset and it 
     may be important in the future to characterize the amount of offset 
     introduced by a particular implementation; however, for now we take a 
     purely qualitative approach. 
      
     Table 1 lists the CHANGING fields from the basic ROHC RTP profile 
     [Bormann01].  For each, we give the application assumptions on pliancy 
     that must hold for a header stripping/generation approach to preserve 
     IP and application semantics. 
    
  McCann, Hiller             Expires 08/2001                          7 
   
   
                                GEHCOARCH                February, 2001 
   
   
       +------------------------+-------------+--------------------------+ 
       |         Field          | Assumption  |  Note                    | 
       +========================+=============+==========================+ 
       | IPv4 Id:    Sequential |   PLIANT    | Start w/initial context  | 
       +------------------------+-------------+--------------------------+ 
       | IP TOS / Tr. Class     |  RC-PLIANT  | Probably never updated   | 
       +------------------------+-------------+--------------------------+ 
       | IP TTL / Hop Limit     |  RC-PLIANT  | Unimportant for last-hop | 
       +------------------------+-------------+--------------------------+ 
       | UDP Checksum: Disabled |   STATIC    | Checksum always disabled | 
       +------------------------+-------------+--------------------------+ 
       |                 No mix |   STATIC    |                          | 
       | RTP CSRC Count: -------+-------------+--------------------------+ 
       |                 Mixed  |  RC-PLIANT  | Update need not be sync. | 
       +------------------------+-------------+--------------------------+ 
       | RTP Marker             |   STATIC    | Disable for EVRC         | 
       +------------------------+-------------+--------------------------+ 
       | RTP Payload Type       |   STATIC    |                          | 
       +------------------------+-------------+--------------------------+ 
       | RTP Sequence Number    |OFFSET-PLIANT|                          | 
       +------------------------+-------------+--------------------------+ 
       | RTP Timestamp          |OFFSET-PLIANT|                          | 
       +------------------------+-------------+--------------------------+ 
       |                 No mix |      -      |                          | 
       | RTP CSRC List:  -------+-------------+--------------------------+ 
       |                 Mixed  |  RC-PLIANT  | Update need not be sync. | 
       +------------------------+-------------+--------------------------+ 
      
          Table 1 : Assumptions on the CHANGING header fields necessary for 
                    header stripping and generation. 
    
   
     We re-classify the IPv4 Identification field as PLIANT.  We assume that 
     IPv4 Identifiers can be generated at the decompressor by incrementing 
     from an initial value supplied by the compressor.  RTP packets should 
     not be fragmented, and the risk of an IPv4 Identifier collision with 
     another fragmented packet should be negligible.  If not, then we assume 
     at least that Identifiers are taken from a contiguous range and do not 
     need to be encoded with every packet.  Only when a new range of 
     Identifiers is chosen would an update need to be sent.  Note that it is 
     not important for such Identifiers to be identical to the ones visible 
     at the compressor, only that there be no collisions with other, 
     fragmented packets. 
      
     We re-classify the Traffic Class/TOS field as RC-PLIANT.  We note that 
     for a given flow, it will probably never be updated unless Explicit 
     Congestion Notification is in use.  ECN bits could be treated as RC-
     PLIANT or SS-PLIANT, based on future study.  It is not clear what the 
     benefit of ECN will be for low-bitrate flows such as EVRC; such a codec 
     will probably not respond to congestion notification. 
    
  McCann, Hiller             Expires 08/2001                          8 
   
   
                                GEHCOARCH                February, 2001 
   
   
     We re-classify the TTL/Hop Limit field as RC-PLIANT.  Note that for 
     last-hop links, this field will be constant in the uplink direction and 
     its value will be unimportant for the downlink direction, because IP 
     forwarding will not be performed.  This field only needs to be updated 
     if the header stripping/generation is operating over a non-last hop 
     link where there is a potential for routing loops.  Even if there is 
     the potential for routing loops, it is not necessary to update the TTL 
     in a precisely synchronized way; a strategy of eager decrease/lazy 
     increase, for example, would have the desired effect of stopping 
     routing loops while not introducing too much update overhead. 
      
     We re-classify the RTP Marker bit as STATIC for the applications of 
     interest.  The purpose of the Marker bit is to indicate where silence 
     may be inserted or removed in case of playback buffer/underflow.  We 
     note that wireless codecs will typically have their own methods of 
     detecting silence, such as the use of low-rate frames. 
      
     We re-classify the CSRC count and values as RC-PLIANT.  We assume that 
     CSRCs are updated rarely, if at all, and so these updates can be 
     carried over the sister reliable data link to the peer without imposing 
     much additional overhead.  There is no need to synchronize them 
     precisely with the packet in which they first appear. 
   
     Under the above assumptions, all BRITTLE fields are STATIC.  This will 
     allow header stripping/generation to work without adversely impacting 
     end-to-end semantics.  
   
   
  4.3 Simplicity 
      
     A major goal of this document is to support transport of voice over 
     existing cellular voice channels with little or no changes on the 
     supporting radio access equipment.  Allowing a solution to completely 
     strip out the header, transmitting only voice data on this channel, 
     will significantly aid that goal.  By not imposing any new format 
     requirements on the vocoded frames, we allow development of future 
     codecs to proceed with maximum flexibility. 
      
     The simplicity of the supporting header compression state machine must 
     also be considered.  Wireless devices are likely to be limited in both 
     power and memory budgets.  Network access servers, while they will be 
     implemented on larger footprint equipment, will need to support large 
     numbers of attached devices and so scalability is a key issue.  By 
     decoupling the header initialization and updates from the synchronous 
     voice traffic channel, it may be possible to achieve significant 
     simplifications in the header compression protocol state machine. 
      
      
  McCann, Hiller             Expires 08/2001                          9 
   
   
                                GEHCOARCH                February, 2001 
   
   
  5. Reference Architecture 
      
     Our reference architecture is shown in Figure 2. 
   
      
      Remote VOIP    Other NRT       VOIP       Zero-Byte  
       Application      Apps        Control------Control  
            \              \         /              | 
             \              \       /               | 
              +-------------IP Protocol            / 
                              Stack               / 
                                |                / 
                                |               / 
       Header Comp/  ------Data Link-----------+           Peer 
         Decomp      \        Layer                       System 
                      \___      |                            | 
                          \     |                            | 
        Audio       Codec  \    +-------------->Physical<----+ 
       Hardware<--->Impl <--+------------------>Channel(s)<--+ 
   
      Figure 2.  Reference architecture for a system implementing 
                 zero-byte header compression. 
   
   
     The architecture diagram consists of nine components connected to a 
     peer system by a collection of physical channels.  Note that we expect 
     zero-byte header compression to be somewhat asymmetric in that it will 
     usually be implemented between a mobile station, where the VOIP and 
     other applications reside, and a peer network entity that is just a 
     data link termination point and a first-hop Internet router.  As such, 
     the peer system in the network will likely be missing the audio 
     hardware and codec implementation, and may not participate in the VOIP 
     control.  Also, the mobile station may not need to actually perform 
     header compression and decompression if its codec implementation is 
     connected directly to the physical channel, which may be required to 
     achieve the desired latency guarantees. 
      
     The component named "Zero-Byte Control" would consist of the protocol 
     logic used to set up and maintain the zero-byte header compression 
     context.  
      
     In the following subsections we discuss each of the architectural 
     elements in turn.  The next section will discuss the interfaces between 
     them. 
      
      
  5.1 Non Real-time Components 
      
     It is important to distinguish between the real-time and non real-time 
     components of Figure 2.  This is especially important for a relay model 
     mobile station, as it impacts which elements of stock operating systems 
    
  McCann, Hiller             Expires 08/2001                         10 
   
   
                                GEHCOARCH                February, 2001 
   
   
     can be reused and which must be implemented as new real-time 
     extensions.  In this subsection we examine the non real-time 
     components. 
      
      
  5.1.1 VOIP Control 
      
     The VOIP control component is the implementation of the call signaling 
     protocol, such as SIP [Handley00] or H.323 [ITU-H323].  We make no 
     assumptions on which protocol is used, and we do not require the 
     network-side peer system to contain this element.  The mobile station 
     will use one of the VOIP signaling protocols to interact with call 
     feature servers that could be anywhere on the Internet. 
      
     We assume that this component will open network-layer connections and 
     will have access to the transport endpoint identifiers for the 
     IP/UDP/RTP flow.  However, we do not require this element to actually 
     process audio data; it will probably be implemented in user-space and 
     could add unpredictable latency to such flows, depending on operating 
     system characteristics. 
      
      
  5.1.2 Remote VOIP Application 
      
     If the RTP generating application is remote from the physical link, 
     i.e., there is at least one IP hop separating it from the compressor, 
     then it will not have direct access to the zero-byte control component.  
     Some network layer protocol must be introduced if it is to take 
     advantage of zero-byte header compression.  
      
      
  5.1.3 IP Protocol Stack Implementation 
      
     We assume that the mobile station implements an IP protocol stack in 
     conformance with RFC 1122 [Braden89].  Note that such an implementation 
     may not be capable of supporting hard real-time tasks. 
      
      
  5.1.4 Data Link Layer 
      
     The data link layer is the interface between the IP protocol stack and 
     the wireless network device.  For cdma2000, this will be PPP [TIA-
     IS835].  For GPRS, this will be LLC [ETSI-LLC], and for UMTS, this will 
     be PDCP [ETSI-PDCP]. 
      
     For cdma2000, we assume a mostly stock PPP implementation for 
     interaction with the physical channels that support data and perform 
     retransmission.  However, because the data link layer may not be a hard 
     real-time component, we would not require it to be on the audio traffic 
     path inside the mobile station. 
      
    
  McCann, Hiller             Expires 08/2001                         11 
   
   
                                GEHCOARCH                February, 2001 
   
   
  5.1.5 Zero-Byte Control 
      
     The Zero-Byte Control component is responsible for negotiating the use 
     of header stripping/generation with the peer system and for setting up 
     context information such as the fixed portion of the IP/UDP/RTP header.  
     It will interact with the VOIP control component to acquire these 
     parameters, and will send them across the data link layer to the peer 
     system.  It will also interact with the wireless device (possibly 
     through the data link layer) to establish the physical audio channels 
     and will identify the channel to be used when sending context 
     information to the peer system.   
      
      
  5.1.6 Other Non-Real-Time Applications 
      
     We expect the terminal equipment to be a general-purpose computer and 
     as such will have other applications running.  These applications may 
     interact with other components such as the IP protocol stack, but in 
     general will not be hard real-time tasks.  These applications must co-
     exist will all the other components. 
      
      
  5.2 Real-time Components 
      
     Because we make use of the real-time nature of the physical channel, 
     several components must be implemented as real-time tasks.  For a 
     network model phone, this is similar to existing practice: a tightly 
     integrated, real-time operating system on an embedded device schedules 
     the audio sampling and playback to coincide with the physical frame 
     rate of the wireless link.  For a relay model terminal, we wish to make 
     use of the audio hardware on the connected terminal equipment.  This 
     may require that the components be implemented using special real-time 
     extensions to existing stock operating systems. 
      
      
  5.2.1 Audio Hardware 
      
     The audio hardware consists of the analog-to-digital (A/D) and digital-
     to-analog (D/A) converters used for sampling and playing back sound, 
     along with the analog microphones and speakers.  In a network model 
     phone this consists of the integrated equipment that is part of the 
     phone.  In a relay model terminal it would be the "sound card" or other 
     audio peripheral. 
      
   
  5.2.2 Codec Implementation 
      
     The codec implementation converts the sampled audio to and from the 
     special wireless-specific encoding format.  For a network model phone, 
     this encoding is carried out on dedicated Digital Signal Processing 
    
  McCann, Hiller             Expires 08/2001                         12 
   
   
                                GEHCOARCH                February, 2001 
   
   
     (DSP) hardware.  In a relay model terminal, we assume this is performed 
     on the general purpose CPU of the terminal equipment. 
      
      
  5.2.3 Physical Channel 
      
     As mentioned before, there will be at least two physical channels 
     supporting the mobile station: one that runs RLP retransmission, 
     supporting the latency tolerant data applications; and another that 
     resembles a voice circuit.  VOIP control signaling will traverse the 
     data-oriented RLP channel, while the voice bearer traffic will traverse 
     the real-time circuit-like channel. 
      
     Both channels must be available to the upper layers regardless of 
     whether a relay model or network model terminal is used.  The voice 
     channel supports real-time traffic and performs no buffering.  It will 
     send a frame at precise, periodic intervals, such as 20 milliseconds 
     for cdma2000.  The codec implementation must be able to supply frames 
     for the physical channel at exactly this rate. 
      
      
  5.2.4 Header Compression/Decompression 
      
     The codec implementation may be directly connected to the physical 
     channel on the mobile terminal side, and so concrete IP/UDP/RTP headers 
     may not necessarily appear inside the mobile terminal.  However, we do 
     not prohibit a mobile terminal from reconstructing such headers if it 
     requires them.  This component is drawn next to the data link layer in 
     the diagram, and may in fact be integrated into the data link layer 
     implementation.  It is responsible for classifying each packet coming 
     down from the IP protocol stack against the fixed IP/UDP/RTP header 
     fields we are attempting to compress.  The value of these fields is 
     established by the Zero-Byte Control component and installed into the 
     header compression component, possibly via the data link layer.  Once 
     the header has been stripped this component must schedule the payload 
     for transmission on the physical layer at the appropriate frame 
     interval, according to the sequence number and timestamp received in 
     the header. 
      
     In the opposite direction, when packets arrive on the network side from 
     the physical channel, this component is responsible for regenerating 
     the proper IP/UDP/RTP header and passing the packet on to the IP 
     protocol stack.  It makes use of the physical arrival time to generate 
     the proper timestamp and sequence number in the RTP header. 
      
     Because the header compression/decompression component is sending and 
     receiving packets from the IP protocol stack, it may be implemented as 
     a soft real-time component.  However, it must interact with the 
     physical voice channel, which is a hard real-time component, both to 
     properly record the frame arrival time and to schedule outgoing packets 
     for transmission.  If the header compression/decompression is 
    
  McCann, Hiller             Expires 08/2001                         13 
   
   
                                GEHCOARCH                February, 2001 
   
   
     implemented in a separate network element from the physical channel, as 
     is likely to be the case in the emerging cellular architectures [TIA-
     IS835], then this interaction could be accomplished with the proper use 
     of sequence numbers on the interfaces between them so that each 
     physical frame carries the information about when it arrived or when it 
     is to be transmitted. 
      
   
  6. Interfaces 
      
     In this section we examine the interfaces between the above components.  
     We distinguish between those interfaces that should be implemented as 
     protocols, suitable for standardization in the IETF or elsewhere, and 
     those that should remain Application Programming Interfaces (APIs) that 
     may or may not need to be standardized. 
      
      
  6.1 Protocol Reference Points 
      
     In terms of new protocols, the interfaces that need to be standardized 
     are listed below.  Some of these interfaces are opportunities for IETF 
     protocols, while others should be carried out by other standards-
     setting organizations. 
      
      
  6.1.1 Zero-Byte Control to Data Link Layer 
      
     The Zero-Byte control component needs to negotiate the use of header 
     stripping/generation with its peer and convey the static portion of the 
     IP/UDP/RTP header to the peer.  This should be done in such a way that 
     the network side is not required to participate in the VOIP control 
     protocol.  This means the network side depends on the mobile station to 
     inform it what are the RTP flows that should be classified by the 
     header compression component as appropriate for sending over the 
     physical voice channel.  Rather than create a new network-layer 
     protocol, we advocate using new data link messages between the two 
     systems to convey this information.   
      
   
  6.1.2 Data Link Layer to Physical Channel 
      
     Mobile terminals running PPP will typically generate an octet stream 
     that is appropriate for an underlying physical channel running RLP.  
     However, prior to running PPP the mobile terminal must take steps to 
     establish the channel.  Also, we require that the terminal be able to 
     dynamically establish and release the voice channels used for real-time 
     audio.  For a network model phone this may be supported by APIs within 
     the phone, but for a relay model terminal this signaling needs to be 
     carried out across a serial port.  Such signaling is usually the 
     provenance of a modem control protocol ("AT commands") and 
     standardization is probably best carried out in the International 
    
  McCann, Hiller             Expires 08/2001                         14 
   
   
                                GEHCOARCH                February, 2001 
   
   
     Telecommunications Union (ITU).  Note that in addition to the usual 
     signaling to establish and release channels, we also need to obtain 
     identifying information for each channel.  This information will be 
     used by the Zero-Byte control component to communicate the initial 
     timestamp and sequence number offsets to the peer.  It must be possible 
     to signal this information during a running PPP session.   
      
     Additional real-time information from the physical channel may improve 
     the header compression.  For example, if the precise activation time of 
     the channel is known and can be correlated with the RTP packet flow, 
     the compressor could initialize and communicate the precise RARELY-
     CHANGING offset to the decompressor.  Precise information about other 
     events that affect the offset, such as handoffs, buffer over/underflow, 
     or clock drift between the physical channel and internal RTP timestamp, 
     would also be useful.  If properly engineered this would allow for even 
     OFFSET-PLIANT fields to be accurate most of the time, which could allow 
     applications like SRTP to function adequately. 
      
      
  6.1.3 Physical Channel to Codec or Header Compression/Decompression 
      
     As stated above, the physical channel could interface directly to the 
     codec implementation on the mobile station side and to a header 
     compression/decompression process on the network side.  For a network 
     model phone, the codec interface may be a proprietary API.  However, 
     for a relay model terminal, we must standardize a new way to transport 
     the frames across a serial connection in real-time.  This will require 
     that we multiplex the real-time frames with the non-real-time data for 
     PPP.  This multiplexing could be carried out with the use of escape 
     characters on the serial interface; again, this work is probably best 
     carried out within the ITU.  Any new special characters would need to 
     be properly inserted into the ACCM of the PPP implementation. 
      
     On the network side, the physical voice channel may be separated from 
     the header compression/decompression process by an IP network.  If this 
     is the case then each physical frame must carry a sequence number that 
     indicates the exact frame time that it was received or is to be 
     transmitted over the air.  Standardization of such interfaces is best 
     carried out within the 3rd Generation Partnership Projects (3GPPs). 
      
      
  6.1.4 Remote VOIP Application to Zero-Byte Control 
      
     If the RTP generating application is remote from the wireless link, 
     i.e., there is at least one IP hop separating it from the compressor, 
     then it will not have direct access to the zero-byte control component.  
     Some network layer protocol must be introduced if it is to take 
     advantage of zero-byte header compression.  This protocol could be 
     similar to the hints that have been introduced into RSVP [Davie00], 
     although we note that the flow specifications in RSVP are not likely to 

    
  McCann, Hiller             Expires 08/2001                         15 
   
   
                                GEHCOARCH                February, 2001 
   
   
     be flexible enough to specify packet flows that contain layers of 
     encapsulation. 
      
      
  6.2 API Reference Points 
      
     Other interfaces between the components are best done as Application 
     Programming Interfaces (APIs) and may or may not need to be 
     standardized.  In any case we do not advocate the standardization of 
     APIs within the IETF and we discuss these interfaces for illustration 
     purposes only. 
      
      
  6.2.1 VOIP Control to Zero-Byte Control 
   
     The VOIP control component is responsible for end-to-end VOIP signaling 
     such as SIP [Handley00] or H.323 [ITU-H323].  We expect these 
     applications to be implemented by many different people and to use 
     standard operating system interfaces.  Also, these applications should 
     work the same way when used in wireless or wireline settings, except 
     that the codecs should be tailored for the specific link layer 
     currently in use. 
      
     When used over wireless links, applications may want to make use of the 
     optimized real-time path outlined above (audio hardware to codec to 
     physical channel) rather than taking audio data into user space, 
     performing a user space codec transformation, constructing RTP packets, 
     and writing them to a standard UDP socket.  Such user space 
     manipulation of audio traffic could introduce unpredictable latency to 
     the flow, depending on the operating system characteristics. 
      
     To enable the optimized real-time path, the VOIP control protocol 
     should signal to the Zero-Byte control component that it has completed 
     VOIP signaling and is ready to begin audio bearer flow.  This signal 
     might be a system call containing the IP/UDP/RTP parameters that have 
     been negotiated and the codec to be used.  This system call would be a 
     one-line addition to existing VOIP client implementations. 
      
      
  6.2.2 Zero-Byte Control to Real-time Path 
      
     When the Zero-Byte control component receives a signal from the VOIP 
     control component that the VOIP signaling has been completed, it must 
     take the following steps: 
      
       1) Open the new physical voice bearer channel; 
        
       2) Send the peer system information about the flow, including the 
          static header fields and identification of the physical bearer 
          channel; and, finally, 
        
    
  McCann, Hiller             Expires 08/2001                         16 
   
   
                                GEHCOARCH                February, 2001 
   
   
       3) Trigger the audio hardware to begin sampling, and the codec 
          implementation to begin encoding/decoding. 
        
     The first step could be accomplished via an interface to the data link 
     layer, or may be accomplished directly.   In the second step, the 
     existing PPP connection is used to inform the peer what header fields 
     should be attached to the synchronous voice frames, beginning with any 
     convenient nearby starting point, such as the first frame received.  
     The third step requires interaction with the real-time components such 
     as the audio hardware and codec implementation, to enable the real-time 
     data to start flowing. 
      
     If the additional real-time channel information is available concerning 
     establishment time, handoff, clock drift, and buffer over/underflow, 
     then additional features could be implemented to improve the 
     transparency of the scheme.  Whenever an event takes place that 
     requires re-synchronization of the compression state, such as a 
     physical layer reset (hard handoff) or sequence number slippage due to 
     clock drift, the Zero-Byte control component would update its peer with 
     the appropriate state.  This update should include an offset, 
     calculated from the time the channel was established or reset, 
     indicating to which physical layer frame the update applies.  Such 
     offset-indicating updates should also be sent when any of the normally 
     static header fields, such as TTL, TOS, or CSRCs change.  This will 
     enable completely transparent decompression of RTP header fields for 
     most packets. 
      
      
  6.2.3 Header Compression/Decompression to Data Link Layer 
   
     The header compression component must classify all traffic from the IP 
     protocol stack as to whether it is part of the RTP flow that needs to 
     be sent on the voice physical channel.  Because it must examine each 
     packet, it will probably be fairly tightly integrated with the data 
     link layer. 
      
     The header decompression component produces IP packets from the 
     physical voice frames and sends them up the IP protocol stack.  Getting 
     packets to the IP protocol stack may be implemented by passing the 
     packets through the data link layer. 
      
      
  6.2.4 Other Interfaces 
      
     The mobile terminal potentially will be executing many simultaneous 
     applications and we expect all of the standard interfaces (network 
     sockets, GUI) to be present.  Note that ordinary applications may want 
     to use the audio hardware at the same time as a voice call is in 
     progress.  This could be disallowed, or a special "audio mixer" process 
     could be introduced between the audio hardware and the codec 
     implementation to allow such simultaneous access.  For example, a 
    
  McCann, Hiller             Expires 08/2001                         17 
   
   
                                GEHCOARCH                February, 2001 
   
   
     system beep noise might be mixed into the telephone call in such a way 
     that only the mobile terminal user would hear it. 
      
     Much ado has been made about the proper reconstruction of the IP 
     Identification field for each RTP packet.  We note that RTP payloads 
     are required to stay within the path MTU [Handley99] and should never 
     experience fragmentation.  However, in order to avoid any possibility 
     of Identification field collision with other packets that may be 
     fragmented, a new interface could be implemented between the Zero-Byte 
     control and the IP protocol stack to "reserve" a range of 
     Identification values for use by the RTP flow.  If the header 
     decompression component always increments the Identification field by 
     one for each reconstructed header, and wraps around to the beginning 
     when the range is about to overflow, then no additional work is 
     necessary to ensure uniqueness of IP Identification fields. 
      
      
  7. Conclusions 
      
     This draft has presented an architecture for zero-byte header 
     compression and its implications for both a mobile station and the 
     supporting network.  On the network side, with this architecture the 
     peer in the network does not need to be aware of the VOIP control 
     between the mobile and a SIP/H323 server that could be anywhere in the 
     network.  When the header compression/ decompression is performed in a 
     network element that is physically separated from the physical channel 
     (e.g. a PDSN from 3GPP2 [TIA-IS835]), the hard real-time requirements 
     on this element can be alleviated through the proper use of sequence 
     numbers on its interface to the radio channel elements. 
      
     On the mobile side, this draft provides high level requirements for 
     support of zero-byte header compression in the form of protocol 
     interfaces and APIs.  Both monolithic network style mobiles as well as 
     relay phone mobiles with laptops are discussed.  Proper architecture of 
     the mobile station allows the segregation of hard real-time processing 
     from the non-real-time IP stack and applications.  Furthermore, 
     convergence of wireline and wireless applications is a long-standing 
     goal in the wireless industry.  This architecture allows mobile end 
     systems to run VOIP based applications developed for wireline access to 
     operate in the wireless environment (although with wireless-specific 
     codecs). The impact on VOIP applications could be as little as one line 
     of code in the VOIP client itself. 
      
     Finally, the draft has outlined protocol work items suitable for the 
     IETF as well as external standards bodies, including the ITU and 3rd 
     Generation Partnership Projects.  Any necessary APIs could be 
     standardized by a collaboration between operating system vendors (open 
     source or otherwise) and third party application developers, driven by 
     wireless service providers. 
      
      
  McCann, Hiller             Expires 08/2001                         18 
   
   
                                GEHCOARCH                February, 2001 
   
   
  8. References 
      
     [Bormann01]    Bormann, C. (ed.), "RObust Header Compression (ROHC)," 
                    RFC 3095, March 2001. 
      
     [Braden89]     Braden, R. (ed.), "Requirements for Internet Hosts -- 
                    Communication Layers," RFC 1122, October 1989.   
      
     [Bradner96]    Bradner, S., "The Internet Standards Process, Revision 
                    3," RFC 2026, October 1996. 
      
     [Davie00]      Davie, B., Iturralde, C., Oran, D., Casner, S., 
                    Wroclawski, J., "Integrated Services in the Presence of 
                    Compressible Flows," RFC 3006, November 2000. 
      
     [ETSI-AMR]     European Telecommunications Standards Institute, 
                    "Adaptive Multi-Rate (AMR) Speech Transcoding," 3G TS 
                    26.090, February 2000. 
      
     [ETSI-LLC]     European Telecommunications Standards Institute, GSM 
                    04.64. 
      
     [ETSI-PDCP]    European Telecommunications Standards Institute, 3G TS 
                    25.323. 
      
     [Fairhurst01]  Fairhurst, G., and Wood, L., "Link ARQ Issues for IP 
                    Traffic," draft-ietf-pilc-link-arq-issues-01.txt, March 
                    2001.  Work In Progress. 
      
     [Handley99]    Handley, M., and Perkins, C., "Guidelines for Writers of 
                    RTP Payload Format Specifications," RFC 2736, December 
                    1999. 
      
     [Handley00]    Handley, Schulzrinne, Schooler, Rosenberg, "SIP: Session 
                    Initiation Protocol," draft-ietf-sip-rfc2543bis-01.txt, 
                    August 2000.  Work In Progress. 
      
     [ITU-H323]     International Telecommunications Union, "Packet Based 
                    Multimedia Communications Systems," ITU-T Rec. H.323, 
                    September 1999. 
      
     [Li01]         Li, A., (editor), "An RTP Payload Format for EVRC 
                    Speech," draft-ietf-avt-evrc-03.txt, May 2001.  Work In 
                    Progress. 
      
     [Shulzrinne96] Schulzrinne, H., Casner, S., Frederick, R., and 
                    Jacobson, V., "RTP: A Transport Protocol for Real-Time 
                    Applications," RFC 1889, January 1996. 
      
     [TIA-IS127]    Telecommunications Industry Association, "Enhanced 
                    Variable Rate Codec, Speech Service 3 for Wideband 
    
  McCann, Hiller             Expires 08/2001                         19 
   
   
                                GEHCOARCH                February, 2001 
   
   
                    Spread Spectrum Digital Systems," TIA/EIA/IS-127, 
                    February 1997. 
      
     [TIA-IS835]    Telecommunications Industry Association, "Wireless IP 
                    Network Standard," TIA/EIA/IS-835, June 2000. 
      
     [TIA-SMV]      Telecommunications Industry Association, "Selectable 
                    Mode Vocoder Service Option for Wideband Spread Spectrum 
                    Communication Systems," TIA PN4575, 3GPP2 C.P9001, 1997. 
      
      
  9. Authors' Addresses 
      
     Peter J. McCann 
     Lucent Technologies 
     Rm 2Z-305 
     263 Shuman Blvd 
     Naperville, IL  60566-7050 
     USA 
      
     Phone: +1 630 713 9359 
     FAX:   +1 630 713 4982 
     EMail: mccap@lucent.com 
      
     Tom Hiller 
     Lucent Technologies 
     Rm 2F-218 
     263 Shuman Blvd 
     Naperville, IL  60566-7050 
     USA 
      
     Phone: +1 630 979 7673 
     FAX:   +1 630 979 7673 
     EMail: tom.hiller@lucent.com 
      
      
  Acknowledgements 
      
     Thanks to Paul Francis for some of the terminology and concepts 
     introduced in Section 4.2. 
   
      
  McCann, Hiller             Expires 08/2001                         20 
   
   
                                GEHCOARCH                February, 2001 
   
   
  Intellectual Property Statement 
      
     The IETF takes no position regarding the validity or scope of any 
     intellectual property or other rights that might be claimed to pertain 
     to the implementation or use of the technology described in this 
     document or the extent to which any license under such rights might or 
     might not be available; neither does it represent that it has made any 
     effort to identify any such rights.  Information on the IETF's 
     procedures with respect to rights in standards-track and standards-
     related documentation can be found in BCP-11.  Copies of claims of 
     rights made available for publication and any assurances of licenses to 
     be made available, or the result of an attempt made to obtain a general 
     license or permission for the use of such proprietary rights by 
     implementers or users of this specification can be obtained from the 
     IETF Secretariat. 
      
     The IETF invites any interested party to bring to its attention any 
     copyrights, patents or patent applications, or other proprietary rights 
     that may cover technology that may be required to practice this 
     standard.  Please address the information to the IETF Executive 
     Director. 
      
   
  Full Copyright Statement 
   

     Copyright (C) The Internet Society (2001). All Rights Reserved. This 
     document and translations of it may be copied and furnished to others, 
     and derivative works that comment on or otherwise explain it or assist 
     in its implementation may be prepared, copied, published and 
     distributed, in whole or in part, without restriction of any kind, 
     provided that the above copyright notice and this paragraph are 
     included on all such copies and derivative works. However, this 
     document itself may not be modified in any way, such as by removing the 
     copyright notice or references to the Internet Society or other 
     Internet organizations, except as needed for the purpose of developing 
     Internet standards in which case the procedures for copyrights defined 
     in the Internet Standards process must be followed, or as required to 
     translate it into languages other than English. 
      
     The limited permissions granted above are perpetual and will not be 
     revoked by the Internet Society or its successors or assigns. 
      
     This document and the information contained herein is provided on an 
     "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 
     TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT 
     NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL 
     NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR 
     FITNESS FOR A PARTICULAR PURPOSE. 


  McCann, Hiller             Expires 08/2001                         21