Internet Engineering Task Force                      Rolf Blom, Ericsson
AVT Working Group                           Elisabetta Carrara, Ericsson
INTERNET-DRAFT                                    David A. McGrew, Cisco
Expires: December 2001                            Mats Naslund, Ericsson
                                                  Karl Norrman, Ericsson
                                                       David Oran, Cisco

                                                               July 2001


                    The Secure Real Time Transport Protocol
                            <draft-ietf-avt-srtp-01.txt>


Status of this memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or cite them other than as "work in progress".

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/lid-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html


Abstract

   This document describes the Secure Real Time Transport Protocol
   (SRTP), a profile of the Real Time Transport Protocol (RTP) which can
   provide confidentiality, message authentication (in groups, also
   source origin authentication), replay protection, and implicit header
   authentication.

   SRTP can achieve high throughput and low packet expansion by using an
   additive stream cipher for encryption, a universal hashing based
   function for message authentication, and an 'implicit' index for
   sequencing based on the RTP sequence number.


Blom, et al.                                                    [Page 1]


INTERNET-DRAFT                    SRTP                    February, 2001


   Robust and flexible re-keying/access control to media can be achieved
   through an optional security parameter index (SPI).

   In addition, SRTP proves to be a suitable protection for
   heterogeneous environments, i.e. environments including both wired
   and wireless links.

   TABLE OF CONTENTS

   1. Notational Conventions.........................................3
   2. Goals..........................................................3
   3. SRTP Overview..................................................4
   3.1 SRTP Cryptographic Contexts...................................5
   3.2 Mapping SRTP Packets to Cryptographic Contexts................6
   3.3 SRTP Packet Processing........................................7
   3.4 Cryptographic Algorithms......................................8
   4. Synchronization................................................9
   4.1 Packet Index Determination....................................9
   4.2. IV Formation for Implicit Header Authentication ............10
   5. Replay Protection.............................................11
   6. Encryption....................................................11
   6.1 Defined Ciphers..............................................12
   6.1.1. Counter Mode AES..........................................12
   6.1.2. AES in f8-Mode............................................13
   6.1.3. NULL Cipher...............................................15
   7. Message Authentication........................................15
   7.1. Non-delayed Message Authentication..........................15
   7.1.1 Default MAC: UMAC..........................................16
   7.2 Delayed Message Authentication...............................16
   7.2.1. TESLA.....................................................17
   7.3. Compound Authentication Tag.................................17
   8. SRTP Parameters...............................................17
   9. Secure RTCP...................................................18
   10. Rationale....................................................21
   10.1 Synchronization.............................................21
   10.2 Replay Protection...........................................22
   10.3 Source Origin Authentication considerations.................22
   10.4. Choice of Encryption Transform.............................23
   11. Security Considerations......................................23
   11.1. SSRC collision.............................................24
   11.2. Confidentiality of the RTP Payload.........................25
   11.3. Confidentiality of the RTP Header..........................25
   11.4. Integrity of the RTP header................................25
   12. Multicast and many-to-many...................................26
   13. Key management ..............................................26
   13.1. Security Parameters........................................26
   13.2. SDP attribute support......................................27
   14. Acknowledgements.............................................28
   15. Author's Addresses...........................................28
   16. References...................................................29
   APPENDIX A: Test Vectors.........................................31


Blom, et al.                                                    [Page 2]


INTERNET-DRAFT                    SRTP                    February, 2001


1. Notational Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC-2119 [B97].

   By convention, the most left bit (byte) is the most significant one.
   By XOR we mean bitwise addition modulo 2 of binary strings, and ||
   denotes concatenation. E.g. if C = A || B, then the most significant
   bits of C are the same as those of A, and the least significant bits
   of C equals those of B.


2. Goals

   The security goals for SRTP are to ensure:

   * the confidentiality of the RTP payload,

   * the integrity protection of the entire RTP packet, including
   protection against replayed RTP packets, and

   * implicit authentication of the header.

   Each of the security services described above is optional. Any
   combination of options can be provided, except the single option of
   implicit header authentication.

   In group scenarios, source origin authentication does not follow
   automatically from integrity protection, and therefore an interface
   is provided to obtain this from an external mechanism, e.g. [TESLA].

   To this end, we need to use a wide definition of the term
   'authentication', meaning both immediate authentication, and delayed
   authentication. Which authentication scheme (if any) is to be used
   MUST be signaled in the set-up phase along with other security
   parameters.

   Other goals for the protocol are:

   * a low computational cost,

   * a low footprint (i.e., small code size and data memory for key
   schedules and replay lists),

   * limited packet expansion,

   * no error propagation (e.g., changing individual bits in the payload
   of an SRTP packet must not change only the corresponding bits in the
   RTP packet),


Blom, et al.                                                    [Page 3]


INTERNET-DRAFT                    SRTP                    February, 2001


   * the preservation of RTP header compression efficiency,

   * to allow cryptographic keys to be used by multiple RTP sessions
   simultaneously,

   * independence from the underlying transport used by RTP.

   These properties ensures that SRTP is a suitable protection scheme
   for RTP in both wired and wireless scenarios.


3. SRTP Overview

   RTP is the Real Time Transport Protocol [SCFJ96]. We define SRTP as a
   profile of RTP, in a way analogous to RFC1890 which defines the
   audio/video profile for RTP. Conceptually, we consider a 'bump in the
   stack' implementation which resides between the RTP application and
   the transport layer, which intercepts RTP packets and then forwards
   an equivalent SRTP packet on the sending side, and which intercepts
   SRTP packets and passes an equivalent RTP packet up the stack on the
   receiving side.


         0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
   |   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   |                           timestamp                           |
   |   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   |           synchronization source (SSRC) identifier            |
   |   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |   |            contributing source (CSRC) identifiers             |
   |   |                               ....                            |
   |   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   |                   RTP extension (optional)                    |
   | +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | | |                                                               |
   | | |                           payload                             |
   | | |                             ....                              |
   | +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
   | | |                         SPI (optional)                        |
   +-->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | | |                     authentication tag (optional)             |
   | | |                                                               |
   | | |                             ....                              |
   | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | |
   | +- Encrypted Portion
   +---- Authenticated Portion


Blom, et al.                                                    [Page 4]


INTERNET-DRAFT                    SRTP                    February, 2001


   Figure 1.  The format of an SRTP packet.


   The format of an SRTP packet is illustrated in Figure 1. The optional
   authentication tag and SPI are the only fields defined by SRTP that
   is not in RTP. The authentication tag provides data origin
   authentication of the RTP header and payload, and of the optional
   SPI, and it indirectly provides replay protection by authenticating
   the sequence number.

   The added fields are:

   Authentication tag: variable length, optional
          The authentication tag shall be used to carry message
          integrity/source authentication data. The Authenticated
          Portion of an SRTP packet consists of the entire equivalent
          RTP packet, and SPI field when present. There is nothing to
          prevent the authentication tag from being composed of several
          sub-tags.

   Security Parameter Index (SPI): 32 bits, optional
          The SPI is used to determine the cryptographic context for the
          current packet.

   Use of authentication and the SPI field is determined during session
   establishment.

   The Encrypted Portion of an SRTP packet consists of the RTP payload
   of the equivalent RTP packet.


3.1 SRTP Cryptographic Contexts

   Each SRTP session requires the sender and receiver to maintain
   cryptographic state information. This information is called the
   cryptographic context, and it consists of:

   * an encryption key k_e, and optionally a "salting key" k_s. These
   keys MUST be randomly and independently chosen.

   * a 32-bit rollover counter r (which records how many times the 16-
   bit RTP sequence number has been reset to zero after passing
   through 65,535),

   * for the receiver only, a sequence number s_l, which is the last
   received (possibly authenticated, if authentication is provided)
   sequence number


Blom, et al.                                                    [Page 5]


INTERNET-DRAFT                    SRTP                    February, 2001


   * the mode of operation for the encryption scheme, and

   * the cipher.

   SRTP also uses an 8-bit FLAG carrying additional information. The
   current specification leaves it static or pre-determined throughout
   the session, though future extensions MAY require it to be included
   in the context.

   In addition, when authentication and replay protection are provided,
   the context contains

   * the actual authentication protection algorithm(s) and parameters to
   be used, and

   * a replay list L (maintained by the receiver only),

   and, depending on the scheme in use, one or more of the following:

   * message authentication key(s) {k_a},

   * an already source authenticated (i.e. digitally signed) commitment
   to a chain of keys made by the sender,

   * a buffer of the most recent packets, maintained by receiver, and/or
   by the transmitter.


3.2 Mapping SRTP Packets to Cryptographic Contexts

   In this section we define the mapping of RTP and SRTP packets to the
   cryptographic contexts used to protect them.

   It is assumed that, when presented with necessary information (see
   below), the key management returns a context with updated
   information.

   An SSRC identifier is unique inside an RTP session, and all packets
   with the same SSRC form part of the same timing and sequence number
   space. Thus, the SSRC field and/or transport address information MAY
   be used by an SRTP receiver (or by a bump in the stack implementation
   on the sender's side) to identify the proper cryptographic context
   within that session. Note though that, for instance in a multicast
   scenario, the RTP anti-collision mechanism for SSRCs may force these
   identifiers to change over time, see discussion in Section 12.

   If information in the context (e.g. keys) are to change dynamically,
   context-signaling MAY be implicit, where in addition, e.g. sequence
   number and timestamp, carried in the RTP packets, are used to
   determine the context. This approach may however suffer from


Blom, et al.                                                    [Page 6]


INTERNET-DRAFT                    SRTP                    February, 2001


   synchronization problems due to packet-loss or drift of internal
   clocks. Therefore, the optional 32-bit SPI field provides means to
   explicitly signal information on which context to use when processing
   the packet on the receiving end. The SPI-approach is the only robust
   method, and SHOULD be used if frequent re-keying is desired.

   We leave it to the key management to implement such features.

   Recall that an RTP session for each participant is defined [SCFJ96]
   by a pair of destination Transport Addresses (one network address
   plus a port pair for RTP and RTCP), and that a multimedia session is
   defined as a collection of RTP sessions. For example, a particular
   multimedia session could include an audio RTP session, a video RTP
   session, and a text RTP session.

   SRTP MAY allow the different RTP sessions to use identical
   cryptographic keys. This is possible if the design of the
   synchronization mechanism (i.e., the IV in the case of the f8 and
   Segmented Integer Counter Modes) avoids keystream re-use (the two-
   time pad, Section 11) and with uniqueness requirements on SSRC beyond
   that dictated by the RTP standard, see Section 12. However, different
   multimedia sessions SHOULD use different keys.


3.3 SRTP Packet Processing

   When Generic Forward Error Correction is performed as specified in
   RFC 2733, then the security processing takes place after FEC on the
   sender's side, and before FEC on the receiver's side.

   To construct a proper SRTP packet, given an RTP packet, the sender
   does the following:

   1. Determine which cryptographic context to use as described in
   Section 3.2.

   2. Determine the index of the SRTP packet as described in Section
   4.1, using the rollover counter in the cryptographic context and the
   sequence number in the RTP packet. Form the current initialization
   vector (IV) if Implicit Header Authentication is provided, as
   described in Section 4.2.

   3. Encrypt the Encrypted Portion of the packet, as described in
   Section 6, using the IV determined in Step 2 and the encryption key
   and salting key in the context found in Step 1.

   4. If authentication is provided, compute the authentication tag for
   the Authenticated Portion of the packet, as described in Section 7,
   using the index determined in Step 2 and the authentication key in
   the context found in Step 1. Note that the Encrypted Portion is
   encrypted before the authentication tag is computed.


Blom, et al.                                                    [Page 7]


INTERNET-DRAFT                    SRTP                    February, 2001


   On the receiving end, packet processing depends on the presence of
   different types of authentication. The processing below is for so-
   called immediate authentication. A policy for handling what we call
   delayed authentication is left to the application (see also Section
   7). To authenticate and decrypt a SRTP packet, the receiver does the
   following:

   1. Determine which cryptographic context to use as described in
   Section 3.2.

   2. Estimate the index of the SRTP packet from the rollover counter
   in the cryptographic context and the sequence number in the RTP
   packet, as described in Section 4.1. If Implicit Header Protection is
   provided, form the current IV in the same way as done in Step 2 in
   the encryption process.

   3. Check if the packet has been replayed, by checking the Replay List
   to ensure that no packet with that index has been received and
   authenticated before. If that index is in the list, then the packet
   has been replayed and is invalid. It MUST be discarded, and the event
   SHOULD be logged.

   Next, perform verification of the authentication tag. If the result
   is 'AUTHENTICATION FAILURE', the packet MUST be discarded from
   further processing and the event SHOULD be logged.

   4. Decrypt the Encrypted Portion of the packet, as described in
   Section 6, using the IV determined in Step 2 and the encryption key
   and salting key in the context found in Step 1.

   5. Update the rollover counter and last sequence number in the local
   context to the values used in the packet index estimate in Step 2.

   The processing occurring when replay protection is activated has been
   chosen to maximize resistance to denial of service attacks (i.e., to
   minimize the receiver's effort in processing spurious packets).


3.4 Cryptographic Algorithms

   Default encryption and authentication algorithms are specified in
   Sections 6.1 and 7.1. While there are numerous encryption and message
   authentication algorithms that can be used in SRTP, we define default
   algorithms in order to avoid the complexity of specifying the
   encodings for the signaling of algorithm and parameter identifiers.


Blom, et al.                                                    [Page 8]


INTERNET-DRAFT                    SRTP                    February, 2001


4. Synchronization

4.1 Packet Index Determination

   SRTP implementations use an 'implicit' packet index for sequencing.
   Receiver-side implementations use the RTP sequence number to
   reconstruct the correct index (that is, location in the sequence of
   all RTP packets). The index is defined as s + r * 65,536, where the
   sequence number is s and the rollover counter is r.

   A robust approach for the proper use of a rollover counter requires
   its handling and use to be well defined. In particular, out-of-order
   RTP packets with sequence numbers close to 65,536 or zero must be
   properly dealt with.

   A receiver reconstructs the index i of a packet with sequence number
   s using the estimate

      i = 65,536 * t + s,

   where t is chosen from the set { r-1, r, r+1 } such that i is closest
   to the value 65,536 * r + s_l. If the value r+1 is used, then the
   rollover counter r in the cryptographic context is incremented by
   one.
   The pseudocode for the algorithm to process a packet with sequence
   number s follows:

         if (s_l < 32,768)
            if (s - s_l > 32,768)
               set i to s + 65,536 * (r-1)
            else
               set i to s + 65,536 * r
            endif
         else
            if (s_l - 32,768 > s)
               set r to r + 1
            endif
            set i to s + r * 65,536
         endif
         set s_l to s

   The index i is used in replay protection (Section 5) when
   authentication is provided, in encryption (Section 6), and in message
   authentication (Section 7).

   This algorithm should be extended by using the information in the
   authenticated RTCP reports.

   When RTP authentication is not present, robust synchronization is not
   possible. In this case, transmission errors or an active attacker may
   force the receiver to erroneously update his rollover counter and


Blom, et al.                                                    [Page 9]


INTERNET-DRAFT                    SRTP                    February, 2001


   thus to become completely out of synch. It is not possible to protect
   against active attackers in such case, but it is possible to have an
   updating policy for the rollover counter which, except in rare cases,
   is robust with respect to random bit errors. If 'delayed'
   authentication, e.g. [TESLA], is present, the same holds. There are
   many updating policies that could be developed, e.g. by call-back
   from the application layer, but the present work does leave it open
   as implementation issue.

   As the rollover counter is 32 bits long, the maximum number of
   packets in any given SRTP session is 2^48 = 281,474,976,710,656.
   After that number of SRTP packets have been sent, the sender MUST not
   send any more packets with that cryptographic context. This
   limitation enforces a security benefit by providing an upper bound on
   the amount of traffic that can pass before cryptographic keys are
   changed. Of course, re-keying mechanisms MUST be triggered before
   this maximum key lifetime, and key refresh mechanisms MAY be
   triggered during the key lifetime.

   Other approaches to sequencing were considered and rejected; please
   see Section 10.1 for our rationale.


4.2. IV Formation for Implicit Header Authentication

   The encryption uses a block cipher in feedback or segmented integer
   counter mode, and these both require an initialization vector (IV).
   There may be several alternatives for the IV formation. To guarantee
   synchronization and avoid keystream re-use, we only require the SSRC,
   rollover counter and sequence number, or some function thereof, to be
   part of the IV. Below, we give a concrete proposal which also
   provides 'implicit' header authentication, and works with every
   cipher having at least 128-bit block size. This particular solution
   also gives a high degree of agreement between bit ordering in the RTP
   packet header and the IV, simplifying data copying.

   When implicit header authentication is provided, data from each RTP
   packet to be encrypted and transmitted, must be included in the IV.
   This IV shall be computed and supplied as input to the ciphering
   algorithm. This shall be done by taking information of said RTP
   packet, the FLAG, and the rollover counter value, and computing the
   128-bit IV:

       IV = ROC || FLAG || M || PT  || SEQ || TS || SSRC

   where TS (Timestamp, 32 bits), SEQ (Sequence Number, 16 bits), M
   (Marker Bit, 1 bit), PT (Payload Type, 7 bits), and SSRC
   (Synchronization Source, 32 bits) are taken from the current RTP
   header. ROC is the 32-bit rollover counter from the identified
   context. FLAG is a 8-bit value which is used to signal additional
   information. Currently, the only value defined (for RTP) is FLAG =


Blom, et al.                                                   [Page 10]


INTERNET-DRAFT                    SRTP                    February, 2001


   00..0. The value 00..01 is reserved for RTCP and MUST not be used
   with RTP.

   With this IV formation, the number of SRTP packets encrypted with any
   fixed encryption key MUST therefore be no more than 2^48. Otherwise,
   the size of the ROC ..||..SEQ .. field will not be large enough to
   avoid keystream reuse.


5. Replay Protection

   Robust replay-protection is possible when direct (non-delayed)
   authentication of RTP packets is present.

   A packet is 'replayed' when it is stored by an adversary, and then
   re-injected into the network. SRTP provides protection against such
   attacks whenever authentication is provided, through the storage of
   the indices of the most recently received and authenticated packets.

   Each SRTP receiver maintains a Replay List, which conceptually
   contains the indices of all of the packets which have been received
   and authenticated. In practice, the list can use a 'sliding window'
   approach, so that a fixed amount of storage suffices for replay
   protection. SRTP packet indices which are less than s_l * 65,536 -
   SRTP-WINDOW-SIZE MAY be assumed to have been received, where SRTP-
   WINDOW-SIZE is a parameter that MUST be at least 64, and which MAY be
   set to a higher value.

   The Replay List can be efficiently implemented by using a bitmap to
   represent which packets have been received, as described in the
   Security Architecture for IP [KA98a].

   If the authentication is of the delayed-type, so is the replay
   protection (see Section 7).


6. Encryption

   Encryption uses a 'seekable' additive stream cipher, following the
   Stream Cipher ESP [sc-esp]. The stream ciphers that can be used must
   be able to efficiently seek to arbitrary locations in their
   keystream. Ciphers that can do this include SEAL [RC94, RC98],
   LEVIATHAN [MF00b], and any block cipher run in suitable mode. In
   particular, AES in counter mode will provide good security,
   reasonable performance, and conform to emerging U.S. Federal
   standards. Another mode which fulfils the requirements is f8 mode
   [ES3D], used together with AES.

   SRTP encryption consists of generating a keystream segment
   corresponding to the index of the packet, and then bitwise exclusive-
   oring that keystream segment into the RTP packet, starting at the


Blom, et al.                                                   [Page 11]


INTERNET-DRAFT                    SRTP                    February, 2001


   first bit of the RTP payload. Decryption is then done the same way,
   but swapping the roles of the plaintext and ciphertext. The
   definition of how the keystream is generated, given the index,
   depends on the cipher and its mode of operation.

   Such a cipher shows features which are desired in a general scenario,
   e.g. low computational cost, and speed. It also shows properties
   which fulfil additional requirements posed by the cellular
   environment [BCNN00], i.e. preservation of RTP header compression
   efficiency, and absence of error propagation and message expansion.

   Hence, we conclude that the proposed profile can be applied to the
   most general heterogeneous environment.


6.1 Defined Ciphers

   The default cipher is the Advanced Encryption Standard (AES), and we
   define two modes of running AES, Segmented Integer Counter Mode AES
   and AES in f8-Mode. Both of these modes provide a simple way of
   obtaining implicit header authentication through the use of the IV
   formation described in Section 4.2. The NULL cipher is also defined,
   to be used when encryption is not required.


6.1.1. Counter Mode AES

   The default cipher SHALL be AES used in the Segmented Integer Counter
   Mode (SICM) [M00], with a 128-bit key size and a 128-bit block size.

   Conceptually, counter mode consists of encrypting successive
   integers. The actual definition is somewhat more complicated, in
   order to avoid 128 bit integer arithmetic and to randomize the
   starting point of the integer sequence. Each packet is encrypted with
   a distinct keystream segment, which is computed as follows.

   The 128-bit block is divided into three parts: a 64-bit segment
   prefix, a 32-bit block index, which is incremented to generate a
   keystream segment, and a 32-bit segment suffix. The segment
   prefix/suffix pair is unique for each keystream segment.

   A keystream segment is the concatenation of the output blocks of the
   cipher in encrypt mode, in which the block indices are in increasing
   order. Symbolically, each keystream segment looks like

      E(A || B || C) || E(A || B + 1 mod 2^32 || C) || E(A || B + 2 mod
      2^32 || C) ..

   where A, B, and C are segment prefix, block index, and segment
   suffix, respectively, determined as given below.


Blom, et al.                                                   [Page 12]


INTERNET-DRAFT                    SRTP                    February, 2001


   The offsets are computed from the salting key k_s and the IV (from
   Section 4.2) by exclusive-oring k_s and the IV, and setting A to the
   first 64 bits of the result, B as the following 32 and C to the
   remaining 32 bits of the result. Symbolically,

      A || B || C = IV XOR k_s.

   If k_s is less than 128 bits long, then k_s is concatenated with
   itself as many times as needed in order to form the salt which is
   added to the IV. If no salting key is used, this is interpreted as
   k_s = 0.

   Note that the segment prefix/suffix pair is distinct for each packet
   which is encrypted, thus ensuring that keystream segments are
   distinct and non-overlapping.

   The restriction on the maximum number of RTP packets above ensures
   the security of the encryption method by limiting the effectiveness
   of probabilistic attacks [BR98].

   The AES has a block size of 128 bits, so 2^32 output blocks are
   sufficient to generate the 2^7 * 2^32 = 549755813888 bits of
   keystream needed to encrypt the largest possible RTP packet.


6.1.2. AES in f8-Mode

   To encrypt UMTS (Universal Mobile Telecommunications System, as 3G
   networks) data, a solution (see [ES3D]) known as the f8-algorithm has
   been developed. On a high level, the proposed scheme is a variant of
   Output Feedback Mode (OFB) [HAC], with a more elaborate
   initialization and feedback function. As in normal OFB, the core
   consists of a block cipher. We define the use of AES as default block
   cipher to be used in f8-Mode for RTP encryption, with 128-bit key and
   block size.

   Figure 2 shows the structure of an arbitrary b-bit block size cipher,
   E, running in what we shall call "f8-mode of operation".


Blom, et al.                                                   [Page 13]


INTERNET-DRAFT                    SRTP                    February, 2001


                    |
                    |
                   \|/
                +------+
                |      |
            --->|  E   |
           |    |      |
           |    +------+
           |        |
     m --> *        |---------------------------  ...     -------|
           |    IV' |           |             |                  |
           |        |   j=1 --> *     j=2 --> *   ...  j=L-1 --> *
           |        |           |             |                  |
           |        |       --> *         --> *   ...        --> *
           |       \|/     |   \|/       |   \|/            |   \|/
           |    +------+   | +------+    | +------+         | +------+
           |    |      |   | |      |    | |      |         | |      |
     k -------->|  E   |   | |  E   |    | |  E   |         | |  E   |
                |      |   | |      |    | |      |         | |      |
                +------+   | +------+    | +------+         | +------+
                    |      |    |        |    |             |    |
                    |------     |--------     |    ...  ----     |
                    |           |             |                  |
                   \|/         \|/           \|/                \|/
                   S(0)        S(1)          S(2)  . . .       S(L-1)


      Figure 2. f8-mode of operation (asterisk, *, denotes bitwise XOR).


   Let E(k,B) be the 128-bit output of E in encrypt mode when applied to
   the 128-bit key k and 128-bit plaintext block B. Let IV, IV', S(j),
   and m denote 128-bit blocks, determined below.

   The S() keystream for an n-bit message is defined by setting IV' =
   E(k XOR m, IV), and S(-1) = 00..0. For j = 0,1,.., L-1 where L =
   n/128 (rounded up to nearest integer) compute

         S(j) = E(k,IV' XOR j XOR S(j-1)),             (Eq. 1)

   Notice that the IV (as defined in Section 4.2) is not used directly.
   Instead it is fed through E under another key to produce an internal,
   "salted" value (denoted IV') to prevent an attacker from gaining
   known input/ouput pairs, and the role of the internal counter is to
   prevent short keystream cycles. The value of the key mask m is
   defined to be

        m = k_s || 0x555..5,


Blom, et al.                                                   [Page 14]


INTERNET-DRAFT                    SRTP                    February, 2001


   i.e. the salting key, padded with the binary pattern 0101.. to fill
   the 128-bit key size. (If no salting key is used, m = 0x55..5.)

   The maximum allowable packet size can be determined as follows. The
   AES has a block size of 128 bits. Assuming that AES behaves like a
   random function, it is (heuristically) secure to generate about 2^64
   output blocks, which is sufficient to generate the 2^71 bits of
   keystream. In practice though, the counter j above will often be
   sufficient if implemented as a 16- or 32-bit counter. In fact, for
   some security margin, other methods SHOULD be used if packets of size
   exceeding 2^32 * 128 = 549755813888 bits are to be encrypted.


6.1.3. NULL Cipher

   The NULL cipher is used when no confidentiality is requested. The
   keystream can be thought of as "000..0", e.g. the encryption simply
   copies the plaintext input into the ciphertext output.


7. Message Authentication

   Message integrity and source origin authentication are optional
   functions provided by SRTP.

   We need to distinguish the case of instant (non-delayed)
   authentication from the delayed one. The reason being that some
   source origin authentication schemes for groups do not allow for
   packets to be authenticated until at a certain later time.

   Note that bit-errors during transmission can in general not be
   distinguished from active attacks, nor does the processing below make
   attempts to do so.

   Authentication SHOULD be provided by SRTP. The fact that
   authentication is optional is motivated by the fact that, while the
   function is typically highly desired, there are certain cases
   (notably in cellular environments) where it has an impact in terms of
   cost, as motivated in [BCNN00]. In those cases, it is up to the user
   security profile to request authentication.


7.1 Non-delayed Message Authentication

   Integrity can be provided by any message authentication code, though
   the default value is UMAC [KBHHKR00].

   The authentication tag is computed by applying the UMAC function to
   the Authenticated Portion of the SRTP packet.


Blom, et al.                                                   [Page 15]


INTERNET-DRAFT                    SRTP                    February, 2001


   The authentication tag is appended to the RTP packet. This expansion
   of the RTP packet may cause the packet size to exceed the Maximum
   transmission Unit (MTU) of a network interface on its path,
   especially in circumstances when the application is attempting to
   'optimize' the size of packets. MTU path discovery SHOULD be used to
   avoid this problem.


7.1.1 Default MAC: UMAC

   The default message authentication code is UMAC [KBHHKR00], which has
   proven security properties and is quite fast. Furthermore, it can be
   used with short (e.g., two or four byte) authentication tags, as well
   as larger tags.

   UMAC is a parameterized algorithm (see Section 2.1 of [KBHHKR00]).
   The default selection of UMAC is UMAC-2/4/128/16/BIG/SIGNED, whose
   parameters are:

         WORD-LEN              2
         UMAC-OUTPUT-LEN       4
         L1-KEY-LEN            128
         UMAC-KEY-LEN          16
         ENDIAN-FAVORITE       BIG
         L1-OPERATIONS-SIGN    SIGNED

   This choice of parameters is intended to work well on low-power
   processors, to minimize packet expansion (e.g. needed in voice-over-
   IP type of applications), and to minimize the size of the
   cryptographic context. The WORD-LEN of two will work well on 16 bit
   and higher processors. The packet expansion is determined by the
   UMAC-OUTPUT-LEN to be only four bytes. The storage requirement, per
   cryptographic context, is 144 bytes. These parameters ensure a
   forgery probability of no greater than 1/2^30 for each individual
   packet. Please see the security considerations section in [KBHHKR00]
   and the references therein for a more detailed discussion.


7.2 Delayed Message Authentication

   Some authentication schemes, in particular ones providing source
   origin authentication in groups, may only allow verification of the
   authenticity of a received packet until a few moments later; the
   authentication is delayed.

   This leaves open possibilities for handling detected authentication
   failures according to several policies. Clearly, detected failures
   MUST be signaled to the application, but how to handle them is left
   to the application.


Blom, et al.                                                   [Page 16]


INTERNET-DRAFT                    SRTP                    February, 2001


   Currently the only authentication of this type specified for use with
   SRTP is TESLA.


7.2.1 TESLA

   This primitive enables receivers in a group scenario to efficiently
   (with symmetric cryptography) verify the identity of a claimed
   sender, though the entire group shares the key(s). Source Origin
   Authentication (SOA) is provided by an interface towards TESLA. We
   refer to [PCBTS,TESLA] for details.

   The main issues with TESLA are that it requires 'loose' time
   synchronization between sender/receiver and buffering both at sender
   and receiver end. (The sender-buffering version actually allows non-
   delayed authentication at the receiver.)

   The buffering means that it is not possible to have true real-time
   communication. Moreover, the buffer size depends on the network
   round-trip time (RTT). The buffers required will in general need to
   store the packets received during a time interval roughly
   proportional to the RTT.


7.3 Compound Authentication Tag

   There may be cases when several authentication schemes are used
   together, for instance both a non-delayed message authentication code
   and a delayed SOA in a group scenario. If so, the sender simply
   concatenates each of the authentication tag information into the
   (compound) authentication tag according to previous agreement between
   the parties. Similarly, the receiver parses the compound tag, and the
   overall authentication MUST be signaled as 'AUTHENTICATION FAILURE'
   if, and only if, at least one of the individual verifications fail.


8. SRTP Parameters

   The SRTP-WINDOW-SIZE is defined to be at least 64 (Section 5).

   The current defined modes are Segmented Integer Counter Mode
   (default), f8 Mode (Section 6), and the NULL Cipher. The default
   cipher is AES (Section 6), used with a block- and encryption key size
   of 128 bits.

   The current defined authentication function is UMAC-
   2/4/128/16/BIG/SIGNED.

   The SPI field is not used per default.


Blom, et al.                                                   [Page 17]


INTERNET-DRAFT                    SRTP                    February, 2001


9. Secure RTCP

   Secure RTCP follows the definition of Secure RTP, but defines the
   index and IV differently. In order to differentiate these quantities,
   we refer to it as the SRTCP index and IV.

   SRTCP is defined as a profile of RTCP, and it adds two mandatory new
   fields to the RTCP packet definition, the SRTCP index and the
   authentication tag, and one optional new field, the SPI. Those fields
   are appended to an RTCP packet in order to form an equivalent SRTCP
   packet, so that they follow any other profile specific extensions. An
   SRTCP packet is illustrated in Figure 3.

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   |V=2|P|    RC   |   PT=SR=200   |             length            |
   |   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   |                         SSRC of sender                        |
   | +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   | | |                              ...                              |
   | | |                          sender info                          |
   | | |                              ...                              |
   | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | | |                              ...                              |
   | | |                         report block 1                        |
   | | |                              ...                              |
   | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | | |                              ...                              |
   | | |                         report block 2                        |
   | | |                              ...                              |
   | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | | |                                                               |
   | | |                              ...                              |
   | | |                                                               |
   | | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   | | |                              ...                              |
   | | |                  profile-specific extensions                  |
   | | |                              ...                              |
   | +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   | | |                           SRTCP index                         |
   | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | | |                          SPI (optional)                       |
   +-|>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | | |                              ...                              |
   | | |                       authentication tag                      |
   | | |                              ...                              |
   | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | |
   | +-- Encrypted Portion
   +---- Authenticated Portion


Blom, et al.                                                   [Page 18]


INTERNET-DRAFT                    SRTP                    February, 2001


   Figure 3.  The format of a Secure RTCP packet, after Section 6.3.1 of
   [SCFJ96]. In this case, the underlying RTCP packet is a sender report
   packet; the SRTP format is identical for other RTCP packet types.

   The added fields are:

   SRTCP index: 32 bits, mandatory
          As we allow both encrypted and non-encrypted packets belonging
          to the same flow (see discussion below), indices with their
          most significant bit set to '1' are reserved for encrypted
          packets, and indices with most significant bit set to '0' are
          used for non-encrypted packets. With this restriction, the
          rest of the bits are set to zero before the first SRTCP packet
          is sent, and is incremented by one after each SRTCP is sent.
          Except for differences in the most significant bit, SRTCP
          indices form a strictly increasing sequence. The index is
          explicitly included in each packet, in contrast to the
          'implicit' index approach used for SRTP.

   Security Parameter Index (SPI): 32 bits, optional
          The SPI is used to determine the cryptographic context for the
          current packet. Use of authentication and the SPI field is
          determined during session establishment.

   Authentication Tag: variable length, mandatory
          The authentication tag shall be used to carry message
          integrity/source authentication data. The Authenticated
          Portion of an SRTCP packet consists of the entire equivalent
          RTP packet, SRTCP index, and SPI when present.

   The Encrypted Portion of an SRTCP packet consists of the RTCP payload
   of the equivalent RTCP packet.

   SRTCP packet processing is identical to that of SRTP packet
   processing, with the following changes:

   * SRTCP replay protection is as defined in Section 5, but using the
   SRTCP index as the index i.

   * SRTCP encryption is as defined in Section 6, but using the
   definition of the SRTCP Encrypted Portion as defined in this section,
   using the SRTCP index as the index i, and the IV as defined in this
   section.

   * The SRTCP authentication tag is defined as in Section 7, but with
   the Authenticated Portion of the SRTCP packet defined in this
   section, and using the SRTCP index as the index i. SRTCP
   authentication is mandatory.


Blom, et al.                                                   [Page 19]


INTERNET-DRAFT                    SRTP                    February, 2001


   * SRCTP decryption is performed as in Section 6, but only if the
   SRTCP index has its most significant bit equal to 1. If so, the
   encrypted portion is decrypted, using the SRTCP index as the index i,
   and the IV as defined in this section. In case the most significant
   bit of the index is 0, the payload is simply copied.

   The IV for ciphers using 128-bit block size is formed in the
   following way:

      IV = SRTCP index || FLAG || PT || 0..0 || SSRC

   where PT (Payload Type, 8 bit), and SSRC (Synchronization Source, 32
   bits) are taken from the first header in the RTCP compound packet.
   SRTCP index is the added 32-bit index to the packet. A pad of 48
   zeros is inserted between the PT and the SSRC.

   FLAG is a 8-bit value which is used to signal additional information.
   Currently, the only value defined (for RTCP) is FLAG = 00..01. The
   value 0..0 is reserved for RTP and MUST not be used for RTCP. This
   allows to use the same key for related RTP and RTCP flows (being the
   IV unique).

   Then this IV is treated in the same way as defined in Section 6,
   according to the chosen encryption mode.

   The encryption prefix (Section 6.1 of [SCFJ96]), which is a random
   32-bit quantity intended to improve privacy, SHOULD NOT be used. This
   is because SRTP encryption uses an additive stream cipher, and thus
   the prefix offers no benefit.

   The maximum number of SRTCP packets with a fixed key is limited to
   2^31 = 2,147,483,648. The last RTCP packet MUST contain an RTCP BYE.
   SRTCP senders MUST send an RTCP BYE in the final packet, if the
   maximum number of SRTCP packets is reached. Similarly, SRTCP
   receivers MUST act as though the last RTCP packet included a BYE,
   even if no BYE was included in the packet, if the maximum number of
   SRTCP packets is reached for a fixed key.

   Authentication MUST be required for RTCP, being it the control
   protocol (e.g., it has a BYE packet). Moreover, the cost for RTCP
   authentication is not of the same order of RTP authentication, being
   the session bandwidth allocated to RTCP recommended at 5%. However,
   when adding authentication to RTCP, the overhead in bandwidth SHOULD
   be considered (it will be more than 5%).

   It is allowed to split a compound RTCP packet into two lower-layer
   packets, one to be encrypted and one to be sent in the clear, as
   described in Section 9.1 of [SCFJ96]. Encryption/non-encryption is
   signaled by the most significant bit of the SRTCP index as described
   above.


Blom, et al.                                                   [Page 20]


INTERNET-DRAFT                    SRTP                    February, 2001


10. Rationale

   SRTP achieves high throughput and low packet expansion by using fast
   stream ciphers for encryption, an implicit index for synchronization,
   and universal hash functions for message authentication. SRTP shows
   to be a suitable choice for the most general scenario, and to fit
   also the most demanding one, conversation multimedia over wireless,
   having it the necessary robustness properties.

   Only a single header extension may be appended to the RTP data
   header, so the use of a header extension for SRTP was avoided. SRTP
   and SRTCP are defined as profiles of RTP and RTCP, respectively.


10.1 Synchronization

   RTP typically runs over unreliable transport. Thus, maintaining
   synchronization of the cryptographic context between the sender and
   the receiver is a conspicuous challenge. Because of the requirement
   to minimize packet expansion, no explicit sequencing information
   should be added. RTP packets contain two fields for synchronization
   purposes, the timestamp and the sequence number. The timestamp field
   could be used for cryptographic synchronization in some
   circumstances. However, this field is not appropriate for such use.
   From [SCFJ96]:

   Several consecutive RTP packets may have equal timestamps if they are
   (logically) generated at once, e.g., belong to the same video frame.
   Consecutive RTP packets may contain timestamps that are not monotonic
   if the data is not transmitted in the order it was sampled, as in the
   case of MPEG interpolated video frames.

   The RTP sequence number might be directly used as a unique identifier
   for SRTP packets. However, it has only 16 bits, which would limit the
   duration of an SRTP security association to only 64,536 packets,
   asking therefore for relatively frequent re-keying.

   The 'implicit index' approach works as long as the reorder and loss
   of the packets is not too great. In particular, 32,768 packets would
   need to be lost, or a packet would need to be 32,768 packets out of
   sequence in order for synchronization to be lost. Such drastic loss
   or reorder is likely to disrupt the RTP application itself.

   When a participant joins an SRTP session while that session is in
   progress, the entire cryptographic context except for the replay list
   is sent to that participant. See also Section 12.


Blom, et al.                                                   [Page 21]


INTERNET-DRAFT                    SRTP                    February, 2001


10.2 Replay Protection

   Replay protection is undoubtedly important for multimedia data, and
   SHOULD be provided. Otherwise, it would be possible for an adversary
   to perform simple manipulations on data that subverted security. For
   example, in a voice application, the phrase "yes" could be
   substituted for "no" if replay protection were not present. However,
   there are certain scenarios, e.g. conversation multimedia, where it
   may be difficult to perform such a kind of attacks. Moreover, to be
   useful, replay protection needs to be based on an authentication
   mechanism (i.e., authentication of the sequence number of the RTP
   header), and this has a cost when cellular links are involved on the
   path.


10.3 Source Origin Authentication considerations

   Normally, SOA can be done using signatures. However, this has high
   impact in terms of bandwidth and processing time, therefore we do not
   consider signatures in the discussion.

   The presence of mixers and translators does not allow source
   authentication in case the RTP payload and/or the RTP header are
   manipulated. Note that this type of middle entities also disrupts
   end-to-end confidentiality (being the IV formation dependent e.g. on
   the RTP header preservation).

   Examples of the mixer and translator scenarios include a translator
   re-encoding data at a lower rate or in a different encoding, and a
   mixer combining the audio streams of multiple speakers in a
   teleconference. In these cases, it is not clear that meaningful
   source origin authentication is possible, as the data that is
   received is not the same as the data that is signed/authenticated. If
   the translator is trusted by the receivers, then it could sign or re-
   sign the data streams, but this scenario may not be prevalent. It may
   be possible to devise a signing scheme that authenticates the source
   but not the content (enabling the receivers to know that "John is one
   of the people talking", but not providing authentication on who said
   what) by signing the concatenation of the Contributing source (CSRC)
   field and some sequencing information (e.g., a timestamp or sequence
   number), but such schemes require synchronization between the
   senders. This synchronization is not required by the RTP protocol
   itself, and may be difficult or impossible to arrange.

   A scheme, namely TESLA [PCBTS], has been recently developed to
   provide a secure sender authentication mechanism for multicast or
   broadcast data streams based mainly on symmetric techniques, see
   Section 7.


Blom, et al.                                                   [Page 22]


INTERNET-DRAFT                    SRTP                    February, 2001


10.4 Choice of Encryption Transform

   When adopting a block cipher mode to produce keystreams, the central
   ingredient is the block cipher which is its core. As far as modern
   cryptology knows, the security basically stands (and falls) with the
   security of the block cipher. This means that if a weakness is found,
   replacing the block cipher with a new one will most likely remedy the
   security problems. We define AES (Rijndael) [AES] as default block
   cipher, as it is widely believed to be secure.


11. Security Considerations

   The security of UMAC is well understood, and is described in
   [KBHHKR00].

   Additive ciphers do not provide any security service other than
   privacy. In particular, they do not provide message authentication
   (see [RK99] or [S96] for a discussion of this security service).
   However, SRTP uses a message authentication code to provide that
   security service.

   By using 'seekable' stream ciphers, SRTP avoids the denial of service
   attacks that are possible on stream ciphers that lack this property
   (these attacks are described in Section 3.4 of [B96]).

   No bit of keystream in an additive stream cipher should ever be used
   to encrypt multiple distinct plaintext bits. Such keystream reuse
   (jokingly called a 'two-time pad' system by cryptographers), can
   seriously compromise security. The NSA's VENONA project [C99]
   provides a historical example of such a compromise. In SRTP, a 'two-
   time pad' is avoided by requiring the key or the IV to be unique.

   An SSRC and transport addresses are mapped to a unique crypto context
   (with additional information in case re-keying/key refresh are in
   place). Multiple crypto contexts may contain identical keys; in this
   case, each context together with data from the RTP header MUST
   produce a unique IV (which is typically assured by plugging the
   unique SSRC in the IV).

   If manual keying is used, two different cryptographic contexts might
   accidentally use the same encryption key with non-negligible
   probability, through manual error or procedural inadequacies. Thus,
   manual keying SHOULD NOT be used for SRTP (or SRTCP).

   An additive stream cipher is vulnerable to attacks that use
   statistical knowledge about the plaintext source to enable key
   collision and time-memory tradeoff attacks [MF00,H80,Bi96]. These
   attacks take advantage of commonalities among plaintexts, and provide


Blom, et al.                                                   [Page 23]


INTERNET-DRAFT                    SRTP                    February, 2001


   a way for a cryptanalyst to amortize the computational effort of
   decryption over many keys, thus reducing the effective key size of
   the cipher. A detailed analysis of these attacks and their
   applicability to the encryption of Internet traffic is provided in
   [MF00]. In summary, the effective key size of SRTP when used in a
   security system in which m distinct keys are used, is equal to the
   key size of the cipher less the logarithm (base two) of m. Protection
   against such attacks can be provided simply by increasing the size of
   the keys used, which here can be accomplished by the use of the
   "salting key".

   In order to provide an effective key size of n bits in a deployment
   in which 2^m SRTP/SRTCP cryptographic contexts will be created, the
   true key size will need to be n+m bits. The value of m SHOULD be 32
   bits for networks with 50,000 connections (fully meshed networks with
   up to 200 devices), and SHOULD be 64 bits for networks with 49e+12
   connections (fully meshed networks with up to 7,000,000 devices).
   These choices of m ensures that key collision attacks amortized over
   a ten year period offer no advantage over exhaustive search, when new
   SRTP keys are established for every connection every hour (note that
   such an attack requires the storage of all network traffic over the
   ten year period). These choices will suffice for many networks,
   though SRTP deployments with more stringent security requirements
   will need to make a detailed assessment of those requirements with
   respect to the attacks described in [MF00].

   Implementations SHOULD use keys that are as large as possible. Please
   note that in many cases increasing the key size of a cipher does not
   affect the throughput of that cipher.

   It is an important point that the m bits of 'extra' key provided to
   thwart these attacks need not be private. In jurisdictions with
   mandated limits on the length of a secret key, the additional key
   bits could be made public. This is because those bits are
   functionally equivalent to the 'salt' that is used to protect
   passwords from dictionary attacks. The fact that the 'extra' key bits
   are distinct for many different keys defeats the key collision and
   time-memory tradeoff attacks by reducing the number of keys over
   which cryptanalytic computation can be amortized.

   Note that other security protocols which use additive ciphers for the
   encryption of Internet traffic (e.g., SSL, TLS, SSH, IPsec) are also
   vulnerable to the attacks described in [MF00]. Those attacks are
   generic to additive encryption of redundant plaintext, and are not
   particular to SRTP.


11.1 SSRC collision

   Assume that two or more communication parties use the same key.
   Though RTP implements an SSRC collision detection mechanism, it is


Blom, et al.                                                   [Page 24]


INTERNET-DRAFT                    SRTP                    February, 2001


   impossible to guarantee that two parties do not accidentally choose
   the same SSRC and send a few packets before the collision is
   detected. In a very unfortunate case, the IV formation in Section 4.2
   could in fact make the keystreams collide and we have a 'two-time
   pad'. This is probably a bigger problem in the case of group
   communication when a single group key is desired. See also some
   administrative issues with SSRC collisions in Section 12.


11.2. Confidentiality of the RTP Payload

   It is important to be aware that, as with any stream cipher, the
   exact length of the payload is revealed by the encryption. This means
   that it may be possible to deduce certain "formatting bits" of the
   payload, as the length of the codec output might vary due to certain
   parameter settings etc. This, in turn, implies that the corresponding
   bit of the keystream can be deduced. However, if the stream cipher is
   secure, knowledge of a few bits of the keystream will not aid an
   attacker in predicting the following keystream bits. Thus, the
   payload length (and information deducible from this) will leak, but
   nothing else.


11.3. Confidentiality of the RTP Header

   With the described proposal, RTP headers are sent in the clear to
   allow for header compression. This means that data such as payload
   type, synchronization source identifier, and timestamp are available
   to an eavesdropper. Moreover, since RTP allows for future extensions
   of headers, we cannot foresee what kind of possibly sensitive
   information might also be "leaked".

   The described proposal is a low-cost method, which allows header
   compression to reduce bandwidth. It is up to the endpoints policies
   to decide about the security scheme to employ. If the header
   compression is omitted, other solutions might be applicable, e.g.
   [sc-esp]. In other words, we provide a solution that works in the
   most general scenario, even in the most demanding one (like
   conversational multimedia over low- bandwidth, unreliable media). Of
   course the solution will then also work in less restricted
   environments, but we suggest that if one really needs to protect
   headers, and is allowed to do so by the surrounding environment, then
   he should also look at alternatives. In addition, we strongly
   recommend the use of profiles to select the right trade-off for the
   required level of security.


11.4 Integrity of the RTP header

   The IV formation in Section 4.2, which depends on the RTP header,
   provides an 'implicit' authentication of that header, which is useful


Blom, et al.                                                   [Page 25]


INTERNET-DRAFT                    SRTP                    February, 2001


   when the authentication option is not present. This is because any
   attack which modifies the header of such a packet will cause the SRTP
   receiver to use an incorrect IV in the decryption step, with the
   result that the decrypted RTP payload will be essentially random.


12. Multicast and Many-to-many

   The scheme described here can be also used in case a single, unique
   set of keys is shared by all the media sessions belonging to the same
   multimedia session, for a low complexity key management. However, in
   this case there must be a way to assure that each SSRC is unique also
   among all the RTP sessions inside that multimedia session, to avoid
   unlucky IV combinations and end up in two-time padding. This is a
   light and feasible solution in several scenarios, e.g. one sender
   only, streaming, and unicast.

   Some special consideration arise when the SSRC is part of the
   identifier for the correct cryptographic context. In multicast and in
   many-to-many scenarios, to use the same group key for the multimedia
   session and the IV formation suggested in Section 4.2., there MUST be
   a way to guarantee uniqueness of the SSRC before starting sending.
   Otherwise, the triggering of the anti-collision mechanism will ask
   for a change in the SSRCs of the parties that happened to have the
   same SSRC, e.g giving trouble in pointing to the right context.

   The problem remains, how to address the context after the anti-
   collision algorithm has changed the SSRCs. Section 3.3 defines the
   use of SSRC and Transport Addresses of that packet as selectors to
   the database. In case of UDP, the unchanged transport addresses can
   be a good indicator that a collision, followed by anti-collision
   triggering, has happened. So, simply try decryption until a RTCP
   message confirms the change in the SSRC on that transport addresses
   and then update the database selectors.

   If the requirement of unique SSRC inside that multimedia session
   cannot be guaranteed (e.g., for large groups), then a unique key per
   sender might be used. The requirement then becomes to have the SSRC
   unique per sender, which appears to be feasible enough. However, the
   same consideration on the anti-collision algorithm triggering
   applies.


13. Key Management Considerations

13.1. Security parameters

   SRTP is a Security Protocol, and it is decoupled from key management.
   There is work done in IETF to define key management schemes, e.g.
   IPSEC WG, MSEC WG, TLS, etc.


Blom, et al.                                                   [Page 26]


INTERNET-DRAFT                    SRTP                    February, 2001


   The key management scheme has to provide SRTP with the initial
   security parameters for the cryptographic context: correct encryption
   and salting keys, mode of operation and cipher, authentication
   algorithm(s) and key(s), (source origin) authentication algorithms
   and parameters, and to maintain the mapping of context identifiers
   (SSRC, adresses, SPI etc) to the actual context.

   The initial value for the ROC must also be agreed upon (0 is
   default). s_l is initially 0, the replay list is initially empty.

   When a newcomer joins an already existing group, all the
   cryptographic context except the replay list MUST be passed to him,
   unless backward security (disclosure of previous communication to the
   newcomer) is wanted, in which case (and if the key management
   supports backward security) re-keying is triggered in a way to ensure
   it.

   Key refresh SHOULD be supported, using RTP sequence number and ROC as
   a basis for refresh.

   A re-keying mechanism SHOULD be supported, both to allow flexible
   access control to media, and also to enable long sessions that would
   otherwise force the cryptographic core into degeneration by
   'exhausting' the key(s).


13.2. SDP Attribute Support

   SRTP is defined as an RTP profile, and, as such, its use has to be
   signaled inside the Session Description Protocol (SDP) [SDP], when
   SDP is used to carry the description of the media sessions.

   An example of the profile's announce is the following:

   m = audio 5004 RTP/SAVP 9

   SAVP indicates the use of the SRTP and AVP profiles.

   If the SRTP profile is to be applied (being it announced in the "m="
   line), then the necessary security parameters follow in a
   correspondent attribute:

   a=x-kxg-sec:SRTP <encrypt> <auth> <saltingkey>

   where

   <encrypt> =  "null"  | "CM_AES"  |  "f8_AES"  |  ..
   <auth>    =  "null"  | "SRTP UMAC"  |  "TESLA"..


Blom, et al.                                                   [Page 27]


INTERNET-DRAFT                    SRTP                    February, 2001


   <encrypt> is an identifier used to select an encryption scheme. A set
   of standard encryption schemes must be defined and assigned a number
   each. Defined values are "null", "CM_AES", and "f8_AES". "CM_AES" is
   the default value.

   <auth> is an identifier used to select an authentication scheme.
   Defined values are "null" and "SRTP UMAC". SRTP_UMAC is defined as
   UMAC-2/4/128/16/BIG/SIGNED, see also [SRTP]. The default value is
   "SRTP_UMAC".

   The <saltingkey> is the base64 encoded salting key. This key may be
   in clear text. If it needs to be protected, it is recommended that
   the master key is extended so that the salting key can be derived
   from the extra bits.

   Moreover, in case of dynamic groups, where members may join/leave, it
   is necessary to pass the rollover counter. The SPI has to be agreed
   on.

   Using the IV formation suggested in Section 4.2., the same encryption
   key is used for securing RTP and related RTCP streams. The same
   authentication key MAY be used for RTP and related RTCP streams.


14. Acknowledgements

   The authors would like to thank Magnus Westerlund, Mark Baugher,
   Brian Weis, and Adrian Perrig for their reviews and comments.


15. Author's Addresses

   Questions and comments about this memo can be directed to:

         David A. McGrew
         David Oran
         Cisco Systems, Inc.
         San Jose, CA 95134-1706 USA
         mcgrew@cisco.com, oran@cisco.com


         Rolf Blom
         Elisabetta Carrara
         Mats Naslund
         Karl Norrman
         Ericsson Research
         {rolf.blom, elisabetta.carrara, mats.naslund,
         karl.norrman}@era.ericsson.se


Blom, et al.                                                   [Page 28]


INTERNET-DRAFT                    SRTP                    February, 2001


16. References

   [AES] NIST, "Advanced Encryption Standard (AES)",
      http://csrc.nist.gov/encryption/aes/

   [B97]  Bradner, S., "Key words for use in RFCs to Indicate
      Requirement Levels", RFC 2119, March 1997.

   [BCNN00]  Blom, R., Carrara, E., Naslund, M., and Norrman, K.,
   "Conversational Multimedia Security in 3G Networks", Internet Draft,
      November 2000, <draft-blom-cmsec-3g-00.txt>.

   [BF00] Boneh, D., and Franklin, M., "Message Authentication in a
   Multicast Environment", the Proceedings of the Seventh Annual
   Workshop on Selected Areas in Cryptography (SAC 2000), Springer-
   Verlag.

   [C99]  Crowell, W. P., "Introduction to the VENONA Project",
   http://www.nsa.gov:8080/docs/venona/index.html.

   [ES3D] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security
   Algorithms Group of Experts (SAGE); General Report on the Design,
   Specification and Evaluation of 3GPP Standard Confidentiality and
   Integrity Algorithms", Public report, Draft Version 1.0, Dec 1999.

   [ES3E] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security
   Algorithms Group of Experts (SAGE) Report on the Evaluation of 3GPP
   Standard Confidentiality and Integrity Algorithms", Public report,
   Draft Version 1.0, Dec 1999.

   [HAC]  Menezes, A., Van Oorschot, P., and Vanstone, S., "Handbook of
   Applied Cryptography", CRC Press, 1997, ISBN 0-8493-8523-7.

   [H80]  Hellman, M. E., "A cryptanalytic time-memory trade-off", IEEE
   Transactions on Information Theory, July 1980, pp. 401-406.

   [KA98a] Kent, S., and R. Atkinson, "Security Architecture for IP",
   RFC 2401, November 1998.

   [KBHHKR00] Krovetz, T., Black, J., Halevi, S., Hevia, A., Krawczyk,
   H., Rogaway, P., "UMAC: Message Authentication Code using Universal
   Hashing", Internet Draft, October 2000, <draft-krovetz-umac-01.txt>.

   [LRW00] Lipmaa, H., Rogaway, P., and Wagner, D., "Comments to NIST
   Concerning AES Modes of Operation: CTR-Mode Encryption", NIST
   Workshop on AES Modes of Operation,
   http://csrc.nist.gov/encryption/aes/modes/lipmaa-ctr.pdf

   [M00]   McGrew, D., "Segmented Integer Counter Mode: Specification
   and Rationale", NIST Workshop on AES Modes of Operation,
   http://www.mindspring.com/~dmcgrew/sic-mode.pdf


Blom, et al.                                                   [Page 29]


INTERNET-DRAFT                    SRTP                    February, 2001


   [MF00]  McGrew, D., and Fluhrer, S., "Attacks on Encryption of
   Redundant Plaintext and Implications on Internet Security", the
   Proceedings of the Seventh Annual Workshop on Selected Areas in
   Cryptography (SAC 2000), Springer-Verlag.

   [MF00b] McGrew, D., and Fluhrer, S., "The Stream Cipher LEVIATHAN:
   Specification and Supporting Documentation", Submission to the New
   European Schemes for Signatures, Integrity, and Encryption (NESSIE)
   Process, October, 2000http://www.cryptonessie.org/.

   [R92]   Rueppel, R., "Stream Ciphers", Chapter 2 of Simmons, G.,
   "Contemporary Cryptology: the Science of Information Integrity,"
   1992, IEEE Press.

   [RC94]  Rogaway, P. and Coppersmith, D., "A Software-Optimized
   Encryption Algorithm", Proceedings of the 1994 Fast Software
   Encryption Workshop, Lecture Notes In Computer Science, Volume 809,
   Springer-Verlag, 1994, pp. 56-63.

   [RC98]  Rogaway, P. and Coppersmith, D., "A Software-Optimized
   Encryption Algorithm", Journal of Cryptology, Volume 11, Number 4,
   Springer-Verlag, 1998, Pages 273-287.  Also available on the Internet
   at http://www.cs.ucdavis.edu/~rogaway/papers/seal-abstract.html.

   [RK99]  Rescorla, E., and Korver, B., "Guidelines for Writing RFC
   Text on Security Considerations," draft-rescorla-sec-cons-00.txt

   [S96]   Schneier, B. "Applied Cryptography: Protocols, Algorithms,
   and Source Code in C", Wiley, 1996.

   [sc-esp] McGrew, D., Fluhrer, S., Peyravian, M.,  "The Stream Cipher
   Encapsulating Security Payload", Internet Draft, July 2000

   [SCFJ96] Schulzrinne, H., Casner, S., Frederick, R., Jacobson, V.,
   "RTP: A Transport Protocol for Real-Time Applications", IETF Request
   For Comments RFC 1889.

   [TESLA] Perrig, A:, Canetti, R., Briscoe, B., Tygar, D., Song, D.,
   "TESLA: Multicast Source Origin Transform", draft-irtf-smug.tesla-
   00.txt


Blom, et al.                                                   [Page 30]


INTERNET-DRAFT                    SRTP                    February, 2001


Appendix A

   Test vectors

   We include in the following some test vectors for f8-AES.


      key:
        234829008467be186c3de14aae72d62c

      salting key || 0x555... :
        32f2870d555555555555555555555555

      AES-internal expanded key:
        23482900 8467be18 6c3de14a ae72d62c
        62be58e4 e6d9e6fc 8ae407b6 2496d19a
        f080e0d2 1659062e 9cbd0198 b82bd002
        05f097be 13a99190 8f149008 373f400a
        78f9f024 6b5061b4 e444f1bc d37bb1b6
        4931be42 2261dff6 c6252e4a 155e9ffc
        31ea0e1b 138bd1ed d5aeffa7 c0f0605b
        fd3a37a1 eeb1e64c 3b1f19eb fbef79b0
        a28cd0ae 4c3d36e2 77222f09 8ccd56b9
        043d86ca 4800b028 3f229f21 b3efc998
        ede0c0a7 a5e0708f 9ac2efae 292d2636

      AES-internal expanded salting key || 555...:
        32f2870d 55555555 55555555 55555555
        cf0e7bf1 9a5b2ea4 cf0e7bf1 9a5b2ea4
        f43f3249 6e641ced a16a671c 3b3149b8
        37045eab 59604246 f80a255a c33b6ce2
        dd54c685 843484c3 7c3ea199 bf05cd7b
        a6e9e78d 22dd634e 5ee3c2d7 e1e60fac
        089f7675 2a42153b 74a1d7ec 9547d840
        e8fe7f5f c2bc6a64 b61dbd88 235a65c8
        d6b39779 140ffd1d a2124095 8148255d
        9f8cdb75 8b832668 299166fd a8d943a0
        9c963bb7 17151ddf 3e847b22 965d3882

      RTP-packet header fields:
        version      = 2
        padding      = 0
        extension    = 0
        CSRC count   = 0
        marker bit   = 0
        payload type = 6e
        sequence no. = 5cba
        timestamp    = 50681de5
        SSRC         = 5c621599


Blom, et al.                                                   [Page 31]


INTERNET-DRAFT                    SRTP                    February, 2001


      Data from Cryptographic context:
      FLAG = 0
      Rollover counter = d462564a

      IV:
        d462564a006e5cba50681de55c621599

      IV':
        4fee844eedb458a3e2b0c7ed43888cc1


      Encryption of bits 0 to 127:

      j: 0
      S(-1)                   : 00000000000000000000000000000000
      S(-1) XOR IV'           : 4fee844eedb458a3e2b0c7ed43888cc1
      S(-1) XOR IV' XOR ct    : 4fee844eedb458a3e2b0c7ed43888cc1
      plain text P[0..127]    : 6e915f07cd6f1c0d44afaab4961c7d31
      final keystream S(0)    : b2d3b3d7e16092de379e33b350582e63
      cipher text C[0..127]   : dc42ecd02c0f8ed373319907c6445352


      Encryption of bits 128 to 255:

      j: 1
      S(0)                    : b2d3b3d7e16092de379e33b350582e63
      S(0) XOR IV'            : fd3d37990cd4ca7dd52ef45e13d0a2a2
      S(0) XOR IV' XOR ct     : fd3d37990cd4ca7dd52ef45e13d0a2a3
      plain text P[128..255]  : 7b9daad84352a6d4bcdf501a560832a0
      final keystream S(1)    : b1ce287dc53c1975de3d7d0500f780ba
      cipher text C[128..255] : ca5382a5866ebfa162e22d1f56ffb21a


      ------------------------------------------------------------

      This Internet-Draft expires in December 2001.


Blom, et al.                                                   [Page 32]