Network Working Group R. Blom Internet-Draft Y. Cheng Intended status: Informational F. Lindholm Expires: September 15, 2011 J. Mattsson M. Naslund K. Norrman Ericsson March 14, 2011 SRTP Store-and-Forward Use Cases and Requirements draft-mattsson-srtp-store-and-forward-04 Abstract The Secure Real-time Transport Protocol (SRTP) was designed to allow simple and efficient protection of RTP. To provide this, encryption and authentication of media and control signaling are tightly coupled to the RTP session, and the information in the RTP header. Hence, in general, it is not possible to perform store-and-forward of protected media using SRTP. This document gives, based on a use case analysis, requirements that SRTP and new SRTP transforms need to satisfy in order to allow secure store-and-forward operation. A first outline on how to introduce the needed new functionality and transforms in SRTP is also presented. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 15, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. Blom, et al. Expires September 15, 2011 [Page 1] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Selected SRTP Background Facts . . . . . . . . . . . . . . . . 6 4. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.1. Trust Model and Assumptions . . . . . . . . . . . . . . . 7 4.2. Media Distribution Use Cases . . . . . . . . . . . . . . . 7 4.2.1. Streaming Pre-encrypted Media . . . . . . . . . . . . 7 4.2.2. Video on Demand . . . . . . . . . . . . . . . . . . . 8 4.2.3. Caching Protected Media in the Network . . . . . . . . 8 4.2.4. Recording Encrypted Media at Home . . . . . . . . . . 9 4.3. Answering Machine Use Cases . . . . . . . . . . . . . . . 9 4.3.1. Storing/Caching Encrypted Media . . . . . . . . . . . 9 4.3.2. Transport Protection . . . . . . . . . . . . . . . . . 9 4.3.3. Playback of Media Stream . . . . . . . . . . . . . . . 10 4.3.4. Multiple Callers . . . . . . . . . . . . . . . . . . . 10 4.4. Centralized Conferencing Use Case . . . . . . . . . . . . 10 5. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 11 6. Solution Outline . . . . . . . . . . . . . . . . . . . . . . . 13 6.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.2. SRTP Store-and-Forward Cryptographic Contexts . . . . . . 14 6.3. Store-and-Forward Packet Format . . . . . . . . . . . . . 15 6.4. Replay Protection . . . . . . . . . . . . . . . . . . . . 16 7. Commented Example Usage . . . . . . . . . . . . . . . . . . . 16 8. Implications on SRTP . . . . . . . . . . . . . . . . . . . . . 18 9. Security Considerations . . . . . . . . . . . . . . . . . . . 18 9.1. Media protection Transform . . . . . . . . . . . . . . . . 18 9.2. Replay Protection . . . . . . . . . . . . . . . . . . . . 18 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 12.1. Normative References . . . . . . . . . . . . . . . . . . . 19 12.2. Informative References . . . . . . . . . . . . . . . . . . 19 Appendix A. Key Management . . . . . . . . . . . . . . . . . . . 20 A.1. Key Management Example for Media Distribution . . . . . . 20 A.2. Key Management Example for Answering Machine . . . . . . . 21 Blom, et al. Expires September 15, 2011 [Page 2] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 Blom, et al. Expires September 15, 2011 [Page 3] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 1. Introduction The Secure Real-time Transport Protocol (SRTP) [RFC3711] is a profile of the Real-time Transport Protocol (RTP) [RFC3550], and it provides confidentiality, message authentication, and replay protection to both RTP and RTCP (Real-time Transport Control Protocol). SRTP was designed to protect real-time point-to-point communications and is, as presently defined, not aimed for communication solutions that include non-trusted store-and-forward middleboxes, i.e. middleboxes that should not have access to cleartext media, but still should have access to other data in order to retransmit media according to RTP standard procedures. Media in need of end-to-end (e2e) protection could e.g. be real-time voice and video information/media clips for internal use by personnel in enterprises or authorities. There are also multimedia telephony applications utilizing media mailboxes and other store-and-forward functions that need e2e protection. Protection e2e could also be needed to protect subscribed media like commercial-free radio and television that is distributed over the Internet. A typical use case is store-and-forward media distributions systems. Many of those systems require that media is confidentiality protected e2e between the media source and the media rendering device; this to prevent illegitimate media intercept or sharing. At the same time the communication should be hop-by-hop (hbh) protected to prevent malicious users from performing denial of service attacks by sending bogus data to store-and-forward middleboxes. Methods like the Packet-switched Streaming Service (PSS) [3GPP.26.234] exhibit the properties needed for secure store-and-forward operation, but they are part of larger frameworks tailored for very specific use cases. Thus, it would be desirable to be able to offer use of SRTP as a general lightweight mechanism to achieve this type of protection. Trying to use SRTP with store-and-forward middleboxes reveals two main problems: The first problem is due to the fact that the incoming and outgoing RTP streams in general are independent; received RTP packets cannot just be stored and later retransmitted. This in particular implies that SRTP with currently defined transforms cannot be applied. For details, see Section 3. It should be noted that store-and-forward of media in most cases requires that side information is available when retransmitting received media. Such side information, e.g. RTP timestamp information, may come from the RTP header, RTCP messages, and session Blom, et al. Expires September 15, 2011 [Page 4] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 definition data. The second problem is due to the fact that to provide both e2e and hbh protection, two independent security contexts with associated protection mechanisms have to coexist; a feature unavailable in SRTP as currently specified. To resolve these problems, SRTP needs extensions that in an efficient and coherent way support store-and- forward use cases. The objective of this document is to explore use cases for a SRTP store-and-forward solution, derive associated requirements, present, and discuss an approach for a solution. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Definitions of terms and notation will, unless otherwise indicated, be as defined in [RFC3711]. o The term authentication will be used to denote message authentication and message integrity protection. o By RTP transport protection or simply transport protection, we mean protection (confidentiality, authentication, etc.) of streamed RTP packets. This is provided by SRTP according to [RFC3711]. o By media protection, we similarly mean e2e protection of the application payloads carried in RTP. SRTP provides media protection, but only during transport (see above). A (protected) media stream similarly refers to (protected) media payloads streamed using RTP. o A store-and-forward e2e session is defined as the set of store- and-forward e2e protected data produced under a single so called e2e (cryptographic) context. A store-and-forward e2e session may comprise several so called store-and-forward sources, i.e. several distinct logical e2e media streams to be protected by the same e2e context. o A store-and-forward hbh session is defined as the set of store- and-forward hbh protected data produced under a single so called hbh context. Blom, et al. Expires September 15, 2011 [Page 5] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 3. Selected SRTP Background Facts SRTP as currently specified has the properties described below, which explain why it cannot be directly used in store-and-forward applications. The description also indicates how a SRTP store-and- forward solution could be designed. o All current SRTP transforms use the RTP header as input. AES-CTR uses the SSRC and the packet index to calculate the IV (Initialization Vector), AES-f8 uses even more header parameters, and HMAC-SHA1 authenticates the full RTP header. The SSRC is typically determined by the key management protocol and the packet index includes the RTP sequence number, which should be randomly chosen according to RTP [RFC3550]. All this means that there are no standard compliant ways to receive SRTP protected packets in one stream and later just retransmit the packets as they were received. o Even if the SRTP relevant RTP parameters like SSRC and the SRTP index could be determined beforehand for the retransmission stream, it would not allow a client to randomly seek in a stream without renegotiating the session, as it would lead to misalignment between the packet index used for streaming and the packet index used by SRTP at the originator. If the user jumps to a different part of the stream, it is impossible to continue increasing the RTP sequence number stepwise while at the same time keeping it equal to the sequence number needed for decryption. Jumping backward (e.g. media rewind) would cause even more problems as the retransmitted packets would be discarded by the SRTP replay protection. o The encryption key and the authentication key are both derived from the same master key in SRTP, see Figure 1. This means that a client which is able to derive e.g. the authentication key will also always have access to the encryption key making it impossible to use say the session encr_key for e2e protection and the session auth_key for hbh protection. Blom, et al. Expires September 15, 2011 [Page 6] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 Packet index -------+ | v +------------+ +------------+ Session encr_key | | Master key | +------------------> | External +---------------->| Key | Session auth_key | Key | | Derivation +------------------> | Management +---------------->| | Session salt_key | | Master salt | +------------------> +------------+ +------------+ Figure 1: SRTP key derivation 4. Use Cases The use cases below were chosen to illustrate media streaming scenarios where the current SRTP specification [RFC3711] does not provide sufficient functionality. These use cases provide context and general rationale for the requirements presented in Section 5. Note that the necessary key distribution and media session setup is out of scope for this document, and will thus not be discussed in any detail in the use cases below. However, as key management is an integral part of a complete store-and-forward solution, some approaches to the necessary key distribution and media session setup for some of the use cases are discussed in Appendix A. 4.1. Trust Model and Assumptions The trust model assumed in this document includes two parties who wish to communicate securely via one or more honest but curious middleboxes. This means that the communicating parties trust the middlebox to deliver the media as expected, but they do not trust it with cleartext data. In the use cases below, there is no example of multiple (sequential) middleboxes, but it is a natural generalization and it seems warranted to cover this case as well. 4.2. Media Distribution Use Cases 4.2.1. Streaming Pre-encrypted Media A content provider wants to distribute high value media to clients. The content provider distributes the media via a streaming server that should not have access to cleartext media, typically because the content provider does not trust it. In one scenario, the content provider streams the media to the streaming server where the media is stored in a protected format. In another scenario, the protected Blom, et al. Expires September 15, 2011 [Page 7] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 media may be delivered to the streaming server via e.g. file transfer. These use cases correspond to the use of pre-encryption in media distribution. In both cases, protected media is available in the streaming server for later transmission to different clients. Even in cases when the streaming server could be trusted with cleartext data there are reasons why one would like to avoid performing encryption in the streaming server itself. One reason is to use pre-encryption to offload the streaming server the task of encrypting the media. If the media is pre-encrypted, the streaming server only needs to add integrity protection (for hbh protection) to the encrypted media before streaming it to the clients. Clients are trusted by the content provider and have access to the encryption key. When a client receives a packet, the authenticity is checked using a security context shared with the streaming server and the decryption is performed using a security context shared with the content provider. 4.2.2. Video on Demand Some protected media is offered as video on demand where users can watch selected video clips at any time. The media is unicasted and the clients are offered random seek functionality which allow them to quickly jump to any part of the video. Other features offered may be rendering with speed translation as in fast forward and slow motion rendering. These features can be used to skip parts of the video or jump backward to see interesting parts again. The problem here is jumping back and forth and performing rendering speed translations in an e2e protected media stream with associated implications on synchronization and interactions with replay protection. 4.2.3. Caching Protected Media in the Network High value encrypted media (e.g. Internet Protocol Television (IPTV), and radio) is broadcasted in a network. Only clients trusted by the content provider have access to the encryption key. A network node is enhancing distribution by caching of the media, but is not trusted by the content provider and has therefore no access to the encryption keys. A client that missed the beginning of a program might stream the media from the network cache instead of listening to the broadcast. Due to the trust model where the content provider only trusts the clients, the media needs to be e2e protected. Nevertheless, the media also needs to be hbh integrity protected to protect against denial-of-service (DoS) attacks. Blom, et al. Expires September 15, 2011 [Page 8] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 4.2.4. Recording Encrypted Media at Home High value encrypted media (e.g. IPTV, and radio) is broadcasted in a network. Only clients trusted by the content provider have access to the encryption key. A user is recording the media on a HDD (Hard Disk Drive), but does not yet have a license, or have a license that does not allow cleartext copying. The media is therefore stored in protected format on the HDD. There is however, a strong need for the HDD to be able to check the integrity of the media before it is stored. Otherwise, a DoS attack may fill the HDD with garbage. 4.3. Answering Machine Use Cases 4.3.1. Storing/Caching Encrypted Media Operators commonly provide an answering machine service to their customers. In this case, the communicating parties (the caller and the callee) may not wish to disclose the media to any other party, and hence want to apply encryption between each other. This requires that they are able to establish a shared key. The answering machine acts as a store-and-forward middlebox, which stores encrypted data and retransmits it to the callee. The answering machine may act as a streaming server when sending the data to the callee, and will then not use the exact same RTP headers on the outgoing SRTP traffic as was used on the incoming SRTP traffic. SRTP as specified in [RFC3711] will not work in this case, since parts of the RTP header are input to the encryption/authentication transforms. An alternative forwarding of the recorded media from the answering machine to the callee could be by file transfer, e.g. sending the recorded media in the format that was used to store it. Such forwarding would not be according to SRTP, but would still yield end- to-end protection of the media. Note however, that decryption and rendering would be similar to part of an enhanced SRTP solution. 4.3.2. Transport Protection To avoid that the answering machine is filled up with bogus data, it is necessary for the answering machine to authenticate the sender of the traffic, and further, to verify the authenticity of the incoming traffic. This poses a problem for SRTP as of [RFC3711] in that the message authentication requires a session key shared with the answering machine, but the encryption key shall as discussed above not be available to it. This implies that there is a need for two independent security contexts, one end-to-end and one hop-by-hop. When the callee retrieves the media from the answering machine, message authentication is also beneficial. There are two Blom, et al. Expires September 15, 2011 [Page 9] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 possibilities. Since the answering machine is trusted to maintain and redistribute the media, it may be sufficient to provide message authentication between the answering machine and the callee. In addition, here it would be necessary to have a separation between the e2e protection and the hbh protection. A second option is that authentication is applied from the caller to the callee. However, if the authentication is applied in that way, the answering machine will not be able to verify the integrity of the incoming traffic from the caller. It is of course also possible that message authentication is desired for any combination of endpoints, i.e. between the caller and the callee, between the caller and the answering machine, and between the answering machine and the callee. 4.3.3. Playback of Media Stream When a user listens to the messages stored on the answering machine, it is useful to be able to rewind and/or fast forward in the media stream. For SRTP as of [RFC3711], this is not possible. The reason for that is that even if the same payloads can be reinserted in the stream by the answering machine, the RTP sequence number is steadily increasing on a per packet basis. Since the synchronization of the encryption transforms is based on the RTP sequence number, the decryption will fail. In addition, message authentication will fail since the authentication according to [RFC3711] shall cover the header of the RTP packet. This implies that the payload and the media have to be protected by a mechanism that is independent of parameters used in the transport protocol. 4.3.4. Multiple Callers Several messages may be left on the answering machine, received in different sessions and possibly from different callers. The result of this is that different contexts (keys) were used to encrypt the media. Depending on how the callee retrieves the messages from the answering machine, different options are possible. One option is to retrieve each message as a separate stream, and in this case, a separate session is required per message. Another option is to somehow switch security contexts within an ongoing hbh session. 4.4. Centralized Conferencing Use Case Another use case is a conference bridge that either is not to be trusted with the cleartext media or do not have the processing power to decrypt and re-encrypt the media from a large number of participants. In this case, the conference bridge cannot act as a mixer, but in some cases, that may be a reasonable assumption. In this setting, the media may be repackaged by the conferencing server into RTP packets with different headers compared to the incoming Blom, et al. Expires September 15, 2011 [Page 10] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 traffic. As described in Section 3, this causes authentication and decryption to fail in SRTP. An example is Push-To-Talk solutions, where only one user at a time is allowed to talk. Another example where this is especially interesting are video conferencing applications, were a conference server does not work as a media mixer, but rather as hub for the conference participants. In such a setup, the application of group based approaches for security may be desirable for the e2e protection of media. 5. Requirements The use cases above show that to enable store-and-forward in an extended SRTP, it has to in an efficient way support the following requirements: o Transport independent media protection It SHALL be possible to have media protection that is independent of RTP parameters. To allow retransmission of received protected media, a transform for protecting the RTP payload that is independent of RTP transport parameters is needed. The media protection MUST cover both message authentication and confidentiality protection. It SHALL be possible to protect several e2e protected media streams with a single e2e context. The requirements imply that the media protection format has to include a SRTP SaF Source (SSS) field for robust operation. The SSS can be thought of as an "e2e SSRC". o Media source authentication It SHALL be possible to provide e2e source authentication of the media stream. In a group setting, source authentication is here meant to ensure that the message originated from a member of the group. This requirement is fulfilled if media has authentication protection in a transport independent manner. o Support of playback of protected media streams A client SHALL be able to do random seek in a protected media Blom, et al. Expires September 15, 2011 [Page 11] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 stream. Note that as playback functions like retransmission and random seek capability are features in the described use cases, replay protection cannot be required for transport independent media protection. This implies a Packet Unique Value (PUV) used on e2e basis in order for the receiver to identify a media payload's position within the overall media stream. o Transport protection It SHALL be possible to provide transport protection that is independent of the media protection. The transport protection MUST be able to provide confidentiality, authentication, and replay protection for RTP and at least authentication and replay protection for RTCP. This requirement maps well against SRTP as of [RFC3711]. Transport protection is also a means to provide replay protection of the media on a hop-by-hop basis. o Separation of security contexts It MUST be possible to have independent security contexts for the transport independent media protection and the transport protection. This means in particular that there has to be two distinct master keys, one for e2e media protection and one for hbh transport protection. o Change of transport independent media protection security context It MUST be possible to signal to the receiver the current media protection security context to use. It MUST be possible to change the e2e security context within an ongoing hbh session. This is needed to allow single stream multiplexing of e.g. protected media "clips" which were generated using different transport independent media protection security contexts The requirements imply that the media protection format has to include a Crypto Context Indicator (CCI) field for robust operation. The CCI can be thought of as a generalized MKI and may be defined to also include all the MKI based functionality defined in [RFC3711]. Blom, et al. Expires September 15, 2011 [Page 12] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 6. Solution Outline In this section, a first outline on how to introduce the needed new functionality and transforms in SRTP is presented. For a complete description, including a packet format specification and a detailed transform description, see [I-D.naslund-srtp-saf]. 6.1. Overview The stated requirements above seem possible to meet by implementing a few minor additions to SRTP. These additions mainly address new SRTP transforms, introduction of media and transport protection crypto context definitions, together with key handling and key derivation. A high-level description of the proposed new SRTP functionality is as follows: The first step is to perform a transport independent media protection operation. The coverage of this transform is the RTP payload only. This operation could either be done with an Authenticated Encryption (AE) transform, or with separate encryption and authentication transforms. The media protection should rely on two explicit values for cryptographic synchronization, the Packet Unique Value (PUV) and the SRTP SaF Source (SSS), which are forwarded in the payload. After the steps making up the transport independent media protection have been performed, the protection processing proceeds as currently defined by [RFC3711], which results in the addition of the required transport protection. Keying for transport protection is performed as described in [RFC3711] and uses the SRTP internal key derivation function. The key derivation function operates on a master key and a master salt, where the master key is denoted hbh key. The keying for the media protection is defined in an equivalent way, producing keying material for the media transform. The e2e keying material is based on another master key, the e2e key, which is independent of the hbh key. Also for the e2e context, a master salt is defined. The key derivations used to derive the e2e keying material could preferable use the key derivation function defined in [RFC3711]. Note that with the approach taken, only the media protection endpoints will have to implement the new SRTP functionality with combined media and transport transform and handling of two security contexts. In the following, we will denote such a combined transform a Compound Transform (CT). The store-and-forward middlebox can rely solely on [RFC3711], using already existing functionality for store- Blom, et al. Expires September 15, 2011 [Page 13] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 and-forward operation, given that the transport transform in the compound transform is equivalent to a transform defined for [RFC3711]. However, there are some practical reasons why also the middlebox needs to have some "knowledge" of the e2e part of the protection, see below. Note that with the approach taken, only the media protection endpoints will have to implement the handling of two security contexts. One of the defined transforms of [RFC3711] is used for the transport protection (using the hbh key). A store-and-forward middlebox should be able to reuse a [RFC3711] compliant implementation of SRTP to first receive and then resend the media. However, there are some practical reasons why also the middlebox needs to have some "knowledge" of the e2e part of the protection, see below. For RTCP the solution principles described for RTP applies. However, the main application for RTCP is to control the traffic over one hop, which means that e2e encryption cannot be applied in general. However, note that there are RTCP application messages, which might benefit from having e2e integrity protection. 6.2. SRTP Store-and-Forward Cryptographic Contexts SRTP maintains a cryptographic context, containing master key(s), cryptographic transforms, etc., for the associated SRTP session. Exactly how the parameters in the cryptographic context are agreed upon is a session setup issue and out of scope of SRTP. SRTP assumes that a cryptographic context or rather the master key therein, is shared only between mutually trusted parties. e2e context (media protection) <-----------------------------------------------> +---+ +---+ +---+ | S | | M | | R | +---+ +---+ +---+ <----------------------> <----------------------> hbh context 1 hbh context 2 (transport protection) (transport protection) Figure 2: Context sharing (Sender, Middlebox, Receiver) The SRTP cryptographic context concept is reusable for the proposed solution. Conceptually, the originator and the intended end-receiver share an e2e media security context, while a hbh transport security context is shared by an endpoint and an intermediary or by two intermediaries, see Figure 2. Blom, et al. Expires September 15, 2011 [Page 14] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 To comply with the trust model of the use cases above, the master key(s) in the e2e context MUST be cryptographically independent of, and MUST NOT be deducible from, the master key of any hbh context. The key management protocol(s) used MUST therefore be able to negotiate keys satisfying these requirements. The identification of the hbh context should be as defined in [RFC3711], while the used e2e context is either implicitly identified in the session setup or its identification relies on the proposed crypto context indicator (CCI). A sender will use two cryptographic contexts: an e2e context used for payload protection to the end-receiver, and a hbh context used to secure the SRTP transport to the (first) intermediary. Similarly, the end-receiver will use two contexts. An intermediary node however, will only use one standard SRTP context for each session. In other words, an e2e context is used to achieve transport independent media protection as required in Section 5, and an hbh context is similarly used to achieve transport protection. For both e2e and hbh contexts, it is assumed that cryptographic context parameters, such as master key and salt (if needed) are included. From these, session keys/salts are derived similarly to [RFC3711]. If several senders' payloads are multiplexed within the same stream from a server to a receiver (as discussed in Section 4.3.4) the receiver may need to switch between e2e contexts within an ongoing hbh session. This can be implemented using a mechanism similar to the SRTP MKI field in the e2e context (what is referred to as CCI above). The hbh context would, however, not need any change but could rely on an MKI field according to the current definition in [RFC3711]. 6.3. Store-and-Forward Packet Format The packet format is composed of an "inner" e2e (sender-receiver) part embedded in an "outer" hbh (sender-middlebox or middlebox- receiver) part. With fields and processing as defined above, the SRTP store-and- forward packet format should look approximately like Figure 3 +------------+-------------------+-----+-----+-----+-----+-----+-----+ | hbh + e2e | e2e | e2e | e2e | hbh | hbh | hbh | | RTP Header + Encrypted Payload | PUV | SSS | MAC | CCI | MKI | MAC | +------------+-------------------+-----+-----+-----+-----+-----+-----+ Blom, et al. Expires September 15, 2011 [Page 15] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 Figure 3: SRTP store-and-forward packet format The additional fields added by the inner e2e security processing are: o SSS: SRTP SaF Source is a value used by the SRTP SaF transform as an identifier for the SaF source within a SaF e2e session. Thus, SSS MUST be unique for all SaF sources within the SaF e2e session. o PUV: Packet Unique Value for the e2e transform. The PUV shall be unique for each e2e encrypted payload being generated by a SaF source within a SaF e2e session. o MAC (e2e): This field is used to carry payload authentication data e2e. o CCI: Crypto Context Identifier is used to signal hbh, which e2e cryptographic context to use. The hbh RTP header, hbh MAC, and hbh MKI are in one-to-one correspondence with respective fields of [RFC3711] and will not be discussed further. 6.4. Replay Protection When the RTP data is hbh transport protected between server and receiver, replay protection on the transport level is provided as the hbh protection offers the same security features as [RFC3711]. As mentioned, it is assumed that the server is trusted not to attempt replay of data on media level, unless the user requests it and thus, this is in line with the trust model. It is possible to implement replay protection on the media level for e2e transforms when the PUV is a counter. This has to be done on the application layer for the applications that requires it. 7. Commented Example Usage In this example use case, it is assumed that a single sender S wants to send a single e2e protected media stream to a receiver R. We make the natural (and necessary) assumption that the sender is made aware (e.g. by session setup signaling) that the media will be delivered/ stored in a middlebox M. Similarly, we assume the middlebox is aware that it is acting as a middlebox. We assume the crypto contexts are defined to provide Blom, et al. Expires September 15, 2011 [Page 16] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 o Integrity and confidentiality e2e (the media part) o Integrity hbh (the transport part) Clearly, other combinations are also possible. Any of the 15 possible (non-trivial) combinations of the security services confidentiality and integrity for the hbh and the e2e part could be specified for use. However, we feel that integrity and confidentiality on e2e basis combined with hbh integrity will be sufficient in most cases. How the crypto contexts are setup (which key management protocol to use etc.) is out of scope. Still, it can be noted that in principle it could be done by having e.g. two MIKEY [RFC3830] exchanges, one between S and M and one between S and R. 1. S defines an e2e crypto context and forwards it to R. The e2e protection is configured to use both integrity and confidentiality protection. Note that for store-and-forward operation, the e2e crypto context has to be decided unilaterally by the sender. 2. S sets up an SRTP session with M, to have data forwarded to R; an hbh crypto context is agreed between them. The hbh context defines transport authentication and NULL transport encryption, which corresponds to transforms defined for [RFC3711]. 3. S starts to transmit SRTP towards M, in effect using k_e2e for e2e media protection and k_hbh for hbh transport authentication. 4. Since M is aware of its role as a (receiving) middlebox, M configures itself to verify integrity but not to decrypt the payload. M stores the (protected) payloads together with relevant side information to be used when the media is forwarded. Note that M would perform exactly the same operations when storing unprotected media for later forwarding. 5. Later, R sets up a session with M to render the stored media. As R contacts a middlebox, an hbh crypto context, independent of the previous contexts, is agreed between R and M. In the reply, M includes the e2e context that was received from S. 6. Since M is aware of its role as a (sending) middlebox, the middlebox configures itself to not encrypt the payloads but only to add hbh transport authentication. M then transmits the authenticated media stream to R. Blom, et al. Expires September 15, 2011 [Page 17] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 7. When receiving the SRTP packets from M, R first verifies the hbh transport authentication and then checks e2e media authentication and decrypts the payloads to retrieve the plaintext media. 8. Implications on SRTP As the SRTP specification allows new transforms, the new transforms can be added with only minor implications. The handling of dual security contexts (in the endpoints) is however a new feature, which will have to be introduced in SRTP. The Key Derivation Function defined in [RFC3711] can be reused for both the e2e and the hbh security contexts. 9. Security Considerations 9.1. Media protection Transform Any fixed keystream output, generated from the same inputs (i.e. key and IV) MUST only be used to encrypt once. Reusing such a key-stream (commonly called a "two-time pad") would almost certainly compromise security. The new e2e transform accomplish packet-uniqueness by inclusion of the PUV and stream-uniqueness by inclusion of the SSS in the IV formation. Thus, the SSS MUST be unique among all the RTP streams within the same RTP session that share the same e2e master key. Master keys MAY be shared between streams belonging to the same RTP session, but it is RECOMMENDED that each stream have its own master key. With the above conditions fulfilled, the security level of the media protection transform will equal the level offered by [RFC3711]. 9.2. Replay Protection Replay protection is only provided on hbh basis. Note that the requirements on random seek in the media stream rules out any general replay protection mechanism applied on an e2e basis, and that this threat falls outside the assumed trust model. Still, the PUV used offers possibility to implement application specific replay protection mechanisms. Blom, et al. Expires September 15, 2011 [Page 18] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 10. Acknowledgements The authors would like to thank Daniel Catrein, Steffen Fries, Frank Hartung, and Magnus Westerlund for their support and valuable comments. 11. IANA Considerations To signal that the new transforms are used, each relevant key management protocol needs to register the new transforms including numbering scheme and syntax with IANA. 12. References 12.1. Normative References [I-D.naslund-srtp-saf] Blom, R., Cheng, Y., Lindholm, F., Mattsson, J., Naslund, M., and K. Norrman, "The Use of the Secure Real-time Transport Protocol (SRTP) in Store-and-Forward Applications", draft-naslund-srtp-saf-03 (work in progress), October 2009. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004. 12.2. Informative References [3GPP.26.234] 3GPP, "Transparent end-to-end Packet-switched Streaming Service (PSS); Protocols and codecs", 3GPP TS 26.234 8.3.0, June 2009. [RFC3830] Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K. Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830, August 2004. Blom, et al. Expires September 15, 2011 [Page 19] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 Appendix A. Key Management This informative appendix discusses possible ways to establish SRTP cryptographic contexts for store-and-forward scenarios. As described above there are two cryptographic contexts, i.e., an e2e context and an hbh context, and they should be independent of each other. An hbh context is identified by the triplet as defined in [RFC3711]. All currently available key management protocols that support SRTP, e.g. MIKEY, SDES, and DTLS-SRTP, can be used between sender/receiver and middlebox or between two middleboxes for negotiating hbh master keys and other security parameters. The e2e context must also be identified and the identifier can be any transport independent value that uniquely determines the cryptographic context between a sender and a receiver. For instance, the sender could assign a unique id to the content to be transmitted and use such a Content ID (CID) to identify the e2e context. The CID is then sent to the middlebox at session setup time, and the CID and the e2e context are sent to the receiver at any time before the receiver is to render the media. Note that the CID discussed here is not the same as the proposed CCI. The CCI may be thought of as a mutant, short, in-band alias for the CID and is only used on hbh basis. The mapping between CID and CCI is then sent out-of-band for each hop, e.g. at session set-up for the respective hop. The receiver can thus (eventually) map the CCI received in SRTP packets to the correct CID and retrieve the corresponding e2e cryptographic context. Therefore, for the e2e context additional information, i.e. CID and (CID, CCI)-mapping, needs to be transmitted, along with the key management protocol messages. Below we give two examples, addressing media distribution and answering machine use cases respectively. In the examples we use MIKEY over SIP/RTSP, but other key management protocols that support SRTP can also be used. A.1. Key Management Example for Media Distribution An example of session setup sequence for a media distribution use case (e.g. Video on demand) is shown in Figure 4. An end user (R) sends a SIP INVITE to the media service (S) to request the delivery of certain content. S replies with a 200 OK message, which includes the CID and a MIKEY message containing e2e master key and other parameters. In case of pre-encrypted content, the e2e context is the same for all users that are authorized to play the content. The pre-encrypted content is stored in the streaming server (M). Blom, et al. Expires September 15, 2011 [Page 20] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 When the end user wants to play the content, R sends an RTSP DESCRIBE message to M in order to obtain session description. M replies with 200 OK, carrying a MIKEY message for setting up the hbh context between M and R. +---+ +---+ +---+ | S | | M | | R | +---+ +---+ +---+ INVITE <------------------------------------------------------------------- 200 OK {MIKEY e2e S-R, CID} -------------------------------------------------------------------> ACK <------------------------------------------------------------------- DESCRIBE <--------------------------------- 200 OK {MIKEY hbh M-R, CID} ---------------------------------> SETUP <--------------------------------- 200 OK ---------------------------------> Figure 4: Session setup sequence for media distribution A.2. Key Management Example for Answering Machine Typically, a caller (S1) tries to reach the intended callee (R) directly. If R is not online, S1 is notified and redirected to an answering machine (M). S1 then knows it should run SRTP SaF. To signal that, S1 sends an INVITE with two MIKEY messages, one for setting up the e2e context between S1 and R, and the other for the hbh context between S1 and M. M cannot process the first MIKEY message but stores it. By processing the second MIKEY message, M agrees the hbh context with S1. Another caller (S2) also wants to talk to R. Similarly, a hbh context is established between S2 and M, and M stores the e2e MIKEY message from S2 that is intended for R. Later when R gets online and tries to retrieve stored data from M, R sends an INVITE to M and negotiates the hbh context between them. In the reply, M includes the two MIKEY messages carrying the e2e contexts that were received from S1 and S2 respectively, and adds the mappings between contexts and CCIs. A session setup sequence is shown in Figure 5. Blom, et al. Expires September 15, 2011 [Page 21] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 +---+ +----+ +----+ +---+ | R | | S1 | | S2 | | M | +---+ +----+ +----+ +---+ INVITE {MIKEY hbh S1-M, MIKEY e2e S1-R} -----------------------------------------------------> 200 OK <----------------------------------------------------- ACK -----------------------------------------------------> INVITE {MIKEY hbh S2-M, MIKEY e2e S2-R} ----------------------------------------> 200 OK <---------------------------------------- ACK ----------------------------------------> INVITE {MIKEY hbh R-M} -------------------------------------------------------------------> 200 OK {(MIKEY e2e S1-R, CCI1), (MIKEY e2e S2-R, CCI2)} <------------------------------------------------------------------- ACK -------------------------------------------------------------------> Figure 5: Session setup sequence for answering machine Authors' Addresses Rolf Blom F. Lindholm SE-164 80 Stockholm Sweden Phone: +46 10 71 31 707 Email: rolf.j.blom@ericsson.com Yi Cheng F. Lindholm SE-164 80 Stockholm Sweden Phone: +46 10 71 17 589 Email: yi.cheng@ericsson.com Blom, et al. Expires September 15, 2011 [Page 22] Internet-Draft SRTP SaF Use Cases and Requirements March 2011 Fredrik Lindholm Ericsson AB SE-164 80 Stockholm Sweden Phone: +46 10 71 31 705 Email: fredrik.lindholm@ericsson.com John Mattsson Ericsson SE-164 80 Stockholm Sweden Phone: +46 10 71 43 501 Email: john.mattsson@ericsson.com Mats Naslund Ericsson SE-164 80 Stockholm Sweden Phone: +46 10 71 33 739 Email: mats.naslund@ericsson.com Karl Norrman Ericsson SE-164 80 Stockholm Sweden Phone: +46 10 71 44 502 Email: karl.norrman@ericsson.com Blom, et al. Expires September 15, 2011 [Page 23]