Network Working Group P. Jones (Ed.) Internet Draft N. Ismail Intended status: Informational D. Benham Expires: September 7, 2015 N. Buckles Cisco Systems J. Mattsson Y. Cheng Ericsson R. Barnes Mozilla March 7, 2015 Requirements for Private Media in a Switched Conferencing Environment draft-jones-avtcore-private-media-reqts-01 Abstract This document specifies the requirements for ensuring the privacy and integrity of real-time media flows between two or more endpoints communicating in a switched conferencing environment. This document also provides a high-level overview of switched conferencing in order to establish a common understanding of the goals and objectives of this work. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on September 7, 2015. Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Jones, et al. Expires September 7, 2015 [Page 1] Internet-Draft Private Media Requirements March 2015 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction...................................................2 2. Requirements Language..........................................3 3. Terminology....................................................3 4. Background.....................................................4 5. Motivation for Private Media in Switched Conferencing..........5 5.1. Switched Conferencing in Cloud Services...................5 5.2. Private Media Security through Switching..................7 6. Private Media Trust Model......................................8 6.1. Trusted Elements..........................................9 6.2. Untrusted Elements.......................................10 7. Goals and Non-Goals...........................................11 7.1. Goals....................................................11 7.1.1. Ensure End-To-End Confidentiality...................11 7.1.2. Ensure End-To-End Source Authentication of Media....11 7.1.3. Provide a More Efficient Service than "Full-Mesh"...11 7.1.4. Support Cloud-Based Conferencing....................12 7.1.5. Limiting a User's Access to Content.................12 7.1.6. Compatibility with the WebRTC Security Architecture.12 7.2. Non-Goals................................................13 7.2.1. Securing the Endpoints..............................13 7.2.2. Concealing that Communication Occurs................13 7.2.3. Individual Media Source Authentication..............13 7.2.4. Support for Multicast in Switched Conferencing......14 8. Requirements..................................................14 9. IANA Considerations...........................................15 10. Security Considerations......................................15 11. References...................................................16 11.1. Normative References....................................16 11.2. Informative References..................................16 12. Acknowledgments..............................................16 13. Contributors.................................................17 Authors' Addresses...............................................18 1. Introduction Users of multimedia communication products and services have privacy expectations that are largely satisfied with the use of SRTP [RFC3711] and related technologies when communicating point-to-point over the Internet. When communicating in a conferencing environment with two or more participants, though, it is necessary for an endpoint to share the SRTP master key and salt with the conference Jones, et al. Expires September 7, 2015 [Page 2] Internet-Draft Private Media Requirements March 2015 server so that it can authenticate and decrypt received RTP and RTCP packets. The conference server also needs the master key and salt in order to transmit media packets it receives to other participants in the conference. The need for conferencing servers to have the master key is a security risk for users. Within a corporate or other isolated environment where conferencing servers are tightly controlled, this security risk can be effectively managed. However, managing this risk is becoming increasing difficult as conferencing resources are being deployed in networks that are less trusted, including virtualized conferencing servers deployed in cloud environments. There are also public voice and video conferencing service providers in which users must place full trust in order to use those services, as it is necessary for an endpoint to share the SRTP master key with those conferencing servers. This exposes corporations, for example, to a higher risk of being subjected to corporate espionage. While it is not the intent of this draft to suggest that any existing service provider would permit or condone any illicit use of its service, the fact is that security threats can come from external sources and remain undiscovered for long periods of time. It is possible to ensure communication privacy within the context of a switched conferencing environment with limited changes in the security mechanisms used today. This document discusses this possibility in more detail and presents a set of requirements for meeting this objective. 2. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119] when they appear in ALL CAPS. These words may also appear in this document in lower case as plain English words, absent their normative meanings. 3. Terminology Adversary - An unauthorized entity that may attempt to compromise the performance of a conference server through various means, including, but not limited to, the transmission of bogus media packets or attempt to gain access to the plaintext of the media. Media content - the portion of the RTP (i.e., the encrypted RTP payload) or other packet containing the actual audio, video, or other multimedia information that is considered confidential and is subject to end-to-end encryption. This does not include, for example, RTP headers, RTP header extensions, or RTCP packets. Jones, et al. Expires September 7, 2015 [Page 3] Internet-Draft Private Media Requirements March 2015 Switching conference server - A conference server that does not decrypt RTP media flows or perform processing on the media payload, but instead simply forwards the received media from a sender to the other participants in a multimedia conference. A switching conference server may modify some RTP headers. 4. Background Traditional multimedia conferencing servers would mix, transcode, transrate, and/or recompose media flows from one or more conference participants, sending out a different audio and video flow to each participant. For audio, this might entail mixing some number of input flows that appear to contain audio intended to be heard by the other participants, with each participant receiving a flow that does not contain that participant's own audio. For video, the conference server may elect to send only video showing the current active speaker, a tiled composition of all participants or the most recent active speakers, a video flow with the active speaker presented prominently with other participants presented as thumbnail images, or some other composite arrangement. It is also common for audio or video to be transcoded. A typical traditional conferencing server is depicted in Figure 1. +-------------------+ +---+ --{A}--> | | <--{C}-- +---+ | A | | Media Composition | | C | +---+ <-{BCD}- | | -{ABD}-> +---+ | Transcoders | +---+ --{B}--> | Transraters | <--{D}-- +---+ | B | | | | D | +---+ <-{ACD}- | Decrypt/Encrypt | -{ABC}-> +---+ +-------------------+ Figure 1 - Traditional Conferencing Server Traditional conference servers require a significant amount of processing power, which in turn translates into a high cost for conferencing hardware manufacturers. Significantly, too, it is very difficult to deploy these servers in a cloud environment due to the high processing demands, as the specialized hardware found in the traditional voice and video conferencing server does not exist in a cloud environment. To enable the traditional conferencing server to perform its job, the server establishes an SRTP session with each of the conference participants so that it can get the keys required to decrypt and encrypt media flows from and to each participant. This means that the conference server is necessarily a fully trusted entity in the communication path. Anytime these servers are deployed in a network that is not tightly controlled, it increases the risk that an attacker might gain access to cryptographic key material, thus Jones, et al. Expires September 7, 2015 [Page 4] Internet-Draft Private Media Requirements March 2015 allowing the attacker to be able to see and listen to ongoing conferences. In some instances, depending on how the hardware is designed and how keys and certificates are managed, it might be possible for an attacker to see and listen to previously recorded conferences or future conferences. The Secure Real-time Transport Protocol (SRTP) [RFC3711] is a profile of RTP, which can provide confidentiality, message authentication, and replay protection to the RTP traffic and to the RTP Control Protocol (RTCP). Encryption of header extension in SRTP [RFC6904] provides a mechanism extending the mechanisms of [RFC3711], to selectively encrypt RTP header extensions in SRTP. [RFC3711] and [RFC6904] solves end-to-end use cases between two endpoints, and does not consider use cases where a sender delivers media to a receiver via a cloud-based conferencing service. 5. Motivation for Private Media in Switched Conferencing 5.1. Switched Conferencing in Cloud Services There is a trend in the industry for enterprises to use cloud services to host multi-party conferences and meet-me services, either exclusively or to meet peak loads on-demand. At the same time, there is shift toward using light-weight, cost-effective switching conference servers in cloud services that do not necessarily need to mix audio or composite/transcode video. Also fueling the use of such light-weight conference servers is the desire to fully exploit virtualized computing resources and dynamic scalability potential available in cloud computing environments. The increased use of cloud services has exposed a problem. There are two different trust domains from a media perspective: endpoints and other devices in a trusted domain, and conference servers controlled by the cloud service in an untrusted domain. Other examples of conference devices spread across trusted and untrusted domains are likely, but the cloud service trend is triggering the urgency to address the need to allow for lightweight media conference while enabling media privacy at the same time. With a switching conference server, each participant transmits media to the server as it would with a traditional conferencing server. However, the switching conference server merely forwards media to the other participants in the conference (where the other participant may be associated with a cascaded conference server or an endpoint on the same server), leaving composition to the receiving endpoint. Since some endpoints may have a limited amount of bandwidth, each endpoint might negotiate with the switching conference server to receive only a subset of the available media flows. Each transmitting endpoint might also send multiple media flows of varying frame sizes and/or frame rates (e.g., simulcast or scalability layers), so that the server can select the streams most appropriate for each receiver's Jones, et al. Expires September 7, 2015 [Page 5] Internet-Draft Private Media Requirements March 2015 bandwidth and capabilities. This allows, for example, an endpoint to receive and display higher quality video for the active speaker and thumbnails for other participants. It is also worth noting that, for switched media to work successfully, each endpoint in the conference must support the media formats transmitted by all other entities in the conference. More modern endpoints support multiple codecs and formats, making this commercially practical. Figure 2 depicts an example of a switching conference server wherein each participant is receiving the media flows transmitted by each of the other participants in the conference. +--------------------+ +---+ --{A}--> | | <-{C}--- +---+ | A | <-{B}--- |Switching Conference| --{A}--> | C | | | <-{C}--- | Server | --{B}--> | | +---+ <-{D}--- | | --{D}--> +---+ | Packet | +---+ --{B}--> | Authentication | <-{D}--- +---+ | B | <-{A}--- | | --{A}--> | D | | | <-{C}--- | | --{B}--> | | +---+ <-{D}--- | Media Privacy | --{C}--> +---+ +--------------------+ Figure 2 - Switching Conference Server Note - The use of multiple arrows directed toward each endpoint is not intended to suggest the use of separate RTP sessions. By using methods such as those described in [RFC6464], it is possible for the switching conference server to transmit the appropriate audio and video flows to conference participants without having knowledge of the contents of the encrypted media. The examples that follow help to illustrate this point. In the Figure 3 below, endpoints A, B and D receive the video streams from endpoint C, the currently active speaker, which is receiving video from endpoint A, the previous active speaker. Later when endpoint B becomes the active speaker (Figure 4), endpoints A, C and D will start to receive video from B, while endpoint B continues to receive video from endpoint C. Finally in Figure 5, endpoint A becomes the active speaker. Jones, et al. Expires September 7, 2015 [Page 6] Internet-Draft Private Media Requirements March 2015 +--------------------+ +---+ --{A}--> | | <--{C}-- +---+ | A | |Switching Conference| | C |* +---+ <-{C}--- | Server | ---{A}-> +---+ | | +---+ --{B}--> | | <--{D}-- +---+ | B | | | | D | +---+ <-{C}--- | | ---{C}-> +---+ +--------------------+ Figure 3 - Endpoint "C" is the Active Speaker +--------------------+ +---+ --{A}--> | | <--{C}-- +---+ | A | |Switching Conference| | C | +---+ <-{B}--- | Server | ---{B}-> +---+ | | +---+ --{B}--> | | <--{D}-- +---+ *| B | | | | D | +---+ <-{C}--- | | ---{B}-> +---+ +--------------------+ Figure 4 - Endpoint "B" is the Active Speaker +--------------------+ +---+ --{A}--> | | <--{C}-- +---+ *| A | |Switching Conference| | C | +---+ <-{B}--- | Server | ---{A}-> +---+ | | +---+ --{B}--> | | <--{D}-- +---+ | B | | | | D | +---+ <-{A}--- | | ---{A}-> +---+ +--------------------+ Figure 5 - Endpoint "A" is the Active Speaker Switched conferencing can also enable conferences to scale to include many more simultaneous participants than would be possible with a traditional conferencing server. Like traditional conferencing servers, switching conference servers can also be cascaded or interconnected in a meshed topology to increase the size of the conference without putting undue burden on any particular server. 5.2. Private Media Security through Switching A traditional conferencing server, or MCU, establishes an SRTP session with each participating endpoint separately, and needs to decrypt packets containing media presented to other endpoints. By using a switching conference server, it is possible to keep the media encryption keys private to the endpoints such that the conference server does not have access to the keys used for media encryption. Jones, et al. Expires September 7, 2015 [Page 7] Internet-Draft Private Media Requirements March 2015 The switching conference server just forwards media received to each of the other participants in the conference. This provides for a significantly improved security model, as one can, for example, utilize conferencing resources in the cloud that do not necessarily have to be trusted. That said, there may be situations where the switching conference server needs to modify the RTP packet received from an endpoint, such as by adding or removing an RTP header extension, modifying the payload type value, etc. It would be the responsibility of the switching conference server to ensure that media of the expected type and containing the correct information is received by a recipient. Thus, there is a need to utilize an end-to-end encryption and authentication key (or pair of keys) and a hop-by-hop encryption and authentication key (or pair of keys). The purpose for the hop-by-hop encryption key is to optionally encrypt RTP header extensions. The current SRTP specification and related specifications do not define use of a dual-key approach presently. However, such an approach is possible and would result in ensuring the privacy of media while also enabling the more scalable switched conferencing model. The assumption is that no changes are made to SRTCP, i.e. SRTCP is protected hop-by-hop with a single security context. This dual-key model does necessitate a change in the way that keys are managed. However, the topic of key management is outside the scope of this requirements document. However, high-level assumptions like if the end-to-end contexts use a group key as SRTP master key or if individual SRTP master keys (that may be derived/negotiated from another group key) is likely to influence the solution derived from this document. 6. Private Media Trust Model The architecture suggested in this specification enables switching conference servers to be hosted in domains in which the network elements may have low trust, or where the trustworthiness is uncertain. This does not mean that the service provider is untrusted; it simply means that high trust is not required. This has the benefit of protecting the endpoints in the case of external attacks against the conference server. In this specification, certain elements are considered trusted and others are considered untrusted. Trust in the context of this specification means that the element can be in possession of the media encryption key(s) for a past, current, or potentially future conference (or portion thereof) used to protect media content. There are very few elements that need to be trusted. However, it is also recognized that in certain deployment models, some elements that Jones, et al. Expires September 7, 2015 [Page 8] Internet-Draft Private Media Requirements March 2015 are classified as untrusted might be placed into the trusted domain and considered trusted. This specification is not intended to prevent such deployment models, but it does not rely upon them. Each of the elements discussed below has a direct or indirect relationship with each other. The following diagram depicts the trust relationships described in the following sub-sections and the media or signaling interfaces that exist between them, showing the trusted elements on the left and untrusted elements on the right. Note that this is a logical diagram and functional elements may be co-located or further divided into multiple separate physical entities. Note that it is not necessary that every interface exist between all elements, such as both an interface from the endpoint and call processing function to a key management function, though both are possible options. | | +--------------------------------------------+ v | | +----------+ | +-----------------+ | | Endpoint |--------------> | Call Processing | | +----------+ | +-----------------+ | ^ | ^ ^ | Trusted | | | | +------+ Elements | | | | | | +-----------------------+ | | | | | v v | | | +----------------------+ | | +--------------> | Switching Conference | | | | | | Server | v v v | +----------------------+ +----------------+ | | Key Management | | Untrusted | Function | | Elements +----------------+ | | | Figure 6 - Relationship of Trusted and Untrusted Elements 6.1. Trusted Elements The endpoint is considered a trusted element, as it will be sourcing media flows transmitted to other conference participants and will be receiving media for rendering for the human user. While it is possible for an endpoint to be compromised and perform in unexpected ways, such as transmitting a decrypted copy of media content to an adversary, such security issues and defenses are outside the scope of this document. Jones, et al. Expires September 7, 2015 [Page 9] Internet-Draft Private Media Requirements March 2015 The other trusted element is a key management function (KMF). This function is responsible for providing cryptographic keys to the endpoints for encrypting and authenticating media content. The KMF is also responsible for providing cryptographic keys to the conferencing resources to enable authentication of media packets received by a conference participant. Interaction between the KMF and untrusted call processing functions may be necessary to ensure conference participants are delivered the appropriate keys or are directed to the appropriate conference server. It is expected that the KMF will be tightly controlled and managed to prevent exploitation by an adversary, as any kind of security compromise of the KMF puts the security of all conferences at risk. 6.2. Untrusted Elements The call processing function is responsible for such things as authenticating the user, signing messages, and processing call signaling messages. This element is responsible for ensuring the integrity, and optionally the confidentiality, of call signaling messages between itself, the endpoint, and other network elements. However, it is considered an untrusted element for the purposes of this specification, as it cannot be trusted to have access to or be able to gain access to cryptographic key material that provides privacy and integrity of media packets. There might be several independent call processing functions within an enterprise, service provider network, or the Internet that are classified as untrusted. Any signaling information that passes through these untrusted entities is subject to inspection by that element and might be altered by an adversary. Likewise, there may be certain deployment models where the call processing function is considered trusted. In such cases, trusted call processing functions MUST take responsibility for ensuring the integrity of received messages before delivering those to the endpoint. How signaling message integrity is ensured is outside the scope of this document, but might use such methods as defined in [RFC4474]. The final element is the switching conference server, which is responsible for forwarding encrypted media packets and conference control information to endpoints in the conference. It is also responsible for conveying secured signaling between the endpoints and the key management function, acquiring per-hop authentication keys from the KMF, and performing per-hop authentication operations for media packets. This function might also aggregate conference control information and initiate various conference control requests. Forwarding of media packets requires that the switching conference server have access to RTP headers or header extensions and potentially modify those message elements, but the actual media content MUST not be decipherable by the switching conference server. Jones, et al. Expires September 7, 2015 [Page 10] Internet-Draft Private Media Requirements March 2015 Further, the switching conference server does not have the ability to determine whether an endpoint is authorized to have access to media encryption keys. Merely joining a conference MUST NOT be interpreted as having authority. Media encryption keys are conveyed to the endpoint by the KMF in such a way as to prevent the switching conference server from having access to those keys. It is assumed that an adversary might have access to the switching conference server and have the ability to read any of the contents that pass through. For this reason, it is untrusted to have access to the media encryption keys. As with the call processing functions, it is appreciated that there may be some deployments wherein the switching conference server is trusted. However, for the purposes of this specification, the switching conference server is considered untrusted so that we can ensure to develop a solution that will work even in the more hostile environments. 7. Goals and Non-Goals 7.1. Goals 7.1.1. Ensure End-To-End Confidentiality The content of the communication and all media needs to be confidential within the group of entities explicitly invited into the conference. An external monitoring adversary should not be able to deduce the human-to-human communication that actually occurred from capturing the media packets. At the same time, it is necessary to allow switching media servers to manipulate certain RTP header fields like the payload type value. 7.1.2. Ensure End-To-End Source Authentication of Media In a conference system with multiple participants it is vital that the media content presented to any of the human participants is from the stated participant, and not an adversary that attempts to inject misleading content. Nor should an adversary be able to fool the system into becoming a trusted party in the conference. Only explicitly invited parties shall be able to contribute content. 7.1.3. Provide a More Efficient Service than "Full-Mesh" A multi-party conference that has the goals of confidentiality and source authentication can be established as a "full mesh" (i.e., each participating endpoint directly addresses each of the other participants). However, this has a significant issue with the amount of consumed resources in both the uplink and the downlink from each participant. Jones, et al. Expires September 7, 2015 [Page 11] Internet-Draft Private Media Requirements March 2015 A switched conferencing model would yield the efficiencies desired. 7.1.4. Support Cloud-Based Conferencing To achieve cost-effective and scalable conferencing, it must be possible to run the conference server instances in a cloud-based virtualized environment. From a security standpoint, this is a significant issue since the virtualized server instance and the underlying hardware and software upon which it runs might not be secure from an adversary. 7.1.5. Limiting a User's Access to Content Since an invited user will be provided with the content protection keys, the user can decrypt content from time periods before and after the user joined the conference. However, this is not always desirable. It should be possible to re-key the content protection keys every time a user joins or leaves the conference so each particular set of conference participants uses a unique key. This also changes the trust level required on the conference roster handling at any point and how to keep that accurate and secured. It should be noted that timely completion of the re-keying operations become an obstacle in system design and operation. Thus, it is a goal to allow for this possibility when it is deemed essential, but it should not be a requirement on a system to re-key each time the participant list changes. 7.1.6. Compatibility with the WebRTC Security Architecture It is a goal of this work to ensure compatibility with the WebRTC security architecture as described in [I.D-rtcweb-security-arch]. As an example, local resources that are considered a part of the trusted computing base (TCB), such as keying material derived using DTLS- SRTP, will remain within the TCB and not exposed to untrusted entities. The browser is reliant on an external calling service to convey signaling information that may open the door for a man-in-the-middle attack, such as the conveyance of certificate fingerprints over the interface between the browser and the calling service. However, as described in [I.D-rtcweb-security-arch], the browser may utilize additional services, such as a trusted identify provider, to mitigate such risks. Having said the foregoing, this document does not aim to define requirements for end-to-end security for the WebRTC data channel. Jones, et al. Expires September 7, 2015 [Page 12] Internet-Draft Private Media Requirements March 2015 7.2. Non-Goals 7.2.1. Securing the Endpoints The security of a communication session requires that the endpoints are not compromised and that the users are trustworthy. If not, credentials and decrypted content may be shared with third parties. However, this is hard to prevent through system design. Thus, it should be assumed that the endpoint is secure and the user is trustworthy; how to achieve this is out of scope this document. 7.2.2. Concealing that Communication Occurs A non-goal is to attempt to prevent a pervasive monitoring adversary from knowing that the communication session has occurred. The reason for excluding this as a goal is that it is extremely difficult to achieve, as a pervasive monitoring adversary can be expected to be able to have knowledge of all IP flows that enter or exit local ISPs, across links that straddle nation borders or internet exchange points. To hide the fact communication occurred, the flows required to achieve the communication session need to be highly difficult to correlate between different legs of the communication. At this stage this is deemed too difficult to attempt and will need to be a subject for further study. Existing attempts include The Onion Router (TOR), against which it has been claimed to be possible to monitor, at least partially, by an adversary with sufficient reach. Also of consideration is that trying to conceal the fact that communication occurred actually makes it more difficult for network administrators to effectively manage and troubleshoot issues with conference calls. 7.2.3. Individual Media Source Authentication Although the participants in the conference are authenticated, it is not a goal to provide source authentication of the media at the individual user level, instead being satisfied with being able to authenticate media as coming from an invited conference participant or not. There exist solutions that can provide individual media source authentication (e.g., TESLA). However, they impact the performance or security properties they provide. Thus, further study is required to determine impact and resulting security properties if desired to have individual source authentication. Jones, et al. Expires September 7, 2015 [Page 13] Internet-Draft Private Media Requirements March 2015 7.2.4. Support for Multicast in Switched Conferencing Multicast traffic is, by design, transmitted to every participant in a conference. The focus of this document is only on centralized unicast conferencing that utilizes a switched conferencing architecture. 8. Requirements The following are the security solution requirements for switched conferencing that enable end-to-end media privacy between all conference participants. Note that while some switching media servers might be fully trusted entities, the intent of this solution and purpose for these private media (PM) requirements is to address those servers that are not fully trusted. PM-01: Switching conference server MUST be able to switch the media between participants in a conference without having access to unencrypted media content. PM-02: Solution MUST maintain all current SRTP security goals, namely the ability to provide for end-to-end confidentiality, provide for hop-by-hop replay protection, and ensure hop-by- hop and end-to-end message integrity. {Editor's Note: Question asked, "Does this include third parties?" Jonathan Lennox to suggest ways to make this more concrete.} PM-03: Solution MUST extend replay protection to cover each hop in the media path, both ensuring that any received packet is destined for the recipient and not a duplicate. PM-04: Keys used for end-to-end encryption and authentication of RTP payloads and other information deemed unsuitable for access by the switching conference server MUST NOT be generated by or accessible to any component that is not in the fully trusted domain. PM-05: The switching conference server MUST be capable of making changes to the RTP header and, optionally, the RTP header extensions. PM-06: The SRTP cryptographic context, which is identified in part by an SSRC, contains transform-independent parameters used by the sending endpoint, including the RTP packet sequence number and rollover counter (ROC), required for packet decryption and authentication that, along with the value of the SSRC, MUST be protected end-to-end. Jones, et al. Expires September 7, 2015 [Page 14] Internet-Draft Private Media Requirements March 2015 PM-07: The switching conference server, or any entity that is not fully trusted, MUST NOT be involved in the user or device authentication for the purpose of media key distribution. PM-08: The switching conference server MUST be able to switch an already active SRTP stream to a new receiver, while guaranteeing the timely synchronization between the SRTP context of the transmitter and its current and new receivers. PM-09: It MUST be possible for the switching conference server to determine if a received media packet was transmitted by a conference participant in possession of the end-to-end media encryption keys and hop-by-hop authentication keys. PM-10: It MUST be possible for a conference to be optionally re- keyed as desired, such as each time a participant joins or leaves the conference. {Editor's note: Who is allowed to know who leaves and joins? Do you trust the conference server to tell you reliably?} PM-11: Any solution satisfying this requirements specification MUST provide for a means through which WebRTC-compliant endpoints can participate in a switched conference using private media as outlined herein. PM-12: All RTP senders, including the switching conference server, MUST adhere to all congestion control requirements that are required by the RTP profile and topology in use, including RTP circuit breakers [I.D-ietf-avtcore-rtp-circuit-breakers]. Since the switching conference server is unable to perform transcoding or transrating that requires access to the unencrypted media, its reaction to congestion signals is often limited to dropping packets that would otherwise be forwarded in the absence of congestion, and signaling congestion to the RTP source. This is similar to the congestion control behavior of the Media Switching Mixer and Selective Forwarding Middlebox/Unit in [I.D-ietf-avtcore-rtp- topologies-update]. 9. IANA Considerations There are no IANA considerations for this document. 10. Security Considerations [TBD] Jones, et al. Expires September 7, 2015 [Page 15] Internet-Draft Private Media Requirements March 2015 11. References 11.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004. [RFC6464] Lennox, J., Ivov, E., and E. Marocco, "A Real-time Transport Protocol (RTP) Header Extension for Client-to- Mixer Audio Level Indication", RFC 6464, December 2011. [I.D-rtcweb-security-arch] E. Rescorla, "WebRTC Security Architecture", Work in Progress, July 2014. [RFC6904] J. Lennox, "Encryption of Header Extensions in the Secure Real-time Transport Protocol (SRTP)", RFC 6904, December 2013. [I.D-ietf-avtcore-rtp-topologies-update] Westerlund, M., and S. Wenger, "RTP Topologies", Work in Progress, March 2015. [I.D-ietf-avtcore-rtp-circuit-breakers] Perkins, C. S., and V. Singh, "Multimedia Congestion Control: Circuit Breakers for Unicast RTP Sessions", Work in Progress, March 2015. 11.2. Informative References [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC4474] Peterson, J. and C. Jennings, "Enhancements for Authenticated Identity Management in the Session Initiation Protocol (SIP)", RFC 4474, August 2006. 12. Acknowledgments The authors would like to thank Marcello Caramma, Matthew Miller, Christian Oien, Magnus Westerlund, Cullen Jennings, Christer Holmberg, Bo Burman, Jonathan Lennox, Suhas Nandakumar, Dan Wing, Roni Even, and Mo Zanaty for their invaluable input. Jones, et al. Expires September 7, 2015 [Page 16] Internet-Draft Private Media Requirements March 2015 13. Contributors [TBD] Jones, et al. Expires September 7, 2015 [Page 17] Internet-Draft Private Media Requirements March 2015 Authors' Addresses Paul E. Jones Cisco Systems, Inc. 7025 Kit Creek Rd. Research Triangle Park, NC 27709 USA Phone: +1 919 476 2048 Email: paulej@packetizer.com Nermeen Ismail Cisco Systems, Inc. 170 W Tasman Dr. San Jose USA Email: nermeen@cisco.com David Benham Cisco Systems, Inc. 170 W Tasman Dr. San Jose USA Email: dbenham@cisco.com Nathan Buckles Cisco Systems, Inc. 170 W Tasman Dr. San Jose USA Email: nbuckles@cisco.com John Mattsson Ericsson AB SE-164 80 Stockholm Sweden Phone: +46 10 71 43 501 Email: john.mattsson@ericsson.com Yi Cheng Ericsson SE-164 80 Stockholm Jones, et al. Expires September 7, 2015 [Page 18] Internet-Draft Private Media Requirements March 2015 Sweden Phone: +46 10 71 17 589 Email: yi.cheng@ericsson.com Richard Barnes Mozilla 331 E Evelyn Ave. Mountain View USA Email: rlb@ipv.sx Jones, et al. Expires September 7, 2015 [Page 19]