INTERNET-DRAFT M. Handley/J. Crowcroft/C. Bormann/J. Ott Expires: January 1998 ISI/UCL/Universitaet Bremen/Universitaet Bremen July 1997 The Internet Multimedia Conferencing Architecture draft-ietf-mmusic-confarch-00.txt Status of this memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Abstract This document provides an overview of multimedia conferencing on the Internet. The protocols mentioned are specified elsewhere as RFCs, Internet-Drafts, or ITU recommendations. Each of these specifications gives details of the protocol itself, how it works and what it does. This document attempts to provide the reader with an overview of how the components fit together and of some of the assumptions made, as well as some statement of direction for those components still in a nascent stage. This document is a product of the Multiparty Multimedia Session Control (MMUSIC) working group of the Internet Engineering Task Force. Comments are solicited and should be addressed to the working group's mailing list at confctrl@isi.edu and/or the authors. (To do for final version: fix references.) Handley/Crowcroft/Bormann/Ott [Page 1] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 1. Introduction In conjunction with computers, the term ``conferencing'' is often used in two different ways: firstly, to refer to bulletin boards and mail list style asynchronous exchanges of messages between multiple users; secondly, to refer to synchronous or so-called ``real-time'' conferencing, including audio/video communication and shared tools such as whiteboards and other applications. This document is about the architecture for this latter application, multimedia conferencing in the Internet. There are other infrastructures for teleconferencing in the world: POTS (Plain Old Telephone System) networks often provide voice conferencing and phone-bridges, while with ISDN, H.320 [1] can be used for small, strictly organised video-telephony conferencing. The architecture that has evolved in the Internet is far more general as well as being scalable to very large groups, and permits the open introduction of new media and new applications as they are devised. As the simplest case, it also allows two persons to communicate via audio only, so it encompasses IP telephony. The determining factors of a conferencing architecture are communication in (possibly large) groups of humans and real-time delivery of information. In the Internet, this is supported at a number of levels. The remainder of this section provides an overview of this support, and the rest of the document describes each aspect in more detail. In a conference, information must be distributed to all the conference participants. Early conferencing systems used a fan-out of data streams, e.g., one connection between each pair of participants, which means that the same information must cross some networks more than once. The Internet architecture uses the more efficient approach of multicasting the information to all participants (section 2). Multimedia conferences require real-time delivery of at least the audio and video information streams used in the conference. In an ISDN context, fixed rate circuits are allocated for this purpose -- whether their bandwidth is required at any particular instance or not. On the other hand, the traditional Internet service model (``best effort'') cannot make the necessary quality of service available in congested networks. New service models are being defined in the Internet together with protocols to reserve capacity in a more flexible way than that available with circuit switching (section 3). In a datagram network, multimedia information must be transmitted in packets, some of which may be delayed more than others. In order that audio and video streams be played out at the recipient in the correct timing, information must be transmitted that allows the recipient to reconstitute the timing. A transport protocol with the Handley/Crowcroft/Bormann/Ott [Page 2] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 specific functions needed for this has been defined (section 4). Conference tools such as virtual whiteboards or shared editors are not concerned with real-time delivery of audio or video but maintain and update shared state between the participants. Work on support of such applications in a multicst environment is in progress (section 5). The humans participating in a conference generally need to have a specific idea of the context in which the conference is happening, which can be formalized as a conference policy. Some conferences are essentially crowds gathered around an attraction, while others have very formal guidelines on who may take part (listen in) and who may speak at which point. In any case, initially the participants must find each other, i.e. establish communication relationships (conference setup, section 6). During the conference, some conference control information is exchanged to implement a conference policy or at least to inform the crowd of who is present (section 7). In addition, security measures may be required to actually enforce the conference policy, e.g. to control who is listening and to authenticate contributions as actually originating from a specific person. In the Internet, there is little tendency to rely on the traditional ``security'' of distribution offered e.g. by the phone system. Instead, cryptographic methods are used for encryption and authentication, which need to be supported by additional conference setup and control mechanisms (section 8). Figure 1: Internet Multimedia Conferencing protocol stacks |<--- Conference Management --->|<--- Media Agents --->| | | | | Conference | Conference | Audio/ | Shared | | Setup & Discovery | Course Control | Video | Applications | +-------------------------+------+--------+-+--------+------------+ + | S D P | | Distr. | RTP / | Reliable | | | SAP | SIP | HTTP | SMTP | RSVP | Ctrl(1)| RTCP |Multicast(2)| | +-----+--+--+------+------+ +--+--------+----------+------------+--+ | UDP | T C P | | U D P | +--------+----------------+---+--------------------------------------+ | IP + IP Multicast | +--------------------------------------------------------------------+ | Integrated Services Forwarding | +--------------------------------------------------------------------+ Notes: (1) The work on distributed control for tightly coupled conferences is in progress (see section 6). (2) See section 5. Handley/Crowcroft/Bormann/Ott [Page 3] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 The protocol stacks for Internet Multimedia Conferencing are shown in Figure 1. Most of the protocols are not deeply layered unlike many protocol stacks, but rather are used alongside each other to produce a complete conference. 2. Multicast Traffic Distribution IP multicast enables efficient many-to-many datagram distribution. It is one of the basic building blocks of the Internet Multimedia Conferencing architecture. For most conferencing purposes, unicast is viewed as being a special case of multicast traffic. 2.1. Multicast Service Model The IP multicast service model is as follows: - Senders send datagrams to the address of a multicast group. - Receivers express an interest in (join) certain multicast groups. - Multicast routers conspire to deliver multicast group addressed datagrams from the senders to the receivers. The important factor here is that senders do not have to know who the receivers are in order to be able to send to them. In fact, in most situations, no single point in the network needs to know who all the receivers are, and it is this that makes IP multicast scalable to very large groups. In addition, receivers do not need to know who the senders are in order to be able to receive traffic from them, and this solves many conference setup and resource location problems without needing explicit machinery. There are many multicast routing protocols [2-5] but all of them satisfy the above service model. They differ in their mechanisms and in how they scale with the number of senders and groups. Within a single LAN, group membership is expressed by IGMP [6, 7]. IGMP version 3 allows receivers to express an interest in only receiving some of the senders to a particular multicast group. Earlier versions of IGMP only allow a receiver to request to receive all the sources sending to a multicast group. 2.2. Address Allocation How does an application choose a multicast address to use? In the absence of any other information, we can bootstrap a multicast Handley/Crowcroft/Bormann/Ott [Page 4] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 application by using well-known multicast addresses. Routing (unicast and multicast) and the group membership protocol IGMP can do just that. However, this is not the best way of managing applications of which there is more than one instance at any one time. For these, we need a mechanism for allocating group addresses dynamically, and a directory service which can hold these allocations together with some key (session information for example -- see later), so that users can look up the address associated with the application. The address allocation and directory functions should be distributed to scale well. Address allocation schemes should avoid clashes, hence some kind of hash function suggests itself for forming initial ``random'' values for the address. Furthermore, both the address allocation system and the directory service can take advantage of the baseline multicast mechanism by advertising conferences through multicast messages on a well-known address, and using this to inform other directory servers to remove clashes and inform applications of the allocation; see also section 7. Such advertisements, as well as the multicast traffic itself, can be restricted to a defined region in the network (such as a corporate network) by using multicast addresses out of a range reserved for administrative scoping [***REF***]. In the future, address allocation may further be influenced by the desire to allocate addresses such that the corresponding landmarks used in emerging inter-domain multicast routing protocols are close to a significant subset of the participants [***REF***]. 3. Internet Service Models Traditionally the Internet has provided best-effort delivery of datagram traffic from senders to receivers. No guarantees are made regarding when or if a datagram will be delivered to a receiver, however datagrams are normally only dropped when a router exceeds a queue size limit due to congestion. The best-effort Internet service model does not assume FIFO queuing, although many routers have implemented this. With best-effort service, if a link is not congested, queues will not build at routers, datagrams will not be discarded in routers, and delays will consist of serialisation delays at each hop plus propagation delays. With sufficiently fast link speeds, serialisation delays are insignificant compared to propagation delays[1]. _________________________ [1] For slow links, a set of mechanisms has been defined that helps minimize serialisation and link access delays [8]. Handley/Crowcroft/Bormann/Ott [Page 5] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 If a link is congested, with best-effort service queuing delays will start to influence end-to-end delays, and packets will start to be lost as queue size limits are exceeded. 3.1. Non-best effort service Real-time Internet traffic is defined as datagrams that are delay sensitive. It could be argued that all datagrams are delay sensitive to some extent, but for these purposes we refer only to datagrams where exceeding an end-to-end delay bound of a few hundred milliseconds renders the datagrams useless for the purpose they were intended. For the purposes of this definition, TCP traffic is normally not considered to be real-time traffic, although there may be exceptions to this rule. On congested links, best-effort service queuing delays will adversely affect real-time traffic. This does not mean that best-effort service cannot support real-time traffic -- merely that congested best-effort links seriously degrade the service provided. For such congested links, a better-than-best-effort service is desirable. To achieve this, the service model of the routers can be modified. At a minimum, FIFO queuing can be replaced by packet forwarding strategies that discriminate different ``flows'' of traffic. The idea of a flow is very general. A flow might consist of ``all marketing site web traffic'', or ``all fileserver traffic to and from teller machines'' or ``all traffic from the CEOs laptop wherever it is''. On the other hand, a flow might consist of a particular sequence of packets from an application in a particular machine to a peer application in another particular machine between specific times of a specific day. Flows are typically identifiable in the Internet by the tuple: {source machine, destination machine, source port, destination port, protocol} any of which could be ``ANY'' (wildcarded). In the multicast case, the destination is the group, and can be used to provide efficient aggregation. Flow identification is called classification and a class (which can contain one or more flows) has an associated service model applied. This can default to best effort. Through network management, we can imagine establishing classes of long lived flows -- enterprise networks (``Intranets'') often enforce traffic policies that distinguish priorities which can be used to discriminate in favor of more important traffic in the event of overload (though in an underloaded network, the effect of such policies will be invisible, and may incur no load/work in routers). The router service model to provide such classes with different Handley/Crowcroft/Bormann/Ott [Page 6] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 treatment can be as simple as a priority queuing system, or it can be more elaborate. Although best-effort services can support real-time traffic, classifying real-time traffic separately from non-real-time traffic and giving real-time traffic priority treatment ensures that real- time traffic sees minimum delays. Non-real-time TCP traffic tends to be elastic in its bandwidth requirements, and will then tend to fill any remaining bandwidth. We could imagine a future Internet with sufficient capacity to carry all of the world's telephony traffic. Since this is a relatively modest capacity requirement, it might be simpler to establish ``POTS'' as a static class which is given some fraction of the capacity overall, and then within the backbone of the network no individual call need be given an allocation (i.e. we would no longer need the call setup/tear down that was needed in the legacy POTS which was only present due to under-provisioning of trunks, and to allow the trunk exchanges the option of call blocking). The vision is of a network that is engineered with capacity for all of the average load sources to send all the time. 3.2. Reservations For flows that may take a significant fraction of the network (i.e. are ``special'' and can't just be lumped under a static class), we need a more dynamic way of establishing these classifications. In the short term, this applies to any multimedia calls since the Internet is largely under-provisioned at the time of writing. RSVP is being standardised for just this purpose. It provides flow identification and classification. Hosts and applications are modified to speak RSVP client language, and routers speak RSVP. Since most traffic requiring reservations is delivered to groups (e.g. TV), it is natural for the receiver to make the request for a reservation for a flow. This has the added advantage that different receivers can make heterogeneous requests for capacity from the same source. Thus RSVP can accommodate monochrome, color and HDTV receivers from a single source. Again the routers conspire to deliver the right flows to the right locations. RSVP accommodates the wildcarding noted above. 3.3. Admission Control If a network is provisioned such that it has excess capacity for all Handley/Crowcroft/Bormann/Ott [Page 7] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 the real-time flows using it, a simple priority classification ensures that real-time traffic is minimally delayed. However, if a network is insufficiently provisioned for the traffic in a real-time traffic class, then real-time traffic will be queued, and delays and packet loss will result. Thus in an under-provisioned network, either all real-time flows will suffer, or some of them must be given priority. RSVP provides a mechanism by which an admission control request can be made, and if sufficient capacity remains in the requested traffic class, then a reservation for that capacity can be put in place. If insufficient capacity remains, the admission request will be refused, but the traffic will still be forwarded with the default service for that traffic's traffic class. In many cases even an admission request that failed at one or more routers can still supply acceptable quality as it may have succeeded in installing a reservation in all the routers that were suffering congestion. This is because other reservations may not be fully utilising their reserved capacity in those routers where the reservation failed. 3.4. Billing If a reservation involves setting aside resources for a flow, this will tie up resources so that other reservations may not succeed, and depending on whether the flow fills the reservation, other traffic is prevented from using the network. Clearly some negative feedback is required in order to prevent pointless reservations from denying service to other users. This feedback is typically in the form of billing. For real-time non-best effort multicast traffic that is not reserved, this negative feedback is provided in the form of loss due to congestion of a traffic class, and it is not clear that usage based billing is required. Billing requires that the user making the reservation is properly authenticated so that the correct user can be charged. Billing for reservations introduces a level of complexity to the Internet that has not typically been experienced with non-reserved traffic, and requires network providers to have reciprocal usage-based billing arrangements for traffic carried between them. It also suggests the use of mechanisms whereby some fraction of the bill for a link reservation can be charged to each of the downstream multicast receivers. 4. Audio/Video Transport Protocols So-called real-time delivery of traffic requires little in the way of transport protocol. In particular, real-time traffic that is sent over more than trivial distances is not retransmittable. Handley/Crowcroft/Bormann/Ott [Page 8] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 4.1. Separate Flows for each Media Stream With packet multimedia data there is no need for the different media comprising a conference to be carried in the same packets. In fact it simplifies receivers if different media streams are carried in separate flows (i.e., separate transport ports and/or separate multicast groups). This also allows the different media to be given different quality of service. For example, under congestion, a router might preferentially drop video packets over audio packets. In addition, some sites may not wish to receive all the media flows. For example, a site with a slow access link may be able to participate in a conference using only audio and a whiteboard whereas other sites in the same conference may also send and receive video. 4.2. Receiver Adaptation Best-effort traffic is delayed by queues in routers between the sender and the receivers. Even reserved priority traffic may see small transient queues in routers, and so packets comprising a flow will be delayed for different times. Such delay variance is known as jitter. Real-time applications such as audio and video need to be able to buffer real-time data at the receiver for sufficient time to remove the jitter added by the network and recover the original timing relationships between the media data. In order to know how long to buffer for, each packet must carry a timestamp which gives the time at the sender when the data was captured. Note that for audio and video data timing recovery, it is not necessary to know the absolute time that the data was captured at the sender, only the time relative to the other data packets. 4.3. Synchronisation As audio and video flows will receive differing jitter and possibly differing quality of service, audio and video that were grabbed at the same time at the sender may not arrive at the receiver at the same time. At the receiver, each flow will need a playout buffer to remove network jitter. Inter-flow synchronisation can be performed by adapting these playout buffers so that samples/frames that originated at the same time are played out at the same time. This requires that the time base of different flows from the same sender can be related at the receivers, e.g. by making available the absolute times at which each of them was captured. Handley/Crowcroft/Bormann/Ott [Page 9] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 4.4. RTP The transport protocol for real-time flows is RTP [9]. This provides a standard format packet header which gives media specific timestamp data, as well as payload format information and sequence numbering amongst other things. RTP is normally carried using UDP. It does not provide or require any connection setup, nor does it provide any enhanced reliability over UDP. For RTP to provide a useful media flow, there must be sufficient capacity in the relevant traffic class to accommodate the traffic. How this capacity is ensured is independent of RTP. Every original RTP source is identified by a source identifier, and this source id is carried in every packet. RTP allows flows from several sources to be mixed in gateways to provide a single resulting flow. When this happens, each mixed packet contains the source ids of all the contributing sources. RTP media timestamp units are flow specific -- they are in units that are appropriate to the media flow. For example, 8kHz sampled PCM encoded audio has a timestamp clock rate of 8kHz. This means that inter-flow synchronisation is not possible from the RTP timestamps alone. Each RTP flow is supplemented by Real-Time Control Protocol (RTCP) packets. There are a number of different RTCP packet types. RTCP packets provide the relationship between the real-time clock at a sender and the RTP media timestamps, and provide textual information to identify a sender in a conference from the source id. 4.5. Conference Membership and Reception Feedback IP multicast allows sources to send to a multicast group without being a receiver of that group. However, for many conferencing purposes it is useful to know who is listening to the conference, and whether the media flows are reaching receivers properly. Accurately performing both these tasks restricts the scaling of the conference. IP multicast means that no-one knows the precise membership of a multicast group at a specific time, and this information cannot be discovered, as to try to do so would cause an implosion of messages, many of which would be lost[2]. Instead, RTCP provides approximate membership information through periodic multicast of session messages which, in addition to information about the recipient, also give information about the reception quality at that receiver. RTCP session messages are restricted in rate, so that as a conference _________________________ [2] Note that a conference policy that restricts conference mem- bership can be implemented using encryption and restricted distri- bution of encryption keys, of which more later. Handley/Crowcroft/Bormann/Ott [Page 10] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 grows, the rate of session messages remains constant, and each receiver reports less often. A member of the conference can never know exactly who is present at a particular time from RTCP reports, but does have a good approximation to the conference membership. Reception quality information is primarily intended for debugging purposes, as debugging of IP multicast problems is a difficult task. However, it is possible to use reception quality information for rate adaptive senders, although it is not clear whether this information is sufficiently timely to be able to adapt fast enough to transient congestion. However, it is certainly sufficient for Van Jacobson congestion control [10] style adaption to a ``share'' of the current capacity. 4.6. Control of Stream Playback and Recording A control protocol for initiating and controlling playing and recording audio, video, and other RTP-based information is the Real- Time Stream control Protocol (RTSP) [11]. While primarily intended for web-based media-on-demand services, RTSP may also be used in the context of teleconferences to make recorded audio/video information available to the participants, or to control recording the course of the conference. 5. Protocols for Non-A/V Applications Applications other than audio and video have evolved in Internet conferencing, e.g. Imm, Wb [12], Nt. Such applications can be used to substitute for meeting aids in physical conferences (whiteboards, projectors) or replace visual and auditory cues that are lost in teleconferences (e.g., a speaker list application); they also can enable new styles of joint work. Most non-A/V applications have in common that the application protocol is about establishing and updating a shared state. Loss of information is often not acceptable, so some form of multicast reliability is required. The applications' requirements differ: Some applications make per-participant additions to the shared state that are orthogonal to each other (e.g., whiteboards), some evolve a more closely interrelated common state (e.g., additions to a speaker list must be properly sequenced). Some applications can make use of added bandwidth/react to congestion in an elastic way, others transport data that, although not strictly real-time, is time-critical. In the IRTF research group on Reliable Multicast, work is in progress on common protocol elements that can be used in such applications. At the time of writing, some aspects of reliable multicast are not well-understood, such as the proper way to provide congestion control in a multicast environment. As congestion control is considered an essential element, standards track protocols are not expected before this can be solved. Refer to http://www.irtf.org/rm for further Handley/Crowcroft/Bormann/Ott [Page 11] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 information. 6. Conference Control Conferences come in many shapes and sizes, but there are only really two models for conference control: light-weight sessions and tightly coupled conferencing. For both models, rendezvous mechanisms are needed. Note that the conference control model is orthogonal to issues of quality of service and network resource reservation. Note also that the issue of conference control is orthogonal to the mechanism for discovering the conference. 6.1. Light-weight Sessions Light-weight sessions are multicast based multimedia conferences that lack explicit session membership and explicit conference control mechanisms. Typically a lightweight session consists of a number of many-to-many media streams supported using RTP and RTCP using IP multicast[3]. The only conference control information available during the course of light-weight sessions is that distributed in the RTCP session information, i.e. an approximate membership list with some attributes per member. 6.2. Tightly coupled Conferences Tightly coupled conferences may also be multicast based and use RTP and RTCP, but in addition they have an explicit conference membership mechanism and may have an explicit conference control mechanism that provides facilities such as floor control. At the time of writing, no standard mechanism for performing tightly coupled conference control currently exists in the Internet community. Another standards body, the ITU, has defined two standards that can be used in the Internet: - The T.120 series of recommendations includes a centralized conference control protocol currently used for data application only, T.124 [13]. - Recommendation H.323 for Multi-Media Conferences for Packet- _________________________ [3] There is some confusion on the term session, which is some- times used for a conference and sometimes for a related set of me- dia streams transported by RTP and perceived as a unit, e.g., the audio channel in a conference. In this document, we prefer to use the less ambiguous term conference except where existing protocols use the term session. Handley/Crowcroft/Bormann/Ott [Page 12] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 based Network Environments [14] specifies a point-to-point channel setup protocol [15] that also covers a few multipoint conferencing aspects. As T.124 is not accepted by the industry as a basis for audiovisual conference control on one hand and H.245 does not provide distributed control for tightly coupled conferences on the other hand, there is no obvious choice. The Simple Conference Control Protocol (SCCP) [16] is being developed as a prototype towards providing this kind of control (being a shared state application, SCCP could also benefit from developments in the area of reliable multicast). A future distributed conference control protocol could be used as the distributed control mode envisioned by H.323 (which has not yet been addressed by the ITU). 7. Conference Discovery There are two basic forms of conference discovery mechanism. These are session advertisement and session invitation. Session advertisements are provided using a session directory, and inviting a user to join a session is provided using a session invitation protocol. 7.1. Session Directories The rendezvous mechanism for light-weight sessions is a multicast based session directory. This distributes session descriptions [17] to all the potential session participants. These session descriptions provide an advertisement that the session will exist, and also provide sufficient information including multicast addresses, ports, media formats and session times so that a receiver of the session description can join the session. The protocol SDP (session description protocol) describes contents and format of the session descriptions. As dynamic multicast address allocation can be optimised by knowing which addresses are in use at which times, the session directory is an appropriate agent to perform multicast address allocation. SAP (session announcement protocol) is the protocol used by the session directory agents [18]. This mechanism can also be applied to advertised tightly coupled sessions, and only requires that additional information about the mechanism to use to join the session is given. 7.2. Session Invitation Handley/Crowcroft/Bormann/Ott [Page 13] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 Not all sessions are advertised, and even those that are advertised may require a mechanism to explicitly invite a user to join a session. Such a mechanism is required regardless of whether the session is a lightweight session or a more tightly coupled session, although the invitation system must specify the mechanism to be used to join the session. As users are mobile, it is important that such a mechanism is capable of locating and inviting a user in a location independent manner. This requires an extra level of indirection (addressing). The invitation mechanism should also provide for alternative responses, such as leaving a message or being referred to another user, should the invited user be unavailable. Based on a protocol with many of the properties required [19], a session initiation protocol (SIP) is being developed [20]. 8. Security There is a temptation to believe that multicast is inherently less private than unicast communication since the traffic visits so many more places in the network. In fact, this is not the case except with broadcast and prune type multicast routing protocols. However, IP multicast does make it simple for a host to anonymously join a multicast group and receive traffic destined to that group without the other senders' and receivers' knowledge. If the application requirement (conference policy) is to communicate between some defined set of users, then strict privacy can only be enforced in any case through adequate end-to-end encryption. RTP specifies a standard way to encrypt RTP and RTCP packets using private key encryption schemes such as DES [21]. It also specifies a standard mechanism to manipulate plain text keys using MD5 [22] so that the resulting bit string can be used as a DES key. This allows simple out-of-band mechanisms such as privacy-enhanced mail to be used for encryption key exchange. 8.1. Authentication and Key Distribution Key distribution is closely tied to authentication. Conference or session directory keys can be securely distributed using public-key cryptography on a one-to-one basis (by email, a directory service, or by an explicit conference setup mechanism), but this is only as good as the certification mechanism used to certify that a key given by a user is the correct public key for that user. Such certification mechanisms [23] are not specific to conferencing, and no standard mechanisms are currently in use for conferencing purposes other than PEM [24]. At the time of writing, no standard mechanisms for key distribution Handley/Crowcroft/Bormann/Ott [Page 14] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 are defined for the conference setup and control protocols in use. Even without privacy requirements in the conference policy, strong authentication of a user is required if making a network reservation results in usage based billing. 8.2. Encrypted Session Announcements Session Directories can make encrypted session announcements using private key encryption, and carry the encryption keys to be used for each of the conference media streams in the session. Whilst this does not solve the key distribution problem, it does allow a single conference to be announced more than once to more than one key-group, where each group holds a different session directory key, so that the two groups can be brought together into a single conference without having to know each other's keys. 9. Summary This document is an attempt to gather together in one place the set of assumptions behind the design of the Internet Multimedia Conferencing architecture, and the services that are provided to support it. 10. Acknowledgments and Authors' Addresses Acknowledgments are due to the End-to-End Research Group, the Int- serv, RSVP, MMUSIC and AVT working groups of the IETF, and discussion with colleagues at UCL. The earliest clear exposition of the ideas here can be found at http://www-mice.cs.ucl.ac.uk/mice-old/van/ and was presented at ACM SIGCOMM 1994 in London by Van Jacobson. Mark Handley (fix me) Email: mjh@isi.edu Jon Crowcroft Department of Computer Science University College London Gower Street, London WC1E 6BT UK fax +44 171 387 1397 Email: m.handley@cs.ucl.ac.uk, j.crowcroft@cs.ucl.ac.uk Web: http://www.cs.ucl.ac.uk/index.html Handley/Crowcroft/Bormann/Ott [Page 15] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 Carsten Bormann Universitaet Bremen Postfach 330440 D-28334 Bremen GERMANY fax +49 421 218-7000 Email: cabo@tzi.org Web: http://www-rn.informatik.uni-bremen.de References 1. ITU, Recommendation H.320. 2. S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C.-G. Liu, and L. Wei, An Architecture for Wide Area Multicast Routing, 24, pp. 126-135, ACM SIGCOMM, October 1994. 3. S. Deering, C. Partridge, and D. Waitzman, "Distance Vector Multicast Routing Protocol," RFC 1075, November 1988. 4. A. Ballardie, P. Francis, and J. Crowcroft, An Architecture for Scalable Inter-Domain Multicast Routing, pp. 85-95, ACM SIGCOMM, 1993. 5. J. Moy, "Multicast Extensions to OSPF," RFC 1584, March 1994. 6. S. Deering, Multicast Routing in Internetworks and Extended LANs, pp. 55-64, ACM SIGCOMM, August 1988. 7. S. Deering, "Host Extensions for IP Multicasting," RFC 1112, August 1989. 8. C. Bormann, "Providing integrated services over low-bitrate links," Internet-Draft draft-ietf-issll-isslow-02.txt, Work in Progress, May 1997. 9. H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications," RFC 1889. 10. V. Jacobson, Congestion Avoidance and Control, ACM SIGCOMM, August 1988. 11. H. Schulzrinne, A. Rao, and R. Lanphier, "Real-Time Stream Control Protocol (RTSP)," Internet-Draft draft-ietf-mmusic- rtsp-0x.txt, Work in Progress, (fix me). 12. S. Floyd, V. Jacobson, S. McCanne, C.-G. Liu, and L. Zhang, A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing, pp. 342-356, ACM SIGCOMM, 1995. Handley/Crowcroft/Bormann/Ott [Page 16] INTERNET-DRAThe Internet Multimedia Conferencing Architecture July 1997 13. ITU-T, Recommendation T.124 -- Generic Conference Control. 14. ITU-T, Recommendation H.323 -- Multi-Media Conferences for Packet-based Network Environments. 15. ITU-T, Recommendation H.245. 16. C. Bormann, J. Ott, and C. Reichert, "Simple Conference Control Protocol," Internet-Draft draft-ietf-mmusic-sccp-0x.txt, Work in Progress, (fix me). 17. M. Handley and V. Jacobson, "SDP: Session Description Protocol," Internet-Draft draft-ietf-mmusic-sdp-0x.txt, Work in Progress, (fix me).. 18. M. Handley and V. Jacobson, "SAP: Session Announcement Protocol," Internet-Draft draft-ietf-mmusic-sap-0x.txt, Work in Progress, (fix me).. 19. H. Schulzrinne, "Personal Mobility for Multimedia Services in the Internet," IMDS'96, March 1996.. 20. M. Handley, H. Schulzrinne, and E. Schooler, "SIP: Session Initiation Protocol," Internet-Draft draft-ietf-mmusic- sip-0x.txt, Work in Progress, (fix me).. 21. National Institute of Standards and Technology (NIST), FIPS Publication 46-1: Data Encryption Standard, January 1988. 22. R. Rivest, "The MD5 Message-Digest Algorithm," RFC 1321, April 1992. 23. CCITT, Recommendation X.509: The Directory -- Authentication Framework, 1988.. 24. J. Linn, "Privacy Enhancement for Internet Electronic Mail: Part I: Message Encryption and Authentication Procedures," RFC 1421, Feb 1993. Handley/Crowcroft/Bormann/Ott [Page 17]