Internet Engineering Task Force MMUSIC WG Internet Draft Philippe Gentric, Philips Electronics February 2003 expires August 2003 draft-gentric-mmusic-stream-switching-00.txt RTSP Stream Switching STATUS OF THIS MEMO This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt To view the list Internet-Draft Shadow Directories, see http://www.ietf.org/shadow.html. Abstract Stream switching is a technique used to change the data rate of a media being streamed, typically for the purpose of adaptation to the effectively available bandwidth of the network. A backward compatible and independent RTSP "SWITCH" command is proposed in order to enable RTSP-based stream switching. Gentric [page 1] Internet Draft RTSP Stream Switching February 2003 1. Introduction Stream switching is a technique used to change the data rate of a media being streamed, typically for the purpose of adaptation to the effectively available bandwidth of the network. The aim is that a real time streaming system can switch from stream to stream in order to vary the data rate. This requires that the same content is encoded as multiple streams at various bit rates. This memo specifies an independent and backward compatible RTSP extension enabling RTSP servers and clients to support stream switching. Section 2,3 and 4 provide a detailed analysis of the problem, section 5 is devoted to the proposed solution, section 6 list open issues, section 7 is for security considerations. 1.1 Typical usage context The typical scenario is video distributed on demand, also known as "Video On Demand" (VOD). The situation is depicted in figure 1. This is the domain of RTSP [RTSP] servers. HTTP is typically used for the service/application i.e. provides the entry point, usually a RTSP URL. The media can be pre-recorded on file or can be a "live" source in which case the RTSP/RTP server acts as a relay. ***************** ***************** * * HTTP * * * HTTP Server * <------------------> * HTTP Client * * * * * ***************** ***************** ***************** ***************** * * RTSP * * * RTSP Server * <------------------> * RTSP Client * * * * * ***************** ***************** ***************** ***************** * * RTP on UDP UC * * Gentric [page 2] Internet Draft RTSP Stream Switching February 2003 * RTP Sender * -------------------> * RTP Receiver * * * * * media * * RTCP SR * * on --> * * -------------------> * * file * * * * or * * RTCP feedback * * live * * <------------------- * * ***************** ***************** Figure 1: video on demand 1.2 Rate control issues The typical usage of stream switching is for adaptation to the effectively available end-to-end bandwidth i.e. rate (or congestion) control. Specifically the streaming system (i.e. sender and receiver), upon detection of variations in the effective bandwidth changes the end-to-end data rate. This document does not address rate control algorithms i.e. the way to compute a target bit rate based on some measurement of the network. Actually rate control algorithms are an orthogonal aspect of the problem addressed here which is the signalization required inside a streaming system in order to perform switching. It is assumed that specifications such as TFRC [TFRC] or work in that area (see [Widmer], [Vojnovic], [Bansal]) should be used to deal with this issue; there is a need to adapt the algorithms due to the limited granularity of available rates when using stream switching; we have experimental evidence that these issues are manageable. 1.3 Content negotiation issues Stream switching requires specific content negotiation taking into account the possibility to change the configuration during the session. Stream switching is actually bridging the gap between "traditional" rate control (i.e. in TCP quasi-continuous changes in the data rate) and "traditional" content negotiation where sessions are negotiated for a constant data rate. 1.4 Seamless switching Gentric [page 3] Internet Draft RTSP Stream Switching February 2003 Seamless stream switching is obtained when the switch is performed in such a fashion that media playback is minimally disturbed. A counter-example is when traditional content negotiation is used: then after a given data rate is negotiated the session is started and when it is obvious that the chosen data rate is not acceptable (usually because it is too high) a new session is re- negotiated. The switch is not seamless because it takes time to tear down the session and re-initialize another one. Therefore seamless stream switching consists in preparing a set of sessions (or a set of configurations within a session) and providing a fast signaling mechanism so that the switch is effectively instantaneous. A side effect of seamless switching (which is a "user" requirement) is to minimally disturb the network i.e. it is a well known congestion control issue that when congestion occurs the fastest the response the minimal the perturbation will be. 1.5 Motivation for standardization Media delivery technologies are based on the availability of extremely optimized constant bit rate encoders. On the other hand IP networks that are being deployed for consumer access do not have stable end-to-end bandwidth. These two facts cause major problems for operators wishing to deploy "best-effort" streaming services in a robust fashion, the current status-quo being that: . On one hand it is assumed that once a program is requested by a client it will cause the server to send data toward the client at the "nominal" constant rate, regardless of the state of the network on the path from the server to the client. . On the other hand there is a common assumption (see for example [3GPP-BWS]) that stream switching is the way to improve this situation; for that reason it is implemented and deployed in non inter-operable ways by many vendors. It is desirable to change this status-quo for several reasons. Firstly the "constant rate method" causes problems whenever a hop in the path from sender to receiver experiences congestion; These problems are routing buffer overflows (and more specifically for wireless networks: base station buffer overflows) and perceptible Gentric [page 4] Internet Draft RTSP Stream Switching February 2003 artifacts during playback, creating 3 populations of unhappy people: network managers, end users watching the video and end users for other traffic traveling on the same path! Secondly the control algorithms for proprietary stream switching type of traffic are not publicly specified regarding congestion control and therefore the deployment on large scales (i.e. comparable to TV broadcast scales) of streaming services is not proven to be safe in terms of network pathologies. Thirdly the various parties involved in streaming media commercial deployments i.e. content providers, network operators and various technology providers -being aware of these issues- are stalling, thereby compromising the immediate future of media distribution and as a direct consequence of various new consumer high data rate network deployments in both the wired and wireless domains. Note that wireless network technologies are especially sensitive to these issues because of the inherent variability of the radio bandwidth, which has triggered attention and efforts in 3GPP (see for example [3GPP-alt-attr] and [3GPP-BWS]) on these issues. In conclusion there is a pressing demand for inter-operable solutions and also a demand for solutions that -being standard- have a better defined and/or more transparent behavior. In short the goals of a standard framework for stream switching would be: . To enable the emergence of more advanced inter-operable streaming products. . To enable the advent of technical and/or commercial specifications for streaming products and services that have a well characterized behavior regarding bandwidth management. 1.6 Requirements The key requirement is that the user experience should be the best possible which means that switching must be seamless. This requirement implies very specific timing constraints on the way the switch is performed, typically the sender should stop sending one stream and start sending the other stream at exactly the same media time and same real time, otherwise problems will occur in terms of buffer at the receiver side, since on the client side Gentric [page 5] Internet Draft RTSP Stream Switching February 2003 the buffers must maintain a stable amount of media time (a media decoder is paced in terms of media time i.e. 1 second of media is decoded in 1 second of elapsed time). Specifically in stream switching the challenge is to avoid buffer underflows where the decoder pauses playback and displays the infamous "re-buffering" message. The simple consequence is that the streaming data source must be informed as soon as possible that it needs to change its output rate, otherwise it will keep on sending at the same excessive rate, which will result in: . Filling up more buffers in the network devices upstream from the limiting hop, thereby amplifying the congestion. . If one or more network devices arrive to the point of saturation this will cause losses not only in the media stream in question but also in other traffic (constant data rate UDP traffic is known to be "aggressive" in this context since TCP traffic will automatically fall back). . It will delay the instant when the new stream would reach the decoder, thereby increasing the chances that the decoder actually runs its buffer down to underflow. The ability to be able to increase the rate -when the available bandwidth increases and with due care to congestion control- is also a requirement; it has much less stringent technical implications. Actually having a really seamless switch is then possible in all cases. 1.7 Vocabulary We define a "program" as a set of "tracks", for example a movie is composed of an audio and a video track. We define a "stream" as an encoded instance of a track, for example the video track of a movie may be encoded at 50kb/s, 150 kb/s and 400 kb/s using respectively H263 baseline SQCIF 7.5 fps, MPEG-4 SP@L3 QCIF 15 fps and MPEG-4 ASP@L3 CIF 30 fps, the audio track may be encoded at 5 kb/s, 20 kb/s, 48 kb/s and 80 kb/s using respectively AMR, AMR WB, AAC mono and AAC stereo. We define one "flavor" of a program as a given set of streams (a pair for a movie, usually consisting in audio and video), for example 400 kb/s video and 80 kb/s AAC is the high quality flavor in the example above for which we have 12 different flavors (but some flavors may not always make sense). Gentric [page 6] Internet Draft RTSP Stream Switching February 2003 We define a "switch-set" as the set of all the streams for a track or a program. A switch-set can be organized either as ordered first by track or first by flavor. Obviously switch-sets are prepared during the content production or deployment phase. 2. Seamless stream switching technical issues 2.1 Configuration issues When streams are switched there are 2 fundamental cases: either the streaming configuration changes or it does not change. 2.1.1 No configuration change The case when the streaming configuration does not change is the most simple. This case can also be described as "nothing changes but the bit rate". Many codecs actually support this "natively"; for example video codecs and recent speech codecs (AMR, EVRC) as well as recent music codecs (AAC in some modes) have the property that they can decode instantly-variable-bit-rate streams. One important thing to note is that in RTP terms the Payload Type remains the same. Specifically the RTP session remains the same. For these reasons this mode is also called "client-transparent" since (in theory) the source can switch without forewarning the client. Care must be taken however that some player implementations may actually be sensitive to sudden bit rate changes, or may prefer to be warned/notified about them. 2.1.2 Configuration change The second case, when the streaming configuration changes or even when the codec itself changes is more complex because: . In the general case one has to assume that the client has to instantiate (or invoke) 2 (or more) completely different (hardware and/or software) codecs, rendering systems and network reception stacks (or at least different payload processors). Obviously this may involve substantial processing and/or buffering resources. These are implementation details out of the scope of this memo, however an important rule derives from this: Gentric [page 7] Internet Draft RTSP Stream Switching February 2003 servers MUST NOT switch streams involving a codec configuration change but upon reception of an explicit request from the receiver or with an explicit prior agreement. Also authentication should be used for these requests (see the security consideration section). . In the general case feeding a codec with a stream for a different codec, or a different configuration can crash a decoder. Therefore there must be a error-proof way to signal the change at the packet or encoded frame granularity. Fortunately RTP does have such a capability with the Payload Type field. . In the general case changing codec (from example from AMR to AAC) also involves changing the RTP payload format. Fortunately this is also covered by the RTP Payload Type field. The important thing to note is that in SDP and RTP terms the Payload Type has to be different for these type of configurations. This is called "client non-transparent" stream switching. 2.1.3 Mixed transparency configurations It is typical that a given switch-set mixes both client non- transparent" and "client transparent" modes. One could wish that the client-transparent mode would be "enough"... however "client-transparent" switches usually do not cover as wide a bandwidth range as "client non-transparent" ones due to the bit rate range of each specific codec. For example a service deployed for CD-quality music using stereo AAC cannot go below 32 kb/s in the client-transparent mode because AAC does not go below this bit rate. On the other hand a client non-transparent switch involving a speech codec (say AMR) enables to define "fall back" streams with as little as 4 kb/s. 2.2 Codec access points For all streams it is possible to "switch out" at any point, however some streams (video is typical) cannot be "switched in" at any points, typically these codecs have several types of frame regarding random access. Some are full random access points (typically I frames, or S frames for recent codecs such as H264) others have other types of partial random access points frames such as frames mixing I macro-blocks and P macro-blocks etc. Gentric [page 8] Internet Draft RTSP Stream Switching February 2003 From the point of view of the decoder these can be seen as implementation details i.e. if a server switches at a non random access point the client should be able to detect it and act in relevance with its capability to handle it. Indeed from the decoder point of view a stream switch on a non-random-access- point is similar to receiving packets after a loss. It could be useful however that a client could indicate to the server that it prefers a switch at a random access point. 2.3 The control issue The first key question is to understand if the decision to switch is taken by the receiver or by the sender. 2.3.1 Server initiated switch Server initiated switch has 4 major advantages . It can be made to work in a similar fashion for all scenarios. . It resembles more the TCP situation improving the chances that some of the considerable know-how acquired with TCP in terms of congestion/rate control can be reused. . It makes more sense to have other source of information about the status of the network than the receiver(s). For example routers on the path may be able to issue congestion notifications much earlier than if one must wait for the perturbation to reach the final destination and feedback signals to travel back (see also [TRIGTRAN]). . It allows one very simple "catastrophe prevention" mechanism: Since the sender does not need to warn the receiver before switching the sender can decide to switch down when feedback from the receiver has not been received for a given amount of time (TRFC uses in the order of 4 RTTs). As has been discussed above the server can decide to switch without telling the receiver only in 2 cases: . if the decoder configuration does not change. In the context of SDP/RTP this means that the Payload Type must not change. For this type of configuration the existing set of IETF specifications is usable in terms of session description and management, specifically "normal" RTCP can be used to send Gentric [page 9] Internet Draft RTSP Stream Switching February 2003 feedback and it can be seen as a server implementation issue that the server decides to switch based on client RTCP feedback. There could be a need to document this, maybe not as a standard specification but surely as "practice" in inter-operability forums. . if there was a prior agreement. In the context of SDP/RTP this means that the client has "instantiated" several "stacks" (one for each flavor of each stream) and is ready to receive data on each of these channel (by channel one means that either or both the destination UDP port and the Payload Type differ). This means that possibly substantial resources must be pre-allocated on the receiver side. This is wasteful in case the network behaves so that the session runs entirely with the initial streams or uses only a fraction of these resources. Obviously a little signaling could help here... Note also that although these streams have different Payload Types this signaling may not be early enough... 2.3.2 Player initiated switch The player can also initiate the switch and using RTSP is the obvious choice. We will see however that the existing RTSP specification needs to be extended in order to provide seamless stream switching. 3. Description of the switch-set Clearly there is a need to convey a description of the switch-set to the client. There are several ways to perform that, that we will describe now. 3.1 SDP Description of the switch-set One way to describe the alternative flavors for each stream composing a program is to list them using SDP, an example of one such description is given in Appendix. For client initiated switches there is a need to convey the bandwidth of a stream, but this is already available. Otherwise the exact SDP syntax to use in order to describe that streams are alternatives of a given track (media) is debatable; SDP has several extensions that can be considered [grouping], also new extensions are a possibility, 3GGP has specified one such SDP syntax for its Release 6 (see [3GPP-alt-attr]). Gentric [page 10] Internet Draft RTSP Stream Switching February 2003 3.2 SMIL Description of the switch-set SMIL is a scene description language [SMIL]. In SMIL the "switch" element allows an author to specify a set of alternative elements from which only the first acceptable element is chosen. Actually the SMIL specification specifies that the bit rate is one typical thing that would change among streams in a switch element. In short the SMIL element "switch" provides a standard way to declare to the client all the possible "flavors" of each stream. However SMIL 2.0 supports only parse-time evaluation i.e. it basically assumes that the evaluation of which stream to use is done once. Furthermore even when dynamic re-evaluation will be specified in future versions, SMIL will typically not specify how the switching should be performed. In conclusion the SMIL switch syntax element is a building block that could very nicely complement an IETF specification of how to perform stream switching at the transport (and transport control) level. 3.3 MPEG-4 system Description of the switch-set MPEG-4 [MPEG-4] provides a way to describe alternative streams. However since this type of manipulation would be performed from the context of a terminal implementing the MPEG-4 system specification it is a priori out of the scope of this memo. 4. Switching control 4.1 Switching by changing the RTSP session One way to perform stream switching is to use RTSP TEARDOWN in order to destroy the session and then restart another one. Unfortunately this method involves several round trips which will typically cause playback to stop, in short it is practically impossible to make it seamless. For that reason this method - although "it works"- will not be discussed further. 4.2 Switching within the same RTSP session One way to perform switching at the session level is to enable the definition of a "switchable session" i.e. an extended session that is negotiated as containing all alternative streams from the very start. Gentric [page 11] Internet Draft RTSP Stream Switching February 2003 Using RTSP has the following advantages: . The method is completely independent of the codec capabilities. . It directly provides both content and capability negotiation as well as control. . It inherits all RTSP (and therefore HTTP) security features. 4.3 Switching using RTSP PLAY/PAUSE The usage of PLAY/PAUSE command for stream switching would be as follows: At the time of session negotiation the client and server prepare to stream all the variants in the switch-set but PAUSE all streams except one per media type. Switching is performed by issuing simultaneously a PAUSE command on the stream being switched out and an PLAY command on the stream being switched in. Unfortunately doing that involves a trick where the client must specify the pause point (see the RTSP PAUSE specification for detail [RTSP]). But then finding out the appropriate time to use as "pause point" is not a trivial issue at all. For this reason this method cannot be used either. 4.4 Switching using RTSP MUTE/UNMUTE An extension to RTSP called MUTE/UNMUTE has been proposed [RTSP- MUTE] . It defines MUTE and UNMUTE as 2 additional optional RTSP commands. MUTE enables a client to request the server to stop sending data for a given stream and in this respect is similar to PAUSE. However UNMUTE requests the server to resume sending data, not at the point in media where MUTE was issued, but at a point of time synchronous with the media streams that were being still streamed. The usage of this command for stream switching would be as follows: at the time of session negotiation the client and server prepare to stream all the variants in the switch-set but MUTE all streams except one per media type. Switching is performed by issuing simultaneously a MUTE command on the stream being switched out and an UNMUTE command on the stream being switched in. The drawback is that for each "atomic" switch two commands have to be issued. Gentric [page 12] Internet Draft RTSP Stream Switching February 2003 Also this does not cover the need for additional signalization as detailed above. 4.5 Switching using RTSP SET_PARAMETER SET_PARAMETER and even OPTIONS has been evoked as candidates for client-initiated stream switching (see [3GPP-BWS]). A possible syntax would be: C->S: SET_PARAMETER rtsp://foo/twister/audio1 RTSP/1.0 CSeq: 421 Content-length: xx Content-type: application/stream-switching Replace-with: rtsp://foo/twister/audio2 S->C: RTSP/1.0 200 OK CSeq: 421 The motivation is that SET_PARAMETER has been designed to provide some type of extensibility to RTSP, the drawback however is that it is not an explicit command. Also this does not cover the need for additional signalization as detailed above. 5. Proposed specification The proposal is to introduce new RTSP Methods specifically for stream switching. As indicated in [RTSP section 1.5] the advantage of a new Method by comparison with extending an existing method is that a component that does not know the new method will reply with "501 not implemented" which makes backward compatibility issues easy to solve. Furthermore there is a need for additional Header- fields as described below that are best introduced for new Methods. Also it is desirable that this specification should be as independent as possible of the RTSP specification and of its evolutions (with a required side effect of having backward compatibility with [RTSP]). For that reason this memo defines stream switching primitives that are orthogonal to the rest of RTSP in terms of state machine Gentric [page 13] Internet Draft RTSP Stream Switching February 2003 and signaling. This specification does not modify the syntax or semantic of any RTSP Method or Headers and the stream switching state machine is defined as being "inside" each state of the RTSP state machines in both the client and server. For example streams can be switched during a PAUSE as well as during a PLAY, etc. (Note that there is an exception to that principle for SWITCHCLOSE issued on a playing stream, see below) All the stream switching methods are OPTIONAL but it is RECOMMENDED to implement all of them. For example the attention of the implementer is attracted on the usefulness of SWITCHCLOSE. 5.1 SWITCHSETUP 5.1.1 SWITCHSETUP rationale Introducing SWITCHSETUP is better than re-using SETUP in the respect that it is explicitly for stream switching purposes. It is also highly desirable that a stream-switching enabled player can connect to a "old" RTSP server (that does not implement stream switching). Therefore it is desirable that the behavior of existing servers is fully defined. For that reason SWITCHSETUP is useful in the respect that an "old" server will refuse it, clearly indicating to the client that it does not support stream switching. In this case SDP files describing switch-sets can also be used with "old" servers. 5.1.2 SWITCHSETUP specification The SWITCHSETUP Method is similar to SETUP except that it explicitly tells the server that the corresponding stream is part of a switch-set. For maximum backward compatibility a client MUST use SETUP for the primary streams and SWITCHSETUP for the alternative streams. This way a server that does not support stream switching will reply "501" to SWITCHSETUP but will SETUP the primary streams (a possible alternative -if SETUP was used for all streams- being a server allocating a lot of resources for a function that it cannot perform!). SWITCHSETUP may be issued at anytime during a RTSP session. SWITCHSETUP issued on a playing stream is similar to SETUP. Gentric [page 14] Internet Draft RTSP Stream Switching February 2003 5.1.3 SWITCHSETUP "Switch-control" header field The SWITCHSETUP Method has an OPTIONAL header field: "Switch- control" The Switch-control request-header field can be used to specify to the server how the client supports stream switching control. The values below are mutually exclusive. "Switch-control=client-initiated-only": Tells the server that it MUST NOT switch on its own but only upon reception of a client- to-server SWITCH command. This is relevant for any type of switch, including client-transparent switches. "Switch-control=non-transparent-client-initiated-only": Tells the server that it MUST NOT switch on its own but only upon reception of a client-to-server SWITCH command for non-client-transparent switches. Specifically the server CAN switch on its own for client-transparent switches. This is the default i.e. a server MUST assume this value for absent or malformed Switch-control header fields. "Switch-control=server-initiated-ok": Tells the server that it CAN switch on its own without warning the client first for all types of switches (i.e. the client has allocated all the necessary resources). "Switch-control=forewarning: 2000": Tells the server that it CAN switch on its own but that then it MUST warn the client by using a SWITCHSIGNAL (see below) and that this forewarning MUST be sent at least 2000 milliseconds before the server performs the switch. This is relevant for any type of switch, including client- transparent switches. "Switch-control=non-transparent-forewarning: 2000": Tells the server that it CAN switch on its own but that for non-transparent switches it MUST warn the client by using a SWITCHSIGNAL (see below) and that this forewarning MUST be sent at least 2000 milliseconds before the server performs the switch. 5.1.4 SWITCHSETUP "RAP" header field The SWITCHSETUP Method has an OPTIONAL header field: "RAP" The RAP request-header field can be used to specify to the server Gentric [page 15] Internet Draft RTSP Stream Switching February 2003 how the client supports stream switching regarding Random Access Points. The values below are mutually exclusive. "RAP=RAP-only": Tells the server that it MUST switch only on Random Access Point (in the "new" stream). For SWITCH requests corresponding to drastic (more than 50%) rate reduction i.e. in case rapid action against congestion is preferable to smoother playback, servers MUST then interrupt the on-going stream immediately and restart streaming at the next available RAP in the new stream (which effectively creates a gap in the stream). "RAP=indifferent": Tells the server that it CAN switch at any point (in the new stream). This is the default i.e. a server SHOULD assume this value for absent or malformed RAP header fields. "RAP=if-before:300": Tells the server that it SHOULD wait to switch on a Random Access Point (in the new stream) unless such a point is not available in less than 300 milliseconds of Normal Play Time, in which case the server MAY switch at any point. Servers MUST ignore this recommendation for SWITCH requests corresponding to drastic (more than 50%) rate reduction i.e. in case rapid action against congestion is preferable to smoother playback. 5.2 SWITCH The "SWITCH" Method is an OPTIONAL atomic command from the client to the server requesting the server to switch from one stream to another. The stream to switch off is indicated as a parameter of the Method. The stream to switch on is indicated with the Header Field "Replace-with" as shown in the example below: C->S: SWITCH rtsp://foo/twister/audio1 RTSP/1.0 CSeq: 421 Replace-with: rtsp://foo/twister/audio2 S->C: RTSP/1.0 200 OK CSeq: 421 Range: smpte=0:10:22-;time=19970123T153600Z RTP-Info: url=rtsp://foo/twister/audio2; seq=12312232;rtptime=78712811 Gentric [page 16] Internet Draft RTSP Stream Switching February 2003 See the Appendix for a fully detailed example. The "Replace-with" Header Field may be absent or empty signaling that the target stream should be stopped with no replacement, a symmetric SWITCH with an empty target can be used to restore the corresponding track (this is useful in order to temporarily suppress the video in order to reach a very low bit rate for example with a news service on a mobile device, in that case SWITCH is equivalent to the MUTE command of [RTSP-MUTE]. SWITCH requests MAY be issued at any time during a RTSP session (including before the acknowledgement of a previous request is received). When receiving several SWITCH requests a server SHOULD ignore/abandon the oldest ones. In all cases a server MUST execute as fast as possible requests producing a smaller data rate (the smallest if several requests are pending). A server MAY delay or deny the execution of requests corresponding to higher data rates, for example if it has reached its maximum capacity. A server SHOULD NOT deny SWITCH request for smaller rates. The server response to a SWITCH from a player SHOULD contain the same information as the answer to PLAY. Note for example that the use of RTP-info as in the above example allows instantaneous lip- sync (the alternative being that the player must wait for the RTCP Sender Report) and also may help the receiver to identify the exact packet corresponding to the new stream (especially in client-transparent cases), which in turn is useful for resetting traffic monitoring computations, etc. 5.3 SWITCHSIGNAL As its name hints, SWITCHSIGNAL is a "signal" rather than a command. SWITCHSIGNAL is an OPTIONAL server signal to the client that a switch will soon be (or is being) performed. The stream to be switched off is indicated as a parameter of the Method. The stream to be switched on is indicated in RTP-Info as shown in the example below: S->C: SWITCHSIGNAL rtsp://foo/twister/audio1 RTSP/1.0 CSeq: 4213 Range: smpte=0:10:22-;time=19970123T153600Z RTP-Info: url=rtsp://foo/twister/audio2; seq=12312232;rtptime=78712811 C->S: RTSP/1.0 200 OK CSeq: 4213 Gentric [page 17] Internet Draft RTSP Stream Switching February 2003 It is RECOMMENDED that the server SHOULD issue SWITCHSIGNAL as soon as possible before the actual switch and adds all possible information in it (range, RTP-info etc) as in response to PLAY. For the client-transparent case SWITCHSIGNAL is normally not necessary for the correct behavior of the streaming system but client may register the need to receive such notification (see SWITCHSETUP above). For the non-client-transparent case the server MUST respect the instructions provided by the client in the SWITCHSETUP commands about the need to issue SWITCHSIGNAL since -unless "Switch- control=server-initiated-ok" was explicitly signaled- a server- initiated switch without forewarning would typically cause the client to produce degraded playback or can even crash it. 5.4 SWITCHCLOSE 5.4.1 SWITCHCLOSE rationale It is highly desirable that a stream-switching enabled player can free non-used resources in order to allocate other resources. A typical example is a session nominally at 10 Mb/s for which a large number of alternative streams are available (say 50 different bit rates all the way from high quality HDTV with 5+1 music down to stamp-sized video with mono speech "backup" configuration). In such a case a typical usage would be that the client would SWITCHSETUP only a few alternatives (say 8 Mb/s, 5 Mb/s, 1 Mb/s) which could involve a substantial amount of memory in case these configurations are supported using different codecs, etc. If the network condition degrades catastrophically this player may need to allocate other resources in order to switch to lower bit rates. In this case it would be highly valuable that it can free (some of) the resources corresponding to the highest bit rates. It is also highly desirable that a server can free resources implicitly allocated after accepting a SWITCHSETUP (including for DOS resistance); but then it is very useful to tell the player that hypothetical corresponding SWITCH requests would be denied. Gentric [page 18] Internet Draft RTSP Stream Switching February 2003 5.4.2 SWITCHCLOSE specification SWITCHCLOSE tears down the resources corresponding to a given SWITCHSETUP identified by the (same) target URL (as used in SWITCHSETUP). SWITCHCLOSE is OPTIONAL. SWITCHCLOSE can be issued by a server or by a client. SWITCHCLOSE MAY be issued at anytime during a RTSP session. SWITCHCLOSE issued on a playing stream causes the corresponding track to be stopped i.e. only a PLAY can restore this track and a SWITCHSETUP is required to restore the stream as a possible future alternative. A player SHOULD NOT issue SWITCHCLOSE on a playing stream, PAUSE or SWITCH SHOULD be first issued for that stream. However SWITCHCLOSE MAY be used by a server on a playing stream in order to signal that this stream is been terminated and will not be resumed unless the client takes explicit action. Example: C->S: SWITCHCLOSE rtsp://foo/twister/audio1 RTSP/1.0 CSeq: 42134 S->C: RTSP/1.0 200 OK CSeq: 42134 5.5 SDP rules A SDP describing a switch-set MUST use different (dynamic) Payload Type for streams that are not client-transparent switchable. A SDP describing a switch-set MAY use identical (dynamic) Payload Type for streams that are client-transparent switchable. A SDP describing a switch-set MAY use identical port numbers for streams that are client-transparent switchable. 6. Open issues 6.1 SDP issues Is there a need for additional SDP syntax and/or rules to describe the switch-set? (or is the example in Appendix OK?) Gentric [page 19] Internet Draft RTSP Stream Switching February 2003 Should it be actually RECOMMENDED (or even a MUST?) to reuse the same (dynamic) payload type for alternate streams of the "client- transparent" type? 6.2 Other issues Status codes: additional status codes may be necessary(?). For example when switching has not been performed because a more recent request arrived...or because max capacity is reached? Stream Switching should work for RTP interleaved inside RTSP? Is there an alternative to doing one SETUP per alternate stream? Would it be worth the trouble to define a specific syntax? In the client-transparent mode assuming neither the payload type nor the port number change it should not be necessary to make one SETUP per stream (right?), shall it be documented/mandated? Are there specific firewall/proxy considerations? 6.3 UDP transport of switching command It is a good idea to also provide a UDP based command. The key motivation of doing that is that UDP feedback may be faster and as mentioned earlier speed is a key factor for optimal congestion control as well as switch seamless-ness. Should this be done using an RTCP extension? Or use "rtspu"? (but isn't rtspu going to be dropped?) For security (UDP being easier to spoof than TCP?) this could be restricted to "down" switch since for congestion control purposes there is never any hurry to switch up? could be restricted to the client-transparent case? 6.4 Independence with RTSP parallel evolution There is a possible exception to that for SWITCHCLOSE issued on a playing stream. But it looks like a very logical one? 7. Security considerations The security issues associated with stream switching are those inherent to the usage of RTP and RTSP plus: Gentric [page 20] Internet Draft RTSP Stream Switching February 2003 7.1 Induced server misbehavior The following threats can be identified: . Causing the server to allocate a lot of resources (in making ready for supporting switching for a large switch-set). Note however that a server can deny SWITCHSETUP requests using for example "503 Service Unavailable" (temporary) or "416 Requested Range Not Satisfiable" (permanent) and can issue SWITCHCLOSE at anytime. Also the server is often the source of the SDP (via DESCRIBE) and therefore has opportunities there to reduce the diversity. . Causing the server to switch up toward high bit rate streams can create large amounts of network traffic. Note however that the typical usage of stream switching is anyway to deploy the service with the maximum bit rate as a primary target...With stream switching streaming servers would actually become bandwidth control tools for operators. . Causing the server to switch down toward low bit rates causes a degraded service. . Causing the server to frequently switch is a source of degraded service but is also a Denial Of Service Attack in the sense that it would typically cause the server to consume substantial resources in switching, thereby reducing the service capacity for example by reducing the maximum number of concurrent streams that the server can serve or the maximum total throughput of the server, etc. The defense of a server is probably to refuse too frequent switches and especially upward switches... These threats are fended off by applying authentication to the stream switching control messages. RFC2326 section 16 provides guidance on how to perform that with RTSP. Also server implementations SHOULD include configurable limitations such as a maximum number of switches per amount of time per media track, a maximum number of alternate streams per client, etc. 7.2 Induced client misbehavior One threat is that a server could cause the receivers to misbehave (or crash) for example if the data sent is encoded with a different decoder configuration than the one the player was initialized with. Gentric [page 21] Internet Draft RTSP Stream Switching February 2003 For that reason this specification makes special care that server-initiated switches are possible only for agreed upon streams (using SWITCHSETUP) and either for client-transparent switches (and a client can disable these anyway) or in conditions specified by the client with "safe" defaults. 8. Acknowledgements The author wishes to thank Alain Teil, Kamal Rada, Yves Ramanzin and Nicolas Delahaye for all the fruitful discussions and comments. 9. References [Widmer] A survey on TCP-Friendly Congestion Control, J. Widmer, R. Denda, M. Mauve, IEEE Network May-June 2001, http://www.informatik.uni- mannheim.de/informatik/pi4/publications/library/Widmer2001a.pdf [Vojnovic] One the long-run behavior of equation-based rate control, M. Vojnovic, J.Y. Le Boudec, Proceedings of SIGCOMM'02, August 19-23 2002, Pittsburg, Pensylvania, USA, http://www.acm.org/sigcomm/sigcomm2002/papers/equation.pdf [Bansal] Dynamic Behavior of Slowly-Responsive Congestion Control Algorithms, D. Bansal, H. Balakrishnan, S. Floyd, S. Shenker, Proceedings of SIGCOMM'01, August 27-31 2001, San Diego, California, USA, http://www.acm.org/sigcomm/sigcomm2001/p21- bansal.pdf [RTP] http://www.ietf.org/rfc/RFC1889.txt [RTSP] http://www.ietf.org/rfc/RFC2326.txt [HTTP] http://www.ietf.org/rfc/RFC2616.txt [grouping] http://www.ietf.org/rfc/RFC3388.txt [TFRC] http://www.ietf.org/rfc/RFC3448.txt [SMIL] http://www.w3.org/TR/smil20/cover.html [MPEG-4] http://mpeg.telecomitalialab.com/standards/mpeg- 4/mpeg-4.htm [3GPP-alt-attr] Gentric [page 22] Internet Draft RTSP Stream Switching February 2003 http://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_22/Docs/S4- 020407.zip [3GPP-BWS] http://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_25/Docs/S4- 030024.zip [RTSP-MUTE] http://www.ietf.org/internet-drafts/draft- sergent-rtsp-mute-00.txt [TRIGTRAN] http://www.ietf.org/internet-drafts/draft- dawkins-trigtran-probstmt-00.txt 9. Authors' Addresse Philippe Gentric Philips MP4Net 51 rue Carnot 92156 Suresnes France e-mail: philippe.gentric@philips.com Appendix: Detailed example C->S: DESCRIBE rtsp://foo/twister RTSP/1.0 CSeq: 1 Server replies with the full content description: there are 3 video streams, at 200 kb/s, 100 kb/s and 50 kb/s there are 3 audio streams, at 20 kb/s, 10 kb/s and 5 kb/s NB: the example is invalid in the respect that normally it would require more detail such as decoder configurations which are omitted for the sake of simplicity ... S->C: RTSP/1.0 200 OK CSeq: 1 Content-Type: application/sdp Content-Length: xxx v=0 o=- 2890844256 2890842807 IN IP4 172.16.2.93 s=RTSP Session i=An Example of RTSP Session Usage for Stream Switching Gentric [page 23] Internet Draft RTSP Stream Switching February 2003 a=control:rtsp://foo/twister t=0 0 m=video 7722 RTP/AVP 96 a=rtpmap:96 MP4V-ES/1000 a=control:rtsp://foo/twister/video1 b=AS:200 m=audio 7724 RTP/AVP 97 a=rtpmap:97 mpeg4-generic/44100/2 a=control:rtsp://foo/twister/audio1 b=AS:20 m=video 7726 RTP/AVP 98 a=rtpmap:98 MP4V-ES/1000 a=control:rtsp://foo/twister/video2 b=AS:100 m=audio 7724 RTP/AVP 99 a=rtpmap:99 mpeg4-generic/44100/2 a=control:rtsp://foo/twister/audio2 b=AS:10 m=video 7726 RTP/AVP 100 a=rtpmap:100 MP4V-ES/1000 a=control:rtsp://foo/twister/video3 b=AS:50 m=audio 7724 RTP/AVP 101 a=rtpmap:101 mpeg4-generic/44100/2 a=fmtp:101 streamtype=5; profile-level-id=15; mode=AAC-hbr a=control:rtsp://foo/twister/audio3 b=AS:5 The second set is SETUP where client and server agree on the transport parameters (UDP port numbers etc). Note that the client waits for the reply to the first SETUP in order to have the session number and then sends all the SWITCHSETUPs in rapid succession so that this operation takes approximately 2 round trips independently of the number of streams. In this example different UDP ports are used but the same port could also be reused since by rule the switch is either performed on streams that are of the client-transparent type or that have a different payload type. Note the "Switch-control=client-initiated-only" header field Gentric [page 24] Internet Draft RTSP Stream Switching February 2003 which signals to the server that it MUST NOT switch on its own but only upon reception of a SWITCH command. C->S: SETUP rtsp://foo/twister/audio1 RTSP/1.0 CSeq: 2 Transport: RTP/AVP;unicast;client_port=8000-8001 S->C: RTSP/1.0 200 OK CSeq: 2 Transport: RTP/AVP;unicast;client_port=8000-8001; server_port=9000-9001 Session: 12345678 C->S: SETUP rtsp://foo/twister/video1 RTSP/1.0 CSeq: 3 Transport: RTP/AVP;unicast;client_port=8002-8003 Session: 12345678 C->S: SWITCHSETUP rtsp://foo/twister/audio2 RTSP/1.0 CSeq: 4 Transport: RTP/AVP;unicast;client_port=8004-8005 Session: 12345678 Switch-control=client-initiated-only C->S: SWITCHSETUP rtsp://foo/twister/video2 RTSP/1.0 CSeq: 5 Transport: RTP/AVP;unicast;client_port=8006-8007 Session: 12345678 Switch-control=client-initiated-only C->S: SWITCHSETUP rtsp://foo/twister/audio3 RTSP/1.0 CSeq: 6 Transport: RTP/AVP;unicast;client_port=8008-8009 Session: 12345678 Switch-control=client-initiated-only C->S: SWITCHSETUP rtsp://foo/twister/video3 RTSP/1.0 CSeq: 7 Transport: RTP/AVP;unicast;client_port=8010-8011 Session: 12345678 Switch-control=client-initiated-only S->C: RTSP/1.0 200 OK CSeq: 3 Transport: RTP/AVP;unicast;client_port=8002-8003; server_port=9004-9005 Session: 12345678 Gentric [page 25] Internet Draft RTSP Stream Switching February 2003 S->C: RTSP/1.0 200 OK CSeq: 4 Transport: RTP/AVP;unicast;client_port=8004-8005; server_port=9006-9007 Session: 12345678 S->C: RTSP/1.0 200 OK CSeq: 5 Transport: RTP/AVP;unicast;client_port=8006-8007; server_port=9008-9009 Session: 12345678 S->C: RTSP/1.0 200 OK CSeq: 6 Transport: RTP/AVP;unicast;client_port=8008-8009; server_port=9010-9011 Session: 12345678 S->C: RTSP/1.0 200 OK CSeq: 7 Transport: RTP/AVP;unicast;client_port=8010-8011; server_port=9012-9013 Session: 12345678 Then the client decides to start streaming the "default" configuration at 220 kb/s (note that an non-agregate play would also be possibility) C->S: PLAY rtsp://foo/twister RTSP/1.0 CSeq: 8 Range: npt=0- Session: 12345678 S->C: RTSP/1.0 200 OK CSeq: 8 Session: 12345678 Then the client decides to switch streaming from 220 kb/s to 210 kb/s by switching audio streams C->S: SWITCH rtsp://foo/twister/audio1 RTSP/1.0 CSeq: 9 Session: 12345678 Replace-with: rtsp://foo/twister/audio2 S->C: RTSP/1.0 200 OK Gentric [page 26] Internet Draft RTSP Stream Switching February 2003 CSeq: 9 Session: 12345678 Then the client decides to switch streaming from 210 kb/s to 60 kb/s by switching video streams C->S: SWITCH rtsp://foo/twister/video1 RTSP/1.0 CSeq: 10 Session: 12345678 Replace-with: rtsp://foo/twister/video3 S->C: RTSP/1.0 200 OK CSeq: 10 Session: 12345678