Internet Engineering Task Force Rajesh Kumar Internet Draft Cisco Systems Document: March 26, 2002 Category: Informational Expires: September 26, 2002 Generic Use and RTP payload for State Signaling Events (SSEs) Status of this Document This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as work in progress. The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract......................................................................2 1 Introduction................................................................2 1.1 Terminology...............................................................2 1.2 Motivation for an RTP-based State Signaling Mechanism.....................2 1.3 Applicability to Fax Relay................................................2 1.4 Scope and Limitation of State Signaling Events (SSEs).....................3 2 Definition of Media States..................................................4 3 RTP Packet Format for State Signaling Events................................6 3.1 Use of RTP Header Fields..................................................6 3.2 Payload Format............................................................6 4 Reliability.................................................................8 4.1 Use of simple packet repetition...........................................8 4.2 Use of RFC 2198-based redundancy..........................................8 5 Indication of Receiver Capabilities using SDP...............................9 6 State Signaling Event Definitions..........................................11 6.1 Protocol Extension Mechanism.............................................11 6.2 Initial List of State Signaling Events...................................11 6.3 SSE Protocol Operation...................................................12 6.3.1 SSE Generation Rules...................................................12 6.3.2 Media State Transition Rules........................................13 6.3.3 Protocol Error Recovery................................................14 R.Kumar Informational 1 RTP State Signaling Events February 2002 6.4 SSE Cause Codes..........................................................15 7 Examples of the use of State Signaling Event Messages......................16 8 Justification for State Signaling Events...................................18 8.1 A Precedent for In-band State Signaling (AAL2 type 3 packets)............18 8.2 Comparison of SSEs with other alternatives...............................18 9 Proposed MIME Registration.................................................20 10 Security Considerations...................................................21 11 IANA Considerations.......................................................22 12 Acknowledgements.........................................................22 13 AuthorÆs Address..........................................................22 14 References...............................................................23 Abstract This document defines a mechanism to signal state changes using RTP packets called State Signaling Events (SSEs). The proposed MIME type is "audio/sse". SSE messages signal a media state which has no specified duration. In general, media state changes result in a two-way synchronization handshake via SSEs. 1 Introduction 1.1 Terminology In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119 [7] and indicate requirement levels for compliant implementations. 1.2 Motivation for an RTP-based State Signaling Mechanism This signaling mechanism (SSEs) was initially proposed to the TIA [4] and subsequently used by the ITU SG16 as a key element of on-going MoIP standardization work [Q11, Ref. 14]. This mechanism was initially meant to address a need for fast media state synchronization in V.MoIP. Without such fast synchronization, high speed modem operation is not possible. This is also true for high-speed, V.34 faxes. 1.3 Applicability to Fax Relay We recommend that applications that implement both MoIP and FoIP also support this mechanism for group 3 fax relay in addition to V.34 fax relay and modem relay. When such a mechanism is mandated on the basis of one set of drivers (e.g. MoIP and V.34 fax), it is cost-effective to make it available as a common synchronization mechanism for other contexts such as group 3 fax relay. For a call that supports a range of capabilities, we deem it unwise to mandate the use of external signaling control (e.g. SIP or call agent) for group 3 fax relay, while mandating fast synchronization for modem relay. This is does not diminish the utility of external signaling mechanisms for media changes in call contexts that support a group 3 fax capability, but not modem relay or V.34 fax capabilities. R.Kumar Informational û Expires August 2002 2 RTP State Signaling Events February 2002 1.4 Scope and Limitation of State Signaling Events (SSEs) State Signaling Events are RTP-encoded event messages that coordinate switches between different media states as defined in Section . The combination of a port and an RTP payload type defines an SSE stream. An SSE stream governs: * a flow consisting of multiple media streams. When such a flow is defined and contains an SSE stream [19], then the SSE stream governs all the remaining media streams in that flow. These streams may span several ports. * all media streams/ports explicitly associated with the MIME definition at call establishment, if the explicit association is elected. * the media stream associated with the port to which the SSEs are sent, if the SSEs are not part of a flow, or are explicitly associated with ports via the MIME definition. To multiplex the RTP-based SSEs with media packets, it is necessary for the media packets to be RTP, or a non-RTP media format that has a payload type discriminator in the same position as RTP. To permit proper identification, the payload type values associated with the non-RTP media packets SHOULD NOT overlap with values assigned to RTP encodings, including SSEs. An example of such a non-RTP encoding is SPRT [14] proposed for modem relay. The use of the SSE mechanism to switch from RTP media to the use of non-RTP media (such as UDPTL-encapsulated ITU T.38 Group 3 fax relay [10]) on the same or different UDP port is possible. If the SSEs share a port with the T.38 media stream, then they cannot be use used to synchronize the switchback since SSE messages cannot be interleaved with UDPTL packets. This is not a significant drawback, since the end of T.38 fax relay can be detected independently at each end of an fax relay session, and can result in a pre- determined action such as switchback to voice, or session termination for signaled (non-provisioned) sessions. The SSE Protocol Operation (Section ) is described for bidirectional RTP/RTP- compatible media streams or groups of media streams between two session users. One transmitter and receiver is associated with each session user. A session user, or endpoint, is located in a "gateway" that converts between an IP media stream and a TDM or analog signal. In multimedia applications, video media use a different port from audio media. No SSEs have been defined for use with video media, but such extensions are not precluded by this document (Section ). The application of the SSE protocol to multicast media streams is an object of further study. R.Kumar Informational û Expires August 2002 3 RTP State Signaling Events February 2002 2 Definition of Media States For the purposes of this document, "media state" is defined in terms of the ultimate use of the media. The definition is similar, though not identical, to that of the MIME media type registered by IANA [15] and used as the "media type" parameter in SDP [2]. As a background, some of the media types described in [2] and [15] are: * Audio. This includes voice and voiceband data content. * Video. * Data. * Application * Control. * Image. * Multipart. * Text. * Message. * Model. The current document defines the following values of the media state parameter: * Audio. This refers to the content intended for the human ear, rather than modulated data. Note the subtle difference from the MIME media type "audio". * Voiceband data. This refers to data modulated as a voiceband signal. This data could be modem or facsimile data. Using the definition of media type [2 & 15], this is subsumed under "audio". * Fax Relay. No further restriction is placed: the fax signal could be group 3, V.34 etc., and could be encapsulated in T.38 UDPTL packets, or in RTP packets (currently under study). Using the definition of media type [15], the media state "Fax Relay" is subsumed under the value "image". * Modem Relay. Although the current MoIP work [14] does not use RTP, the definition of media state "Modem Relay" in this document does not place this restriction. In the current modem relay definition, a baseband (unmodulated) data signal is encapsulated within an SPRT (Simple Packet Relay Transport). Although SPRT is not RTP, it has a payload type field in the same position as RTP. Further, for media R.Kumar Informational û Expires August 2002 4 RTP State Signaling Events February 2002 streams that could be switched between RTP and SPRT, SPRT MUST NOT use payload types assigned to RTP encodings. Using the definition of media type [15], the media state "Modem Relay" is subsumed under the value "application". * Text Relay. This media stream is a simple sequence of text characters [18]. This is primarily used in TDD (Telecommunications Device for the Disabled) applications. No media state is currently defined for n x 64 clear channel data, commensurate with the user state "circuit" defined for circuit emulation in AAL2 [5]. There are no current RTP encodings or on-going standardization activities in this area. Such encodings would typically be used for unmodulated, baseband data streams such as those produced by an ISDN end- subscriber equipment. A basic requirement is bit integrity of the composite n x 64 kbps, precluding the use of robbed-bit CAS signaling, and TDM line codes that violate bit-level transparency to achieve clock density. Other basic requirements are low, controlled latencies and error rates, objectives that are not characteristic of most IP networks of today. Media states are represented numerically (Section ). The State Signaling Event (SSE) protocol defined in this document is used, by compliant implementations, to synchronize shifts between these media states, and other media states which MAY be defined in the future (Section ). Apart from the high-level definitions in this section, this document does not intend to detail the range of media properties associated with each media state. These properties, which are negotiated as active or latent capabilities at the time of session establishment [2 & 16], SHOULD be found in the existing and planned ITU, IETF etc. standards specifications which define VoIP [12], Voiceband Data [10, 14, 17], modem relay [14] and fax relay [10] applications. The following observations can, however, be made with respect to these media states: * The audio media state includes the entire range of audio codecs that conform to the RTP/AVP profile [12]. * The voiceband data media state is used, to transmit modem and fax signals. It is associated with a unique encoding name, "vbd", which is dynamically mapped into one or more RTP payload types [17]. As described in [17], each instance of the "vbd" payload type MAY be further associated with format-specific parameters which define the underlying audio encoding. Since voiceband data has a much lower distortion tolerance, not all audio encodings are suitable for use with voiceband data. Some examples of encodings suitable for voiceband data are PCMU, PCMA, G726-40 and G726-32. Voiceband data requires the absence of any DC removal filters in the encoding algorithm. (PCM and ADPCM do not have any.) Silence suppression and voice packet loss concealment algorithms, which work well with voice, cannot be used with voiceband data, which requires a continuous carrier signal. Unlike voice, voiceband data is found to work best with fixed jitter buffers rather than adaptive jitter buffers. R.Kumar Informational û Expires August 2002 5 RTP State Signaling Events February 2002 * The common fax relay format is per the ITU T.38 specification, and it uses non-RTP, UDPTL packets. * The common modem relay format is the non-RTP, SPRT format currently under study [14]. When an SSE payload type governs more than one port, then the media state is associated with the entire set of ports. In this case, the media properties described on the different ('m=') lines need not have resources allocated to them until activation of the corresponding local media state (Section ). 3 RTP Packet Format for State Signaling Events The payload format for State Signaling Events described below is suitable for both end-to-end and gateway contexts. The generation of State Signaling Events by an endpoint has no impact on the simultaneous generation of other RTP payload formats such a audio [12], telephone events and telephone tones [1], or of RTP-compatible formats such as SPRT [14]. If on the same port, it is distinguished from these other formats by its unique, dynamic payload type. The RTP payload format for State Signaling Events is designated as "sse". The MIME type is designated as "audio/sse", pending registration [Section ]. In conformance with the Internet Protocol, all fields are carried in network byte order, that is, most significant byte (octet) first. Within a byte, the most significant bit is transmitted first. This byte order is commonly known as big-endian. In this specification, bytes and bits shown on the left are more significant. 3.1 Use of RTP Header Fields SSRC: State Signaling Events MAY use one of the SSRC values sent to one of the port(s) they govern. Alternately, the use of a unique SSRC dedicated to SSEs is not precluded. In either case, SSEs qualify all media stream(s) sent to the port(s) they qualify. Timestamp: The RTP timestamp reflects when a local decision is made to issue the SSE message, in response to either an internal event or a received SSE message. It serves to distinguish between consecutive SSE messages sent by an entity. Marker bit: Since SSEs have no duration, the marker field is a don't care. For SSEs, receivers SHALL ignore the RTP marker bit. Transmitters SHOULD set it to 1. Parenthetically, for audio media, a marker value of 1 can indicate the beginning of a talkspurt. For RFC 2833 telephone events, it can indicate the beginning of a new event rather than the continuation of a previous event. 3.2 Payload Format R.Kumar Informational û Expires August 2002 6 RTP State Signaling Events February 2002 The payload format for State Signaling Events without payload extension (extension bit set to 0) is shown in Fig. 1. The payload format for State Signaling Events with payload extension (extension bit set to 1) is shown in Fig. 2. All fields, except the extension length and extension information fields, are the same as in Fig. 1. In this case, it is possible to attach a variable number of extension bits at the end of the payload. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | event |E|X|P| cause code| cause code information | | | | |P| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1: Payload Format for State Signaling Events without Payload Extension event: The events are encoded as shown in Section . The number of bits available for representing state signaling events is 8. Extensions of event numbering into the area of reserved bits is possible, but unnecessary at present. E: "End" bit. Although this field is a donÆt care for State Signaling Events, it is preferable to always set to 1 in all State Signaling Event messages, including all redundant copies. Receivers SHOULD tolerate and ignore either value of this bit. X: "Extension" bit. If set to 0, then the SSE payload does not have a payload extension at the end (Figure 1). Otherwise, there is a payload extension (Figure 2). PP: "Priority/Precedence" bit. A value of zero indicates normal priority/precedence. A value of one indicates a high priority/ precedence. cause code: This is an six-bit code that indicates the rationale for sending the SSE message. The rationale could be a local event detection, a a received SSE message, or a combination of local events and received SSE messages. A value of all zeros is a null cause code, indicating non-communication of the rationale for sending an SSE message. The receiver MAY then use a default value if it needs to. Distinct events MAY share cause codes, or MAY have cause codes that are unique to the event. However, note that the cause code space for each event is distinct. cause code information: This 15-bit field is used to provide additional information associated with the cause code. For instance, if the cause code refers to the modem CM (call menu) signal, this field MAY be used to indicate the actual CM bits. If the cause code is null, then this field is always null regardless of its value. Also, a value of all zeros indicates a null cause code information field. R.Kumar Informational û Expires August 2002 7 RTP State Signaling Events February 2002 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | event |E|X|P| cause code| cause code information | | | | |P| | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | extension length |extension info (0- 65,536 bits)| | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: Payload Format for State Signaling Events with Payload Extension extension length: This 16-bit field is used to indicate the number of extension bits after it. Obviously, a value of all zeros in the extension length field, although permissible, is not useful. This field exists only if the extension (X) bit is 1. extension information: This field consists of a variable number of bits, indicated by the extension length field. The extension length and extension information are meant to allow future extensions of the SSE protocol. 4 Reliability 4.1 Use of simple packet repetition Redundant transmission of State Signaling Event messages is RECOMMENDED. A receiver acts on the first redundant message it receives. Except for incrementing the sequence number count, it ignores the remaining redundant SSE messages. Although they have different sequence numbers, redundant SSE messages have identical timestamps. Along the lines of AAL2 type 3 user state control messages [5], SSE packets SHOULD be repeated three times, at 20 ms intervals. The number of repetitions and the repetition interval can be a provisionable parameter, with 3 times and 20 ms as the default. Identical provisioning of different nodes in a network is desirable, but not necessary. 4.2 Use of RFC 2198-based redundancy It is possible to combine the SSE payload with other RTP payloads, including itself, within an RFC 2198 payload [9] in a manner similar to that for Named Telephone Events [1]. When this is done, the association between the different constituents of an RFC 2198 payload MUST be defined at session establishment using SDP or another protocol such as H.245. Simple repetitive redundancy and RFC 2198 redundancy SHOULD NOT be simultaneously used for SSEs. R.Kumar Informational û Expires August 2002 8 RTP State Signaling Events February 2002 An implementation need not support RFC 2198 encapsulation of SSEs in order to be compliant with this document. Further, it must be noted that there is little bandwidth incentive to use RFC 2198 encapsulation in conjunction with State Signaling Events, since they are relatively infrequent. 5 Indication of Receiver Capabilities using SDP 5.1 Indication of the Support of Media Properties Properties associated with a media state MUST be advertised via SDP (or a similar protocol such as H.245) before an SSE is used to indicate it. It is not the intention of this document to specify how SDP MAY be used to describe the properties of audio, voiceband data, fax relay or modem relay media. The basics of audio data media description are in RFC 2327 [2]. Other documents expand on these requirements, such the SDP description of one specific redundancy method in RFC 2198 [9]. The SDP description of Fax Relay sessions is found in Annex D of ITU T.38 [10]. Recently, work has been done towards the explicit designation of media encoding formats for Voiceband Data [17] and Modem Relay [4 & 14]. Work has also been done towards the description of media capabilities, some of which MAY be latent, in SDP [16]. Work on equivalent H.245 descriptions of these media capabilities has been done, or is in progress. 5.2 Indication of SSE Support Indication of SSE support via SDP is similar to the indication of telephone event support via SDP [1]. This document is limited to describing the indication of SSE support in SDP. Work is being done in ITU SG16 to formulate H.245 descriptions of SSE support. For a given session, the dynamic payload type for State Signaling Events is different from the dynamic payload type for rfc2833 Network Telephone Events. Since these represent distinct capabilities, assigning different MIME types to telephone events and state signaling events permits separate definition and possible extension of these mechanisms. An RTP endpoint advertising support for telephone-events need not advertise support for state signaling events. To indicate support for State Signaling Events, the receiver MUST associate a dynamic payload type with the encoding name æsseÆ corresponding to the proposed MIME type audio/sse. This is done via the ærtpmapÆ attribute: a=rtpmap: sse/ Although the clock rate is not very useful for SSEs, it must be included for compatibility with RFC 2327 [2]. It does no harm to set it to 8000 in all instances, regardless of the possible timestamp rate(s) of the media stream being qualified by the State Signaling Events. The following example, a=rtpmap:97 sse/8000 R.Kumar Informational û Expires August 2002 9 RTP State Signaling Events February 2002 indicates that an RTP payload type of 97 is associated with State Signaling Events. Receivers indicate which State Signaling Events they can receive through an æfmtpÆ format line: a=fmtp: The parameter is the dynamically assigned RTP payload type in the 'rtpmap' attribute. The list of values consists of comma-separated elements, which can be either a single decimal number or two decimal numbers separated by a hyphen (dash), where the second number is larger than the first. No whitespace is allowed between numbers or hyphens. The list does not have to be sorted. If a receiver does not enumerate the events it supports, none may be assumed by default. As a result of this omission, the SSE payload type declaration, although syntactically valid, is not very useful. Receivers MAY indicate whether cause codes (Section ) MAY be sent for any of the events supported: a=fmtp: sseCauseCodeEnable= If omitted, then a transmitter MAY not assume their support, unless it is aware of their support by other means (e.g. hardcoding, provisioning etc.). Receivers MAY indicate the port numbers governed by an SSE stream. Recall that an SSE stream is a combination of port number and SSE payload type. The SSEscope attribute MAY be used to bind this stream to a set of ports, which may or may not include the port on which the SSEs are received. If omitted, then an SSE payload type value governs the port or flow [19] on which it is received. The SSEscope attribute is expressed as: a=fmtp: SSEscope= ... where the port numbers are assigned to media streams within the session. The following example, a=rtpmap:97 sse/8000 a=fmtp:97 192,194,200,203 indicates a willingness to receive State Signaling Events 192, 194, 200 and 203. No willingness to receive cause codes with any State Signaling Event is indicated; however, this may be inferred by default or by provisioning. The meaning of these events is defined later. Contrariwise, a=rtpmap:97 sse/8000 a=fmtp:97 192,194,200,203 R.Kumar Informational û Expires August 2002 10 RTP State Signaling Events February 2002 a=fmtp:97 sseCauseCodeEnable=yes explicitly indicates a willingness to receive State Signaling Events 192, 194, 200 and 203 along with all cause codes defined for these SSEs in standards documents. The lines a=rtpmap:97 sse/8000 a=fmtp:97 192,194,200,203 a=fmtp:97 sseCauseCodeEnable=yes a=fmtp:97 SSEscope=49230 49238 49375 indicate that SSEs with payload type 97 govern all media streams sent to ports 49230, 49238 and 49375. None of these need be the port to which the SSEs are sent. The following sample media type definition corresponds to the SDP example above: audio/sse;events="192,194,200,203"; sseCauseCodeEnable="yes";SSEscope="49230 49238 49375" An implementation SHOULD NOT transmit a state signaling event whose support has not been explicitly indicated, via SDP [2] or an equivalent ITU H.245 [8] mechanism, by the receiver. If such a state signaling event is received, it MAY be ignored. 6 State Signaling Event Definitions 6.1 Protocol Extension Mechanism New State Signaling Events MAY be defined. It is REQUIRED that the assignment of State Signaling Event numbers be controlled by the IANA (Section ). 6.2 Initial List of State Signaling Events The number of available encodings for event numbers is 256 (range 0-255). Event Encoding (Decimal) --------------------------------------------------------------------- SSE: VBD (Media state == voiceband data) 192 SSE: audio (Media state == audio) 194 SSE: FR (Media state == fax relay) 200 SSE: MR (Media state == modem relay) 203 SSE: TR (Media state == text relay) 210 Table 1: State Signaling Events for Media State Change R.Kumar Informational û Expires August 2002 11 RTP State Signaling Events February 2002 The media states in are defined in Section . Each SSE is an indication of status. By sending a specific SSE, a gateway or endpoint is indicating its local media state with respect to a specific media stream (port). 6.3 SSE Protocol Operation At present, the SSE protocol does not apply to multicast media streams (Section ). For a media stream (port) or set of media streams (ports or 'flow') governed by an SSE payload type value, the "local" media state represents the view of the gateway or endpoint regarding the media state. The "remote" media state represents the corresponding view of the remote endpoint, communicated via SSEs. The "SSE protocol state" for the port or ports is a pair consisting of the local and remote media states. The local state (represented by S) can take on the following values (based on Section ): a: audio, not including voiceband data v: voiceband data f: fax relay (FR) m: modem relay (MR) t: text relay (TR). In addition to all of these values, the remote state (represented by S') can take on the following value: i: indeterminate. For the port(s) governed by an SSE payload type value, an SSE protocol state, P, that is a composite of the local and remote media states is defined. Expressed as a pair, P = (S, S'). On initialization, the audio media state is the "base" state of any SSE-driven media state machine. Thus, on initialization, S = S' = a and (S, S') = (a, a). 6.3.1 SSE Generation Rules Consider a change in the SSE protocol state from P1 = (S1, S1') to P2 = (S2, S2'), where one or both of the following propositions is true: S1 != S2 S1' != S2'. On any change in the SSE protocol state from P1 = (S1, S1') to P2 = (S2, S2'), an SSE indicating media state S2 SHALL be sent to the remote endpoint or gateway, except for case in which (S1' != S2') && (S1 == S2) && (S2 == S2'). In this last case, an SSE indicating media state S2 SHALL NOT be sent. In the last case, the SSE message indicates that the remote end has changed to a media state that is identical to the local media state, which is itself unchanged. This can only happen in response to a prior SSE sent to the remote end. To R.Kumar Informational û Expires August 2002 12 RTP State Signaling Events February 2002 send an SSE again with an identical media state is unnecessary and would introduce a three-way handshake into the protocol. In the context of protocol error recovery, the gateway or endpoint is permitted to re-send the audio SSE on timeout expiration (Section ), even though there has been no change in protocol state. It is not the intention of this document to delineate all the events that can cause the local media state, S1, to change. One such event is the receipt of an SSE (Section ). Other, local events are specific to MoIP, FoIP, ToIP applications, and are expected to be listed in standards or interoperability documents related to these applications. Such documents are expected to specify any permissible variability in setting the local media state in response to local triggers, and the means for maintaining viable operation in the face of such variability. For example, endpoints and gateways might be allowed to respond to a V.21 fax preamble by setting the local media state, S, to v (Voiceband Data) or f (fax relay). The declaration of the necessary media parameters and capabilities (via H.245 or SDP) at connection establishment, and the SSE-based coordination of the media state switch on detection of the V.21 fax preamble are necessary ingredients of ensuring interoperability between different gateway/endpoint designs, configurations etc. 6.3.2 Media State Transition Rules By definition, on receipt of a new SSE message, the remote media state, S', is set to the media state (a, v, f, m, t) indicated in the SSE. The setting of the local media state, S, on receipt of a new SSE message depends on a number of factors such as: * Permitted media states. See the list below. * Current resource availability. * Support of the media state by the design. * SSE priority/precedence (PP) value. Per section , PP = 0 or 1. The permitted local media states on receipt of a new SSE are: 1. If (S' == a), then S = a. 2. If (S' == v) && (PP==0), then S = a | v. 3. If (S' == f) && (PP==0), then S = a | v | f. 4. If (S' == m) && (PP==0), then S = a | v | m. 5. If (S' == t) && (PP==0), then S = a | v | t. 6. If (S' == v) && (PP==1), then S = v. R.Kumar Informational û Expires August 2002 13 RTP State Signaling Events February 2002 7. If (S' == f) && (PP==1), then S = f. 8. If (S' == m) && (PP==1), then S = m. 9. If (S' == t) && (PP==1), then S = t. These rules (1-9) limit the freedom of a compliant endpoint or gateway with respect to local media state change on receipt of an SSE message. Any contravention of these rules MUST be handled via the recovery procedures in Section . Particular implementations MAY further limit the range of values for the local media state in response to an SSE. Per rule 1, when an SSE indicating the audio media state is received, a compliant implementation MUST change the local media state (for the port(s) in question) to audio. There is no choice, since the audio media state is the "base" state of any SSE-driven media state machine. Rules 2-5 assume a priority/precedence value of 0 (normal). Rules 6-9 assume a priority/precedence value of 1 (high). Rule 2 allows the possibility of not changing to a VBD media state when the other side indicates that is has changed media state to VBD. Such a choice is application-specific. Rules 3, 4 and 5 allow the possibility of not setting the local media state to match the remote media state, and of, instead, selecting the audio or VBD as the local state. Again, such a choice is application-specific. When the priority/precedence indication (PP) is set to high, then compliant implementations SHALL set the local media state to match the remote media state. This is indicated in rules 6-9. An inability to comply SHALL result in invocation of the recovery procedures in Section . _ Note that there is no possibility of setting the local media state to, say, fax relay, when the remote end indicates, say, modem relay. If the gateway or endpoint is already has a local state, S1 = (f, m, t) and a received SSE indicates a remote state, S2' = (f, m, t) such that S1 != S2', then an out-of- context SSE has been received. It MUST be handled by error procedures [Section ]. 6.3.3 Protocol Error Recovery This section does not address additional steps that might be taken, such as logging and reporting alarm messages. The simplest "recovery" mechanism is to terminate the session (clear the call) when a protocol error occurs. Another, more complex recovery mechanism is to set both sides to the audio media state, which is the "base" , "ground" or "reset" state for the SSE protocol state machine. The choice can be R.Kumar Informational û Expires August 2002 14 RTP State Signaling Events February 2002 provisionable. If the two parties in a session are provisioned inconsistently, then session recovery SHALL result in session termination. In the case where recovery consists of consistently resetting the local media state on both sides to audio, it SHALL consist of the following sequence of actions: 1. Set S = a and S' = i (local media state set to audio, remote media state set to indeterminate). 2. Send an SSE indicating the audio state. This is to be repeated every T1 seconds (defined below) until S'=a. If S'!=a after N tries, then the session SHALL be terminated. The following is a list of conditions, not necessarily exhaustive, that SHALL trigger this recovery procedure: 1. Inability to comply with the rules of Section . This covers the receipt of out-of-context SSEs, and an inability to make one of the permitted local media state transitions for reasons such as changes in resource status. 2. If S != S' (local and remote media state not the same) for more than T2 seconds (defined below). 3. If the received payload type and/or packet format is inconsistent with the local media state, S1, for more than T2 seconds (defined below). Note that this allows asymmetrical payload types, but not asymmetrical media states. If at all there is such a need, a multimedia session is RECOMMENDED. The following is a list of timeouts associated with this recovery procedure: 1. T1. Repeat interval for audio SSEs used for resetting the SSE protocol. The suggested default is 1 seconds, this MAY be provisionable. This is to be distinguished from normal SSE redundancy (Section ). Note that each the SSE redundancy modes described in Section 4 MAY co-exist with the ones described in this section. The value of N (number of retries) SHALL be defaulted to 5. 2. T2. Transience interval for media states. This is the time interval for which an inconsistency in the local and remote media states is permitted. Theoretically, this can be made equal to the Round Trip Delay for SSEs, plus a margin for processing delays, delay fluctuations etc. In practice, it is not always possible to set this parameter separately for each possible session. A default value that is large enough for all connections MAY be chosen; it is suggested that this be 1 seconds. The timeouts, T1 and T2, and the retry count, N, MAY be provisionable. 6.4 SSE Cause Codes R.Kumar Informational û Expires August 2002 15 RTP State Signaling Events February 2002 SSE cause codes are to be defined per SSE event. Some SSE events MAY have no cause codes associated with them. The definition of SSE cause codes is not complete, and will be included in a later version of this internet draft. Examples of this on-going work are the draft definitions of MoIP-specific SSE cause codes currently being reviewed by the TIA TR-30.1 and ITU Q11/16 study groups. 7 Examples of the use of State Signaling Event Messages In this set of examples, entity 1, 2, 3 etc. are gateways/endpoints that share a VoIP RTP session and that use SSEs to communicate media state changes. These examples are meant to illustrate the use of State Signaling Event Messages in normal and error scenarios. Any reference to local event triggers is meant for illustrative purposes only, and it not intended to specify a right response to those triggers. This SHOULD be found in application-specific documents such as [14]. R.Kumar Informational û Expires August 2002 16 RTP State Signaling Events February 2002 Entity 1 Entity 2 | | S=a, S'=a S=a, S'=a | | | 2100 Hz tone detect | Resource check pass | S=v, S'=a | [1] SSE: VBD | +<----------------------------------------| S=a, S'=v | Resource check pass | S=v, S'=v | | [2] SSE: VBD | +---------------------------------------->| | | | S=v, S'=v | | | V.21 fax preamble detect | S=f, S'=v | [3] SSE: FR | +<----------------------------------------| | | S=v, S'=f | | | Resource check fail | S=a, S'=f | | [4] SSE: audio | +---------------------------------------->| | S=a, S'=a | | | [5] SSE: audio | +<----------------------------------------| S=a, S'=a | | | Figure 3: Examples of successful and failed media state change attempts In , both ends are in a protocol state (S, S') = (a, a) initially (Section ). Entity 2 detects a 2100 Hz ANS tone and sets its local media state, S = v. Since the local media state has changed to 'v', entity 2 sends SSE: VBD to entity 1. Entity 1 determines that it has the resources to do a media state shift to VBD. It sets the local media state to 'v'. It MUST now send SSE : VBD to entity 2. At this point, by the rules of Section , entity 2 need not send a further SSE: VBD to entity 1. Subsequently, entity 2 detects a V.21 fax preamble, changes its local media state to 'f' and sends SSE: FR to entity 1. Entity 1 determines that it does not have the resources to do a media shift to fax relay, and sets the local media state to audio. Per the rules of , it could have left it as voiceband data. However, this is not how this example progresses. Entity 1 sends an SSE: R.Kumar Informational û Expires August 2002 17 RTP State Signaling Events February 2002 audio to entity 2, which sets its media state to audio and responds with SSE: audio. If entity 1 does not receive SSE: audio within T1 seconds, it will resend SSE: audio to entity 2, for a total of N retries (Section ). Note that this document does not mandate a switch to a voiceband data media state on detection of a 2100 Hz tone. It merely illustrates the use of SSEs to synchronize this and other media state changes. In addition to the media state change, it is necessary to ensure that the 2100 Hz tone is transmitted end to end. Other means, used in addition to SSEs, MUST be used to accomplish this. These could be one of the following: RFC 2833 telephone events, RFC 2833 telephony tones, or in-band transmission via the voice codec. 8 Justification for State Signaling Events 8.1 A Precedent for In-band State Signaling (AAL2 type 3 packets) Like RTP sessions, AAL2 channels can be used to carry real-time media. Even though the out-of-band signaling method defined in ITU Q.2630.1 can be used to effect changes in media properties and states [6], it is supplemented with in- band signaling that is integrated into the media stream through use of the same channel identifier (CID). This signaling scheme uses "type 3" packets to signal, reject and accept state changes, and to perform other time-critical messaging such as relaying alarm conditions, controlling loopback, and synchronizing previously signalled changes in media ("Service-Specific Convergence Sublayer") attributes. Because of the dynamic nature of resource availability, unsynchronized, autonomous local switching between alternative media properties that have been agreed to a priori (via out-of-band signaling) is not workable. The type 3 packet-based, explicit, in-band synchronization method used does not incur the potential delays inherent in out-of-band signaling. AAL2 state change messages cover the following modes or "user" states: voice, voiceband data, circuit emulation and fax demodulation (fax relay). Reliable transmission of type 3 messages is ensured via a triple redundancy scheme, and CRC-10 protection of individual type 3 packets. 8.2 Comparison of SSEs with other alternatives In the case of real-time communications over IP, several alternatives have been suggested to SSEs for synchronizing media state changes. These were considered by the ITU SG16 and found inferior compared to the in-band, SSE signaling alternative. These options are: 1. Unsynchronized, on the fly-media state switches. If the capabilities associated with media states have been advertised at session establishment time, then both ends should be able to switch on the fly between these states. Although this is possible, this requires that both ends tie up the necessary resources to support all advertised capabilities. If they do not do so for the sake of efficiency, then there is a possibility of erroneous R.Kumar Informational û Expires August 2002 18 RTP State Signaling Events February 2002 operation due to resource starvation. This is specially true in large trunk gateways that handle several thousands of concurrent VoIP sessions, and use dynamic allocation of resources such as bandwidth and DSP power. 2. Use out-of-band signaling (such as a SIP invite, [3]) with a fresh session description each time a media property needs to be changed. In certain configurations, this method has the problem of unacceptable delay due to congestion on common signaling (SIP etc.) ports that handle multiple sessions. Note that in-band SSE signaling cannot supplant out-of-band signaling for changing media properties, which used an integral part of signaling protocols such as SIP, H.248 and H.323. Further, it must be noted that if media properties need to be changed with greater precision that what is afforded by SSEs, out-of-band signaling, which uses a wealth of SDP or H.245 constructs might be needed. If SSEs are available, then such changes based on out-of-band signaling can be made rare enough so as not burden signaling ports, agents etc. Since SSEs are bound to a session, they do not suffer the possible congestion and delays associated with common signaling paths. 3. Extension of the list of rfc2833 Named Telephone Events (NTEs) to include state signaling messages. Members of the IETF AVT study group have found this unacceptable since these RFC 2833 NTEs indicate the detection of stateless, raw stimuli such as tones and preamble bits, which need to be qualified with duration information. Further, it is not desirable to complicate RFC 2833 Named Telephone Event definition with a state transition protocol such as Section . Keeping the MIME media types for the two sets of events separate allows equipment to implement RFC 2833 telephone events but not State Signaling Events, or both, and to indicate the applicable protocol support at session establishment time. Also note that SSEs can never replace RFC 2833 telephone events, which perform the critical task of providing a "compression-friendly" means of transporting tones over an IP session. 4. Define a new RTCP message for synchronizing media state changes, and parameterize it with specific media state transitions, precedence values, cause code fields etc. This approach is rejected for the reason that RTCP is, in general, not meant to be an extensible protocol. All RTP-compliant sessions are required to support the full complement of RTCP messages. This is in contrast with the flexible definition of audio MIME types, and the mapping of these MIME types and related parameters into dynamic payload types and SDP attributes. It would be inconsistent with the current state of the protocol to detail RTCP capabilities, with a fine granularity, at the time of session establishment. On the other hand, there is no inconsistency in defining a new audio MIME type for State Signaling Event messages, and to allow receivers to flexibly indicate support for individual messages. Sessions lacking in adequate support for in-band state signaling must, of necessity, fall back to out-of-band signaling for synchronized changes in media state. R.Kumar Informational û Expires August 2002 19 RTP State Signaling Events February 2002 9 Proposed MIME Registration This section provides a MIME registration proposal for State Signaling Events. MIME media type name: audio MIME subtype name: sse Required parameters: events The "events" parameter lists the State Signaling Events supported by the implementation. Events are listed as one or more comma-separated elements. Each element can either be a single integer or two integers separated by a hyphen. The range of these integers is 0-255. No white space is allowed in the argument. The integers designate the State Signaling Event numbers supported by the implementation. Thus, events="192,194,200-202" indicates that the State Signaling Events supported by the implementation are 192, 194, 200, 201 and 202. The "events" parameter is represented in the Session Description Protocol (SDP, RFC 2327) by the "fmtp" attribute. When this is done, its value is represented in a format that is identical in format to the "events" MIME parameter. The encoding name used in conjunction with this MIME is "sse". Since the "events" parameter is mandatory with the MIME audio/sse, there is no default list of events that can be assumed when this MIME is supported for a session or connection. Optional parameters: sseCauseCodeEnable,SSEscope The "sseCauseCodeEnable" parameter can be assigned a value of "yes" and "no". A "yes" value implies that the events listed in the mandatory "events" parameter may be supplemented with cause code fields. All standard cause codes for the advertised events must be supported. A "no" value implies the absence of this information by setting these SSE fields to a null value. If "sseCauseCodeEnable" is omitted, then these fields are set to null. The sseCauseCodeEnable boolean may be represented within an SDP "fmtp" attribute. The "SSEscope" parameter is a list of ports. These are the ports governed by the SSE payload type in question. These may or may not include the port on which SSEs are received. If omitted, SSEs MAY only govern the media stream on the port to which they are received. Encoding considerations: The MIME audio/sse is only defined for transfer via the Real Time Protocol (RTP) as defined in RFC 1889. As such, it consists of packets of binary data. Security considerations: See the "Security Considerations" (Section ) section in this document. R.Kumar Informational û Expires August 2002 20 RTP State Signaling Events February 2002 Interoperability considerations: none. Published specification: This document. Applications which use this media: The MIME audio/sse supported state coordination between telephone systems over the Internet. Additional information: 1. Magic number(s): N/A 2. File extension(s): N/A 3. Macintosh file type code: N/A Author/Change controller : Rajesh Kumar rkumar@cisco.com Cisco Systems 170 West Tasman Drive San Jose, CA 95134-1706 10 Security Considerations RTP packets using the payload format used for State Signaling Events defined in this specification are subject to the security considerations discussed in the RTP specification (RFC 1889) and any appropriate RTP profile (for example RFC 1890). This implies that confidentiality of the media streams is achieved by encryption. Since there is no data compression of State Signaling Events, there is no potential conflict between compression and encryption. This payload type does not exhibit any significant non-uniformity in the receiver side computational complexity for packet processing to cause a potential denial-of-service threat. Any flooding of a receiver with State Signaling Events does not have an impact on unrelated sessions that is different from flooding a session with any other payload format (e.g. PCM or telephone event). Regardless of the payload type used for the attack, the remedy is to limit resource allocation to any and all RTP sessions, to authenticate session participants, and to log and penalize disruptive or illegal behavior. The MIME subtype contains no executable content that affects sessions or session legs other than the ones in which the requester is a participant. For these session legs, changes to media state are well within the normal prerogatives of the requester. The switchover from one media state to another may cause the use of a different media port. However, this does not pose a security hazard since this R.Kumar Informational û Expires August 2002 21 RTP State Signaling Events February 2002 switchover, which may be governed by SSEs, is limited to a set of ports assigned, at session establishment, via the media/flow/session description. SSEs cannot be used to specify arbitrary port changes. 11 IANA Considerations This document defines one new RTP payload formats, state signaling event, and the associated Internet media (MIME) type, audio/sse. The initial assignment of possible values to the parameter "events" will be based on the RFC that will come out of this internet draft. This document will contain a complete description of the non-null cause code field values associated with each event in this initial list of events. It will also describe the cause code information, if any, associated with each cause code. Within the audio/sse type, additional events MUST be registered with IANA. Registrations are subject to approval by the current chair of the IETF audio/video transport working group, or by an expert designated by the transport area director if the AVT group has closed. The meaning of new events MUST be documented either as an RFC or an equivalent standards document produced by another standardization body, such as ITU-T. The description of each new event must be accompanied with a complete description of any non-null cause codes and cause code information values associated with that event. Additions to the list of cause codes or cause code information values will require an amendment to the document defining the event. 12 Acknowledgements The author acknowledges the contributions of the TIA TR-30.1 and the ITU SG16 groups towards the development of the SSE protocol state machine. Special thanks are due to Hisham Abdelhamid, Flemming Andreasen, Bill Foster, Mehryar Garakani, Alex Urquizo and Herbert Wildfeur of Cisco Systems, and Michael Beadle of MindSpeed Technologies. The author acknowledges the work of Jim Renkel of 3COM Corporation in defining the SSE priority/precedence indication. The author gratefully acknowledges the assistance, contributions and constructive criticisms of the following individuals at Cisco towards the development of this signaling paradigm: Vasmi Abidi, Amit Agrawal, Flemming Andreasen, David Auerbach, Robert Biskner, Dan DeLiberato, Bill Foster, Sadanand Hegde, Dave Horwitz, Iftekhar Hussain, Gary Kelly, Michel Khouderchah, Mohamed Mostafa, David Oran, Joseph Swaminathan, Mike Thomas, Bruce Thompson and Sravan Vadlakonda. 13 AuthorÆs Address Rajesh Kumar Cisco Systems 170 West Tasman Drive R.Kumar Informational û Expires August 2002 22 RTP State Signaling Events February 2002 San Jose, CA 95134-1706 Email: rkumar@cisco.com 14 References [1] Schulzrinne, H. and Petrack, S., "RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals", RFC 2833, May 2000. [2] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998. [3] Handley, M., Schulzrinne, H., Schooler, E., Rosenberg, J., "SIP: Session Initiation Protocol", RFC 2543. [4] Telecommunications Industry Association (TIA), Newport Beach, CA, January 9-11, 2002, TR.30.1, 10201005.doc, ftp://ftp.tiaonline.org/TR-30/TR301/Public/TR-30.1/2002-01-NewportBeach/ [5] ITU-T I.366.2, AAL Type 2 Reassembly Service Specific Convergence Sublayer for Trunking, Nov. 2000. [6] ITU Q.2630.1, AAL2 type 2 signaling protocol, capability set 1, Dec. 1999. [7] Bradner, S., Keywords for use in RFCs to Indicate Requirement Levels, RFC 2119, March 1997. [8] ITU H.245, Control Protocol for Multimedia Communication, July 2001. [9] Perkins, C. et al, RTP Payload for Redundant Audio Data, RFC 2198. [10] ITU T.38, Procedures for Group 3 facsimile communication over IP networks. [11] Schulzrinne, H., et al, RTP: A Transport Protocol for Real-Time Applications, RFC 1889. [12] Schulzrinne, H., RTP Profile for Audio and Video Conferences with Minimal Control, RFC 1890. [13] http://www.isi.edu/in-notes/iana/assignments/media-types/audio/vnd. cisco.nse [14] ITU SG16, Q11, Draft Text D-003 for V.MoIP, Temporary Document 33, Geneva, 5-15 February, 2002. [15] http://www.isi.edu/in-notes/iana/assignments/media-types/ [16] Andreasen, F., SDP Simple Capability Negotiation, R.Kumar Informational û Expires August 2002 23 RTP State Signaling Events February 2002 draft-andreasen-mmusic-sdp-simcap-04.txt. [17] Foster, B. et al, Voice-Band Data Media Format, draft-foster-mmusic-vbdformat-01.txt. [18] ITU V.18, Operational and Interworking Requirements for DCEs operating in the text telephone mode. [19] Camarillo Gonzalo et al, Grouping of media lines in SDP, draft-ietf- mmusic-fid-06.txt. Full Copyright Statement Copyright (C) The Internet Society (March 2, 2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." R.Kumar Informational û Expires August 2002 24