SPEECHSC	D. Burnett
Internet-Draft	Voxeo
Intended status: Standards Track	S. Shanmugham
Expires: January 10, 2011	Cisco Systems, Inc.
	July 9, 2010

Media Resource Control Protocol Version 2 (MRCPv2)
draft-ietf-speechsc-mrcpv2-21

Abstract

The MRCPv2 protocol allows client hosts to control media service resources such as speech synthesizers, recognizers, verifiers and identifiers residing in servers on the network. MRCPv2 is not a "stand-alone" protocol - it relies on other protocols, such as Session Initiation Protocol (SIP) to rendezvous MRCPv2 clients and servers and manage sessions between them, and the Session Description Protocol (SDP) to describe, discover and exchange capabilities. It also depends on SIP and SDP to establish the media sessions and associated parameters between the media source or sink and the media server. Once this is done, the MRCPv2 protocol exchange operates over the control session established above, allowing the client to control the media processing resources on the speech resource server.

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

This Internet-Draft will expire on January 10, 2011.

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English.

1. Introduction
2. Document Conventions
    2.1. Definitions
    2.2. State-Machine Diagrams
    2.3. URI Schemes
3. Architecture
    3.1. MRCPv2 Media Resource Types
    3.2. Server and Resource Addressing
4. MRCPv2 Protocol Basics
    4.1. Connecting to the Server
    4.2. Managing Resource Control Channels
    4.3. SIP session example
    4.4. Media Streams and RTP Ports
    4.5. MRCPv2 Message Transport
5. MRCPv2 Specification
    5.1. Common Protocol Elements
    5.2. Request
    5.3. Response
    5.4. Status Codes
    5.5. Events
6. MRCPv2 Generic Methods, Headers, and Result Structure
    6.1. Generic Methods
        6.1.1. SET-PARAMS
        6.1.2. GET-PARAMS
    6.2. Generic Message Headers
        6.2.1. Channel-Identifier
        6.2.2. Accept
        6.2.3. Active-Request-Id-List
        6.2.4. Proxy-Sync-Id
        6.2.5. Accept-Charset
        6.2.6. Content-Type
        6.2.7. Content-ID
        6.2.8. Content-Base
        6.2.9. Content-Encoding
        6.2.10. Content-Location
        6.2.11. Content-Length
        6.2.12. Fetch Timeout
        6.2.13. Cache-Control
        6.2.14. Logging-Tag
        6.2.15. Set-Cookie and Set-Cookie2
        6.2.16. Vendor Specific Parameters
    6.3. Generic Result Structure
        6.3.1. Natural Language Semantics Markup Language
7. Resource Discovery
8. Speech Synthesizer Resource
    8.1. Synthesizer State Machine
    8.2. Synthesizer Methods
    8.3. Synthesizer Events
    8.4. Synthesizer Header Fields
        8.4.1. Jump-Size
        8.4.2. Kill-On-Barge-In
        8.4.3. Speaker Profile
        8.4.4. Completion Cause
        8.4.5. Completion Reason
        8.4.6. Voice-Parameter
        8.4.7. Prosody-Parameters
        8.4.8. Speech Marker
        8.4.9. Speech Language
        8.4.10. Fetch Hint
        8.4.11. Audio Fetch Hint
        8.4.12. Failed URI
        8.4.13. Failed URI Cause
        8.4.14. Speak Restart
        8.4.15. Speak Length
        8.4.16. Load-Lexicon
        8.4.17. Lexicon-Search-Order
    8.5. Synthesizer Message Body
        8.5.1. Synthesizer Speech Data
        8.5.2. Lexicon Data
    8.6. SPEAK Method
    8.7. STOP
    8.8. BARGE-IN-OCCURED
    8.9. PAUSE
    8.10. RESUME
    8.11. CONTROL
    8.12. SPEAK-COMPLETE
    8.13. SPEECH-MARKER
    8.14. DEFINE-LEXICON
9. Speech Recognizer Resource
    9.1. Recognizer State Machine
    9.2. Recognizer Methods
    9.3. Recognizer Events
    9.4. Recognizer Header Fields
        9.4.1. Confidence Threshold
        9.4.2. Sensitivity Level
        9.4.3. Speed Vs Accuracy
        9.4.4. N Best List Length
        9.4.5. Input Type
        9.4.6. No Input Timeout
        9.4.7. Recognition Timeout
        9.4.8. Waveform URI
        9.4.9. Media Type
        9.4.10. Input-Waveform-URI
        9.4.11. Completion Cause
        9.4.12. Completion Reason
        9.4.13. Recognizer Context Block
        9.4.14. Start Input Timers
        9.4.15. Speech Complete Timeout
        9.4.16. Speech Incomplete Timeout
        9.4.17. DTMF Interdigit Timeout
        9.4.18. DTMF Term Timeout
        9.4.19. DTMF-Term-Char
        9.4.20. Failed URI
        9.4.21. Failed URI Cause
        9.4.22. Save Waveform
        9.4.23. New Audio Channel
        9.4.24. Speech-Language
        9.4.25. Ver-Buffer-Utterance
        9.4.26. Recognition-Mode
        9.4.27. Cancel-If-Queue
        9.4.28. Hotword-Max-Duration
        9.4.29. Hotword-Min-Duration
        9.4.30. Interpret-Text
        9.4.31. DTMF-Buffer-Time
        9.4.32. Clear-DTMF-Buffer
        9.4.33. Early-No-Match
        9.4.34. Num-Min-Consistent-Pronunciations
        9.4.35. Consistency-Threshold
        9.4.36. Clash-Threshold
        9.4.37. Personal-Grammar-URI
        9.4.38. Enroll-Utterance
        9.4.39. Phrase-Id
        9.4.40. Phrase-NL
        9.4.41. Weight
        9.4.42. Save-Best-Waveform
        9.4.43. New-Phrase-Id
        9.4.44. Confusable-Phrases-URI
        9.4.45. Abort-Phrase-Enrollment
    9.5. Recognizer Message Body
        9.5.1. Recognizer Grammar Data
        9.5.2. Recognizer Result Data
        9.5.3. Enrollment Result Data
        9.5.4. Recognizer Context Block
    9.6. Recognizer Results
        9.6.1. Markup Functions
        9.6.2. Overview of Recognizer Result Elements and their Relationships
        9.6.3. Elements and Attributes
    9.7. Enrollment Results
        9.7.1. NUM-CLASHES Element
        9.7.2. NUM-GOOD-REPETITIONS Element
        9.7.3. NUM-REPETITIONS-STILL-NEEDED Element
        9.7.4. CONSISTENCY-STATUS Element
        9.7.5. CLASH-PHRASE-IDS Element
        9.7.6. TRANSCRIPTIONS Element
        9.7.7. CONFUSABLE-PHRASES Element
    9.8. DEFINE-GRAMMAR
    9.9. RECOGNIZE
    9.10. STOP
    9.11. GET-RESULT
    9.12. START-OF-INPUT
    9.13. START-INPUT-TIMERS
    9.14. RECOGNITION-COMPLETE
    9.15. START-PHRASE-ENROLLMENT
    9.16. ENROLLMENT-ROLLBACK
    9.17. END-PHRASE-ENROLLMENT
    9.18. MODIFY-PHRASE
    9.19. DELETE-PHRASE
    9.20. INTERPRET
    9.21. INTERPRETATION-COMPLETE
    9.22. DTMF Detection
10. Recorder Resource
    10.1. Recorder State Machine
    10.2. Recorder Methods
    10.3. Recorder Events
    10.4. Recorder Header Fields
        10.4.1. Sensitivity Level
        10.4.2. No Input Timeout
        10.4.3. Completion Cause
        10.4.4. Completion Reason
        10.4.5. Failed URI
        10.4.6. Failed URI Cause
        10.4.7. Record URI
        10.4.8. Media Type
        10.4.9. Max Time
        10.4.10. Trim-Length
        10.4.11. Final Silence
        10.4.12. Capture On Speech
        10.4.13. Ver-Buffer-Utterance
        10.4.14. Start Input Timers
        10.4.15. New Audio Channel
    10.5. Recorder Message Body
    10.6. RECORD
    10.7. STOP
    10.8. RECORD-COMPLETE
    10.9. START-INPUT-TIMERS
    10.10. START-OF-INPUT
11. Speaker Verification and Identification
    11.1. Speaker Verification State Machine
    11.2. Speaker Verification Methods
    11.3. Verification Events
    11.4. Verification Header Fields
        11.4.1. Repository-URI
        11.4.2. Voiceprint-Identifier
        11.4.3. Verification-Mode
        11.4.4. Adapt-Model
        11.4.5. Abort-Model
        11.4.6. Min-Verification-Score
        11.4.7. Num-Min-Verification-Phrases
        11.4.8. Num-Max-Verification-Phrases
        11.4.9. No-Input-Timeout
        11.4.10. Save-Waveform
        11.4.11. Media Type
        11.4.12. Waveform-URI
        11.4.13. Voiceprint-Exists
        11.4.14. Ver-Buffer-Utterance
        11.4.15. Input-Waveform-Uri
        11.4.16. Completion-Cause
        11.4.17. Completion Reason
        11.4.18. Speech Complete Timeout
        11.4.19. New Audio Channel
        11.4.20. Abort-Verification
        11.4.21. Start Input Timers
    11.5. Verification Message Body
        11.5.1. Verification Result Data
        11.5.2. Verification Result Elements
    11.6. START-SESSION
    11.7. END-SESSION
    11.8. QUERY-VOICEPRINT
    11.9. DELETE-VOICEPRINT
    11.10. VERIFY
    11.11. VERIFY-FROM-BUFFER
    11.12. VERIFY-ROLLBACK
    11.13. STOP
    11.14. START-INPUT-TIMERS
    11.15. VERIFICATION-COMPLETE
    11.16. START-OF-INPUT
    11.17. CLEAR-BUFFER
    11.18. GET-INTERMEDIATE-RESULT
12. Security Considerations
    12.1. Rendezvous and Session Establishment
    12.2. Control channel protection
    12.3. Media session protection
    12.4. Indirect Content Access
    12.5. Protection of stored media
    12.6. DTMF and recognition buffers
13. IANA Considerations
    13.1. New registries
        13.1.1. MRCPv2 resource types
        13.1.2. MRCPv2 methods and events
        13.1.3. MRCPv2 header fields
        13.1.4. MRCPv2 status codes
        13.1.5. Grammar Reference List Parameters
        13.1.6. MRCPv2 vendor-specific parameters
    13.2. NLSML-related registrations
        13.2.1. application/nlsml+xml Media Type registration
    13.3. NLSML XML Schema registration
    13.4. MRCPv2 XML Namespace registration
    13.5. text Media Type Registrations
        13.5.1. text/grammar-ref-list
    13.6. session URL scheme registration
    13.7. SDP parameter registrations
        13.7.1. sub-registry "proto"
        13.7.2. sub-registry "att-field (session-level)"
        13.7.3. sub-registry "att-field (media-level)"
14. Examples
    14.1. Message Flow
    14.2. Recognition Result Examples
        14.2.1. Simple ASR Ambiguity
        14.2.2. Mixed Initiative
        14.2.3. DTMF Input
        14.2.4. Interpreting Meta-Dialog and Meta-Task Utterances
        14.2.5. Anaphora and Deixis
        14.2.6. Distinguishing Individual Items from Sets with One Member
        14.2.7. Extensibility
15. ABNF Normative Definition
16. XML Schemas
    16.1. NLSML Schema Definition
    16.2. Enrollment Results Schema Definition
    16.3. Verification Results Schema Definition
17. References
    17.1. Normative References
    17.2. Informative References
Appendix A. Contributors
Appendix B. Acknowledgements
§ Authors' Addresses

Resource Type	Resource Description	Described in
speechrecog	Speech Recognizer	Section 9 (Speech Recognizer Resource)
dtmfrecog	DTMF Recognizer	Section 9 (Speech Recognizer Resource)
speechsynth	Speech Synthesizer	Section 8 (Speech Synthesizer Resource)
basicsynth	Basic Synthesizer	Section 8 (Speech Synthesizer Resource)
speakverify	Speaker Verification	Section 11 (Speaker Verification and Identification)
recorder	Speech Recorder	Section 10 (Recorder Resource)

Code	Meaning
200	Success
201	Success with some optional header fields ignored

Code	Meaning
401	Method not allowed
402	Method not valid in this state
403	Unsupported header field
404	Illegal value for header field. This is the error for a syntax violation.
405	Resource not allocated for this session or does not exist
406	Mandatory Header Field Missing
407	Method or Operation Failed (e.g., Grammar compilation failed in the recognizer. Detailed cause codes MAY BE available through a resource specific header.)
408	Unrecognized or unsupported message entity
409	Unsupported Header Field Value. This is a value that is syntactically legal but exceeds the implementation's capabilities or expectations.
410	Non-Monotonic or Out of order sequence number in request.
411-420	Reserved for future assignment

Code	Meaning
501	Server Internal Error
502	Protocol Version not supported
503	Proxy Timeout. The MRCP Proxy did not receive a response from the MRCP server.
504	Message too large

Cause-Code	Cause-Name	Description
000	normal	SPEAK completed normally.
001	barge-in	SPEAK request was terminated because of barge-in.
002	parse-failure	SPEAK request terminated because of a failure to parse the speech markup text.
003	uri-failure	SPEAK request terminated because access to one of the URIs failed.
004	error	SPEAK request terminated prematurely due to synthesizer error.
005	language-unsupported	Language not supported.
006	lexicon-load-failure	Lexicon loading failed.
007	cancelled	A prior SPEAK request failed while this one was still in the queue.

Cause-Code	Cause-Name	Description
000	success	RECOGNIZE completed with a match or DEFINE-GRAMMAR succeeded in downloading and compiling the grammar
001	no-match	RECOGNIZE completed, but no match was found
002	no-input-timeout	RECOGNIZE completed without a match due to a no-input-timeout
003	hotword-maxtime	RECOGNIZE in hotword mode completed without a match due to a recognition-timeout
004	grammar-load-failure	RECOGNIZE failed due grammar load failure.
005	grammar-compilation-failure	RECOGNIZE failed due to grammar compilation failure.
006	recognizer-error	RECOGNIZE request terminated prematurely due to a recognizer error.
007	speech-too-early	RECOGNIZE request terminated because speech was too early. This happens when the audio stream is already "in-speech" when the RECOGNIZE request was received.
008	success-maxtime	RECOGNIZE request terminated because speech was too long but whatever was spoken till that point was a full match.
009	uri-failure	Failure accessing a URI.
010	language-unsupported	Language not supported.
011	cancelled	A new RECOGNIZE cancelled this one, or a prior RECOGNIZE failed while this one was still in the queue.
012	semantics-failure	Recognition succeeded but semantic interpretation of the recognized input failed. The RECOGNITION-COMPLETE event MUST contain the Recognition result with only input text and no interpretation.
013	partial-match	Speech Incomplete timeout expired before there was a full match. But whatever that was spoken till that point was a partial match to one or more grammars.
014	partial-match-maxtime	The Recognition-Timer expired before full match was achieved. But whatever was spoken till that point was a partial match to one or more grammars.
015	no-match-maxtime	The Recognition-Timer expired. Whatever was spoken till that point either did not match any of the grammars. This cause could also be returned if the recognizer does not support detecting partial grammar matches.
016	grammar-definition-failure	any DEFINE-GRAMMAR error other than grammar-load-failure and grammar-compilation-failure.

Cause-Code	Cause-Name	Description
000	success-silence	RECORD completed with a silence at the end
001	success-maxtime	RECORD completed after reaching maximum recording time specified in record method.
002	noinput-timeout	RECORD failed due to no input
003	uri-failure	Failure accessing the record URI.
004	error	RECORD request terminated prematurely due to a recorder error.

Cause-Code	Cause-Name	Description
000	success	VERIFY or VERIFY-FROM-BUFFER request completed successfully. The verify decision can be "accepted", "rejected", or "undecided".
001	error	VERIFY or VERIFY-FROM-BUFFER request terminated prematurely due to a verification resource or system error.
002	no-input-timeout	VERIFY request completed with no result due to a no-input-timeout.
003	too-much-speech-timeout	VERIFY request completed with no result due to too much speech.
004	speech-too-early	VERIFY request completed with no result due to spoke too soon.
005	buffer-empty	VERIFY-FROM-BUFFER request completed with no result due to empty buffer.
006	out-of-sequence	Verification operation failed due to out-of-sequence method invocations. For example calling VERIFY before QUERY-VOICEPRINT.
007	repository-uri-failure	Failure accessing Repository URI.
008	repository-uri-missing	Repository-uri is not specified.
009	voiceprint-id-missing	Voiceprint-identification is not specified.
010	voiceprint-id-not-exist	Voiceprint-identification does not exist in the voiceprint repository.
011	speech-not-usable	VERIFY request completed with no result because the speech was not usable (too noisy, too short, etc.)

[RFC3550]	Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, “RTP: A Transport Protocol for Real-Time Applications,” STD 64, RFC 3550, July 2003 (TXT, PS, PDF).
[RFC3261]	Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, “SIP: Session Initiation Protocol,” RFC 3261, June 2002 (TXT).
[RFC2326]	Schulzrinne, H., Rao, A., and R. Lanphier, “Real Time Streaming Protocol (RTSP),” RFC 2326, April 1998 (TXT).
[RFC4566]	Handley, M., Jacobson, V., and C. Perkins, “SDP: Session Description Protocol,” RFC 4566, July 2006 (TXT).
[RFC2119]	Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” BCP 14, RFC 2119, March 1997 (TXT, HTML, XML).
[RFC2616]	Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” RFC 2616, June 1999 (TXT, PS, PDF, HTML, XML).
[RFC3264]	Rosenberg, J. and H. Schulzrinne, “An Offer/Answer Model with Session Description Protocol (SDP),” RFC 3264, June 2002 (TXT).
[RFC3629]	Yergeau, F., “UTF-8, a transformation format of ISO 10646,” STD 63, RFC 3629, November 2003 (TXT).
[RFC5234]	Crocker, D. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” STD 68, RFC 5234, January 2008 (TXT).
[RFC4145]	Yon, D. and G. Camarillo, “TCP-Based Media Transport in the Session Description Protocol (SDP),” RFC 4145, September 2005 (TXT).
[RFC4572]	Lennox, J., “Connection-Oriented Media Transport over the Transport Layer Security (TLS) Protocol in the Session Description Protocol (SDP),” RFC 4572, July 2006 (TXT).
[RFC3388]	Camarillo, G., Eriksson, G., Holler, J., and H. Schulzrinne, “Grouping of Media Lines in the Session Description Protocol (SDP),” RFC 3388, December 2002 (TXT).
[RFC5322]	Resnick, P., Ed., “Internet Message Format,” RFC 5322, October 2008 (TXT, HTML, XML).
[RFC2392]	Levinson, E., “Content-ID and Message-ID Uniform Resource Locators,” RFC 2392, August 1998 (TXT, HTML, XML).
[RFC2109]	Kristol, D. and L. Montulli, “HTTP State Management Mechanism,” RFC 2109, February 1997 (TXT, HTML, XML).
[RFC2965]	Kristol, D. and L. Montulli, “HTTP State Management Mechanism,” RFC 2965, October 2000 (TXT, HTML, XML).
[RFC4646]	Phillips, A. and M. Davis, “Tags for Identifying Languages,” RFC 4646, September 2006 (TXT).
[RFC5226]	Narten, T. and H. Alvestrand, “Guidelines for Writing an IANA Considerations Section in RFCs,” BCP 26, RFC 5226, May 2008 (TXT).
[RFC1035]	Mockapetris, P., “Domain names - implementation and specification,” STD 13, RFC 1035, November 1987 (TXT).
[RFC4288]	Freed, N. and J. Klensin, “Media Type Specifications and Registration Procedures,” BCP 13, RFC 4288, December 2005 (TXT).
[RFC3688]	Mealling, M., “The IETF XML Registry,” BCP 81, RFC 3688, January 2004 (TXT).
[RFC4395]	Hansen, T., Hardie, T., and L. Masinter, “Guidelines and Registration Procedures for New URI Schemes,” BCP 35, RFC 4395, February 2006 (TXT).
[RFC4568]	Andreasen, F., Baugher, M., and D. Wing, “Session Description Protocol (SDP) Security Descriptions for Media Streams,” RFC 4568, July 2006 (TXT).
[W3C.REC-speech-synthesis-20040907]	Walker, M., Hunt, A., and D. Burnett, “Speech Synthesis Markup Language (SSML) Version 1.0,” World Wide Web Consortium Recommendation REC-speech-synthesis-20040907, September 2004 (HTML).
[RFC2483]	Mealling, M. and R. Daniel, “URI Resolution Services Necessary for URN Resolution,” RFC 2483, January 1999 (TXT, HTML, XML).
[RFC3711]	Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, “The Secure Real-time Transport Protocol (SRTP),” RFC 3711, March 2004 (TXT).
[W3C.REC-speech-grammar-20040316]	McGlashan, S. and A. Hunt, “Speech Recognition Grammar Specification Version 1.0,” World Wide Web Consortium Recommendation REC-speech-grammar-20040316, March 2004 (HTML).
[W3C.REC-semantic-interpretation-20070405]	Tichelen, L. and D. Burke, “Semantic Interpretation for Speech Recognition (SISR) Version 1.0,” World Wide Web Consortium REC REC-semantic-interpretation-20070405, April 2007 (HTML).
[W3C.REC-xml-names11-20040204]	Layman, A., Hollander, D., Tobin, R., and T. Bray, “Namespaces in XML 1.1,” World Wide Web Consortium FirstEdition REC-xml-names11-20040204, February 2004 (HTML).

[RFC4313]	Oran, D., “Requirements for Distributed Control of Automatic Speech Recognition (ASR), Speaker Identification/Speaker Verification (SI/SV), and Text-to-Speech (TTS) Resources,” RFC 4313, December 2005 (TXT).
[Q.23]	International Telecommunications Union, “Technical Features of Push-Button Telephone Sets,” ITU-T Q.23, 1993.
[RFC4733]	Schulzrinne, H. and T. Taylor, “RTP Payload for DTMF Digits, Telephony Tones, and Telephony Signals,” RFC 4733, December 2006 (TXT).
[W3C.REC-voicexml20-20040316]	Ferrans, J., Porter, B., Danielsen, P., McGlashan, S., Carter, J., Tryphonas, S., Hunt, A., Rehor, K., Lucas, B., and D. Burnett, “Voice Extensible Markup Language (VoiceXML) Version 2.0,” World Wide Web Consortium Recommendation REC-voicexml20-20040316, March 2004 (HTML).
[RFC4463]	Shanmugham, S., Monaco, P., and B. Eberman, “A Media Resource Control Protocol (MRCP) Developed by Cisco, Nuance, and Speechworks,” RFC 4463, April 2006 (TXT).
[RFC2234]	Crocker, D., Ed. and P. Overell, “Augmented BNF for Syntax Specifications: ABNF,” RFC 2234, November 1997 (TXT, HTML, XML).
[refs.javaSpeechGrammarFormat]	Sun Microsystems, “Java Speech Grammar Format Version 1.0,” October 1998.
[W3C.REC-emma-20090210]	Johnston, M., Baggia, P., Burnett, D., Carter, J., Dahl, D., McCobb, G., and D. Raggett, “EMMA: Extensible MultiModal Annotation markup language,” World Wide Web Consortium Recommendation REC-emma-20090210, February 2009 (HTML).
[RFC4467]	Crispin, M., “Internet Message Access Protocol (IMAP) - URLAUTH Extension,” RFC 4467, May 2006 (TXT).
[W3C.REC-pronunciation-lexicon-20081014]	Baggia, P., Bagshaw, P., Burnett, D., Carter, J., and F. Scahill, “Pronunciation Lexicon Specification (PLS),” World Wide Web Consortium Recommendation REC-pronunciation-lexicon-20081014, October 2008 (HTML).

	Daniel C. Burnett
	Voxeo
	189 South Orange Avenue #2050
	Orlando, FL 32801
	USA
Email:	dburnett@voxeo.com

	Saravanan Shanmugham
	Cisco Systems, Inc.
	170 W. Tasman Dr.
	San Jose, CA 95134
	USA
Email:	sarvi@cisco.com

Media Resource Control Protocol Version 2 (MRCPv2)draft-ietf-speechsc-mrcpv2-21

Abstract

Status of this Memo

Copyright Notice

Table of Contents

1. Introduction

2. Document Conventions

2.1. Definitions

2.2. State-Machine Diagrams

2.3. URI Schemes

3. Architecture

3.1. MRCPv2 Media Resource Types

3.2. Server and Resource Addressing

4. MRCPv2 Protocol Basics

4.1. Connecting to the Server

4.2. Managing Resource Control Channels

4.3. SIP session example

4.4. Media Streams and RTP Ports

4.5. MRCPv2 Message Transport

5. MRCPv2 Specification

5.1. Common Protocol Elements

5.2. Request

5.3. Response

5.4. Status Codes

5.5. Events

6. MRCPv2 Generic Methods, Headers, and Result Structure

6.1. Generic Methods

6.1.1. SET-PARAMS

6.1.2. GET-PARAMS

6.2. Generic Message Headers

6.2.1. Channel-Identifier

6.2.2. Accept

6.2.3. Active-Request-Id-List

6.2.4. Proxy-Sync-Id

6.2.5. Accept-Charset

6.2.6. Content-Type

6.2.7. Content-ID

6.2.8. Content-Base

6.2.9. Content-Encoding

6.2.10. Content-Location

6.2.11. Content-Length

6.2.12. Fetch Timeout

6.2.13. Cache-Control

6.2.14. Logging-Tag

6.2.15. Set-Cookie and Set-Cookie2

6.2.16. Vendor Specific Parameters

6.3. Generic Result Structure

6.3.1. Natural Language Semantics Markup Language

7. Resource Discovery

8. Speech Synthesizer Resource

8.1. Synthesizer State Machine

8.2. Synthesizer Methods

8.3. Synthesizer Events

8.4. Synthesizer Header Fields

8.4.1. Jump-Size

8.4.2. Kill-On-Barge-In

8.4.3. Speaker Profile

8.4.4. Completion Cause

8.4.5. Completion Reason

8.4.6. Voice-Parameter

8.4.7. Prosody-Parameters

8.4.8. Speech Marker

8.4.9. Speech Language

8.4.10. Fetch Hint

8.4.11. Audio Fetch Hint

8.4.12. Failed URI

8.4.13. Failed URI Cause

8.4.14. Speak Restart

8.4.15. Speak Length

8.4.16. Load-Lexicon

8.4.17. Lexicon-Search-Order

8.5. Synthesizer Message Body

8.5.1. Synthesizer Speech Data

8.5.2. Lexicon Data

8.6. SPEAK Method

8.7. STOP

8.8. BARGE-IN-OCCURED

8.9. PAUSE

8.10. RESUME

8.11. CONTROL

Media Resource Control Protocol Version 2 (MRCPv2)
draft-ietf-speechsc-mrcpv2-21