Internet Engineering Task Force Saravanan Shanmugham Internet-Draft Cisco Systems Inc. draft-ietf-speechsc-mrcpv2-04 July 19, 2004 Expires: January 19, 2005 Media Resource Control Protocol Version 2(MRCPv2) Status of this Memo By submitting this Internet-Draft, we certify that any applicable patent or other IPR claims of which we are aware have been disclosed, and any of which we become aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt . The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html . This Internet-Draft will expire on January 19, 2005. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract This document describes a proposal for a Media Resource Control Protocol Version 2 (MRCPv2) and aims to meet the requirements specified in the SPEECHSC working group requirements document. It is based on the Media Resource Control Protocol (MRCP), also called S. Shanmugham, et. al. Page 1 MRCPv2 Protocol July 2004 MRCPv1 developed jointly by Cisco Systems, Inc., Nuance Communications, and Speechworks Inc. The MRCPv2 protocol will control media service resources like speech synthesizers, recognizers, signal generators, signal detectors, fax servers etc. over a network. This protocol depends on a session management protocol such as the Session Initiation Protocol (SIP) to establish a separate MRCPv2 control session between the client and the server. It also depends on SIP to establish the media pipe and associated parameters between the media source or sink and the media server. Once this is done, the MRCPv2 protocol exchange can happen over the control session established above allowing the client to command and control the media processing resources that may exist on the media server. Table of Contents Status of this Memo..............................................1 Copyright Notice.................................................1 Abstract.........................................................1 Table of Contents................................................2 1. Introduction:...............................................4 2. Notational Convention.......................................5 3. Architecture:...............................................5 3.1. MRCPv2 Media Resources:...................................7 3.2. Server and Resource Addressing............................7 4. MRCPv2 Protocol Basics......................................7 4.1. Connecting to the Server..................................8 4.2. Managing Resource Control Channels........................8 4.3. Media Streams and RTP Ports..............................13 4.4. MRCPv2 Message Transport.................................14 4.5. Resource Types...........................................15 5. MRCPv2 Specification.......................................16 5.1. Request..................................................17 5.2. Response.................................................18 5.3. Event....................................................20 6. MRCP Generic Features......................................20 6.1. Generic Message Headers..................................20 6.2. SET-PARAMS...............................................28 6.3. GET-PARAMS...............................................29 7. Resource Discovery.........................................30 8. Speech Synthesizer Resource................................31 8.1. Synthesizer State Machine................................31 8.2. Synthesizer Methods......................................32 8.3. Synthesizer Events.......................................32 8.4. Synthesizer Header Fields................................32 8.5. Synthesizer Message Body.................................39 8.6. SPEAK....................................................41 8.7. STOP.....................................................43 8.8. BARGE-IN-OCCURRED........................................44 S Shanmugham IETF-Draft Page 2 MRCPv2 Protocol July 2004 8.9. PAUSE....................................................45 8.10. RESUME...................................................46 8.11. CONTROL..................................................47 8.12. SPEAK-COMPLETE...........................................49 8.13. SPEECH-MARKER............................................49 8.14. DEFINE-LEXICON...........................................51 9. Speech Recognizer Resource.................................51 9.1. Recognizer State Machine.................................52 9.2. Recognizer Methods.......................................53 9.3. Recognizer Events........................................53 9.4. Recognizer Header Fields.................................53 9.5. Recognizer Message Body..................................68 9.6. DEFINE-GRAMMAR...........................................75 9.7. RECOGNIZE................................................78 9.8. STOP.....................................................80 9.9. GET-RESULT...............................................81 9.10. START-OF-SPEECH..........................................82 9.11. START-INPUT-TIMERS.......................................82 9.12. RECOGNITION-COMPLETE.....................................83 9.13. START-PHRASE-ENROLLMENT..................................85 9.14. ENROLLMENT-ROLLBACK......................................86 9.15. END-PHRASE-ENROLLMENT....................................86 9.16. MODIFY-PHRASE............................................87 9.17. DELETE-PHRASE............................................87 9.18. INTERPRET................................................88 9.19. INTERPRETATION-COMPLETE..................................89 9.20. DTMF Detection...........................................90 10. Recorder Resource..........................................90 10.1. Recorder State Machine...................................91 10.2. Recorder Methods.........................................91 10.3. Recorder Events..........................................91 10.4. Recorder Header Fields...................................91 10.5. Recorder Message Body....................................96 10.6. RECORD...................................................96 10.7. STOP.....................................................97 10.8. RECORD-COMPLETE..........................................97 10.9. START-INPUT-TIMERS.......................................98 11. Speaker Verification and Identification....................99 11.1. Speaker Verification State Machine......................100 11.2. Speaker Verification Methods............................100 11.3. Verification Events.....................................101 11.4. Verification Header Fields..............................101 11.5. Verification Result Elements............................110 11.6. START-SESSION...........................................114 11.7. END-SESSION.............................................115 11.8. QUERY-VOICEPRINT........................................115 11.9. DELETE-VOICEPRINT.......................................116 11.10. VERIFY..................................................117 11.11. VERIFY-FROM-BUFFER......................................117 11.12. VERIFY-ROLLBACK.........................................120 11.13. STOP....................................................120 S Shanmugham IETF-Draft Page 3 MRCPv2 Protocol July 2004 11.14. START-INPUT-TIMERS......................................121 11.15. VERIFICATION-COMPLETE...................................122 11.16. START-OF-SPEECH.........................................122 11.17. CLEAR-BUFFER............................................123 11.18. GET-INTERMEDIATE-RESULT.................................123 12. Security Considerations...................................124 13. Examples:.................................................124 14. Reference Documents.......................................131 15. Appendix..................................................133 15.1. ABNF Message Definitions................................133 15.2. XML Schema and DTD......................................147 Full Copyright Statement.......................................152 Intellectual Property..........................................152 Contributors...................................................152 Acknowledgements...............................................153 Editors' Addresses.............................................154 1. Introduction: The MRCPv2 protocol is designed for a client device to control media processing resources on the network allowing to process and audio/video stream. Some of these media processing resources could be speech recognition, speech synthesis engines, speaker verification or speaker identification engines. This allows a vendor to implement distributed Interactive Voice Response platforms such as VoiceXML [8] browsers. The protocol requirements of SPEECHSC require that the protocol is capable of reaching a media processing server and setting up communication channels to the media resources, to send/recieve control messages and media streams to/from the server. The Session Initiation Protocol (SIP) protocol described in [4] meets these requirements and is used to setup and tear down media and control pipes to the server. In addition, the SIP re-INVITE can be used to change the characteristics of these media and control pipes mid- session. The MRCPv2 protocol hence is designed to leverage and build upon a session management protocols such as Session Initiation Protocol (SIP) and Session Description Protocol (SDP). SDP is used to describe the parameters of the media pipe associated with that session. It is mandatory to support SIP as the session level protocol to ensure interoperability. Other protocols can be used at the session level by prior agreement. The MRCPv2 protocol depends on SIP and SDP to create the session, and setup the media channels to the server. It also depends on SIP and SDP to establish MRCPv2 control channels between the client and the server for each media processing resource required for that session. The MRCPv2 protocol exchange between the client and the media resource can then happen on that control channel. The MRCPv2 S Shanmugham IETF-Draft Page 4 MRCPv2 Protocol July 2004 protocol exchange happening on this control channel does not change the state of the SIP session, the media or other parameters of the session SIP initiated. It merely controls and affects the state of the media processing resource associated with that MRCPv2 channel. The MRCPv2 protocol defines the messages to control the different media processing resources and the state machines required to guide their operation. It also describes how these messages are carried over a transport layer such as TCP, SCTP or TLS. 2. Notational Convention The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY" and "OPTIONAL" in this document are to be interpreted as described in RFC 2119[10]. Since many of the definitions and syntax are identical to HTTP/1.1, this specification only points to the section where they are defined rather than copying it. For brevity, [HX.Y] is to be taken to refer to Section X.Y of the current HTTP/1.1 specification (RFC 2616 [1]). All the mechanisms specified in this document are described in both prose and an augmented Backus-Naur form (ABNF). It is described in detail in RFC 2234 [3]. The complete message format in ABNF form is provided in Appendix section 12.1 and is the normative format definition. Media Resource An entity on the MRCP Server that can be controlled through the MRCP protocol MRCP Server Aggregate of one or more "Media Resource" entities on a Server, exposed through the MRCP protocol.("Server" for short) MRCP Client An entity controlling one or more Media Resources through the MRCP protocol. ("Client" for short) 3. Architecture: The system consists of a client that requires the generation of media streams or requires the processing of media streams and a media resource server that has the resources or engines to process or generate these streams. The client establishes a session using SIP and SDP with the server to use its media processing resources. A SIP URI refers to the MRCPv2 server. S Shanmugham IETF-Draft Page 5 MRCPv2 Protocol July 2004 The session management protocol (SIP) will use SDP with the offer/answer model described RFC 3264 to describe and setup the MRCPv2 control channels. Separate MRCPv2 control channels are need for controlling the different media processing resources associated with that session. Within a SIP session, the individual resource control channels for the different resources are added or removed through the SDP offer/answer model and the SIP re-INVITE dialog. The server, through the SDP exchange, provides the client with a unique channel identifier and a TCP port number. The client MAY then open a new TCP connection with the server using this port number. Multiple MRCPv2 channels can share a TCP connection between the client and the server. All MRCPv2 messages exchanged between the client and the server will also carry the specified channel identifier that MUST be unique among all MRCPv2 control channels that are active on that server. The client can use this channel to control the media processing resource associated with that channel. The session management protocol (SIP) will also establish media pipes between the client (or source/sink of media) and the MRCP server using SDP m-lines. A media pipe maybe shared by one or more media processing resources under that SIP session or each media processing resource may have its own media pipe. MRCPv2 client MRCPv2 Media Resource Server |--------------------| |-----------------------------| ||------------------|| ||---------------------------|| || Application Layer|| || TTS | ASR | SV | SI || ||------------------|| ||Engine|Engine|Engine|Engine|| ||Media Resource API|| ||---------------------------|| ||------------------|| || Media Resource Management || || SIP | MRCPv2 || ||---------------------------|| ||Stack | || || SIP | MRCPv2 || || | || || Stack | || ||------------------|| ||---------------------------|| || TCP/IP Stack ||----MRCPv2---|| TCP/IP Stack || || || || || ||------------------||-----SIP-----||---------------------------|| |--------------------| |-----------------------------| | / SIP / | / |-------------------| RTP | | / | Media Source/Sink |-------------/ | | |-------------------| Fig 1: Architectural Diagram S Shanmugham IETF-Draft Page 6 MRCPv2 Protocol July 2004 3.1. MRCPv2 Media Resources: The MRCP server may offer one or more of the following media processing resources to its clients. Speech Recognition The server may offer speech recognition engines that the client can allocate, control and have it recognize the spoken input contained in the audio stream. Speech Synthesis The server may offer speech synthesis engines that the client can allocate, control and have it generate synthesized voice into the audio stream. Speaker Identification The server may offer speaker recognition engines that the client can allocate, control and have it recognize the speaker from voice in the audio stream. Speaker Verification The server may offer speaker Verification engines that the client can allocate, control and have it verify and authenticate the speaker based on his voice. 3.2. Server and Resource Addressing The MRCPv2 server as a whole is a generic SIP server and addressed by a specific SIP URL registered by the server. Example: sip:mrcpv2@mediaserver.com 4. MRCPv2 Protocol Basics MRCPv2 requires the use of a transport layer protocol such as TCP or SCTP to guarantee reliable sequencing and delivery of MRCPv2 control messages between the client and the server. If security is needed a TLS connection is used to carry MRCPv2 messages. One or more TCP, SCTP or TLS connections between the client and the server can be shared between different MRCPv2 channels to the server. The individual messages carry the channel identifier to differentiate messages on different channels. The message format for MRCPv2 is text based with mechanisms to carry embedded binary data. This allows data like recognition grammars, recognition results, synthesizer speech markup etc. to be carried in the MRCPv2 message between the client and the server resource. The protocol does not address session and media establishment and management and relies of SIP and SDP to do this. S Shanmugham IETF-Draft Page 7 MRCPv2 Protocol July 2004 4.1. Connecting to the Server The MRCPv2 protocol depends on a session establishment and management protocol such as SIP in conjunction with SDP. The client finds and reaches a MRCPv2 server across the SIP network using the INVITE and other SIP dialog exchanges. The SDP offer/answer exchange model over SIP is used to establish resource control channels for each resource. The SDP offer/answer exchange is also used to establish media pipes between the source or sink of audio and the server. 4.2. Managing Resource Control Channels The client needs a separate MRCPv2 resource control channel to control each media processing resource under the SIP session. A unique channel identifier string identifies these resource control channels. The channel identifier string consists of a hexadecimal number specifying the channel ID followed by a string token specifying the type of resource separated by an "@". The server generates the hexadecimal channel ID and MUST make sure it does not clash with any other MRCP channel allocated to that server. MRCPv2 defines the following type of media processing resources. Additional resource types, their associated methods/events and state machines can be added by future specification proposing to extend the capabilities of MRCPv2. Resource Type Resource Description speechrecog Speech Recognition dtmfrecog DTMF Recognition speechsynth Speech Synthesis basicsynth Poorman's Speech Synthesizer speakverify Speaker Verification recorder Speech Recording Additional resource types, their associated methods/events and state machines can be added by future specification proposing to extend the capabilities of MRCPv2. The SIP INVITE or re-INVITE dialog exchange and the SDP offer/answer exchange it carries, will contain m-lines describing the resource control channel it wants to allocate. There MUST be one SDP m-line for each MRCPv2 resource that needs to be controlled. This m-line will have a media type field of "control" and a transport type field of "TCP", "SCTP" or "TLS". The port number field of the m-line MUST contain the discard port of the transport protocol (say port 9 for TCP) in the SDP offer from the client and MUST contain the TCP listen port on the server in the SDP answer. The client MAY then setup a TCP or TLS connection to that server port or share an already established connection to that port. The format field of the S Shanmugham IETF-Draft Page 8 MRCPv2 Protocol July 2004 m-line MUST contain "application/mrcpv2". The client must specify the resource type identifier in the resource attribute associated with the control m-line of the SDP offer. The server MUST respond with the full Channel-Identifier (which includes the resource type identifier and an unique hexadecimal identifier), in the "channel" attribute associated with the control m-line of the SDP answer. All servers MUST support TCP, SCTP and TLS and it is up to the client to choose which mode of transport it wants to use for an MRCPv2 session. When the client wants to add a media processing resource to the session, it MUST initiate a re-INVITE dialog. The SDP offer/answer exchange contained in this SIP dialog will contain an additional control m-line for the new resource that needs to be allocated. The server, on seeing the new m-line, will allocate the resource and respond with a corresponding control m-line in the SDP answer response. When the client wants to de-allocate the resource from this session, it MUST initiate a SIP re-INVITE dialog with the server and MUST offer the control m-line with a port 0. The server MUST then answer the control m-line with a response of port 0. Example 1: This exchange adds a resource control channel for a synthesizer. Since a synthesizer would be generating an audio stream, this interaction also creates a receive-only audio stream for the server to send audio to. C->S: INVITE sip:mresources@mediaserver.com SIP/2.0 Via: SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf9 Max-Forwards: 6 To: MediaServer From: sarvi ;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314161 INVITE Contact: Content-Type: application/sdp Content-Length: ... v=0 o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4 s=- c=IN IP4 224.2.17.12 m=control 9 TCP application/mrcpv2 a=resource:speechsynth a=cmid:1 S Shanmugham IETF-Draft Page 9 MRCPv2 Protocol July 2004 m=audio 49170 RTP/AVP 0 96 a=rtpmap:0 pcmu/8000 a=recvonly a=mid:1 S->C: SIP/2.0 200 OK Via: SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf9 To: MediaServer From: sarvi ;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314161 INVITE Contact: Content-Type: application/sdp Content-Length: ... v=0 o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4 s=- c=IN IP4 224.2.17.12 m=control 32416 TCP application/mrcpv2 a=channel:32AECB234338@speechsynth a=cmid:1 m=audio 48260 RTP/AVP 00 96 a=rtpmap:0 pcmu/8000 a=sendonly a=mid:1 C->S: ACK sip:mresources@mediaserver.com SIP/2.0 Via: SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf9 Max-Forwards: 6 To: MediaServer ;tag=a6c85cf From: Sarvi ;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314162 ACK Content-Length: 0 Example 2: This exchange continues from example 1 allocates an additional resource control channel for a recognizer. Since a recognizer would need to receive an audio stream for recognition, this interaction also updates the audio stream to sendrecv making it a 2-way audio stream. C->S: INVITE sip:mresources@mediaserver.com SIP/2.0 Via: SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf9 S Shanmugham IETF-Draft Page 10 MRCPv2 Protocol July 2004 Max-Forwards: 6 To: MediaServer From: sarvi ;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314163 INVITE Contact: Content-Type: application/sdp Content-Length: ... v=0 o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4 s=- c=IN IP4 224.2.17.12 m=control 9 TCP application/mrcpv2 a=resource:speechrecog a=cmid:1 m=control 9 TCP application/mrcpv2 a=resource:speechsynth a=cmid:1 m=audio 49170 RTP/AVP 0 96 a=rtpmap:0 pcmu/8000 a=rtpmap:96 telephone-event/8000 a=fmtp:96 0-15 a=sendrecv a=mid:1 S->C: SIP/2.0 200 OK Via: SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf9 To: MediaServer From: sarvi ;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314163 INVITE Contact: Content-Type: application/sdp Content-Length: 131 v=0 o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4 s=- c=IN IP4 224.2.17.12 m=control 32416 TCP application/mrcpv2 a=channel:32AECB234338@speechrecog a=cmid:1 m=control 32416 TCP application/mrcpv2 a=channel:32AECB234339@speechsynth a=cmid:1 m=audio 48260 RTP/AVP 0 96 a=rtpmap:0 pcmu/8000 a=rtpmap:96 telephone-event/8000 S Shanmugham IETF-Draft Page 11 MRCPv2 Protocol July 2004 a=fmtp:96 0-15 a=sendrecv a=mid:1 C->S: ACK sip:mresources@mediaserver.com SIP/2.0 Via: SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf9 Max-Forwards: 6 To: MediaServer ;tag=a6c85cf From: Sarvi ;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314164 ACK Content-Length: 0 Example 3: This exchange continues from example 2 and de-allocates recognizer channel. Since a recognizer would not need to receive an audio stream any more, this interaction also updates the audio stream to recvonly. C->S: INVITE sip:mresources@mediaserver.com SIP/2.0 Via: SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf9 Max-Forwards: 6 To: MediaServer From: sarvi ;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314163 INVITE Contact: Content-Type: application/sdp Content-Length: ... v=0 o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4 s=- c=IN IP4 224.2.17.12 m=control 0 TCP application/mrcpv2 a=resource:speechrecog a=cmid:1 m=control 9 TCP application/mrcpv2 a=resource:speechsynth a=cmid:1 m=audio 49170 RTP/AVP 0 96 a=rtpmap:0 pcmu/8000 a=recvonly a=mid:1 S->C: S Shanmugham IETF-Draft Page 12 MRCPv2 Protocol July 2004 SIP/2.0 200 OK Via: SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf9 To: MediaServer From: sarvi ;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314163 INVITE Contact: Content-Type: application/sdp Content-Length: 131 v=0 o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4 s=- c=IN IP4 224.2.17.12 m=control 0 TCP application/mrcpv2 a=channel:32AECB234338@speechrecog a=cmid:1 m=control 32416 TCP application/mrcpv2 a=channel:32AECB234339@speechsynth a=cmid:1 m=audio 48260 RTP/AVP 0 96 a=rtpmap:0 pcmu/8000 a=sendonly a=mid:1 C->S: ACK sip:mresources@mediaserver.com SIP/2.0 Via: SIP/2.0/TCP client.atlanta.example.com:5060; branch=z9hG4bK74bf9 Max-Forwards: 6 To: MediaServer ;tag=a6c85cf From: Sarvi ;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314164 ACK Content-Length: 0 4.3. Media Streams and RTP Ports The client or the server would need to add audio (or other media) pipes between the client and the server and associate them with the resource that would process or generate the media. One or more resources could be associated with a single media channel or each resource could be assigned a separate media channel. For example, a synthesizer and a recognizer could be associated to the same media pipe(m=audio line), if it is opened in "sendrecv" mode. Alternatively, the recognizer could have its own "sendonly" audio pipe and the synthesizer could have its own "recvonly" audio pipe. The association between control channels and their corresponding media channels is established through the mid attribute defined in S Shanmugham IETF-Draft Page 13 MRCPv2 Protocol July 2004 RFC 3388[20]. If there are more than 1 audio m-line, then each audio m-line MUST have a "mid" attribute. Each control m-line MUST have a "cmid" attribute that matches the "mid" attribute of the audio m- line it is associated with. cmid-attribute = "a=cmid:" identification-tag identification-tag = token A single audio m-line can be associated with multiple resources or each resource can have its own audio m-line. For example, if the client wants to allocate a recognizer and a synthesizer and associate them to a single 2-way audio pipe, the SDP offer should contain two control m-lines and a single audio m-line with an attribute of "sendrecv". Each of the control m-lines should have a "cmid" attribute whose value matches the "mid" of the audio m-line. If the client wants to allocate a recognizer and a synthesizer each with its own separate audio pipe, the SDP offer would carry two control m-lines (one for the recognizer and another for the synthesizer) and two audio m-lines (one with the attribute "sendonly" and another with attribute "recvonly"). The "cmid" attribute of the recognizer control m-line would match the "mid" value of the "sendonly" audio m-line and the "cmid" attribute of the synthesizer control m-line would match the "mid" attribute of the "recvonly" m-line. When a server receives media(say audio) on a media pipe that is associated with more than one media processing resource, it is the responsibility of the server to receive and fork it to the resources that need it. If the multiple resources in a session are generating audio (or other media), that needs to be sent on a single associated media pipe, it is the responsibility of the server to mix the streams before sending on the media pipe. The media stream in either direction may contain more than one Synchronized Source (SSRC) identifier due to multiple sources contributing to the media on the pipe and the client or server SHOULD be able to deal with it. If a server does not have the capability to mix or fork media, in the above cases, then the server SHOULD disallow the client from associating multiple such resources to a single audio pipe, by rejecting the SDP offer. 4.4. MRCPv2 Message Transport The MRCPv2 resource messages defined in this document are transported over a TCP, SCTP or TLS pipe between the client and the server. The setting up of this TCP pipe and the resource control channel is discussed in Section 3.2. Multiple resource control channels between a client and a server that belong to different SIP sessions can share one or more TCP or SCTP pipes between them. The individual MRCPv2 messages carry the MRCPv2 channel identifier in S Shanmugham IETF-Draft Page 14 MRCPv2 Protocol July 2004 their Channel-Identifier header field MUST be used to differentiate MRCPv2 messages from different resource channels. All MRCPv2 servers MUST support TCP, SCTP and TLS and it is up to the client to choose which mode of transport it wants to use for an MRCPv2 session. Example 1: C->S: MRCP/2.0 483 SPEAK 543257 Channel-Identifier: 32AECB23433802@speechsynth Voice-gender: neutral Voice-category: teenager Prosody-volume: medium Content-Type: application/synthesis+ssml Content-Length: 104 You have 4 new messages. The first is from Stephanie Williams and arrived at 3:45pm. The subject is ski trip S->C: MRCP/2.0 81 543257 200 IN-PROGRESS Channel-Identifier: 32AECB23433802@speechsynth S->C: MRCP/2.0 89 SPEAK-COMPLETE 543257 COMPLETE Channel-Identifier: 32AECB23433802@speechsynth Most examples from here on show only the MRCPv2 messages and do not show the SIP messages and headers that may have been used to establish the MRCPv2 control channel. 4.5. Resource Types Basic Synthesizer A speech synthesizer resource with very limited capabilities, that can be achieved through the playing out concatenated audio file clips. The speech data is described as SSML data but with limited support for its elements. It MUST support ,