Audio/Video Transport J. Lennox Internet-Draft H. Schulzrinne Expires: June 1, 2005 J. Nieh R. Barrato Columbia U. December 2004 Protocols for Application and Desktop Sharing draft-lennox-avt-app-sharing-00 Status of this Memo This document is an Internet-Draft and is subject to all provisions of section 3 of RFC 3667. By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she become aware will be disclosed, in accordance with RFC 3668. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on June 1, 2005. Copyright Notice Copyright (C) The Internet Society (2004). Abstract This document defines several protocols to support accessing general graphical user interface (GUI) desktops and applications remotely, either by a single remote user or embedded into a multiparty conference. The protocols are designed to allow sharing of, and access to general windowing system applications that are not Lennox, et al. Expires June 1, 2005 [Page 1] Internet-Draft Application and Desktop Sharing December 2004 expressly written to be accessed remotely. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . 5 3.2 Protocol Components . . . . . . . . . . . . . . . . . . . 5 4. Common Protocol Elements . . . . . . . . . . . . . . . . . . . 6 5. Output Protocols . . . . . . . . . . . . . . . . . . . . . . . 6 5.1 Window Identifiers and Output Meta-Format . . . . . . . . 6 5.2 Window State Protocol . . . . . . . . . . . . . . . . . . 7 5.3 Window Pixel Data . . . . . . . . . . . . . . . . . . . . 9 5.4 Pointer Representation . . . . . . . . . . . . . . . . . . 10 6. Input Protocols . . . . . . . . . . . . . . . . . . . . . . . 10 6.1 Keyboard Input . . . . . . . . . . . . . . . . . . . . . . 10 6.2 Pointer Position . . . . . . . . . . . . . . . . . . . . . 15 7. Implementation Notes . . . . . . . . . . . . . . . . . . . . . 16 8. Open issues . . . . . . . . . . . . . . . . . . . . . . . . . 16 9. Security Considerations . . . . . . . . . . . . . . . . . . . 18 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . 18 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 11.1 Normative References . . . . . . . . . . . . . . . . . . . . 18 11.2 Informative References . . . . . . . . . . . . . . . . . . . 19 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 20 Intellectual Property and Copyright Statements . . . . . . . . 22 Lennox, et al. Expires June 1, 2005 [Page 2] Internet-Draft Application and Desktop Sharing December 2004 1. Introduction While two-party and multi-party conferencing using standards-based protocols is now common and well-developed, protocols for sharing applications are largely proprietary or based on the aging T.120 [8] suite of protocols. In this document, we define a set of protocols for application and desktop sharing. We note that there are large similarities between remote access to an application ("remote desktop") and by multiple users sharing an application within a collaboration setting such as a multimedia call or multiparty conference. The protocols defined in this document therefore support both. Remote access differs from video transmission of the sort for which most video encodings have been designed. In particular, screen encoding may need to be lossless and typically operates on artificial rather than natural (photographic) video input. The video input is characterized by large areas of the screen that remain unchanged for long periods of time, while others change rapidly. (However, rendering the output of a modern computer-generated animation application such as video games blurs the distinction between traditional motion video output and screen sharing.) Unlike earlier systems, such as T.120, we believe that application sharing should be integrated into the existing IETF session model, encompassing session descriptions using the Session Description Protocol (SDP) [1] or successors and the Session Initiation Protocol (SIP) [9]. Application sharing needs many of the same control functions as other multimedia sessions, such as address binding and session feature and media negotiation. We believe that use of the session model is also beneficial for the remote desktop case, as it allows to re-use many of the well-developed session components and easily supports hybrid models, such as the delivery of desktop audio to the remote user. Remote access to graphical applications and desktops, as defined in this document, has two important characteristics. First, the access protocol is unaware of any semantic characteristics of the applications being shared; it only transmits the visual characteristics of the windows. This is different, therefore, from shared-drawing or shared-editing tools that allow distributed modification of documents. Secondly, the protocol is designed to work with applications which were not written to be used remotely, by intercepting or simulating their connections to their native window systems. In this way, it is distinguished from systems such as the X Window System [10], which allow natively-written applications to be displayed on remote viewers. Lennox, et al. Expires June 1, 2005 [Page 3] Internet-Draft Application and Desktop Sharing December 2004 We distinguish between local and remote users. Local users employ normal operating system mechanisms to interact with the running application. Remote users interact via the delivery protocols described here. The application sharing problem can be divided into four components: (1) setting up a session to the node running the application, (2) transporting user input events from the remote viewers such as conference participants to the application, (3) delivering screen output from the application to the participants, (4) moderating access to shared human interface devices such as pointing devices (e.g., mice, joystick, trackball) and text input (keyboard). We refer to components (2) and (3) as the "remoting protocol". They are the focus of this document, and are described in Section 6 and Section 5 respectively. Session negotiation and description can be provided by existing session setup protocols; user input access can be moderated by a floor control protocol. Thus, these two components are beyond the scope of this document, although they are important for an acceptable overall user experience. Applications are more than just windows; they are a stack of related windows which serve the same task and are usually associated with the same process on the server. Some applications impose special constraints on the user input, e.g., through modal dialogs, which temporarily exclusively acquire input focus, and floating (always-on-top) windows. The protocols described in this document are intended to fulfil the requirements described in the Internet-Draft Sharing and Remote Access to Applications [11]. The rest of this document is laid out as follows. Section 2 defines the common terminology for normative references. Section 3 gives an overview of the protocol's architecture and components. Section 4 defines common elements of the output and input protocols, which are then further described in Section 5 and Section 6 respectively. Section 7 gives implementation notes, and Section 8 discusses open issues with the design of the protocol. Finally, Section 9 discusses security considerations, and Section 10 gives IANA considerations. 2. Terminology In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119 [2] and indicate requirement levels for compliant implementations. Lennox, et al. Expires June 1, 2005 [Page 4] Internet-Draft Application and Desktop Sharing December 2004 3. Overview 3.1 Architecture Application and desktop sharing consists of two classes of components: "viewers" and "application hosts". Viewers receive remote graphics and provide input. Application hosts receive input from local and remote users, and host and transmit applications and graphics. The application and desktop sharing models defined in this document are integrated into the IETF conferencing model. In particular, the Session Initiation Protocol (SIP) [9] is used to intiate and control remote access. This allows the use of existing SIP mechanisms for confidentiality, authentication and authorization, user location, conferencing, etc. In the IETF conferencing model media sessions can consist of multiple participants; this document's protocols are designed to work in this case. The various Centralized Conferencing (XCON) [12] control protocols can be used for floor control, to determine which member of a conference is permitted to send input to an application host at any given time. Conferencing also gives rise to issues of late-joiners; the protocol is designed to make it relatively easy for a protocol relay, which receives input from one application host and forwards it to multiple viewers, to send all the necessary information about a sharing session to new conference arrivals. Similarly, it is possible to record and replay a shared application session without semantic awareness of the protocol. 3.2 Protocol Components The three core components of desktop sharing are: input protocols which represent user input from keyboards or pointing devices such as mice, trackballs, or touchscreens; an output protocol which can represent screen pixels and related data; and a negotiation mechanism which can convey attributes of the session such as the desired size of the screen. In addition, application sharing requires a mechanism to represent window state, position and stacking. In this document, the negotiation mechanism is defined in Section 4; output protocols, including window-state handling, are defined in Section 5; and input protocols are defined in Section 6. Additional, optional mechanisms can enhance both window and application sharing. Additional input mechanisms such as joysticks or other game controllers can be supported; audio streams can be associated with a desktop or application; viewer-side scaling and porthole requests can be used to optimize transmission of data to Lennox, et al. Expires June 1, 2005 [Page 5] Internet-Draft Application and Desktop Sharing December 2004 viewers with a small screen; and it is often useful to allow copy-and-paste between applications running on a viewer and those running on an an application host. This document does not define any such extensions; they may be defined elsewhere. 4. Common Protocol Elements Protocol negotiation is carried out using the Session Description Protocol (SDP) [1], while all input and output protocols run over the Real-Time Protocol (RTP) [3]. In most use cases for application and desktop sharing, reliability is more important than latency, and flow control and dynamic bandwidth adjustment are crucial. As such, viewers and application hosts SHOULD use RTP Framing [4] to send the RTP packets over TCP, unless there is a strong reason, such as the need to distribute a desktop session over multicast, to do otherwise. 5. Output Protocols 5.1 Window Identifiers and Output Meta-Format A shared application consists of a set of overlapping windows, usually rectangular. Each window needs a unique identifier, and most output data (other than audio or other non-visual output mechanisms not specified here) needs to be associated with a particular window. Windows in an application need to be created and destroyed relatively frequently. Re-negotiating SDP descriptions whenever a window is opened or closed would therefore not be practical, and so data for multiple windows needs to be multiplexed into a single RTP stream. Since multiple windows are from a single source, multiplexing on the RTP SSRC (synchronization source) field would not be appropriate. Instead, each window is assigned a unique window identifier. Rather than require each payload type to define a field to carry this window identifier we instead define a payload meta-format which precedes each payload. The meta-format carries the relevant window identifier and coordinates, and is then followed by the actual data for the particular payload being sent. (This allows existing payload type definitions to be re-used in the context of a window.) This protocol has three mandatory format-specific parameters, which are carried in an SDP "a=fmtp:" parameter. The parameter "height" indicates the desktop height in pixels; "width" indicates the desktop width in pixels. All images and other coordinates sent for this protocol must lie within these boundaries. The third parameter is "mode", which can take the value "desktop", indicating one big drawing pane, or "application", indicating that individual windows will created and destroyed as needed, and all drawing will occur in Lennox, et al. Expires June 1, 2005 [Page 6] Internet-Draft Application and Desktop Sharing December 2004 individual windows. In application mode, SDP "i=" lines for this protocol SHOULD contain a human-readable description of the application. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . RTP header . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | X Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Y Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Window ID | MBZ | PT | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . Payload . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1: RTP putput payload meta-protocol Figure 1 shows how all output data formats are defined. The "X Offset" and "Y Offset" fields indicate the upper-left corner of the area the message is describing. (In desktop mode, these coordinates are relative to the desktop; in application mode, these coordinates are relative to the enclosing window, except for window state protocol messages.) The Window ID is a 16-bit unique identifier for each window; these are assigned arbitrarily by the application host, and SHOULD be recycled on a least-recently-used basis. If the meta-protocol is running in desktop mode, the Window ID MUST always be set to zero by the server and SHOULD be ignored by viewers. In application mode, Window IDs are defined by the window state protocol, defined in Section 5.2. Messages other than Window State Protocol messages which reference unknown Window IDs SHOULD be ignored. The PT indicates the actual payload type of the rest of the data. The MBZ bits must be zero. Fields of the RTP header other than the PT field are set as appropriate for the enclosed payload type; the meta-protocol does not define any specific uses for them. 5.2 Window State Protocol The window state protocol is an output protocol which handles the Lennox, et al. Expires June 1, 2005 [Page 7] Internet-Draft Application and Desktop Sharing December 2004 creation, destruction, resizing, raising and lowering, positioning, and characteristics of application windows when the protocol is being run in application mode. It MUST NOT be used in desktop mode. The specific details and packet format of this protocol are not yet defined. The rest of this section describes it at a high level. Creating, moving, resizing, and raising or lowering a window are indicated by the same message. This message starts with the common meta-protocol header, whose coordinates indicate the position of the window relative to the desktop. Following this is a code indicating that this is a "window state" message, and the window's X and Y sizes as 32-bit integers. Finally, the message contains a list of all the application's windows, in Z order, bottom to top. (Listing the entire Z order in every message helps prevent the Z order list from getting out of sync between viewers and the application host.) A flag in the Z order list can indicate that some windows should be considered "always on top", and float above all non-"on top" other windows on the viewer (shared or not). The size and position given for a window includes all the "trim" provided by its window manager -- title bars, frames, and the like. (Otherwise, the remoting protocol would need to include input and output messages to indicate window state changes and manipulation.) Non-rectangular and translucent windows can additionally have an alpha channel specified. This is sent as a black-and-white or grayscale PNG [5] image corresponding to the window's transparent or translucent pixels. Window alpha channels can change dynamically. Window removal consists of the meta-protocol header followed by a "window remove" code. On receipt of this message, a viewer erases the corresponding window and removes it from the Z order list. The X and Y coordinates given in the meta-protocol are ignored and SHOULD be zero. An additional message indicates the "pointer capture", in which a window indicates that it should exclusively receive all pointer events until it indicates otherwise. This is necessary when menus are pulled down, for example; a window with a pulled-down menu receives a "release menu" mouse click whether or not it the cursor is still over the original window. "Stop pointer capture" is the same message, with a flag set. The window state protocol may need to carry additional information, as well; see the open issues list in Section 8. Lennox, et al. Expires June 1, 2005 [Page 8] Internet-Draft Application and Desktop Sharing December 2004 5.3 Window Pixel Data There are three basic window data operations: pixel images, fills, and block copies. Each operation uses the common meta-protocol header. Each operation has its own MIME type, and thus a unique RTP payload type in the meta-protocol PT field. Pixel images contain arbitrary graphical data to be applied to windows. They are conveyed as PNG [5] images. The PNG image follows the meta-protocol header (which indicates the offset of the image within the window or desktop) and consists of an area of the screen to be updated. If the PNG image contains an alpha channel, the image is composited with the existing contents of the window or desktop. (The PNG images defining the initial graphical contents of a window or desktop MUST NOT contain alpha channels.) In window mode, if a window has an alpha channel with completely transparent pixels -- i.e., if a window is non-rectangular -- the corresponding pixels in the PNG image are ignored. As an optimization, two additional window data operations are defined. A fill defines an area of a window to be filled by a single solid color. Following the meta-protocol header, it consists of a height and width (specified as 32-bit coordinates), followed by the fill color. Colors are specified as one byte each of Red, Green, and Blue, i.e. as PNG color type 2 with 8-bit sample depth. (Color sample depths greater than 8 bits per channel cannot be spsecified with the fill operation, and must use the general PNG pixel image form.) The block copy operation copies a region of a desktop or window from one position to another. Following the meta-protocol header (which indicates the destination position) are the source position and size (both as 32-bit coordinates). The destination region MAY overlap the source region. Both the source and destination regions MUST NOT extend beyond the boundaries of the window (in window mode) or desktop (in desktop mode). In window mode, cross-window moves are not supported. Portions of the source region which do not overlap with the destination region remain unmodified. Additionally, if the viewer and application host negotiate support for other video/* MIME types, video streams can be sent following the meta-protocol header. For video this will often be more efficient than sending raw screen images. In window mode, graphics to be drawn MUST NOT extend beyond the boundaries of the window; in either mode, images to be drawn MUST NOT extend beyond the defined borders of the desktop. (Window-related images such as drop-down menus or tooltips which can extend beyond Lennox, et al. Expires June 1, 2005 [Page 9] Internet-Draft Application and Desktop Sharing December 2004 the boundaries of a window SHOULD be transmitted as separate windows.) 5.4 Pointer Representation For efficiency, pointers can be represented separately from other window data. This is accomplished by transmitting, in a special protocol, PNG [5]s with alpha channels and hotspots for the pointers' images, and then RFC 2862 [6] streams to indicate pointers' positions. These protocols still need to be defined in detail. 6. Input Protocols 6.1 Keyboard Input The viewer represents keyboard input to the server by sending a list of depressed keys, updated whenever this state changes. (Note that this is unlike how keys are represented in most window systems, which instead use individual key-down and key-up events. The latter can be derived from the former.) Key repetition is handled by the application host. There are two types of keys that can be represented. "Encoding keys" are keys that encode a specific Unicode [7] character, whereas "virtual keys" do not. Encoding keys are indicated by the Unicode value of the character they encode. Virtual keys are indicated by codes from the tables of virtual keycodes listed in Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, and Figure 9. These codes are those of the X Window System [10], taken from the header file . They are all of X's keycodes with 0xFFnn values, except for the XK_KP_* keypad keys. Unicode control characters -- characters in the range 0x0000 - 0x001f or 0x007f - 0x009f -- MUST NOT be sent. Instead, the corresponding virtual key, or the virtual key Control (Left) or Control (Right) (0xE3 or 0xE4) plus a Unicode codepoint, should be used. Lennox, et al. Expires June 1, 2005 [Page 10] Internet-Draft Application and Desktop Sharing December 2004 Name Code Note ---- ---- ---- Backspace 0x08 Back space, back char Tab 0x09 Linefeed 0x0A Linefeed, LF Clear 0x0B Return 0x0D Return, enter Pause 0x13 Pause, hold Scroll Lock 0x14 Sys Req 0x15 Escape 0x1B Delete 0xFF Delete, rubout These codes have been chosen to map to ASCII, for convenience of programming, but could have been arbitrary (at the cost of lookup tables in viewer code). Figure 2: Virtual keycodes: teletype keys Name Code Note ---- ---- ---- Multi key 0x20 Multi-key character compose Code Input 0x37 Single Candidate 0x3C Multiple Candidate 0x3D Previous Candidate 0x3E Figure 3: Virtual keycodes: international and multi-key character composition Lennox, et al. Expires June 1, 2005 [Page 11] Internet-Draft Application and Desktop Sharing December 2004 Name Code Note ---- ---- ---- Kanji 0x21 Kanji, Kanji convert Muhenkan 0x22 Cancel Conversion Henkan Mode 0x23 Start/Stop Conversion Romaji 0x24 to Romaji Hiragana 0x25 to Hiragana Katakana 0x26 to Katakana Hiragana/Katakana 0x27 Hiragana/Katakana toggle Zenkaku 0x28 to Zenkaku Hankaku 0x29 to Hankaku Zenkaku/Hankaku 0x2A Zenkaku/Hankaku toggle Touroku 0x2B Add to Dictionary Massyo 0x2C Delete from Dictionary Kana Lock 0x2D Kana Lock Kana Shift 0x2E Kana Shift Eisu Shift 0x2F Alphanumeric Shift Eisu Toggle 0x30 Alphanumeric toggle Kanji Bangou 0x37 Codeinput Zen Koho 0x3D Multiple/All Candidate(s) Mae Koho 0x3E Previous Candidate Note that some of these codes are also used for equivalent Hangul keyboard keys listed in Figure 9. Figure 4: Virtual keys: Japanese keyboard support Name Code Note ---- ---- ---- Home 0x50 Left 0x51 Move left, left arrow Up 0x52 Move up, up arrow Right 0x53 Move right, right arrow Down 0x54 Move down, down arrow Prior 0x55 Prior, previous Page Up 0x55 Next 0x56 Next Page Down 0x56 End 0x57 EOL Begin 0x58 BOL Figure 5: Virtual keycodes: cursor control and motion Lennox, et al. Expires June 1, 2005 [Page 12] Internet-Draft Application and Desktop Sharing December 2004 Name Code Note ---- ---- ---- Select 0x60 Select, mark Print 0x61 Execute 0x62 Execute, run, do Insert 0x63 Insert, insert here Undo 0x65 Undo, oops Redo 0x66 Redo, again Menu 0x67 Find 0x68 Find, search Cancel 0x69 Cancel, stop, abort, exit Help 0x6A Help Break 0x6B Mode switch 0x7E Character set switch (*) Num Lock 0x7F (*) The "Mode switch" key is variously used on Katakana, Arabic, Greek, Hebrew, and Hangul keyboards to switch between the Roman and native alphabets. Figure 6: Virtual keycodes: miscellaneous functions Name Code ---- ---- F1 0xBE F2 0xBF F3 0xC0 F4 0xC1 F5 0xC2 F6 0xC3 F7 0xC4 F8 0xC5 F9 0xC6 F10 0xC7 F11/L1 0xC8 F12/L2 0xC9 F13/L3 0xCA F14/L4 0xCB F15/L5 0xCC F16/L6 0xCD F17/L7 0xCE F18/L8 0xCF F19/L9 0xD0 F20/L10 0xD1 F21/R1 0xD2 F22/R2 0xD3 F23/R3 0xD4 Lennox, et al. Expires June 1, 2005 [Page 13] Internet-Draft Application and Desktop Sharing December 2004 F24/R4 0xD5 F25/R5 0xD6 F26/R6 0xD7 F27/R7 0xD8 F28/R8 0xD9 F29/R9 0xDA F30/R10 0xDB F31/R11 0xDC F32/R12 0xDD F33/R13 0xDE F34/R14 0xDF F35/R15 0xE0 Sun keyboards and a few other manufacturers have additional Left and Right function key groups on the left and/or right sides of the keyboard. Figure 7: Virtual keycodes: auxiliary functions Name Code Note ---- ---- ---- Shift (Left) 0xE1 Shift (Right) 0xE2 Control (Left) 0xE3 Control (Right) 0xE4 Caps Lock 0xE5 Shift Lock 0xE6 Meta (Left) 0xE7 Windows (Microsoft); Option (Macintosh) Meta (Right) 0xE8 Windows (Microsoft); Option (Macintosh) Alt (Left) 0xE9 Command (Macintosh) Alt (Right) 0xEA Command (Macintosh) Super (Left) 0xEB Super (Right) 0xEC Hyper (Left) 0xED Hyper (Right) 0xEE Application hosts which lack right-hand versions of modifiers SHOULD treat them as though the left-hand version had been received. Figure 8: Virtual keys: modifiers Lennox, et al. Expires June 1, 2005 [Page 14] Internet-Draft Application and Desktop Sharing December 2004 Name Code Note ---- ---- ---- Hangul 0x31 Hangul start/stop(toggle) Hangul Start 0x32 Hangul start Hangul End 0x33 Hangul end, English start Hangul Hanja 0x34 Start Hangul->Hanja Conversion Hangul Jamo 0x35 Hangul Jamo mode Hangul Romaja 0x36 Hangul Romaja mode Hangul Code Input 0x37 Hangul code input mode Hangul Jeonja 0x38 Jeonja mode Hangul Banja 0x39 Banja mode Hangul Pre-Hanja 0x3A Pre Hanja conversion Hangul Post-Hanja 0x3B Post Hanja conversion Hangul Single Candidate 0x3C Single candidate Hangul Multiple Candidate 0x3D Multiple candidate Hangul Previous Candidate 0x3E Previous candidate Hangul Special 0x3F Special symbols Figure 9: Virtual keys: Hangul (Korean) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V|P| MBZ | Unicode codepoint or virtual keycode | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 10: One entry in the keyboard state list Figure 10 illustrates the format of entries in the keyboard state list. If the "V" bit is set, it indicates that the codepoint number indicates a virtual keycode, otherwise it indicates a Unicode codepoint. If the "P" bit is set, it indicates that it is the version of a key from a separate keypad. If the server does not have a concept of a separate keypad, or the key indicated does not appear on a keypad, the "P" bit MAY be ignored. The highest Unicode codepoint is 0x10ffff; thus, all Unicode characters fit comfortably in the 24-bit codepoint field of the keycode format. For non-virtual, non-keypad keys, for which the V and P bits are both zero, the format is identical to a UTF-32BE encoding of the same character. (The format can, thus, be considered alternatively as UTF-32BE bitwise-or'd with 0x80000000 for "V" and 0x40000000 for "P".) 6.2 Pointer Position The viewer indicates pointer position to the application host using Lennox, et al. Expires June 1, 2005 [Page 15] Internet-Draft Application and Desktop Sharing December 2004 the media type video/pointer, defined in RFC 2862, the RTP Payload Format for Real-Time Pointers [6]. 7. Implementation Notes Application hosts shouldn't blindly send every screen update they receive down the RTP channel. Instead, they should monitor the state of their TCP transmission buffers (through mechanisms such as the select() command) and only send the most recent screen data when there is not a backlog. This will prevent screen latency for rapidly-changing images, when a viewer usually only needs to see the final state of the image. To conserve bandwidth, application hosts SHOULD use PNG's palette or grayscale image format wherever possible, and SHOULD use a minimal palette and image bit depth, subject to encoding delay constraints. In particular, two-color images, and one-color images drawn over the existing image, SHOULD use a one-bit PNG with a two-entry palette, in the latter case with a transparency chunk. In window mode, application hosts SHOULD be aware of unshared local windows on the host. If an unshared window obscures a shared window, the application host SHOULD obscure its contents (through a mechanism such as transmitting a neutral color) so that the viewer experience reflects as closely as possible the experience on the host. Application hosts MAY choose to treat portions of windows obscured by other shared windows the same way. 8. Open issues We need to determine what mechanism to recommend to secure the input and output streams. The two logical possibilities would be Secure RTP [13] and Transport-Layer Security (TLS) [14]. Either would work; neither is currently specified for connection-oriented RTP (neither TCP/RTP/SAVP nor TCP/TLS/RTP/AVP is defined). One or the other ought to be recommended to facilitate interoperability. It seems likely that "beep" needs to be defined specially, as an output type, and needs to be defined separately from other audio channels. Many systems allow beeps to be rendered visually, either for accessibility for the deaf or because systems are being used in quiet environments. We need a name for the meta-protocol of Section 5.1. It's also unclear whether it should have an application/* or video/* MIME type. SDP doesn't normally allow you to send traffic with different top-level MIME types over the same RTP channel. Do we need to add an Lennox, et al. Expires June 1, 2005 [Page 16] Internet-Draft Application and Desktop Sharing December 2004 extension to work around this? Some of the RTP payloads described in this document should pretty clearly be application/*, some should be video/*, some (beeping) audio/*, PNG is already defined as image/png, etc. All the payload types need MIME-type assignments and RTP payload characteristics (sample rate, use of marker bit, etc.) defined. Is there anything useful that can be done with the MBZ bits of the meta-protocol? What other information needs to be carried in the window state protocol? One particular concern is taskbar support. Window information that might be carried to support taskbars includes the window title, a list of minimized windows, and whether each window should be listed in the taskbar. Taskbar support also requires an input protocol to support taskbar actions (right-clicking on a taskbar item): unminimize, maximize, close, etc. Another flag that might be carried by the window state protocol is whether a window is "taggable", i.e. whether it is a good candidate for a viewer-side marker indicating that this is a shared application. Top-level windows would typically get this, while subsidiary windows such as dialog boxes would not. (This is a feature of T.128.) Do we need to define what an un-drawn-upon window looks like? T.128 seems to assume that windows are transparent until drawn to, and uses this fact: a server can define a full-screen window on top of all others, and draw directly to it. I don't think this is necessary, but it's an important consideration. Should pointer images be window-associated? Should there be a pointer image cache? (T.128 has one; VNC doesn't.) RFC 2862 only supports 12-bit positioning for the mouse pointer. I believe this is already too small; screens wider than 4096 pixels already exist, especially virtual desktops. Additionally, we want to support additional mouse information, most notably mousewheels. A protocol obsoleting RFC 2862 is probably in order. Do we want to support a mechanism by which viewers can request a full screen refresh, analogous with RFC 2032 [15]'s Full Intra-frame Request (FIR) RTCP packet? One other optimized pixel transmission operation that could be used is the tiling operation: transmit one image and a number of times it should be repeated horizontally and vertically. Would this be worth Lennox, et al. Expires June 1, 2005 [Page 17] Internet-Draft Application and Desktop Sharing December 2004 the bandwidth/complexity tradeoff? (Note that it can be emulated without too much overhead by the copy operation.) Is it necessary for viewers and application hosts to be able to negotiate the maximum supported size and color depth of pointers? This would presumably be a format parameter on the pointer image representation payload type. Do we want to support multiple payloads in one RTP packet, either reusing or inspired by RFC 2198 [16]? How should "lock" key state (caps lock, num lock, scroll lock) be represented in the keyboard input protocol? Should there be flags of some sort? Should there be separate virtual keycodes for "lock key depressed" and "lock state enabled"? Should this simply be handled on the viewer side, altering the keycodes that are sent? (This works for caps and num lock, not so well for scroll lock.) 9. Security Considerations Both input and output data may be highly sensitive. For example, input data may contain user passwords. Thus, encryption of all user input is likely to be required. For some applications, such as sharing slides during a public lecture, confidentiality for user output may not be required. Given the broad set of applications, viewers and application hosts MUST support or be able to leverage end-to-end confidentiality and integrity protection mechanism. Application sharing inherently exposes the shared applications to risks by malicious participants. They may, for example, access resources beyond the application itself, e.g., by installing or running scripts. It may be difficult to constrain access to specific user data, e.g., a specific set of slides, unless the user application can be sandboxed or run in some kind of "jail", with the sandbox control outside the view of the remoting protocol. 10. IANA Considerations TODO; MIME type definitions for everything. 11. References 11.1 Normative References [1] Handley, M., Jacobson, V. and C. Perkins, "SDP: Session Description Protocol", draft-ietf-mmusic-sdp-new-21 (work in progress), October 2004. Lennox, et al. Expires June 1, 2005 [Page 18] Internet-Draft Application and Desktop Sharing December 2004 [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [3] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [4] Lazzaro, J., "Framing RTP and RTCP Packets over Connection-Oriented Transport", draft-ietf-avt-rtp-framing-contrans-03 (work in progress), July 2004. [5] Duce, D., "Portable Network Graphics (PNG) Specification (Second Edition)", W3C REC REC-PNG-20031110, November 2003. [6] Civanlar, M. and G. Cash, "RTP Payload Format for Real-Time Pointers", RFC 2862, June 2000. [7] International Organization for Standardization, "Information Technology - Universal Multiple-octet coded Character Set (UCS)", ISO Standard 10646, December 2003. 11.2 Informative References [8] International Telecommunication Union, "Data Protocols for Multimedia Conferencing", ITU-T Recommendation T.120, July 1996. [9] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [10] Scheifler, R., "X Window System Protocol", X Consortium Standard X Version 11, Release 6.7, November 2004. [11] Schulzrinne, H., "Sharing and Remote Access to Applications", draft-schulzrinne-mmusic-sharing-00 (work in progress), October 2004. [12] Barnes, M. and C. Boulton, "A Framework for Centralized Conferencing", draft-barnes-xcon-framework-00 (work in progress), October 2004. [13] Baugher, M., "The Secure Real-time Transport Protocol", draft-ietf-avt-srtp-09 (work in progress), July 2003. [14] Lennox, J., "Connection-Oriented Media Transport over the Transport Layer Security (TLS) Protocol in the Session Lennox, et al. Expires June 1, 2005 [Page 19] Internet-Draft Application and Desktop Sharing December 2004 Description Protocol (SDP)", draft-ietf-mmusic-comedia-tls-02 (work in progress), October 2004. [15] Turletti, T., "RTP Payload Format for H.261 Video Streams", RFC 2032, October 1996. [16] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M., Bolot, J., Vega-Garcia, A. and S. Fosse-Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, September 1997. [17] International Telecommunication Union, "Multipoint Application Sharing", ITU-T Recommendation T.128, February 1998. Authors' Addresses Jonathan Lennox Columbia University Department of Computer Science 450 Computer Science 1214 Amsterdam Ave., M.C. 0401 New York, NY 10027 US Phone: +1 212 939 7018 EMail: lennox@cs.columbia.edu Henning Schulzrinne Columbia University Department of Computer Science 450 Computer Science 1214 Amsterdam Ave., M.C. 0401 New York, NY 10027 US Phone: +1 212 939 7004 EMail: hgs+mmusic@cs.columbia.edu Jason Nieh Columbia University Department of Computer Science 450 Computer Science 1214 Amsterdam Ave., M.C. 0401 New York, NY 10027 US Phone: +1 212 939 7000 EMail: nieh@cs.columbia.edu Lennox, et al. Expires June 1, 2005 [Page 20] Internet-Draft Application and Desktop Sharing December 2004 Ricardo Baratto Columbia University Department of Computer Science 450 Computer Science 1214 Amsterdam Ave., M.C. 0401 New York, NY 10027 US Phone: +1 212 939 7000 EMail: ricardo@cs.columbia.edu Lennox, et al. Expires June 1, 2005 [Page 21] Internet-Draft Application and Desktop Sharing December 2004 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2004). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Lennox, et al. Expires June 1, 2005 [Page 22]