INTERNET-DRAFT Eric Edwards draft-ietf-avt-rtp-jpeg2000-00.txt Satoshi Futemma Eisaburo Itakura Takahiro Fukuhara Sony Corporation May 14, 2001 Expires: November 13 2002 RTP Payload Format for JPEG 2000 Video Streams Status of this Memo This document is an Internet-Draft and is in subject to all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference materials or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Drafts Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document describes a payload format for transporting JPEG 2000 video streams using RTP (Real-time Transport Protocol). JPEG 2000 video streams are formed as a continuous series of JPEG 2000 still images. The JPEG 2000 payload format described in this document has three features: (1) Improvement of robustness to packet loss by intelligently fragmenting JPEG 2000 packet units, (2) Persistency of main header to minimize loss effect and maximize recovery, (3) Priority information field for scalable delivery from the same code stream. These will allow for scalability and robustness of JPEG 2000's potential to be maximized in streaming applications. 1. Introduction This document specifies payload formats for JPEG 2000 video streams over the Real-time Transport Protocol (RTP). JPEG 2000 is an ISO/IEC International Standard developed for next-generation still image encoding. Its basic encoding technology is described in [1]. Edwards, et al. [Page 1] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 Part 3 of the JPEG 2000 standard defines Motion JPEG 2000[2]. However, this defines only the file format but not the transmission format for streaming on the Internet. For this reason, it is necessary to define the RTP format for JPEG 2000 video streams. JPEG 2000 supports many features over the current JPEG standard [3][4][5]: o Higher compression efficiency than JPEG with less visual loss especially at extreme compression ratios. o A single code stream that offers both lossy and superior lossless compression. o Transmission over noisy environments. The JPEG 2000 code stream can be built with markers to boost error resilience and recovery. The JPEG 2000 code stream is very robust to bit errors as it has been designed to avoid catastrophic decoding failure due to bit errors from transmission. o Progressive transmission by pixel accuracy and resolution. Progressive transmission that allows images to be reconstructed with increasing pixel accuracy or spatial resolution is essential for many applications. This feature allows the reconstruction of images with different resolutions and pixel accuracy, as needed or desired, for different target devices. The image architecture provides for the efficient delivery of image data in many applications such as client/server applications. o Random code stream access and processing. There are parts of an image which maybe more important than others. Specific regions of the code stream can be defined to be less distorted than other areas. Access to any specific area of an image is handled efficiently without the need to completely decompress the code stream. Simple image transforms (rotating, translation, filtering) can be done directly with compressed code stream. First, the JPEG-2000 algorithm is briefly explained below. Fig. 1 shows a block diagram of JPEG 2000 encoding method. +-----+ | ROI | +-----+ | V +----------+ +----------+ +------------+ |DC, comp. | | Wavelet | | | raw image ==> |transform-|==>|transform-|==>|Quantization|==+ | ation | | ation | | | | +----------+ +----------+ +------------+ | Edwards, et al. [Page 2] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 | +-------------+ +----------+ +------------+ | | | | | | | | JPEG 2000 <==|Data ordering|<==|Arithmetic|<==|Coefficient |<=+ code stream | | | coding | |bit modeling| +-------------+ +----------+ +------------+ Fig. 1: Block diagram of the JPEG 2000 encoder First, the image will go through component separation, if it is a color image, split into RGB, YCbCr, or various other color spaces. It can also further be sectioned into tiles within the image for processing. Each color component or tile is transformed into wavelet coefficients. The component or tile is sub-sampled into various levels usually vertically and horizontally from high frequencies (which contains all the sharp details) to the low frequencies (which contains all the flat areas.) These wavelet coefficients are categorized into different frequencies called subbands. Subband HH has the high frequency information horizontally and vertically, then HL (high frequencies horizontally, low frequencies vertically) and LH (low frequencies horizontally, high frequencies vertically) contains the middle frequencies, and the lowest frequencies and most important coefficients are in the LL (low frequency horizontally and vertically) subband as they contain all the broad details. Quantization is performed on the coefficients within each subband. The wavelet coefficient is divided by the quantization step size and the result is truncated. This can happen iteratively to produce a highly accurate target bit rate. After quantization, code blocks are formed from within the precincts within the tiles. Precincts are a finer separation than tiles and code blocks are the smallest separation of the image data. Entropy coding is performed within each code block and arithmetically encoded by bit plane. There are 3 passes for the code block: significance propagation pass, magnitude refinement pass, and cleanup pass. After the coefficients of all code blocks have been coded into a short bit stream, a header is added turning it into a packet. The header has all the information needed to decompress the packet into code blocks. A group of packets is called layers. For additional features in transmitting, a re-ordering of the formed packets is necessary. The standard has four ways to transmit and decode a compressed image by: resolution, quality, position, or component. As there are many markers built-in to the code stream of JPEG 2000, a parser can go through the bit stream and get the proper order of packets to transmit and decode. Edwards, et al. [Page 3] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 This is only to serve as an introduction to JPEG 2000 and to aid in understanding the rest of this document. Further details of the encoder can be found in various texts on JPEG 2000 [1]. To decompress a JPEG 2000 code stream, one would follow the reverse order of the encoding order, minus the quantization step. It is outside the scope of this document to describe in detail this procedure. Please refer to various JPEG 2000 texts for details [1]. 1.1 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 [7]. 2. JPEG 2000 Video Features JPEG 2000 video streams are formed as a continuous series of JPEG 2000 still images so the above features of JPEG 2000 can be used effectively. A JPEG 2000 video stream has the following merits: o SNR is improved at a low bit rate. The formation can be used as a video stream format at a low bit rate. o This is a Full Intra format, which each frame is independently compressed has a low encoding and decoding delay. This is suitable for interactive video communication. Even if a packet loss occurs in any part of the frame, errors are not propagated to subsequent frames. Moreover, each frame can be handled independently, this facilitates video editing. o JPEG 2000 has flexible and accurate rate control. This is suitable for traffic control and congestion control at the network transmission. o JPEG 2000 can provide its own code stream error resilience markers to aid in code stream recovery. An encoder can insert a resynchronization marker at the beginning of a JPEG 2000 packet and a segmentation symbol at the end of the bit plane to aid in recovery within a frame. 3. Design of RTP payload format for JPEG 2000 video streams To provide a payload format that exploits the JPEG 2000 video stream, described in the previous section, the following must be taken into consideration: - Provisions for packet loss On the Internet, 5% packet loss is common and this percentage may sometimes come to 20% or more. To split JPEG 2000 video Edwards, et al. [Page 4] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 streams into RTP packets, efficient packetization of the code stream is required to minimize the effects of disabled decoding due to missing code-blocks over error prone environments. If the main header is lost in transmission, the decoding ability is lost. Accordingly, a system to compensate for the loss of the main header as much as possible is required. - A packetizing scheme that exploits JPEG 2000 functionality. A packetizing scheme so that an image can be progressively transmitted and reconstructed progressively by the receiver using JPEG 2000 functionality. Maximizing performance over various network conditions and various computing power of receiving platforms. 4. Proposal for an RTP payload format for JPEG 2000 video streams 4.1 RTP fixed header usage For each RTP packet, the RTP fixed header is followed by the JPEG 2000 payload header, which is followed by JPEG 2000 code stream. The RTP header fields that have a meaning specific to the JPEG 2000 video are described as follows: Payload type (PT): The payload type is dynamically assigned by means outside the scope of this document. A payload type in the dynamic range SHALL be chosen by means of an out of band signaling protocol (e.g., RTSP, SIP, etc.) Marker bit (M): The marker bit of the RTP fixed header MUST be set to 1 on the last RTP packet of a video frame, and otherwise, it must be 0. When transmission is performed by multiple RTP sessions, the bit is set in the last packet of the frame in each session. Timestamp: The RTP timestamp is in units of 90 KHz. The same timestamp must appear in each fragment of a given frame. The initial value of the timestamp is random to make known plaintext attacks on encryption more difficult, even if the source itself does not encrypt, as the packets may flow through a translator that does. 4.2 RTP Payload header format The RTP payload header format for JPEG 2000 video stream is as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |E|X|M|T|L|mh_id| priority | tile_id | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | fragment offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 2: RTP payload header format for JPEG 2000 Edwards, et al. [Page 5] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 E : 1 bit Enable bit flag. If this bit is set to 1, it means "intelligent packetization" described in Section 5.2. If E bit is 0, it means "non-intelligent packetization" and a receiver MUST ignore any other payload header information other than extension bit flag and fragment offset. X : 1 bit Extension bit flag. This bit MUST be set to 1 when a JPEG 2000 optional payload header follows this header, the JPEG 2000 payload header, otherwise it MUST be set to 0. The details of optional payload headers are described in Section 8 of this document. M : 1 bit Main header bit flag. If the JPEG 2000 main header is included in the payload, this field MUST be set to 1, otherwise it must be 0. This flag is valid only when E bit is 1. If the E bit is 0, then this flag SHOULD be zero. T : 1 bit Tile header bit flag. If a JPEG 2000 tile header is included in the payload, this field MUST be set to 1, otherwise it must be 0. This flag is valid only when E bit is 1. If the E bit is 0, then this flag SHOULD be zero. L : 1 bit Last fragment flag. If the last part of main header or tile header is included (either whole or fragmented), this field MUST be set to 1, otherwise 0. Please see Section 5 of this document for more details. This flag is valid only when E bit is 1. If the E bit is 0, then this flag SHOULD be zero. mh_id : 3 bits Main header identification value. This is used for the JPEG 2000 main header recovery. The same mh_id is used as long as the coding parameters described in the main header remain unchanged. The mh_id starts at a value 1 when the first main header is transmitted. Mh_id value must increase by 1 every time a new main header is transmitted. Once the mh_id value is greater than 7, it must roll over and start at 1 again. Usage of this header is described in Section 7 of this document. This field is only valid when E bit is 1. If the E bit is 0, then this field SHOULD be zero. Edwards, et al. [Page 6] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 priority : 8 bits The priority field indicates the importance of the JPEG 2000 packet included in the payload. Typically, a higher priority is set at the packets that contain the JPEG 2000 packets of the lower layers and the lower subbands. This field is valid when E bit is 1. If E bit is 0, then this field SHOULD be zero. tile_id : 16 bits Tile number identification. This field is valid when JPEG 2000 packet(s) or tile-part header is included in the payload. JPEG 2000 packets belonging to different tile numbers MUST NOT be packed into the same RTP packet. When the RTP packet contains only the main header of JPEG 2000, this field MUST be zero. fragment offset : 32 bits This value must be set to the byte offset in the JPEG 2000 data stream of this RTP packet's contents. JPEG 2000 frames are typically larger than underlying network's maximum transfer units (MTU), frames might be fragmented into several packets. The fragment offset is the data offset in bytes of the current packet from the start of the JPEG 2000 code stream. This field helps the receiver to reassemble JPEG 2000 code stream. To perform scalable video delivery by using multiple RTP sessions, the offset value from the first byte of the same frame is set for fragment offset. Accordingly, in scalable video delivery using multiple RTP sessions, the fragment offset may not start with 0 in some RTP sessions even if the packet is the first one of the frame. 5. Fragmentation of JPEG 2000 code stream and Type Field Fig. 3 shows the construction of the JPEG 2000 code stream. The JPEG 2000 code stream consists of a main header beginning with the SOC marker, one or more tiles (only one tile for no tile division), and the EOC marker to indicate the end of the code steam. Each tile consists of a tile-part header starts with the SOT marker and ending with the SOD marker, and a bit stream (a series of JPEG 2000 packets.) +-- +------------+ Main | | SOC | Required as the first marker. header| +------------+ | | main | Main header marker segments +-- +------------+ Edwards, et al. [Page 7] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 | | SOT | Required at the beginning of each tile-part Tile- | +------------+ header. part | | T0,TP0 | Tile 0, tile-part 0 header marker segments header| +------------+ | | SOD | Required at the end of each tile-part header +-- +------------+ | bit stream | Tile-part bit stream. +-- +------------+ Might include SOP and EPH | | SOT | Tile- | +------------+ part | | T1,TP0 | header| +------------+ | | SOD | +-- +------------+ | bit stream | +------------+ | EOC | Required as the last marker in the code stream +------------+ Fig. 3: Construction of the JPEG 2000 code stream The JPEG 2000 code stream consists of a main header, tile-part headers, and JPEG 2000 packets. When we packetize the JPEG 2000 code stream, these construction units from the code stream must be maintained. Each RTP packet will consist of a main header, tile-part header, or JPEG 2000 packet. If the server does not understand JPEG 2000 code stream (i.e. the sender is not intelligent) it should pack JPEG 2000 code stream in the largest possible MTU data size for the RTP packet. The sender must segment the JPEG 2000 code stream along arbitrary lengths into RTP sized packets for the receiver. In this case, the E bit MUST be set to 0. If the sender understands JPEG 2000 code streams and can read the JPEG 2000 packets from the code stream. (i.e. the sender is intelligent) JPEG 2000 packets should be packed into RTP payload packets in the following way: 1. If the JPEG 2000 packets are smaller than the MTU size, the sender should put as many whole JPEG 2000 packets into a single RTP packet. That is, the JPEG 2000 payload data should begin with either one of the SOC marker, SOT marker, or SOP marker (if it exists in the JPEG 2000 data stream). 2. If the JPEG 2000 packets are larger than the MTU size, the sender should segment the JPEG 2000 packets at the largest possible MTU size but JPEG 2000 packets must not overlap. The sender can both packetize intelligently or non-intelligently. If the receiver is not intelligent but the sender is, then the sender MUST packetize non-intelligently (i.e. with E bit set to 0) to compensate for the receiver. Edwards, et al. [Page 8] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 Regardless of the sender's capabilities, the receiver MUST be able to handle RTP packets of any size. If the sender does not fragment, any packets larger than the MTU size, might be fragmented into multiple smaller IP packets than the MTU size by the IP layer. If one fragmented IP packet is lost during transmission, it is recognized as a loss of the whole RTP packet because the receiving host might not be able to reassemble the RTP packet. The segmentation of the JPEG 2000 code stream into RTP packets, must fit within the RTP payload size. For intelligent packetization, all packets SHOULD be 32 bit aligned. If padding bits are required, then the padding bits MUST come at the end of the payload. Any required padding bits MUST NOT appear between the header and the payload or at the beginning. In the following, all the possible packetization cases are described with diagrams. For each case, the MTL value shown from Fig. 2 is also indicated. 5.1 Separation at arbitrary lengths In this case, a JPEG 2000 code stream is split into several fragments at arbitrary byte-position(Fig.2). The MTL flags of the RTP packet are set to 0 0 0, respectively. The E bit MUST be set to 0 for this packetization type. +---+---+---+----------------------+ |RTP|PL |SOC| jpeg 2000 codestream | E M T L |hdr|hdr| | fragment (1) | 0 0 0 0 +---+---+---+----------------------+ +---+---+--------------------------+ |RTP|PL | jpeg 2000 codestream | E M T L |hdr|hdr| fragment (2) | 0 0 0 0 +---+---+--------------------------+ ... +---+---+----------------------+---+ |RTP|PL | jpeg 2000 codestream |EOC| E M T L |hdr|hdr| fragment (N) | | 0 0 0 0 +---+---+----------------------+---+ *PL hdr = payload header Fig. 4: Arbitrary length fragmentation The E (Enable) bit flag in the payload header MUST be 0 for this packetization type. All other fields except the X bit and fragmentation offset field, in the payload header must be 0 and the receiver must ignore any other values when the enable bit is 0. Such RTP packetization scheme is not recommended from the standpoint Edwards, et al. [Page 9] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 of error resilience. It is desirable to use it only in some limited environments shown below: - The sender finds it difficult to distinguish the main header, tile header, and JPEG 2000 packets from one another. There is no SOP marker in the JPEG 2000 code stream. The sender is not intelligent. - The network environment is error free. - If the JPEG 2000 error resilience markers (TLM, PLM, PLT, PPM, and PPT markers) are present in the code stream. Error resilience will be handled outside of RTP. Its description is not within the scope of this document. Using these markers may improve error resilience and recovery. Producing JPEG 2000 bit streams with these markers is highly recommended in all cases. 5.2 General JPEG 2000 RTP packet types For the following packetization types, the E bit MUST be set to 1 in all following cases. (1) JPEG 2000 main header (SOC marker) must come first after the payload header (just after the RTP payload header). The MTL flags of the RTP packets which contain the whole main header (not fragmented) must be: 1 0 1, respectively. This packet contains the main header and also has the end of the header in the same packet. (1-a) The RTP packet only contains the complete main header. +---+---+------+ |RTP|PL |Main | M T L |hdr|hdr|header| 1 0 1 +---+---+------+ * PL hdr = payload header Fig. 5: Main header packet (1-b) The main header and the first tile-part header are packed into one RTP packet. +---+---+------+---------+ |RTP|PL |Main |Tile-part| M T L |hdr|hdr|header|header | 1 1 1 +---+---+------+---------+ Fig. 6: Main header and tile header packet (1-c) The main header, the first tile-part header and whole JPEG Edwards, et al. [Page 10] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 2000 packet(s) are packed into one RTP packet. +---+---+------+---------+---------+-----+---------+ |RTP|PL |Main |Tile-part|jpeg 2000| ... |jpeg 2000| M T L |hdr|hdr|header|header |packet | |packet | 1 1 1 +---+---+------+---------+---------+-----+---------+ Fig. 7: Main header, tile header, JPEG 2000 packets packet (1-d) The main header is split into the several RTP packets. If the main header is larger than one RTP packet, then it MAY be split into several RTP packets. In this case, the RTP packets must contain only a piece of the main header. The MTL value of the RTP packets must be: 1 0 0, respectively. For the last header fragment, the MTL value must be: 1 0 1. The first main header fragment MUST have fragment offset value of 0. The values in between MUST NOT have a fragment offset of 0. +---+---+--------------+ |RTP|PL |Main Header(1)| M T L |hdr|hdr| | 1 0 0 +---+---+--------------+ +---+---+--------------+ |RTP|PL |Main Header(2)| M T L |hdr|hdr| | 1 0 0 +---+---+--------------+ ... +---+---+--------------+ |RTP|PL |Main Header(N)| M T L |hdr|hdr| | 1 0 1 +---+---+--------------+ Fig. 8: Main header fragmented packets (Note) When the main header is split into multiple RTP packets, the first tile-part header must not be included in the RTP packet containing the last fragment like below: +---+---+--------------+---------+ |RTP|PL |Main Header(N)|Tile-part| This packetization |hdr|hdr| |header | MUST NOT occur. +---+---+--------------+---------+ Fig. 9: MUST NOT occur - Last main header fragment and tile part header (2) Tile-part headers (SOT marker) should come first in the RTP payload, except for the first tile-part header just after the main header. The first tile-part header may either be packed with the main header, or be separated to another RTP Edwards, et al. [Page 11] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 packet. The MTL value of the RTP packet must be 0 1 1, respectively. (2-a) The RTP packet only contains the complete tile-part header. +---+---+----------+ |RTP|PL |Tile-part | M T L |hdr|hdr|Header | 0 1 1 +---+---+----------+ Fig. 10: Tile part header packet (2-b) The tile-part header and JPEG 2000 packet(s) are packed into one RTP packet. +---+---+----------+---------+-----+---------+ |RTP|PL |Tile-part |jpeg 2000| ... |jpeg 2000| M T L |hdr|hdr|Header |packet | | | 0 1 1 +---+---+----------+---------+-----+---------+ Fig. 11: Tile part header and JPEG 2000 packets packet (2-c) The tile-part header is split into the several RTP packets. If the tile-part header is larger than one RTP packet, it may be split into several RTP packets. In this case, the RTP packets contain only a piece of the tile-part header. The RTP packets which contain the first piece of the tile-part header must have MTL values set at: 0 1 0, respectively, and the packet with the last fragment must have MTL value set at: 0 1 1, respectively. +---+---+-------------------+ |RTP|PL |Tile-part header | M T L |hdr|hdr|fragment(1) | 0 1 0 +---+---+-------------------+ +---+---+-------------------+ |RTP|PL |Tile-part header | M T L |hdr|hdr|fragment(2) | 0 1 0 +---+---+-------------------+ ... +---+---+-------------------+ |RTP|PL |Tile-part header | M T L |hdr|hdr|fragment(N) | 0 1 1 +---+---+-------------------+ Fig. 12: Tile header fragmented packets (Note) When the tile-part header is split into multiple RTP Edwards, et al. [Page 12] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 packets, the JPEG 2000 packet must not be included in the RTP packet containing the last fragment. +---+---+-------------------+---------+ |RTP|PL |Tile-part header |jpeg 2000| This packetization |hdr|hdr|fragment(N) |packet | MUST NOT occur. +---+---+-------------------+---------+ Fig. 13: MUST NOT occur - Last tile header packet with JPEG 2000 packet Note that tile_id MUST change accordingly when there is a new tile header transferred. Tile_id MUST NOT be used when the payload is only of a main header. In all other cases, tile_id MAY be used as it MUST be set to a valid value. (3) The JPEG 2000 packet may be packed by itself, except for JPEG 2000 packets just after a complete tile-part header. Also several JPEG 2000 packets may be packed into the one RTP packet. If SOP(Start of Packet) marker is used for error resilience, SOP marker MUST be placed at the beginning of the RTP payload. The MTL value of the RTP packet, which contains only JPEG 2000 packet(s) must be set to: 0 0 0, respectively. (3-a) More than one JPEG 2000 packets are packed into one RTP packet. +---+---+---------+-----+---------+ |RTP|PT |jpeg 2000| ... |jpeg 2000| M T L |hdr|hdr|packet | |packet | 0 0 0 +---+---+---------+-----+---------+ Fig. 14: JPEG 2000 packets (3-b) The JPEG 2000 packet is split into the several RTP packets The JPEG 2000 packet, being too large to pack into one RTP packet, may be split into two or more RTP packets. In this case, the RTP packets contain only a piece of the JPEG 2000 packet. The MTL value in the payload header MUST be set to 0 0 0 and E bit to 1, for all the fragments. +---+---+-------------------+ |RTP|PT |jpeg 2000 packet | M T L |hdr|hdr|fragment(1) | 0 0 0 +---+---+-------------------+ +---+---+-------------------+ |RTP|PT |jpeg 2000 packet | M T L |hdr|hdr|fragment(2) | 0 0 0 Edwards, et al. [Page 13] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 +---+---+-------------------+ ... +---+---+-------------------+ |RTP|PT |jpeg 2000 packet | M T L |hdr|hdr|fragment(N) | 0 0 0 +---+---+-------------------+ Fig. 15: Fragmented JPEG 2000 packets packet (Note) When the JPEG packet is split into multiple RTP packets, another JPEG 2000 packet must not be included in the RTP packet containing the last fragment. +---+---+-------------------+---------+ |RTP|PT |jpeg 2000 packet |jpeg 2000| This packetization |hdr|hdr|fragment(N) |packet | MUST NOT occur. +---+---+-------------------+---------+ Fig. 16: MUST NOT occur - JPEG 2000 packet fragments overlapping 6. Scalable Delivery and Priority field JPEG 2000 code stream has rich functionality built into it so decoders can easily handle scalable delivery or progressive transmission. Progressive transmission that allows images to be reconstructed with increasing pixel accuracy or spatial resolution is essential for many applications. This feature allows the reconstruction of images with different resolutions and pixel accuracy, as needed or desired, for different target devices. The largest image source devices can provide a code stream that is easily processed for the smallest image display device. The JPEG 2000 packets contain all compressed image data from a specific layer, a specific component, a specific resolution level, and a specific precinct. The order in which these packets are found in the code stream is called the "progression order". The ordering of the packets can progress along four axes: layer, component, resolution level and precinct. Providing priority field to show importance of data contained in a given RTP packet can exploit JPEG 2000 progressive & scalable functions. In resolution progression order, the higher decomposition level is more important. The priority field of the RTP packet that contains the higher decomposition level is set to the higher priority. When transmitted in spatial resolution order, LL0 components data should be set to the highest priority. Edwards, et al. [Page 14] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 6.1 Priority mapping table For the progression order, the priority value to be given to each JPEG 2000 packet is defined by the priority-mapping table. The higher the importance, the lower the priority value. The priority-mapping table may define the priority values for spatial resolution, layer, color component, or precinct level. This priority table is sent from the sender to a receiver through another protocol (RTSP, SIP, etc.) outside of RTP. To change the priority-mapping table, a new priority-mapping table must be sent from the sender to the receiver as needed. If there is no priority-mapping table, the priority value of the RTP packet must be set to 0xFF. For example, the priority table can be sent to the receiver from the sender but the receiver may determine its own level of priority RTP packets to receive using the priority table as a guideline. The priority value of 1 has the highest priority in the priority-mapping table. As the priority value increases, the priority becomes lower. If transmission is performed without attaching any priority-mapping table, 0xFF (255) must be set in the priority field. For RTP packets that only consist of a whole or fragmented main or tile header and containing no JPEG 2000 packets, the sender must set priority 0 if a priority-mapping table is used. If a priority-mapping table is not used, the priority value must be 0xFF for the same RTP packets. The sender may transmit each priority using separate multiple RTP sessions defined by the priority value. For example, different priority may be allocated to other multicast groups. The sender may also transmit all priority valued RTP packets using a single RTP session. When multiple JPEG 2000 packets are included in a single RTP packet, the sender must set the packet priority of the highest value of all the priorities of all the packets. In the following, an example of priority mapping table is shown. The component-based priority should be used when there is a higher priority component like Y in YCbCr components. A simple example of usage of priority can be for a generalized broadcast of JPEG 2000 bitstreams. With priority values set according to resolution, many devices can receive the broadcast, choose only the appropriate priority level for itself and decode the minimum number of packets for maximum resolution on the device. This example of priority is simple, but shall be developed further. Edwards, et al. [Page 15] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 6.1.1 Layer based priority This is an example of priority mapping table in the progression order in which SNR is improved progressively. The JPEG 2000 packet of layer 0 and resolution 0 has the highest priority. The JPEG 2000 packets with layer 0 and resolution 1 or more are next in priority. As the layer number increases, the priority becomes lower. (">N" in the table means the progression level is higher than N. In Table 1, ">0" means resolution level is higher than 0.) L R C P | priority -------------+------------- 0 0 - - | 1 0 >0 - - | 2 1 - - - | 3 .... | .. Table 1: Layer based priority table mapping 6.1.2 Resolution level based priority This is an example of priority mapping table in the progression order in which the spatial resolution is increased. The JPEG 2000 packet with layer 0 and resolution 0 has the highest priority and the JPEG 2000 packets with later 1 or more and resolution 0 are next in priority. As the resolution level increases, the priority becomes lower. L R C P | priority -------------+---------- 0 0 - - | 1 >0 0 - - | 2 - 1 - - | 3 .... | .. Table 2: Resolution level based priority table mapping 6.1.3 Component based priority The priority-mapping table for component progression is used only when there is priority order among components. This example is for YCbCr components. The JPEG 2000 packet with layer 0, resolution 0, and component 0 has the highest priority. The JPEG packets with layer 1 or more, resolution 0, and component 0 are next in priority. The JPEG 2000 packets with resolution 0 and component 0 are the third in priority. As the resolution increases, the priority becomes lower. Edwards, et al. [Page 16] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 L R C P | priority -------------+---------- 0 0 0 - | 1 >0 >0 0 - | 2 - - >0 - | 3 .... | .. Table 3: Component based priority table mapping 6.2 Sender's Actions Priority is given in accordance with the priority-mapping table. The receiver may use the priority field but the receiver can use any specific processing method. If the priority-mapping table is not used, 255 (0xFF) must be set in the priority field. 6.3 Receiver's Action Progressive transmission that allows images to be reconstructed with increasing pixel accuracy or spatial resolution is essential for many applications. This feature allows the reconstruction of images with different resolutions and pixel accuracy, as needed or desired, for different target devices. The image architecture provides for the efficient delivery of image data in many applications such as client/server applications. The receiver should decode packets above a certain priority to obtain maximum performance depending on the receiver's platform. The receiver can determine on its own (using or not using the mapping table or other variables) the priority value level the RTP packets it should decode. For example, when a less powerful CPU is used or the terminal has only a low-resolution display, decoding only RTP packets below a certain priority permits obtaining optimal performance. If any high-priority RTP packet is not received when a packet loss occurs, frame(s) can be skipped because no significant visual loss may be perceived even if decoding can be successfully performed. When any uninterpretable or an unexpected priority is received, the receiver must interpret the packets as no priority (i.e. priority= 0xFF.) 7. JPEG 2000 main header compensation The JPEG 2000 image main header describes various encode parameters and the decoder decodes by using the parameters described in the main header. If the RTP packet that contains the main header is lost, the corresponding JPEG 2000 code stream cannot and should not Edwards, et al. [Page 17] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 be decoded. In an extremely rare case, if the main header has dropped and all the remainder JPEG 2000 packets has been received successfully, the receiver cannot decode the frame without main header information. Even when the main header is lost, it can be recovered to a certain level using the following method. A recovery of the main header that has been lost is very simple with this procedure. In the case of JPEG 2000 video, it is common that encode parameters will not vary greatly from each successive frame. Even if the RTP packet including the main header of a frame has dropped, decoding processing may be performed by using the main header of the previous frame if this previous frame is already encoded by the same encode parameters. The mh_id field of the payload header is used to recognize whether the encoding parameters of the main header are the same as the encoding parameters of the previous frame. The same value is set in mh_id of the RTP packet in the same frame. Mh_id and encode parameters are not associated with each other as 1:1 but they are used to recognize whether the encode parameters of the previous frame are the same or not. The mh_id field value SHOULD be saved from previous frames to be used to recover the current frame's main header, if lost. If the mh_id of the current frame has the same value as the mh_id value of the previous frame, the previous frame's main header SHOULD be used to decode the current frame, in case of a lost header. The sender MUST increment mh_id when parameters in the header change and send a new main header accordingly. The receiver MAY use the md_id and MAY retain the header for such compensation. 7.1 Sender processing The sender must transmit RTP packets with the same mh_id value unless the encoder parameters are different from the previous frame. The encode parameters are the fixed information marker segment (SIZ marker) and functional marker segments (COD, COC, RGN, QCD, QCC, and POC) specified in JPEG 2000 Part 1 Annex A [1]. If the encode parameters have been changed, the sender transmitting RTP packets MUST increment the mh_id value by one. The initial mh_id value should be 1. When the mh_id value exceeds 7, the value MUST return to 1 again. If the md_id field is set to 0, the receiver MUST not save the main header and MUST NOT compensate for lost headers using the above method. 7.2 Receiver processing Edwards, et al. [Page 18] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 When the receiver has received the main header correctly, the RTP sequence number, the mh_id and main header should be saved except when the mh_id value is 0. Only the last main header that was received correctly SHOULD be saved. That is, if there has been a saved main header, the previous one is deleted and the new main header is saved. When the main header is not received, the receiver compares the current mh_id value (this mh_id can be known by receiving at least one RTP packet) with the saved mh_id value. When the values are the same, decoding may be performed by using the saved main header. Knowing whether the main header is lost or not maybe difficult, especially when the main header is fragmented. In all cases, the main header will start with fragment offset = 0. In the case of fragmented main header, only the first fragment will have the fragment offset = 0. 8. Optional Payload Header When the extension bit of the JPEG 2000 payload header is 1, an optional payload header follows the payload header. The JPEG 2000 video stream payload comes after the optional payload header. The figure shows a general format of the optional payload header. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | optype |X| length | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | option specific format ..... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 17 : JPEG 2000 video stream optional payload header generic format optype : 7 bits optype describes the optional payload header type. Any optype values not specified within this document MUST be ignored and accompanying header must be ignored as well. X : 1 bit Further extension bit. This must be set to 1 if another optional payload header follows this optional payload header; otherwise it must be set to 0. When the extension bit of the optional header is 1, another optional payload header MUST come immediately after this optional payload header. Edwards, et al. [Page 19] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 length : 16 bits This value must be the length of optional header in bytes. The receiver shall perform processing for the optional header when the extension bit of the JPEG 2000 payload header is 1. The optype value shall be set as so: +--------------------+---------------------------------+ | Optype value range | Defined in | +--------------------+---------------------------------+ | 0 | Not allowed | | 1 - 63 | In this specification | | 63 - 127 | Free for application definition | +--------------------+---------------------------------+ Table 4: Optype value definition range 8.1 Marker Segment Optional Header The marker segment optional header allows changes to almost any property of the JPEG 2000 main or tile header functional markers such as: (SIZ, COD, COC, RGN, QCD, QCC, POC, etc.) As an optional header, this can be used to duplicate critical data from the main or tile header redundantly with each packet. At the same time, small changes to a larger header would be simple with this marker. The format of this optional header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | optype |X| length |F| JP2code | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | marker segment data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 18: Marker segment optional header format optype: 7 bits Option type value. This value MUST be 1 for this optional header. X : 1 bit Extension bit. This signifies whether another optional header follows this one. If there is another, the X bit MUST be set to 1, else, it must be 0. For multiple changes to the header, Edwards, et al. [Page 20] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 chaining these headers together is recommended. length : 16 bits Length value. The length of this optional marker should be the length of the corresponding JP2 functional marker minus 1. (i.e. Lxxx - 1) Please see section 8.1.1 and section 8.1.2 for specific example. F : 1 bit Functional bit. Whether the optional header is making a change in the main or tile header. F = 0 for tile header and F = 1 for main header. JP2code : 7 bits JP2 functional code value. This value contains the lower 7bits of the original JPEG 2000 functional code marker. (i.e. COD marker = 0xFF52, lower 7 bits = 0x52 --> 0b1010010) marker segment data : length bits The data in this area MUST be the same as the corresponding JPEG n2000 marker data specified in Annex A of [1] but not including the length of the marker segment. A limitation of this optional header is that the functional markers in the optional header MUST be present in the original main or tile header. Markers other than the ones in main or tile headers MUST NOT be present in this header. 8.1.1 Specific example of marker segment header: COD Here is a specific marker segment header for a COD functional segment: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | optype=1 |0| length=(Lcod-1) |1| 1010010 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Scod | SGcod | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SGcod | | +-+-+-+-+-+-+-+-+ | | Spcod (Lcod length) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 19: COD marker segment optional header example - Optype = 1. As specified in this recommendation. Edwards, et al. [Page 21] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 - X = 0. For this instance. - Length = Lcod - 1. The length of the original COD marker - 1. - F = 1. This change is in the main header, then F=1. - JP2Code = 0b1010010-->0x52. COD marker in JPEG 2000 value: 0xFF52. 8.1.2 Specific example of marker segment header: QCD Here is a specific marker segment header for a QCD functional segment: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | optype=1 |0| length=(Lqcd-1) |0| 1011100 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sqcd | | +-+-+-+-+-+-+-+-+ | | SPqcd (Lqcd length) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig. 20: QCD marker segment optional header example - Optype = 1. As specified in this recommendation. - X = 0. For this instance. - Length = Lqcd - 1. The length of the original QCD marker - 1. - F = 0. This change is in the tile header, then F=0. - JP2Code = 0b1011100-->0x5C. QCD marker in JPEG 2000 value: 0xFF5C. 8.2 JPIP Optional Header Interoperability with different standards is extremely useful. The ISO WG1 group also has put forth a transmission protocol standard called: JPIP. This standard is a protocol standard for viewing JPEG 2000 images interactively using RTSP. To embrace this standard, an optional JPIP header to handle the RTP data for JPIP compatible clients is defined here. At the time of this writing, the JPIP work is still in its early stage of standardization. Currently, a reserved optype value of 2 will be placed for JPIP when it is complete. The option specific information in this optional header shall be the same as the server response data packet from a JPIP server or a description of the packet's JPEG 2000 packets. Optype : 7 bits The optype value for a compatible JPIP optional header must be 2. Option specific format: X bits Edwards, et al. [Page 22] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 This shall be determined at a later date. 9. Security Consideration RTP packets using the payload format defined in this specification are subject to the security considerations discussed in the RTP specifications[3]. This implies that confidentiality of the media streams is achieved by encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed on the compressed data so there is no conflict between the two operations. 10. Recommended Practices As the JPEG 2000 coding standard is highly flexible, many different but compliant data streams can be produced and still be labeled as a JPEG 2000 data stream. The following is a set of recommendations set forth from our experience in developing JPEG 2000 and this payload specification. Implementations of this standard must handle all possibilities mentioned in this specification. The following is a listing of items an implementation could optimize. Error Resilience Markers The use of error resilience markers in the JPEG 2000 data stream is highly recommended in all situations. Error recovery with these markers is helpful to the decoder and save external resources. Markers such as: RESET, RESTART, and ERTERM Packetization Ordering Packetization ordering is completely dependent on the client's capabilities. Some orderings allow for less amount of distortion in the event of loss at the expense of memory storage and packet reordering. YCbCr Color space The YCbCr color space provides the greatest amount of compression in color with respect to the human visual system. When used with JPEG 2000, the usage of this color space can provide excellent visual results at extreme bit rates. Progression Ordering JPEG 2000 offers many different ways to order the final code stream to optimize the transfer with the presentation. The most useful ordering in our usage cases have been for layer Edwards, et al. [Page 23] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 progression and resolution progression ordering. Tiling and Packets JPEG 2000 packets are formed regardless of the encoding method. The encoder has little control over the size of these JPEG 2000 packets as they maybe large or small. Tiling splits the image up into smaller areas and each are encoded separately. With tiles, the JPEG 2000 packet sizes are also reduced. When using tiling, almost all JPEG 2000 packet sizes are an acceptable size (i.e. smaller than the MTU size of most networks.) It is highly recommended that tiling be used so that packetization of JPEG 2000 packets for transport can be done simpler. 11. Author's Address Eric Edwards Sony Corporation Media Processing Division Network & Software Technology Center of America 3300 Zanker Road, MD: SJ2C4 San Jose, CA 95134 Phone: +1 408 955 6462 Fax: +1 408 955 5724 Email: Eric.Edwards@am.sony.com Satoshi Futemma Sony Corporation 6-7-35 Kitashinagawa Shinagawa-ku Tokyo 141-0001 JAPAN Phone: +81 3 5448 4373 Fax: +81 3 5448 4622 Email: satosi-f@sm.sony.co.jp Eisaburo Itakura Sony Corporation 6-7-35 Kitashinagawa Shinagawa-ku Tokyo 141-0001 JAPAN Phone: +81 3 5448 3096 Fax: +81 3 5448 4622 Email: itakura@sm.sony.co.jp Takahiro Fukuhara Sony Corporation 1-11-1 Osaki Shinagawa-ku Tokyo 141-0032 JAPAN Phone: +81 3 5435 3665 Fax: +81 3 5435 3891 Email: fukuhara@av.crl.sony.co.jp Edwards, et al. [Page 24] INTERNET-DRAFT draft-ietf-avt-rtp-jpeg2000-00.txt May 2002 12. References [1] ISO/IEC JTC1/SC29: ISO/IEC 15444-1 "Information technology - JPEG 2000 image coding system - Part 1: Core coding system", December 2000. [2] ISO/IEC JTC1/SC29/WG1: "Motion JPEG 2000 Committee Draft 1.0", http://www.jpeg.org/public/cd15444-3.pdf, December 2000. [3] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson "RTP: A Transport Protocol for Real Time Applications", RFC 1889, January 1996. [4] ISO/IEC JTC1/SC29/WG1: "JPEG2000 requirements and profiles version 6.3", draft in progress, http://www.jpeg.org/public/wg1n1803.pdf [5] Diego Santa-Cruz, Touradj Ebrahimi, Joel Askelof, Mathias Larsson and Charilaos Christopoulos: "JPEG 2000 still image coding versus other standards", In Proc. of SPIE's 45th annual meeting, Application of Digital Image Processing XXIII, vol.4115, pp.446-454, July 2000. Edwards, et al. [Page 25]