template for common text iso/uit-t · web viewtitle template for common text iso/uit-t subject...

INTERNATIONAL ORGANISATION FOR STANDARDISATION

INTERNATIONAL ELECTROTECHNICAL COMMISSION

Information technology –

Information Technology: Scalable Compression and Coding of Continuous-Tone Still Images

Encoding of Alpha Channels

Draft ITU-T Recommendation | International Standard

CONTENTSPage

Foreword...................................................................................................................................................................iiIntroduction...............................................................................................................................................................ii1 Scope..............................................................................................................................................................12 Normative references.....................................................................................................................................1

2.1 Identical Recommendations | International Standards.......................................................................12.2 Paired Recommendations | International Standards equivalent in technical content.........................12.3 Additional references..........................................................................................................................1

3 Definitions, Abbreviations and Symbols.......................................................................................................23.1 Definitions..........................................................................................................................................23.2 Symbols..............................................................................................................................................43.3 Abbreviations.....................................................................................................................................4

4 Conventions...................................................................................................................................................44.1 Conformance language.......................................................................................................................44.2 Operators............................................................................................................................................5

4.2.1 Arithmetic operators.........................................................................................................54.2.2 Logical operators..............................................................................................................54.2.3 Relational operators..........................................................................................................54.2.4 Precedence order of operators..........................................................................................54.2.5 Mathematical functions....................................................................................................6

5 General...........................................................................................................................................................65.1 High Level Overview on ISO/IEC 18477-9 (Informative)...............................................................65.2 Encoder requirements.........................................................................................................................65.3 Decoder requirements.........................................................................................................................7

Annex A....................................................................................................................................................................8A.1 Decoding Process (Normative)...........................................................................................................8A.2 Composition of Foreground and Background Image (Informative)...................................................8A.3 Reconstruction of the Alpha Channel (Normative)............................................................................9

Annex B..................................................................................................................................................................11B.1 Introduction......................................................................................................................................11B.2 Output Conversion Box....................................................................................................................12B.3 Base Non-linear Point Transformation Specification Box...............................................................13B.4 Residual Non-linear Point Transformation Specification Box.........................................................14B.5 Base DCT Specification Box............................................................................................................15B.9 Residual DCT Specification Box.....................................................................................................15B.XX Alpha Merging Specification Box....................................................................................................16B.XX Alpha Codestream Box.....................................................................................................................17B.XX Alpha Refinement Data Box............................................................................................................17B.XX Residual Alpha Data Box.................................................................................................................17B.XX Residual Alpha Refinement Box......................................................................................................18B.XX Alpha Channel Composition Box.....................................................................................................19

Annex C..................................................................................................................................................................21C.1 Introduction......................................................................................................................................21C.2 Base Profile......................................................................................................................................21C.2 Full Profile........................................................................................................................................22

Annex D..................................................................................................................................................................22

ITU-T Rec. T.xxxx (200x E) i

ForewordThis Recommendation | International Standard specifies a codestream format for lossy and lossless storage of transparency information within ISO/IEC 18477-3 (JPEG XT) compliant files. It is an extension that can be combined with other parts of the ISO/IEC 18477 family of standards to include opacity information. Similar to all other members of this family of standards, this Recommendation | International standard is completely backwards compatible to ITU.T Rec. 81 | ISO/IEC 10918, commonly known as the JPEG still image format. That is, legacy applications conforming to ITU.T Rec. 81 | ISO/IEC 10918-1 will be able to reconstruct streams generated by an encoder conforming to this Recommendation | International Standard, though will be unaware of the opacity information and will display the images fully opaque.

Use cases for images with opacity information are manifold: Web-applications may want to display photographic images with an arbitrarily shaped boundary, leaving image parts outside of the information transparent. Rendering of translucent objects in computer generated images also require opacity information; in the former application, the transparency information consists of a binary mask that encodes whether a pixel is transparent or opaque, in the latter case opacity is represented by a continuous variable within the interval [0,1]. While standards such as PNG (ISO/IEC 15948) allow the lossless or near-lossless encoding of images with opacity information, it is not very efficient for encoding photo-realistic images in applications where bandwidth is limited and lossy compression of photographic content is preferable.

This Recommendation | International Standard is itself based on ISO/IEC 18477-3, which defines a box-based extension of ISO/IEC 18477-1. It can be freely combined with other members of the ISO/IEC 18477 standard to extend either 8 bits per sample LDR, IDR or HDR images with opacity information.

IntroductionThis Recommendation | International standard specifies an extension for ISO/IEC 18477-3 compliant files that adds capabilities for lossy or lossless storage of continuous or binary opacity information associated to the image; such additional channels are commonly known as alpha channels. These channels are used for compositing the image content with other content on the same physical media. An alpha value of zero encodes maximal transparency (and no opacity), while the maximal sample value represents maximal opacity (and no transparency). Additionally, the image content itself may be premultiplied with the alpha value or premultiplied and shaded with a background color M, a process by which the original image A is replaced by the image A’ defined as

A’=α×A for pre-multiplication

A’=α×A+(1−α)×M for pre-multiplication and shading

And A’ is encoded instead of A in the JPEG XT codestream. Reconstruction is then performed as follows: If A denotes the sample value of the image contained in the ISO/IEC 18477-3 file at a specific spatial location, B is the sample value of the background on which the image should be rendered, M is the matte color and α is the decoded value of the alpha channel, then the sample value of the image C composed from A and B on the same position is given by:

C = α ×A +(1−α)×B for non-premultiplied content and

C = A +(1−α)×B for premultiplied content

C = A +(1−α)×(B-M) for premultiplied content with shade removal

Encoding a premultiplied and shaded version of A’ with color M enables legacy decoders that lack alpha channel support to still decode and display the image with the appearance that it is composited on a background with color M. At the same time, new JPEG XT compliant decoders can composite the image on any background by calculating image C from A, B and M.

This standard provides facilities to encode the value of α for each spatial location, with or without loss, either as a binary decision, i.e. α=0 or α=1, on a continuous scale of integers with a resolution between 8 and 16 bits, or as floating point number between zero and one with 16 bit precision. It uses coding technology from other parts of the ISO/IEC 18477 family of standards for its encoding, and no new technology besides that already defined in other parts is required for the reconstruction of the opacity information.

This part can be freely combined with other parts of the ISO/IEC 18477 family, i.e. the sample values A in the above formulae might be either 8 bit unsigned integers, i.e. represented by ISO/IEC 18477-1, up to 16-bit integers using the encoding of ISO/IEC 18477-6 or floating point values encoded by ISO/IEC 18477-7. The image content A may also be

ii ITU-T Rec. T.xxxx (200x E)

encoded without loss, using ISO/IEC 18477-8, but one should then keep in mind that the compositing step itself that creates the final output image C from the input images A and B is not fully standardized, i.e. the multiplications required for its implementation may cause additional loss, and conforming implementations may generate rendered images C that deviate slightly from each other.

The syntax of the codestream defined in this Recommendation | International Standard is fully backwards compatible to ITU Recommendation T.81 | ISO/IEC 10918-1 and the ISO/IEC 18477 family of standards. Decoders unaware of the extensions defined here will reconstruct a fully opaque version of the image by discarding the alpha channel content.

ITU-T Rec. T.xxxx (200x E) iii

INTERNATIONAL STANDARDISO/IEC 29199-2 : 200x (E)ITU-T Rec. T.xxxx (200x E)

ITU-T RECOMMENDATION

INFORMATION TECHNOLOGY – SCALABLE COMPRESSION AND CODING OF CONTINUOUS-TONE STILL IMAGES:

ENCODING OF ALPHA CHANNELS

1 ScopeThis Recommendation | International standard specifies a coding format, referred to as JPEG XT, which is designed primarily for continuous-tone photographic content.

2 Normative referencesThe following Recommendations and International Standards contain provisions which, through reference in this text, constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent edition of the Recommendations and Standards listed below. Members of IEC and ISO maintain registers of currently valid International Standards. The Telecommunication Standardization Bureau of the ITU maintains a list of currently valid ITU-T Recommendations.

2.1 Identical Recommendations | International Standards

2.2 Paired Recommendations | International Standards equivalent in technical content

ISO/IEC 18477-1: Information Technology: Scalable Compression and Coding of Continuous-Tone Still Images, Core Coding System Specification

ISO/IEC 18477-3: Information Technology: Scalable Compression and Coding of Continuous-Tone Still Images, Box File Format

ISO/IEC 18477-6: Information Technology: Scalable Compression and Coding of Continuous-Tone Still Images, IDR Integer Coding

ISO/IEC 18477-7: Information Technology: Scalable Compression and Coding of Continuous-Tone Still Images, HDR Floating Point Coding

ISO/IEC 18477-8: Information Technology: Scalable Compression and Coding of Continuous-Tone Still Images, Lossless and Near-Lossless Coding

2.3 Additional referencesITU-T Rec. T.81 | ISO/IEC 10918-1: Information Technology – Digital Compression and Coding of Continuous Tone Still Images – Requirements and Guidelines ITU-T Rec. T.871 | ISO/IEC 10918-5: Information technology -- Digital compression and coding of continuous-tone still images: JPEG File Interchange FormatITU-T Rec. T.801 | ISO/IEC 15444-1: Information technology – JPEG 2000 Image Coding SystemIEC 60559 Binary floating-point arithmetic for microprocessor systemsIEC 61966-2-1sRGB Colour management – Default RGB colour space – sRGB ISO/IEC 10646-1Annex D Information Technology – Universal Multiple-Octet Coded Character Set (UCS) – part 1:Architecture and Basic Multilingual Plane

ITU-T Rec. T.xxxx (200x E) 1

3 Definitions, Abbreviations and Symbols

3.1 DefinitionsFor the purposes of this Recommendation | International Standard, the following definitions apply.

AC coefficient: Any DCT coefficient for which the frequency is not zero in at least one dimension.

alpha channel: An additional scalar image channel that encodes the opacity of each sample in the main image.

alpha component: Synonym for alpha channel.

binary decision: Choice between two alternatives.

bit stream: Partially encoded or decoded sequence of bits comprising an entropy-coded segment.

block: An 8 8 array of samples or an 8 8 array of DCT coefficient values of one component.

box: a structured collection of data describing the image or the image decoding process embedded into one or multiple APP 11 marker segments. See Annex Error: Reference source not found for the definition of boxes.

byte: A group of 8 bits.

coder: An embodiment of a coding process.

coding: Encoding or decoding.

coding model: A procedure used to convert input data into symbols to be coded.

(coding) process: A general term for referring to an encoding process, a decoding process, or both.

compression: Reduction in the number of bits used to represent source image data.

component: A two-dimensional array of samples having the same designation in the output or display device. An image typically consists of several components, e.g. red, green and blue.

composition: The process of merging the decoded image data with background image data using opacity information and generating one single final output image

continuous-tone image: An image whose components have more than one bit per sample.

data unit: An 8 8 block of samples of one component in DCT-based processes; a sample in lossless processes.

DC coefficient: The DCT coefficient for which the frequency is zero in both dimensions.

(DCT) coefficient: The amplitude of a specific cosine basis function – may refer to an original DCT coefficient, to a quantized DCT coefficient, or to a dequantized DCT coefficient.

decoder: An embodiment of a decoding process.

decoding process: A process which takes as its input compressed image data and outputs a continuous-tone image.

dequantization: The inverse procedure to quantization by which the decoder recovers a representation of the DCT coefficients.

discrete cosine transform; DCT: Either the forward discrete cosine transform or the inverse discrete cosine transform.

downsampling: A procedure by which the spatial resolution of a component is reduced.

encoder: An embodiment of an encoding process.

encoding process: A process which takes as its input a continuous-tone image and outputs compressed image data.

entropy-coded (data) segment: An independently decodable sequence of entropy encoded bytes of compressed image data.

entropy decoder: An embodiment of an entropy decoding procedure.

entropy decoding: A lossless procedure which recovers the sequence of symbols from the sequence of bits produced by the entropy encoder.

entropy encoder: An embodiment of an entropy encoding procedure.

entropy encoding: A lossless procedure which converts a sequence of input symbols into a sequence of bits such that the average number of bits per symbol approaches the entropy of the input symbols.

extended (DCT-based) process: A descriptive term for DCT-based encoding and decoding processes in which additional capabilities are added to the baseline sequential process.

extension image: Synonym for residual image (see there).

forward discrete cosine transform; IFDCT: A mathematical transformation using cosine basis functions which converts a block of samples into a corresponding block of original DCT coefficients.

2 ITU-T Rec. T.xxxx (200x E)

frequency: A two-dimensional index into the two-dimensional array of DCT coefficients.

grayscale image: A continuous-tone image that has only one component.

high dynamic range: An image or image data comprised of more than eight bits per sample.

Huffman decoder: An embodiment of a Huffman decoding procedure.

Huffman decoding: An entropy decoding procedure which recovers the symbol from each variable length code produced by the Huffman encoder.

Huffman encoder: An embodiment of a Huffman encoding procedure.

Huffman encoding: An entropy encoding procedure which assigns a variable length code to each input symbol.

Intermediate dynamic range: An image or image data comprised of more than eight bits per sample.

Joint Photographic Experts Group; JPEG: The informal name of the committee which created this Specification. The “joint” comes from the ITU-T and ISO/IEC collaboration.

legacy decoding path: The collection of operations to be performed on the entropy coded data as described by ITU Recommendation T.80 | ISO/IEC 10918-1 jointly with the Legacy Refinement scans before this data is merged with the residual data to form the final output image.

legacy decoder: An embodiment of a decoding process conforming to ITU.T Rec.T.81|ISO/IEC 10918-1, confined to the lossy DCT process and the baseline, sequential or progressive modes, decoding at most four components to eight bits per component.

legacy image: The arrangement of sample values as described by applying the decoding process described by ITU Recommendation T.80 | ISO/IEC 10918-1 on the entropy coded data as defined by said standard.

lossless: A descriptive term for encoding and decoding processes and procedures in which the output of the decoding procedure(s) is identical to the input to the encoding procedure(s).

lossless coding: The mode of operation which refers to any one of the coding processes defined in this Specification in which all of the procedures are lossless (see Annex H).

lossy: A descriptive term for encoding and decoding processes which are not lossless.

low-dynamic range: An image or image data comprised of data with no more than eight bits per sample.

marker: A two-byte code in which the first byte is hexadecimal FF and the second byte is a value between 1 and hexadecimal FE.

marker segment: A marker together with its associated set of parameters.

minimum coded unit; MCU: The smallest group of data units that is coded.

noise shaping: A signal processing technique that removes quantization noise from the low frequency components and injects it into the high frequency domain where it can be removed by filtering.

pixel: A collection of sample values in the spatial image domain having all the same sample coordinates, e.g. a pixel may consist of three samples describing its red, green and blue value.

point transform: Scaling of a sample or DCT coefficient by a factor.

precision: Number of bits allocated to a particular sample or DCT coefficient.

premultiplied component: An image component that has already multiplied by the scaled value of the alpha channel on a pixel-by-pixel basis to ease the composition of the image with the background.

procedure: A set of steps which accomplishes one of the tasks which comprise an encoding or decoding process.

quantization value: An integer value used in the quantization procedure.

quantize: The act of performing the quantization procedure for a DCT coefficient.

residual decoding path: The collection of operations applied to the entropy coded data contained in the residual data box and residual refinement scan boxes up to the point where this data is merged with the legacy data to form the final output image.

residual image: The sample values as reconstructed by inverse quantization and inverse DCT transformation applied to the entropy-decoded coefficients described by the residual scan and residual refinement scans.

residual scan: An additional pass over the image data invisible to legacy decoders which provides additive and/or multiplicative correction data of the legacy scans to allow reproduction of high-dynamic range or wide color gamut data.

refinement scan: An additional pass over the image data invisible to legacy decoders which provides additional least significant bits to extend the precision of the DCT transformed coefficients. Refinement scans can be either applied in the legacy or residual decoding path.

sample: One element in the two-dimensional image array which comprises a component.


sample grid: A common coordinate system for all samples of an image. The samples at the top left edge of the image have the coordinates (0,0), the first coordinate increases towards the right, the second towards the bottom.

scan: A single pass through the data for one or more of the components in an image.

scan header: A marker segment that contains a start-of-scan marker and associated scan parameters that are coded at the beginning of a scan.

superbox: A box that carries other boxes as payload data.

table specification data: The coded representation from which the tables used in the encoder and decoder are generated and their destinations specified.

(uniform) quantization: The procedure by which DCT coefficients are linearly scaled in order to achieve compression.

upsampling: A procedure by which the spatial resolution of a component is increased.

vertical sampling factor: The relative number of vertical data units of a particular component with respect to the number of vertical data units in the other components in the frame.

zero byte: The 0x00 byte.

zig-zag sequence: A specific sequential ordering of the DCT coefficients from (approximately) lowest spatial frequency to highest.

3.2 Symbols

X Width of the sample grid in positionsY Height of the sample grid in positionsNf Number of components in an imagesi,x Subsampling factor of component i in horizontal directionsi,y Subsamplng factor of component i in vertical directionHi Subsampling indicator of component i in the frame headerVi Subsampling indicator of component i in the frame headervx,y Sample value at the sample grid position x,yh Additional number of DCT coefficients bits represented by refinement scans, 8+h is the number of non-

fractional bits (i.e. bits in front of the "binary dot") of the output of the inverse DCT process.Rb Additional bits in the HDR image. 8+Rb is the sample precision of the reconstructed HDR image.

3.3 AbbreviationsFor the purposes of this Recommendation | International Standard, the following abbreviations apply.

ASCII American Standard Code for Information Interchange LSB Least Significant BitMSB Most Significant BitHDR High Dynamic RangeIDR Intermediate Dynamic RangeLDR Low Dynamic RangeTMO Tone Mapping OperatorDCT Discrete Cosine Transformation

4 Conventions

4.1 Conformance languageThis Recommendation | International Standard consists of normative and informative text.

Normative text is that text which expresses mandatory requirements. The word "shall" is used to express mandatory requirements strictly to be followed in order to conform to this Specification and from which no deviation is permitted. A conforming implementation is one that fulfils all mandatory requirements.

Informative text is text that is potentially helpful to the user, but not indispensable and can be removed, changed or added editorially without affecting interoperability. All text in this Recommendation | International Standard is


normative, with the following exceptions: the Introduction, any parts of the text that are explicitly labelled as "informative", and statements appearing with the preamble "NOTE" and behaviour described using the word "should". The word "should" is used to describe behaviour that is encouraged but is not required for conformance to this Specification.

The keywords "may" and "need not" indicate a course of action that is permissible in a conforming implementation.

The keyword "reserved" indicates a provision that is not specified at this time, shall not be used, and may be specified in the future. The keyword "forbidden" indicates "reserved" and in addition indicates that the provision will never be specified in the future.

4.2 OperatorsNOTE – Many of the operators used in this Recommendation | International Standard are similar to those used in the C programming language.

4.2.1 Arithmetic operators+ Addition Subtraction (as a binary operator) or negation (as a unary prefix operator)

* Multiplication

/ Division without truncation or rounding.

umod x umod a is the unique value y between 0 and a-1

for which y+Na = x with a suitable integer N.

4.2.2 Logical operators|| Logical OR

&& Logical AND

! Logical NOT x {A, B} is defined as (x == A || x == B) x {A, B} is defined as (x != A && x != B)

4.2.3 Relational operators> Greater than

>= Greater than or equal to

< Less than

<= Less than or equal to

== Equal to

!= Not equal to

4.2.4 Precedence order of operatorsOperators are listed below in descending order of precedence. If several operators appear in the same line, they have equal precedence. When several operators of equal precedence appear at the same level in an expression, evaluation proceeds according to the associativity of the operator either from right to left or from left to right.


Operators Type of operation Associativity

(), [ ], . Expression Left to Right Unary negation

*, / Multiplication Left to Right

umod Modulo (remainder) Left to Right

+, Addition and Subtraction Left to Right

< , >, <=, >= Relational Left to Right

4.2.5 Mathematical functionsx Ceil of x. Returns the smallest integer that is greater than or equal to x.

x Floor of x. Returns the largest integer that is lesser than or equal to x.

|x| Absolute value, is –x for x < 0, otherwise x.

sign(x) Sign of x, zero if x is zero, +1 if x is positive, -1 if x is negative.

clamp(x,min,max) Clamps x to the range [min,max]: returns min if x < min, max if x > max or otherwise x.

power(x,a) Raises the value of x to the power of a. x is a non-negative real number, a is a real number. Power(x,a) is equal to exp(a×log(x)) where exp is the exponential function and log() the natural logarithm. If x is zero and a is positive, power(x,a) is defined to be zero.

5 GeneralThe purpose of this clause is to give an informative overview of the elements specified in this Specification. Another purpose is to introduce many of the terms, which are defined in clause 3. These terms are printed in italics upon first usage in this clause.

There are three elements specified in this Specification:

a) An encoder is an embodiment of an encoding process. An encoder takes as input digital source image data and encoder specifications, and by means of a specified set of procedures generates as output a codestream.

b) A decoder is an embodiment of a decoding process. A decoder takes as input a codestream, and by means of a specified set of procedures generates as output digital reconstructed image data.

c) The codestream is a compressed image data representation that includes all necessary data to allow a (full or approximate) reconstruction of the sample values of a digital image. Additional data might be required that define the interpretation of the sample data, such as color space or the spatial dimensions of the samples.

5.1 High Level Overview on ISO/IEC 18477-9 (Informative)

5.2 Encoder requirementsAn encoder is only required to meet the compliance tests and to generate the codestream according to the syntax defined in this Recommendation | International Standard. How the codestream is algorithmically constructed and how the boxes are laid out is implementation specific and not within scope of this Recommendation | International Standard. Subsequent Recommendations | Standards of the ISO/IEC 18477 family may, however, define additional restrictions and requirements, either within the standard itself, or within profiles that restrict the freedom of the encoder further.

An encoder claiming to be compliant to one of these profiles then shall conform to the syntax constraints defined in the corresponding profile of the corresponding part of ISO/IEC 18477.


5.3 Decoder requirementsA decoding process converts compressed image data to reconstructed image data. It shall follow the decoding operation specified in the Recommendation | International Standard and ISO/IEC 18477-1 to reconstruct a legacy 8 bits/channel standard low dynamic range image. It is not required that a conforming decoder is capable of decoding and interpreting all box types defined in this or other members of the ISO/IEC 18477 family of standards. A decoder implementation is always free to skip over box types it is unable or not willing to support.

In order to comply with this Specification, a decoder

a) may convert a codestream conforming to this Recommendation | International Standard without considering any boxes into to a low dynamic range image.

b) may additionally convert a conforming codestream including the information in some boxes into an image (of higher precision, higher quality or higher bit-depths) and into an alpha-channel.

c) shall implement at least all the functional blocks of the JPEG XT decoding process defined in the profile it claims to be conforming to, where profiles are defined in this and other parts of the ISO/IEC 18477 family of standards. For that, a conforming decoder shall correctly interpret all box types required in the definition of the profile.


Annex ADefinition of the Decoding Process

(This annex forms an integral part of this Recommendation | International Standard)

A.1 Decoding Process (Normative)

This Annex extends the File Format specified in ISO/IEC 18477-3 by introducing additional boxes carrying entropy-coded data and metadata required for signaling the alpha channel content of an image. The decoding process is depicted in Figure A-1. The decoder first decodes the foreground image contained in the codestream and boxes of the ISO/IEC 18477-3 conforming file, giving one or three components per sample process. The decoding process of the foreground image from the codestream is specified in other parts of the ISO/IEC 18477 family of standards and not repeated here. The decoder then proceeds to decode the codestream in the Alpha-codestream box B1 and the Alpha refinement box B1a if they are present, giving a precursor alpha channel plane denoted by Hi. If the Alpha-codestream box is not present, Hi will be constant zero. Decoding then proceeds to the Residual Alpha box B5, and the Residual Alpha Refinement box B5a. If decoded, they provide either a lossless error residual, denoted by Q i, to enable lossless coding of the alpha channel. The Base alpha channel H i is added to the residual alpha channel Qi forming the intermediate output Fi, which is then converted or scaled to range, giving a number between 0 and one. This is the output of the decoding process of this Recommendation | International Standard.

−−

T.81 10918-1 EntropyDecoder


Refinement ScanRefinement Scan

++

Residual Scanor T.81 101918

Entropy Decoder

Residual Scanor T.81 101918

Entropy Decoder

Residual Ref'ment ScanResidual Ref'ment Scan

Base NLTPointTrafo

Base NLTPointTrafo

ResidualNLTPointTrafo

ResidualNLTPointTrafo

Alpha channel

Oi Hi

Qi

PiRi

OutputConversion

OutputConversion

B1

B1a

B4

B5

B5a

B7

B10

Fi

InverseQNT

InverseQNT

FDCTOr

IDCT

FDCTOr

IDCT

InverseQNTNoise

Shaping

InverseQNTNoise

Shaping

IDCTOr

DCTBypass

IDCTOr

DCTBypass

B1b B1c

B5b B5c



ISO/IEC 18477-XDecoder

ISO/IEC 18477-XDecoder xx

xxOther image sourceOther image source

premultiplied

1-α1-α

++

Foreground image

Background image

ComposedimagelB1 B9

MatteColor

MatteColor xx

matte removal

+-

Figure A-1: High-Level overview of the JPEG XT Decoding of Alpha Channels and Image Decomposition


Tim Bruylants, 27/11/14,

What title do we put here?

A.2 Composition of Foreground and Background Image (Informative)The following processing steps are non-normative and typically carried out outside of the ISO/IEC 18477-9 decoder process: If the foreground image is not premultiplied, then all of the foreground sample values are point-wise multiplied by the respective sample values of the alpha-channel.

Generation of the background image is also outside the scope of this Recommendation | International Standard. It is typically given by the application using an ISO/IEC 18477-9 decoder. It is point-wise multiplied with (1-α) where α is the reconstructed sample value of the decoding process described in subclause A.1 and specified in subclause A.3.

The results of the two multiplication operations, on foreground and background images, are then added together forming the final output image.

A.3 Reconstruction of the Alpha Channel (Normative)The following steps shall be followed to reconstruct the sample values α of the alpha channel:

In step B1, the content of the Alpha-Codestream box is entropy decoded, by means of one of the Huffman coding modes of ISO/IEC 18477, i.e. by the baseline sequential, extended sequential or progressive modes of ITU Recommendation T.81 | ISO/IEC 10918-1. If this codestream is present, the width and the height of the sample grid described by the alpha-codestream shall match those of the foreground image, and the number of components of the alpha-codestream shall be one.

In step B1a, decoding the content of the Alpha-Refinement box, if it is present, refines the precision of the reconstructed alpha-DCT-coefficients. This box shall only exist if the Alpha-Codestream box is present. Entropy decoding of the contents of the Alpha-Refinement box shall follow the specifications of Refinement Coding as defined in Annex D of ISO/IEC 18477-6.

In steps B1b and B1c, the data is inversely quantized and inversely DCT transformed. The Lf flag of the Alpha Output Conversion box and the Alpha Base DCT box define the DCT transformation to select. See Annex B for details. The DCT transformations are either defined by ITU Recommendation T.81 | ISO/IEC 10918-1 or ISO/IEC 18477-8 and are selected according to the above boxes. The output of this process is the reconstructed base alpha sample values Oi.

In step B4, an optional non-linear point-transformation is applied to the sample values O i generating the precursor alpha channel sample values Hi. This transformation is selected by the Base Transformation Box within the Alpha Merging Specification box if it is present, or is a scaling process in its absence. The transformation selected by the Base Transformation box is either an Integer or Floating Point Lookup Table, or a Parametric Curve box describing the transformation. If lossless coding of alpha channels is required, then either the Base Transformation box shall reference an Integer Table Lookup table, or no Base Transformation box shall be present.

In step B5, the contents of the Residual Alpha Codestream box is entropy decoded, if it is present. This uses either one of the coding modes of ISO/IEC 18477 (sequential baseline, extended sequential or progressive), or the DCT bypass or large-range DCT coding mode specified in Annex D of ISO/IEC 18477-8. The dimensions of the image described by such an optional box shall be identical to the dimensions of the foreground image, and the Number of Components Nf of this image shall be one.

In step B5a, decoding the contents of the Residual Alpha Refinements box, if it is present, refines the precision of the reconstructed residual alpha-DCT-coefficients. This box shall only exist if the Residual Alpha Codestream box is present. Entropy decoding of the contents of the Residual Alpha-Refinement box shall follow the specifications of Annex D of ISO/IEC 18477-6 if the Residual Alpha Codestream box uses one of the coding modes of ISO/IEC 18477-1, or the Residual Refinement Coding defined in subclause D.3 of ISO/IEC 18477-8 if the Residual Alpha Codestream is encoded in the DCT-bypass or large-range DCT mode.

In steps B5b and B5c, the residual data is inversely quantized and inversely DCT transformed; the DCT transformation is optionally bypassed in which case inverse quantization may be replaced by inverse noise shaping. The DCT process itself is specified by the Residual DCT box, see Annex B for details. The DCT transformation shall conform to ITU Recommendation T.81 | ISO/IEC 10918-1, if the Lf flag of the Output Conversion box is zero. Otherwise, the Residual DCT box determines the DCT transform as being one of the DCT processes specified in Annex E of ISO/IEC 18477-8. The output of this process are residual alpha channel sample values Ri.

In step B7, the residual alpha channel data R i optionally undergoes a non-linear point transformation, selected by the Residual Non-Linear Point Transformation box. This transformation is either an Integer or Floating Point Lookup Table, or a Parametric Curve box describing the nonlinearity. If this box is absent, then the input is scaled to match the output. If lossless encoding of alpha channels is desired, then this box shall only


reference an Integer Table Lookup, or shall not be present at all. The output of this step is the final Alpha channel residual sample values Pi.

Step B9 merges the final alpha base sample values H i and the final alpha residual sample values Pi to form the intermediate alpha output values Fi by

Fi := Hi + Pi−2Rb+8−1 umod 2Rb+8 if the Lf flag of the Alpha Output Conversion box is set or

Fi := Hi + Pi−2Rb+8−1 if the Lf flag of the Alpha Output Conversion box is zero

The residual alpha channel box is optional. In case it does not exist, the implied value P i shall be 2Rb+8-1 where the value of Rb is taken from the Output Conversion box within the Alpha Merging Specification box.

The sample intermediate alpha values Fi are further processed according to an algorithm defined by the Output conversion box. This conversion is either a scale to unit range, a pseudo-logarithmic map from Annex D of ISO/IEC 18477-7 or a generic non-linear mapping defined by a Parametric Curve box. See Annex B for details. The output of this process is the final alpha channel. This output conversion process transforms the sample from its bit precision to a number in the range [0..1].

Further processing of the reconstructed alpha channel is outside the scope of this Recommendation | International Standard. Typically, however, the foreground image is multiplied by the reconstructed alpha value on a pixel-by-pixel basis. Or, alternatively, the image is already pre-multiplied, and then added to the background image multiplied by (1-α) where α is the same reconstructed alpha sample value. Optionally, the shade color M is removed from the background image M before merging foreground and background. The indicator of whether the foreground image is premultiplied or not, or whether shade removal is desired or not is defined by the Alpha Channel Compositing Box within the Alpha Merging Specification box. This box is specified in Annex B. Recommended image composition algorithms are listed in Table A-1.

C = α×A +(1−α)×B for non-premultiplied content and

C = A +(1−α)×B for premultiplied content

C = A +(1−α)×(B-M) for premultiplied content with shade removal

Table A-1: Image Compositon Algorithms

NOTE – Premultiplication reduces the algorithmic complexity of the reconstruction process by computing the product of the image and the alpha channel already at the encoder side. It replaces transparent image regions with black color. If this is undesirable, the black color can be replaced by any other color M at encoder side that is then removed by the compositing process by subtracting it from the background.


Annex BBoxes


B.1 Introduction

This Annex defines selects and refines a subset of the boxes defined in ISO/IEC 18477-3 for the purpose of representing alpha channels. It lists those boxes of ISO/IEC 18477-3 that are required for this Recommendation | International standard. All other boxes are optional and its interpretation is outside the scope of this Recommendation | International standard. Other parts of ISO/IEC 18477 or other standards may define their meaning, and decoders conforming to this standard may ignore them.

Table B-1 lists the boxes required in this Recommendation | International standard that are paired with ISO/IEC 18477-3. Some of the boxes require additional specifications that are listed in subsequent clauses of this Annex.

Box Name Box Type Box layout and box structure defined in which subclause of

ISO/IEC 18477-3

Further definitions in which subclause of

this Recommendation | International

StandardFile Type Box 0x66747970 ("ftyp"). B.8

Legacy Data Checksum Box

0x4C43484B ("LCHK") B.7

Alpha Codestream Box

0x414C4641 ("ALFA") B.XX

Alpha Refinement Box

0x4146494E ("AFIN") B.XX

Residual Alpha Codestream Box

0x41524553 ("ARES") B.XX

Residual Alpha Refinement Box

0x41525246 ("ARRF") B.XX

Alpha Merging Specification Box

0x41535043 ("ASPC") B.XX

Refinement Specification Box

0x52535043 ("RSPC") B.15

Parametric Curve Box 0x43555256 ("CURV") B.4Integer Table Lookup Box 0x544f4e45 ("TONE") B.2Output Conversion

Box0x4F434F4E ("OCON") B.14 B.2

Base Non-linear Point Transformation

Specification Box

0x4C505453 ("LPTS") B.16 B.3

Residual Non-linear Point Transformation

Specification Box

0x51525453 ("QPTS") B.16 B.4

Base DCT Specification Box

0x4C444354 ("LDCT") B1.8 B.5

Residual DCT Specification Box

0x52444354 ("RDCT") B.18 B.6


thor, 18/11/14,

These boxes need to be defined in 18477-3

Alpha Channel Composition Box

0x414D554C ("AMUL") B.XX

B.2 Output Conversion BoxThis mandatory box defines the conversion process from the result of the base and residual opacity data merging process to the final opacity samples. It describes the final merging process and by that step B10 of the algorithm described in subclause A.1. The purpose of this box is specify the transformation that transformes the decoded sample ranges to the unit range [0..1]. In specific, the box parameters shall be selected such that the reconstructed sample values are always within this interval.

This box is already defined in subclause B.14 of ISO/IEC 18477-3, though its application to this Recommendation | International Standard further constraints the value of its fields.

This box shall never appear top level in the file, but it shall be a subbox of the Alpha Merging Specification Box defined in Annex B of ISO/IEC 18477-3. Exactly one Output Conversion Box shall appear in the Alpha Merging Specification Box.

Table B-2 constraints the parameters of the Output Conversion box as applied in this Recommendation | International Standard.

Parameter Constraints within this Recommendation | International

Standard

Meaning

Rb 0..8 Number of additional bits available for Opacity data due to the Alpha

Residual Codestream. The bit precision of the reconstructed

opacity data shall be computed as 8+Rb.

The parameter of this field shall be 8 if Oc is 1.

Lf 0..1 This field indicates whether lossless/near lossless or lossy coding

of opacity data is intended.

If this field is zero, lossy coding is intended and the decoder may pick any DCT implementation as long as

it follows the constraints of ITU Recommendation T.83 | ISO/IEC

10918-2.

If this field is one, implementations shall follow the DCT operation as selected by the Base and Residual

DCT Specification Box, and as specified in Annex E of ISO/IEC 18477-8. Furthermore, merging

requires modulo arithmetic.

Oc 0..1 Pseudo-logarithmic output enable flag.

If this flag is set, the half-logarithmic map defined in Annex D of ISO/IEC 18477-7 shall be enabled. This step is performed after clipping the data to range, if enabled by the Ce flag,

but before applying the output


thor, 18/11/14,

This should also go into -3.

transformation, if enabled by the Ol flag.

If this flag is reset, the output of the clipping stage is the input to the non-

linear transformation directly.

If this flag is set, the value of Rb

shall be 8 and the Alpha Codestream Box shall be present, otherwise the

value of Rb shall be in the range [0,8].

Ce 0..1 This field indicates whether the ouput of the merging process, i.e. the

sum of base and residual image, shall be clipped to range [0,2Rb-1]

before processing the data any further.

If the Ce flag and the Oc flag are both enabled, clipping is applied

before conversion to floating point.

Ol 0..1 This field indicates whether an output lookup or point

transformation is required.

If enabled, the output transformation is specified by the to0 through to3

fields.

If disabled, no further transformation is performed and the output of the

clipping step and/or half-logarithmic map is already the opacity

information.

The output transformation by the Ol field is the final step of the output

formation and applied after clipping and conversion to floating point.

to0 0 If Ol is one, this field defines the output table for the opacity data.

to1 0 Unused, shall be zero.



Table B-2: Parameter Constraints for the Output Conversion Box.


B.3 Base Non-linear Point Transformation Specification BoxThis box defines the non-linear point transformation between the samples as reconstructed from the Alpha Codestream box Oi and the input Hi of the merging process with the residual alpha data. It thus defines step B4 in the decoder description in Annex A. Its box layout and box structure is given by the Non-Linear Point Transformation Specification box, defined in subclause B.18 of ISO/IEC 18477-3. This box shall only be present if the Alpha Codestream box is present, and it shall only exist as a subbox of the Alpha Merging Specification box.

Additional constraints apply, however: If the Lf flag of the Output Conversion box is set, then the td i values of the box shall only reference Integer Table Lookup boxes. References to Floating point Table Lookup boxes or Parametric Curve boxes shall not be used in this case. If Lf is zero, no such constraints apply. The non-linear point transformation itself is given by the process specified in Annex C of ISO/IEC 18477-3. It requires four additional parameters, the input range Rw,Re and the output range Rt,Rf. The two value pairs shall be given as follows:

Rw = 8 + Rh Re = 0

Rt = 8 + Rb Rf = 0

The value Rh is the number of refinement scans in the legacy decoding path and is found in the Refinement Specification box as subbox of the Alpha Merging Specification box. The Refinement Specification box is defined in subclause B.15 of ISO/IEC 18477-3. If the Refinement specification box is absent, the inferred value of R h is 0. The value Rb is found in the Output Conversion box, where Rb+Po defines the total output precision of the image.

If this box is not present but the Alpha Codestream Box is, reconstructed sample v values from this box shall be scaled to output values w by

w = v × 2Rt-Rw if Rw ≤ Rt

w = v/2Rw-Rt if Rw>Rt

The type of this box shall be 0x4C52505453, ASCII encoding of “LPTS”. The box structure and layout does not deviate from that in subclause B.16 of ISO/IEC 18477-3.

B.4 Residual Non-linear Point Transformation Specification BoxThis box defines the non-linear point transformation between the output of the residual DCT process R i and the input Pi

of merging process with base opacity data. It implements step B7 of Figure A-1 in Annex A. The box structure and layout is already defined in subclause B.17 of ISO/IEC 18477-3, though its purpose is refined here and additional constraints apply. At most one Residual non-linear Point Transformation Specification box shall exist as a sub-box of the Alpha Merging Specification Box. This box shall only be present if the Residual Alpha Codestream box is present.

If the Lf flag of the Output Conversion box as subbox of the Alpha Merging Specification box is one, the td i values shall only reference Integer Table Lookup boxes. References to Floating point Table Lookup boxes or Parametric Curve boxes shall not be used in this case. If Lf is zero, only Parametric Curve boxes shall be referenced by the Residual Non-linear Point Transformation Specification box, Integer or Floating Point Table Lookup boxes shall not be used in this case.

If this box is not present, input values v as reconstructed from the Residual Alpha codestream shall be scaled to output values w by

w = v × 2Rb+Po+Rr-P if Rr + P ≤ Rb + Po

w = v/2Rr+P−Rb-Po if Rr + P > Rb + Po

where Rr is the number of refinement scans in the residual decoding path and is found in the Refinement Specification box defined in subclause B.15 of ISO/IEC 18477-3 and Rb is the number of excess integer bits defined by the Output Conversion box specified in subclause B.2. If the Refinement specification box is absent, the inferred value of R r is 0. P is the frame precision of the codestream, as recorded in the frame header of the codestream in the Residual Alpha Codestream box.

The non-linear Point Transformation as specified in Annex C of ISO/IEC 18477-3 requires two additional input parameter pairs, namely Rw,Re and Rt,Rf. They shall be computed as follows:

Rw = P + Rr Re=0

Rt = 8 + Rb Rf=0

Parameters P, Rr, Rb are as above.


NOTE – The constraints and Rt,Re and Rw,Rf parameters of the Residual Non-linear Point Transformation Specification box in this Recommendation | International Standard differs slightly from the constraints and definitions in ISO/IEC 18477-6 and ISO/IEC 18477-7.

B.5 Base DCT Specification BoxThis box defines the DCT operation in the legacy decoding path. It shall be present as a subbox of the Alpha Merging Specification box if and only if the Lf flag of the Output Conversion box in the Alpha Merging Specification box is one and the Alpha Codestream box exists. This box shall never appear at top-level of the file.

Lossless and near-lossless decoding requires a fully specified, bit-precise DCT, two of which are specified in Annex E of ISO/IEC 18477-8. It defines the operation of the B1c box in the functional diagram of Annex A. This box uses the layout and structure of the DCT Specification box defined in subclause B.18 of ISO/IEC 18477-3, but refines its parameters.

The box selects between two possible DCT implementations: The IDCT is a fully invertible integer-to-integer transformation of a relatively high implementation complexity. It allows lossless coding even without a residual codestream. The FDCT is a fixed-point approximation of the DCT that is only invertible up to a small error. However, since the FDCT is fully specified, the error at decoder side can be predicted precisely and can be corrected by an additional residual scan over the data. The implementation complexity of the FDCT is much lower than that of the IDCT.

The type of the Base DCT Specification box shall be 0x4C444354, ASCII encoding of "LDCT".

Table B-3 constraints the parameters and parameter sizes, Table B-4 specifies the encoding of the dct Parameter.

Parameter Constraints within this Recommendation | International

Standard

Meaning

Dct 0,2 Selects the DCT that shall be used to reconstruct the legacy image. See

Table B-4 for the encoding.

Ns 0 Reserved for ISO/IEC purposes

Table B-3: Parameter Constraints for the Base DCT Specification Box

Value Transformation to be used

0 The FDCT shall be used

2 The IDCT shall be used

All other values Reserved for ISO/IEC purposes

Table B-4: Encoding of the dct Parameter of the Base DCT Specification Box

B.9 Residual DCT Specification BoxThis box defines the DCT operation and the noise shaping in the residual decoding path and thus selects the DCT transformation to be used for step B5c of the functional diagram in Annex A. It shall be present as a subbox of the Alpha Merging Specification box if and only if the Lf flag of the Output Conversion box in the Alpha Merging Specification box is one and the Residual Alpha Codestream box exists. This box shall never appear at top-level of the file.

Its structure and layout is defined by the DCT Specification box as specified in subclause B.18 of ISO/IEC 18477-4, though the purpose of the box and its parameters are refined in this subclause.

To enable lossless coding, the residual coding pass corrects for residual errors of the legacy decoding pass and is thus fully invertible. Two choices exist for the DCT: Either the invertible IDCT or the DCT bypass process which replaces the DCT by a simple level shift. The latter shift uses a slightly modified entropy coding defined in Annex D of ISO/IEC 18477-8. For near lossless coding, the DCT or DCT-bypassed signal is quantized, causing quantization artifacts if the


DCT is bypassed and image residuals are quantized directly in the spatial domain. An optional Noise Shaping process specified in Annex G of ISO/IEC 18477-8 avoids this problem.

The type of the Residual DCT Specification box shall be 0x52444354, ASCII encoding of "RDCT".

Table B-5 refines the definition of the parameters and parameter sizes. Table B-6 describes the encoding of the dct Parameter.

Parameter Parameter Constraints within this Recommendation | International

Standard

Meaning

dct 2,3 Selects the DCT that shall be used to reconstruct the legacy image. See

Table B-6 for the encoding.

RNs 0,1 If zero, Noise Shaping is disabled.

If one, Noise Shaping as specified in Annex G shall be enabled.

The value of this parameter shall be zero if dct is 2.

All other values are reserved for ISO/IEC purposes.

Table B-5: Parameter Constraints for the Residual DCT Specification Box

Value Transformation to be used

2 The IDCT shall be used

3 The DCT bypass process shall be used.

All other values Reserved for ISO/IEC purposes

Table B-6: Encoding of the dct Parameter of the Residual DCT Specification Box

B.XX Alpha Merging Specification BoxThis box is a superbox that encapsulates all boxes describing the process of generating the alpha channel from the Alpha Codestream Box, the Alpha Refinement Box, the Residual Alpha box and the Residual Alpha Refinement box. It does not contain the entropy coded data itself. Otherwise, this box is similar in content and purpose to the Merging Specification box.

At most one Alpha Merging Specification box shall be present in the codestream, and this box shall be present if and only if the JPEG XT file includes opacity information. The contents of this superbox are specified in additional parts of the ISO/IEC 18477 family of standards. This box shall be represented by a single JPEG XT marker segment in front of the first SOF marker of the 18477-1 codestream.

The type of this box shall be 0x41535043, ASCII encoding of "ASPC".

Figure B-XX describes the organization of this box. Subboxes of this box are defined in Annex B of this Recommendation | International Standard.


thor, 18/11/14,

Move this into -3.

Boxes (see subclause A.5)

Figure B-XX: Organization of the Alpha Merging Specification Box

B.XX Alpha Codestream BoxThis box encapsulates entropy coded segments forming the base layer of the opacity data of an image. The payload of this box shall form a codestream conforming to ITU Recommendation T.81 | ISO/IEC 10918-1 confined to the baseline Huffman, extended Huffman or progressive Huffman coding modes, i.e. coding modes permitted by ISO/IEC 18477-1. The sample values reconstructed from this data then undergo the Alpha merging process as specified in Annex A of ISO/IEC 18477-9, and as described by the boxes within the Alpha Merging Specification Box, see subclause B.xx. This box shall be present if the Output Conversion box in the Alpha Merging Specification box is present.

The type of this box shall be 0x414C4641, ASCII encoding of "ALFA". There shall be at most one stream of opacity data after concatenating all JPEG extension marker payload data belonging to this box type.

NOTE – This box is an optional box. This box may be missing because an image does not include opacity information.

The structure of this box is defined in Figure B-XX, the parameters and sizes in Table B-2.

Entropy Coded Data

Figure B-XX: Organization of the Alpha Codestream Box

Parameter Size (bits) Value Meaning

Data Varies Varies Entropy coded data segment of variable lengths.

Figure B-XX: Alpha Codestream Box, Parameters and Sizes

B.XX Alpha Refinement Data BoxThis box encapsulates entropy coded data segments created by a single refinement scan over the opacity image plane. If the number of additional alpha refinement bits, the Rh-Parameter of the Refinement Specification Box (see subclause B.15 of ISO/IEC 18477-3), is non-zero, alpha refinement data encapsulated in Alpha Refinement Data Boxes shall be present. This box shall only exist if the Residual Alpha Codestream box exists.

Data contained in this box extends the precision of the alpha channel coefficients described by the Alpha Codestream box by additional least significant bits. The concatenated payload data of all Alpha Refinement Data Boxes forms the input to the refinement decoding process specified in Annex D of ISO/IEC 18477-6.

The type of this box shall be 0x4146494e, ASCII encoding of "AFIN". The structure of the payload data of this box is defined in Figure B-3, the parameters and sizes in Table B-4.

Entropy Coded Data

Figure B-XX: Organization of the Alpha Refinement Data Box



Figure B-XX: Alpha Refinement Data Box, Parameters and Sizes


thor, 18/11/14,

This needs to go to 18477-3.

thor, 18/11/14,

Move this to -3.

B.XX Residual Alpha Data BoxThis box encapsulates entropy coded segments extending the precision of opacity data in the spatial domain. It is comparable to the Residual Codestream box which includes entropy coded data to extend the precision of image data rather than opacity data. The process for merging the base layer of the opcacity data defined by the Alpha Codestream box and the Alpha Refinement box with the extension data contained in this box and the Residual Alpha Refinement box is defined by the Alpha Merging Specification box and specified in Annex A of ISO/IEC 18477-9.

The Residual Alpha Data box shall only appear at the top-level of the codestream and not as a sub-box of a superbox. Data contained in this box defines residual data that extends the legacy image encoded in the legacy JPEG stream to a HDR image. The sample precision of the samples within the codestream shall be either 8 or 12. While subsampling factors may be different from the base image, the number of components in the residual codestream and the image dimensions shall be identical to those signaled in the legacy codestream.

The type of this box shall be 0x52455349, ASCII encoding of "RESI". There shall be at most one stream of residual data after concatenating all JPEG extension marker payload data belonging to this box type.

NOTE – Unlike refinement coding, residual coding merges all data into one single box. This box may, however, extend over several JPEG XT marker segments.

The structure of this box is defined in Figure B-XX, the parameters and sizes in Table B-XX.

Entropy Coded Data, conforming to ITU Recommendation T.81 | ISO/IEC 10918-1

Figure B-XX: Organization of the Residual Alpha Data Box



Table B-XX: Residual Alpha Data Box, Parameters and Sizes

B.XX Residual Alpha Refinement BoxThis box encapsulates entropy coded data segments created by a single refinement scan over residual opacity data. Its purpose is similar to the Residual Refinement Data box, which applies to residual refinement image data. The entropy coded data contained in this box extends the bit-precision of the extension images in the DCT domain, where the refinement coding process itself is specified in Annex D of ISO/IEC 18477-6.

The Residual Alpha Refinement Box shall only appear at the top-level of the codestream and not as a sub-box of a superbox. If the number of additional residual refinement bits, the Rr-Parameter of the Refinement Specification subbox of the Alpha Merging Specification box is non-zero, residual refinement data encapsulated in Residual Alpha Refinement boxes shall be present.

Data contained in this box extends the precision of the residual alpha coefficients by one additional least significant bit. The concatenated payload data of all Residual Alpha Refinement boxes forms the input to the refinement decoding process specified in Annex D of ISO/IEC 18477-6. The data in this box encodes a single scan over the image and starts with the “Tables Misc” section followed by the “Start of Scan” marker, followed by entropy coded data, see Figure B.2 of ITU Recommendation T.81 | ISO/IEC 10918-1 in Annex B. Since a Residual Alpha Refinement box does not include a frame header, the image dimensions, subsampling factors and number of components shall be inferred from the frame header of the residual alpha codestream as contained in the Residual Alpha Data box.

If residual opacity data is refined by more than one bit, multiple Residual Alpha Refinement boxes shall be written, one per scan over the data. The Box Instance Number En, see Figure A-1, disambiguates then between the scans, i.e. the first scan has a value of En=1, the next one En=2 and so on. See Annex A of IS/IEC 18477-3 for details on the Box syntax. The number of Residual Alpha Refinement boxes in an ISO/IEC 18477-3 compliant file shall be given by the R r

parameter of the Refinement Specification Box in the Alpha Merging Specification superbox.

The type of this box shall be 0x41525246, ASCII encoding of "ARRF". The structure of the payload data of this box is defined in Figure B-xx, the parameters and sizes in Table B-xx.


thor, 18/11/14,

Move this to -3.

thor, 18/11/14,

Move this to -3.

Entropy Coded Data, single scan over the data, following the syntax of ITU Recommendation T.81 | ISO/IEC 10918-1

Figure B-xx: Organization of the Residual Alpha Refinement Box



Table B-xx: Residual Alpha Refinement Box, Parameters and Sizes

B.XX Alpha Channel Composition BoxThis box provides information on how a background image and the decoded image should be composed to a final image by means of an additional opacity channel. Decoding of opacity data is specified in ISO/IEC 18477-9 and more information on coding of opacity data is provided there. This box is a mandatory subbox of the Alpha Merging Specification box; it shall not exist at top level of the file.

This box selects from one out of three possible compositing methods decoders are suggested to follow if image compositing is intended. The methods are defined in Table A-1 of Annex A. Some of the compositing options require information about a matte background color of the foreground image that is to be removed by the compositing process. This matte color is represented by the M0 through M2 parameters for three-component images, or M0 for grey-scale images. These parameters are encoded as Rb+8 bit integers for IDR images, where Rb is the parameter indicating the image bit-precision as found in the Output Conversion box within the Merging Specification box (not within the Alpha Merging Specification box). If the foreground image is represented by floating point samples, the M i parameters are encoded as 16-bit integers and have to be converted to floating point by means of the half-logarithmic map specified in Annex D of ISO/IEC 18477-7, or equivalently, are encoded in IEC 60559 (IEEE 754) half-float numbers.

NOTE – The process of compositing a background and foreground image is outside the scope of this Recommendation|International Standard. Nevertheless, if such compositing is intended, the information provided by the box defined in this subclause should be used as an indicator to steer the process.

The type of this box shall be 0x414D554C, ASCII encoding of "AMUL". The structure of the payload data of this box is defined in Figure B-xx, the parameters and sizes in Table B-xx.

Acmp0 Acmp1 Acmp2 Acmp3 M0 M1 M2 M3

Figure B-xx: Layout of the Alpha Channel Composition Box

Parameter Size (in bits) Value Meaning

Acmp0 4 1..3 Alpha Channel Compositing Method. See

Table B-xx for its encoding

Acmp1 4 0 Shall be zero. Reserved for ITU/ISO purposes.



M0 16 0..2Rb+8−1 First component of the Matte Background if required by Acmp0


thor, 18/11/14,

Move this to -3.

M1 16 0..2Rb+8−1 Second component of the Matte Background if required by Acmp0

M2 16 0..2Rb+8−1 Third component of the Matte Background if required by Acmp0

M3 16 0 Reserved for ITU/ISO purposes.

Table B-xx: Parameters of the Alpha Channel Composition Box

Value Compositing Method

(A=foreground image data,B= background image, C=composed

output image, M=matte color from the Mi parameters of Table B-xx and α the reconstructed opacity channel)

Notes

0 C=A Opaque, no compositing.

1 C = α×A +(1−α)×B Regular Alpha Compositing

2 C = A +(1−α)×B Alpha Compositing with pre-multiplied foreground image.

3 C = A +(1−α)×(B-M) Alpha Compositing with pre-multiplied foreground image and

matte color removal.

All other values invalid Reserved for ITU/ISO purposes

Table B-xx: Encoding of the Acmp0 parameter of the Alpha Channel Composition Box


Annex CProfiles


C.1 IntroductionThis Annex defines two profiles that limit the choices of coding parameters offered by this Recommendation | International Standard. The base profile only supports lossy encoding of alpha channels with an 8-bit opacity resolution whereas the full profile supports all features of the standard.

C.2 Base ProfileThis profile allows lossy encoding of opacity information with a resolution of 8-bits. The profile indicator of this profile, i.e. the entry CLi in the compatibility list of the File Type box shall be 0x61636270, ASCII encoding for “acbp”. The constraints for this profile are listed in Table E-1.

Box Name Constraint

Alpha Channel Codestream Box None (ISO/IEC 10918-1, 8 bit resolution, baseline Huffman, extended Huffman, progressive Huffman only)

Alpha Refinement Box Boxes of this type shall not be present.

Residual Alpha Codestream Box Boxes of this type shall not be present.

Residual Alpha Refinement Box Boxes of this type shall not be present.

Refinement Specification Box Boxes of this type shall not be present.

Parametric Curve Box A single parametric curve box shall be present, defining the output transformation. It is referenced by the Output

Conversion box.

Integer Table Lookup Box Boxes of this type shall not be present.

Output Conversion Box This box shall be present with the following parameters:

Rb=0, indicating an 8-bit resolution.

Lf=0, indicating lossy coding.

Oc=0, i.e. integer samples are coded.

Ce=1, the output is clipped to range [0..255]

Ol=1, an additional transformation is applied.

to0 shall point to a single parametric curve box of type t=5, e=1 requesting a transformation by a linear ramp with parameters P1=0 and P2=1/255. This scales the opacity information to range [0..1] as required by the compositing algorithm.

Base Non-linear Point Transformation Specification Box

This box shall not be present, i.e. sample values Hi that form the input of the merging process are identical to the sample values Oi reconstructed from the contents of the

Alpha Codestream box.

Residual Non-linear Point Transformation Specification Box

Boxes of this type shall not be present.


Base DCT Specification Box This box shall not be present since Lf=0.

Residual DCT Specification Box This box shall not be present.

Alpha Channel Composition Box This box shall be present as a sub-box of the Alpha Merging Specification box.

Table E-1: Constraints of the Base Profile

C.2 Full ProfileThis profile does not define any constraints and allows the use of all features defined in this Recommendation | International Standard. The profile indicator of this profile, i.e. the entry CLi in the compatibility list of the File Type box shall be 0x61636670, ASCII encoding for “acfp”.

Annex DReferences


template for common text iso/uit-t · web viewtitle template for common text iso/uit-t subject...

Documents