iso/iec jtc 1/sc 29 n · web viewin part 3: audio, subpart 4, subclause 4.6.18 sbr tool, after...
TRANSCRIPT
INTERNATIONAL ORGANIZATION FOR STANDARDIZATIONORGANISATION INTERNATIONALE NORMALISATION
ISO/IEC JTC 1/SC 29/WG 11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC 1/SC 29/WG 11N7935Jan 2006, Bangkok, Thailand
Source: Audio Subgroup
Title: Text of ISO/IEC 14496-3:2005/PDAM 5, BSAC extensions
Status: Approved
Document type: Document subtype: Document stage: Document language:
/tt/file_convert/6009567657b9d06aa545bcf5/document.doc
ISO/IEC JTC 1/SC 29 N
Date: 2006-01-22
ISO/IEC 14496-3:200X/PDAM 5
ISO/IEC JTC 1/SC 29/WG 11
Secretariat:
Information technology — Coding of audio-visual objects — Part 3: Audio, AMENDMENT 5: BSAC extensions
Élément introductif — Élément central — Partie 3: Titre de la partie
Warning
This document is not an ISO International Standard. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an International Standard.
Recipients of this draft are invited to submit, with their comments, notification of any relevant patent rights of which they are aware and to provide supporting documentation.
ISO/IEC 14496-3:200X/PDAM 5
Copyright notice
This ISO document is a working draft or committee draft and is copyright-protected by ISO. While the reproduction of working drafts or committee drafts in any form for use by participants in the ISO standards development process is permitted without prior permission from ISO, neither this document nor any extract from it may be reproduced, stored or transmitted in any form for any other purpose without prior written permission from ISO.
Requests for permission to reproduce this document for the purpose of selling it should be addressed as shown below or to ISO's member body in the country of the requester:
[Indicate the full address, telephone number, fax number, telex number, and electronic mail address, as appropriate, of the Copyright Manger of the ISO member body responsible for the secretariat of the TC or SC within the framework of which the working document has been prepared.]
Reproduction for sales purposes may be subject to royalty payments or a licensing agreement.
Violators may be prosecuted.
© ISO/IEC 2006 — All rights reserved III
ISO/IEC 14496-3:200X/PDAM 5
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
Amendment 5 to ISO/IEC 14496-3:200X was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 29, Coding of Audio, Picture, Multimedia and Hypermedia Information.
This Amendment specifies the normative syntax of the integration of ER-BSAC and SBR and the decoding process. An informative encoder description is also given.
IV © ISO/IEC 2006 — All rights reserved
ISO/IEC 14496-3:200X/PDAM 5
Information technology — Coding of audio-visual objects — Part 3: Audio, AMENDMENT 5: BSAC extensions
Amendment Subpart 1
1 SBR object type
In Part 3: Audio, Subpart 1, in subclause 1.5.1.2.6 SBR object type, replace Table 1.2A with the table below:
Table 1.2A – Audio object types that can be combined with the SBR Tool
Audio Object Type Combination with SBR Tool permitted
Object Type ID
Null 0AAC main X 1AAC LC X 2
AAC SSR X 3AAC LTP X 4
SBR 5AAC Scalable X 6
TwinVQ 7CELP 8HVXC 9
(Reserved) 10(Reserved) 11
TTSI 12Main synthetic 13
Wavetable synthesis 14General MIDI 15
Algorithmic Synthesis and Audio FX
16
ER AAC LC X 17(Reserved) 18
ER AAC LTP X 19ER AAC scalable X 20
ER TwinVQ 21ER BSAC X 22
ER AAC LD 23ER CELP 24ER HVXC 25ER HILN 26
ER Parametric 27
© ISO/IEC 2006 — All rights reserved 1
ISO/IEC 14496-3:200X/PDAM 5
SSC 28PS 29
(Reserved) 30(Reserved) 31
In Part 3: Audio, Subpart 1, in subclause 1.6.2.1 AudioSpecificConfig, replace table 1.8 with the table below:
“
Table 1.8 – Syntax of AudioSpecificConfig()
Syntax No. of bits MnemonicAudioSpecificConfig (){
audioObjectType = GetAudioObjectType();samplingFrequencyIndex; 4 uimsbfif ( samplingFrequencyIndex==0xf )
samplingFrequency; 24 uimsbfchannelConfiguration; 4 uimsbf
sbrPresentFlag = -1;psPresentFlag = -1;if ( audioObjectType == 5 ||
audioObjectType == 29) {extensionAudioObjectType = audioObjectType;sbrPresentFlag = 1;if ( audioObjectType == 29 ) {
psPresentFlag = 1;}extensionSamplingFrequencyIndex; 4 uimsbfif ( extensionSamplingFrequencyIndex==0xf )
extensionSamplingFrequency; 24 uimsbfaudioObjectType = GetAudioObjectType();If ( audioObjectType == 22 )
extensionChannelConfiguration 4 uimsbf}else {
extensionAudioObjectType = 0;}if ( audioObjectType == 1 || audioObjectType == 2 ||
audioObjectType == 3 || audioObjectType == 4 ||audioObjectType == 6 || audioObjectType == 7 )GASpecificConfig();
if ( audioObjectType == 8 )CelpSpecificConfig();
if ( audioObjectType == 9 )HvxcSpecificConfig();
if ( audioObjectType == 12 )TTSSpecificConfig();
if ( audioObjectType == 13 || audioObjectType == 14 ||audioObjectType == 15 || audioObjectType==16)StructuredAudioSpecificConfig();
/* the following Objects are Amendment 1 Objects */
© ISO/IEC 2006 — All rights reserved 2
if ( audioObjectType == 17 || audioObjectType == 19 ||audioObjectType == 20 || audioObjectType == 21 ||audioObjectType == 22 || audioObjectType == 23 )GASpecificConfig();
if ( audioObjectType == 24)ErrorResilientCelpSpecificConfig();
if ( audioObjectType == 25)ErrorResilientHvxcSpecificConfig();
if ( audioObjectType == 26 || audioObjectType == 27)ParametricSpecificConfig();
if ( audioObjectType == 17 || audioObjectType == 19 || audioObjectType == 20 || audioObjectType == 21 ||audioObjectType == 22 || audioObjectType == 23 ||audioObjectType == 24 || audioObjectType == 25 ||audioObjectType == 26 || audioObjectType == 27 ) {epConfig; 2 uimsbfif ( epConfig == 2 || epConfig == 3 ) {
ErrorProtectionSpecificConfig();}if ( epConfig == 3 ) {
directMapping; 1 uimsbfif ( ! directMapping ) {
/* tbd */}
}}
if ( extensionAudioObjectType != 5 && bits_to_decode() >= 16 ) {
syncExtensionType; 11 bslbfif (syncExtensionType == 0x2b7) {
extensionAudioObjectType = GetAudioObjectType();if ( extensionAudioObjectType == 5 ) {
sbrPresentFlag; 1 uimsbfif (sbrPresentFlag == 1) {
extensionSamplingFrequencyIndex; 4 uimsbfif ( extensionSamplingFrequencyIndex == 0xf ) {
extensionSamplingFrequency; 24 uimsbf}if ( bits_to_decode() >= 12 ) {
syncExtensionType;if (syncExtensionType == 0x548) {
psPresentFlag;}
}}
}if ( extensionAudioObjectType == 22 ) {
sbrPresentFlag; 1 uimsbfif (sbrPresentFlag == 1) {
extensionSamplingFrequencyIndex; 4 uimsbfif ( extensionSamplingFrequencyIndex == 0xf )
extensionSamplingFrequency; 24 uimsbf}extensionChannelConfiguration 4 uimsbf
}}
}}
© ISO/IEC 2006 — All rights reserved 3
“
In Part 3: Audio, Subpart 1, after 1.6.3.14 psPresentFlag, add:
“
1.6.3.15 extensionChannelConfiguration
A four bit field indicating the channel configuration of the BSAC multichannel extension, according to Table 1.11.
“
In Part 3: Audio, Subpart 1, after 1.6.3.14 psPresentFlag, add:
1.1.1 Implicit and Explicit signaling for BSAC extension payloads
The implicit signaling method for BSAC extension payloads is similar to that of SBR and PS tool. The BSAC decoder which can decode the BSAC extension payloads will check if there is the extension type which is related with the SBR tool such as ‘EXT_BSAC_SBR_DATA’ and ‘EXT_BSAC_SBR_ DATA_CRC’ in the bsac_raw_data_block(). If it’s detected and the SBR tool is operated in dual-rate mode, the sampling frequency will be updated. In addition, the BSAC decoder which can decode the BSAC extension payloads will check if there is the extension type which is related with the BSAC channel extension such as ‘EXT_BSAC_CHANNEL’ in the bsac_raw_ data_block(). If it’s detected, the number of channel from the AudioSpecificConfig() for BSAC Audio Object Type will be updated depending on the ‘channel_configu-ration_index’ of each extended_bsac_base_element().
When explicit signaling is used, implicit signaling shall not occur. Two different types of explicit signaling are available:
1. Explicit Signaling Method 1: hierarchical signalingIf the first audioObjectType (AOT) signaled is the SBR AOT, a second audio object type is
signaled which indicates the underlying audio object type. This signaling method is not backward compatible. If the second audioObjectType is the ER BSAC AOT, the extensionChannelConfiguration indicates the total number of channel in the bsac_raw_data_block().
2. Explicit Signaling Method 2 : backward compatible signalingThe extensionAudioObjectType is signaled at the end of the AudioSpecificConfig(). If the
extensionAudioObjectType is the ER BSAC AOT, the extensionChannelConfiguration indicates the total number of channel in the bsac_raw_data_block(). This method shall only be used in systems that convey the length of the AudioSpecificConfig(). Hence, it shall not be used for LATM with audioMuxVersion==0.
The Table 1.23 explains the decoder behavior with SBR and BSAC channel extension signaling.
4 © ISO/IEC 2006 — All rights reserved
Table 1.23 – SBR and BSAC channel extension signaling and Corresponding Decoder Behavior
Bitstream characteristics Decoder behaviorExtension
AudioObjectTypesbrPresentFlag extensionChan
nelConfiguration
raw_data_block BSAC decoder
Extended BSAC decoder
!= ER_BSAC( Implicit
Signaling )
-1(Note 1)
Not available BSAC Play BSAC Play BSACBSAC+SBR Play BSAC Play at least
BSAC,should play BSAC+SBR
BSAC+MC Play BSAC Play at least BSAC,
should play BSAC+MC
BSAC+SBR+MC
Play BSAC Play at least BSAC,
should play BSAC+SBR
+MC== ER_BSAC
( Explicit Signaling )
0(Note 2)
== channelConfigu
ration(Note 4)
BSAC Play BSAC Play BSAC
!= channelConfigu
ration
BSAC + MC Play BSAC Play BSAC+MC
1(Note 3)
== channelConfigu
ration(Note 4)
BSAC+SBR Play BSAC Play BSAC+SBR
!= channelConfigu
ration
BSAC+SBR+MC
Play BSAC Play BSAC+SBR
+MC
Note 1: Implicit signaling, check payload in order to determine output sampling frequency, or assume the presence of SBR data in the payload, giving an output sampling frequency of twice the sampling frequency indicated by samplingFrequency in the AudioSpecificConfig() (unless the down sampled SBR Tool is operated, or twice the sampling frequency indicated by samplingFrequency exceeds the maximum allowed output sampling frequency of the current level, in which case the output sampling frequency is the same as indicated by samplingFrequency).
Note 2: Explicitly signals that there is no SBR data, hence no implicit signaling is present, and the output sampling frequency is given by samplingFrequency in the AudioSpecificConfig().
Note 3: Output sampling frequency is the extensionSamplingFrequency in AudioSpecificConfig().
Note 4: Explicitly signals that there is no BSAC channel extension data, and the number of output channel is given by channelConfiguration in the AudioSpecificConfig().
© ISO/IEC 2006 — All rights reserved 5
Amendment Subpart 4
2 Scope2.1 IntroductionThis International Standard describes the Integration of ER-BSAC and SBR, that is capable of fine grain scalable reproduction with bandwidth extension. In the preferred modes of operating the integration of ER-BSAC and SBR, the high frequency sound components can be either a ER-BSAC enhancement layer signal or synthesized SBR signal.
2.2 Technical overviewThe basic structure of the integration of ER-BSAC and SBR is shown in Figure 1.
Figure 1 – Overview of the integration of ER-BSAC and SBR decoder
6 © ISO/IEC 2006 — All rights reserved
Bitstream Payload
DeformatterBSAC Core
Decoder
Bitstream Parser
Huffman Decoding &
Dequantization
Envelope
Adjuster
HF
Generator
HF
Overlap
Synthesis
QMF Bank
Analysis
QMF Bank
Coded Audio Bitstream
Output PCM Samples
3 Syntax3.1 Extension SyntaxReplace the definition of bsac_raw_data_block() in ISO/IEC 14496-3:2005 Part 3: Audio, Subpart 4, Subclause 4.4.2.6 Payloads for the audio object type ER BSAC, Table 4.33
Table 4.33 – Syntax of bsac_raw_data_block()
Syntax No. of bits Mnemonicbsac_raw_data_block(){
bsac_base_element();layer=slayer_size;while(data_available() && layer<(top_layer+slayer_size)) {
Bsac_layer_element(layer);Layer++;
}byte_alignment();if (data_available()) {
zero_code 32 bslbfsync_word 4 bslbfWhile( data_available() ) {
extension_type 4 bslbfswitch(extension_type) {
case EXT_BSAC_CHANNEL :extended_bsac_raw_data_block();
case EXT_BSAC_SBR_DATA :extended_bsac_sbr_data(nch, 0);
case EXT_BSAC_SBR_DATA_CRC :extended_bsac_sbr_data(nch, 1);
case EXT_BSAC_CHANNEL_SBR :extended_bsac_raw_data_block();extended_bsac_sbr_data(nch, 0);
case EXT_BSAC_CHANNEL_SBR_CRC :extended_bsac_raw_data_block();extended_bsac_sbr_data(nch, 1);
}}
}}
In Part 3: Audio, Subpart 4, under Bitstream elements in subclause 4.5.2.6.2.1 Definitions, after bsac_raw_data_block, add the following:
sync_word a four bit code that identifies the start of the extended part. The bit string ‘1111’.
extension_type a four bit code that identifies the extension type according to Table A.1..
Table 1 BSAC extension_type
© ISO/IEC 2006 — All rights reserved 7
Symbol Value of extension_type
Purpose
EXT_BSAC_CHANNEL ‘1111’ BSAC channel extensionEXT_BSAC_SBR_DATA ‘0000’ BSAC SBR enhancementEXT_BSAC_SBR_DATA_CRC ‘0001’ SBR enhancement with CRCEXT_BSAC_CHANNEL_SBR ‘1110’ BSAC channel extension with
SBREXT_BSAC_CHANNEL_SBR_CRC ‘1101’ BSAC channel extension with
SBR_CRCRESERVED ‘0010’ ~ ’1100’ reserved
3.2 Scalable SBR SyntaxAdd the following subclause after 4.4.2.8 in Part 3 : Audio, Subpart 4
3.2.1 SBR payloads for the audio object type ER BSAC
Table 2 – Syntax of extended_bsac_sbr_data ()
Syntax No. of bits Mnemonicextended_bsac_sbr_data(nch, crc_flag){ num_sbr_bits = 0;
cnt = count; 4 uimsbfnum_sbr_bits += 4;if (cnt == 15) {
cnt += esc_count - 1; 8 uimsbfnum_sbr_bits += 8;
}if (crc_flag) {
bs_sbr_crc_bits; 10 uimsbfnum_sbr_bits += 10;
}num_sbr_bits += 1;if (bs_header_flag) 1 uimsbf
num_sbr_bits += sbr_header();num_sbr_bits += bsac_sbr_data(nch, bs_amp_res);num_align_bits = (8*cnt - num_sbr_bits)%8;bs_fill_bits; num_align_
bitsuimsbf
return ((num_sbr_bits + num_align_bits ) / 8)}
Table 3 – Syntax of bsac_sbr_data()
8 © ISO/IEC 2006 — All rights reserved
Syntax No. of bits Mnemonicbsac_sbr_data(nch, bs_amp_res){
switch (nch) {case 1
sbr_single_channel_element(bs_amp_res)break;
case 2sbr_channel_pair_element(bs_amp_res)break;
}}
3.3 Scalable SBR Bitstream Element Definitions Add the following subclause after 4.5.2.9 in Part 3 : Audio, Subpart 4
3.3.1 SBR payloads for the audio object type ER BSAC
extended_bsac_sbr_data () Syntactic element that contains the SBR extension data for ER BSAC
count Initial length of extended bsac sbr data
esc_count Incremental length of extended bsac sbr data
bs_sbr_crc_bits Cyclic redundancy checksum for the SBR extension data. The CRC code is defined by the generator polynomial G10(x) = x10 + x9 + x5 + x4 + x + 1 and the initial value for the CRC calculation is zero.
bs_header_flag Indicates if an SBR header is present
sbr_header () Syntactic element that contains the SBR header. See Table 4.56-Syntax of sbr_header()
bsac_sbr_data() Syntactic element that contains the SBR data for ER BSAC
bs_fill_bits Byte alignment bits
sbr_single_channel_element() Syntactic element that contains data for an SBR single channel element. See Table 4.58-Syntax of sbr_single_channel_element()
sbr_channel_pair_element() Syntactic element that contains data for an SBR channel pair element. See Table 4.59-Syntax of sbr_channel_pair_element()
3.3.2 Decoding process
SBR elements are inserted into the bsac_raw_data_block() after layer element data.
By Combining BSAC core coder and SBR extension tool, fine grain scalability is supported with constant bandwidth for all layers. The start frequency for the SBR frequency range is set to the base_band frequency of the BSAC base layer. SBR covers high frequency regions beyond the base layer frequency range. If enhancement layers of BSAC core are transmitted, the overlapped regions of SBR range are replaced with core data..
© ISO/IEC 2006 — All rights reserved 9
4 Scalable SBR ToolIn part 3: Audio, Subpart 4, subclause 4.6.18 SBR tool, after 4.6.18.8 Low power SBR tool, add the following subclause:
4.1 Scalable SBR tool4.1.1 Introduction
This subclause describes decoding process of scalable SBR integrated with ER BSAC.
The scalable SBR tool incorporates additional module, HF-Overlap for FGS with bandwidth scalability.
Decoding process of Scalable SBR tool conforms to that of SBR tool except additional module HF-Overlap.
The scalable SBR decoding block diagram is shown in following Figure 2. It displays how the HF-Overlap module is interconnected into the SBR decoder.
Figure 2 – Overview of the integration of ER-BSAC and SBR decoder
10 © ISO/IEC 2006 — All rights reserved
Bitstream Payload
DeformatterBSAC Core
Decoder
Bitstream Parser
Huffman Decoding &
Dequantization
Envelope
Adjuster
HF
Generator
HF
Overlap
Synthesis
QMF Bank
Analysis
QMF Bank
Coded Audio Bitstream
Output PCM samples
4.1.2 HF Overlap
In decoding, the BSAC scalable bitstream and SBR decoding are performed in parallel.
In BSAC core decoding process, layer_max_freq value is extracted from layer_end_index[layer] which represents the upper frequency of decoded highest layer. The layer_max_freq value is delivered to SBR decoding process.
In SBR decoding process, subband index conforming to layer_max_freq is searched in QMF domain. The subband index denoted as core_max_band, is determined by
If the core_max_band is greater than HF patched subband denoted as HFPatchStartBand, the overlapped region data of SBR extension part are replaced with core data in scalable BSAC part. This process is depicted in the flowchart of Figure 3
Figure 3 – Flowchart of HF-Overlap
© ISO/IEC 2006 — All rights reserved 11
start
CoreMaxSubband >
HFPatchStartBand
for(b=HFPatchStartBand;b<CoreMaxSubband;b++)
for
done
1.A Annex A(informative) Encoder tools
In Part 3 : Audio, Subpart 4, subclause Annex B (informative) Encoder tools, add the following chapters:
1.A.1 Informative SBR Encoder for ER BSAC Description
The encoding block diagram is described in Figure 4.
Figure 4 – Overview of the integration of ER-BSAC and SBR Encoder
To support FGS using SBR object, the frequency range of SBR has some restriction . SBR start frequency(represented as bs_start_freq) is aligned with upper frequency of BSAC base layer frequency range(represented as base_band). The frequency range of BSAC core can be scalable from base layer to top layer.
An example of frequency coverage of BSAC base layer, enhancement layers and its relation with frequency coverage of SBR extension is shown in following Figure 5
Figure 5 – Bandwidth coverage of BSAC core and SBR
12 © ISO/IEC 2006 — All rights reserved
SBR
Encoder
BSAC
Encoder
Bitstream
Payload
FormatterDown
Sampler
Input
2SF
f(kHz)
BSAC base layer
SBR extension
BSAC enhance layers