audio synchronization...
TRANSCRIPT
![Page 1: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/1.jpg)
MPEG-4 Audio Synchronization
Masayuki Nishiguchi, Shusuke Takahashi, Akira Inoue
Oct 22, 2014
Sony Corporation
![Page 2: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/2.jpg)
Use case
Synchronization Scheme
Audio Feature Extraction tool (Normative)
Audio Feature Similarity Calculation Tool (Informative)
Performance evaluation
Conclusion
Agenda
![Page 3: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/3.jpg)
Audio Synchronization Use case of “Second Screen” Application
Audio Signal of Main Media Stream
Receiver
Sub Device(2nd screen)
Audio FeatureOf Main Media Stream
Main Media Stream
Sub Media Stream
Transmitter
Internet (IP)
Main Device(1st screen)
• foreign language audio tracks• audio commentary• closed caption information• audio/visual contents recorded from various angles• high quality audio/visual contents• advertisement
Synchronization using Audio Feature !
1st screen 2nd screen
![Page 4: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/4.jpg)
(Synchronous in time)
Synchronization Scheme
DeMUX & Playback
Main Device
DeMUX
Sub Device
Audio Feature Extraction
Audio Feature Similarity
Calculation
Timing Adjustment& Playback
Audio Signal of Main Media Stream
Audio Playback device(such as speaker)
Audio Recording Device(such as microphone)
Audio Feature ofMain Media Stream
(Transmitted)
Sub Media Stream
Audio Feature ofMain Media Stream(Extracted)
Sub MediaPresentation
Main MediaPresentation
Synchronization Information
noise
Main Media Stream
Sub Media Stream
MUX
Audio Feature Extraction
Audio Feature of Main Media StreamAudio Signal
Multiplexed Data Stream- Sub Media Stream- Audio Feature of Main Media Stream
Receiver (Sub device)Transmitter
1st screen
2nd screen
![Page 5: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/5.jpg)
Audio Feature Extraction tool (Normative)
![Page 6: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/6.jpg)
Block Diagram of Audio Feature Extraction tool
Framing
PeakDetection
Audio Signal(fs=8kHz)
Filt
er
bank
Auto-correlation
Inte
gra
tion
Audio Feature…
Audio FeatureFrame Rate Conversion
Pre Emphasis
![Page 7: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/7.jpg)
Overall Signal Flow
Input Signal(After Pre emphasis filter)
Auto Correlation
Confidence Measure for thisband is less than threshold
Integrated Auto Correlation
Band split signal
Integrate together into singleAuto Correlation
split the audio signalsinto 5 equally spacedfrequency bands in logfrequency domain
1 0.97 ∙
converted into a 128-bit length feature vector
Prominent peak : 1Otherwise : 0
![Page 8: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/8.jpg)
Block Diagram of Audio Feature Extraction tool
Framing
PeakDetection
Audio Signal(fs=8kHz)
Filt
er
bank
Auto-correlation
Inte
gra
tion
Audio Feature…
Audio FeatureFrame Rate Conversion
Pre Emphasis
![Page 9: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/9.jpg)
Framing
Input
Output
……
……
Feature Extraction
Audio FeatureFrame Rate Conversion
Input frame length: 32msec (256samples), Hamming Window
Input frame interval: 8msec (64sample)
Output frame interval: 8msec or 32msec (audio_sync_feature_time_resolution)
![Page 10: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/10.jpg)
Block Diagram of Audio Feature Extraction tool
Framing
PeakDetection
Audio Signal(fs=8kHz)
Filt
er
bank
Auto-correlation
Inte
gra
tion
Audio Feature…
Audio FeatureFrame Rate Conversion
Pre Emphasis
![Page 11: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/11.jpg)
Filter Bank
For each audio frame, a pre-emphasis filter is applied to emphasize the high frequency, then band pass filtering is applied in order to split the audio signals into 5 equally spaced frequency bands in log frequency domain.
![Page 12: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/12.jpg)
Block Diagram of Audio Feature Extraction tool
Framing
PeakDetection
Audio Signal(fs=8kHz)
Filt
er
bank
Auto-correlation
Inte
gra
tion
Audio Feature…
Audio FeatureFrame Rate Conversion
Pre Emphasis
![Page 13: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/13.jpg)
Auto-correlation
∙ , 0 , 0
0 0 , 0
For each band, Auto-correlation is calculated using:
The Auto-correlation is normalized using:
For each frequency band , confidence measure is calculated based on the auto-correlation value.
max , 0
: input frame length, : index of frequency band: index of lag for autocorrelation : order of auto-correlation and is set to 128,: index of the input audio signal. : number of frequency bands and is set 5
![Page 14: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/14.jpg)
Block Diagram of Audio Feature Extraction tool
Framing
PeakDetection
Audio Signal(fs=8kHz)
Filt
er
bank
Auto-correlation
Inte
gra
tion
Audio Feature…
Audio FeatureFrame Rate Conversion
Pre Emphasis
![Page 15: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/15.jpg)
Integration
∑ ∙∑
, 0
0, 0.31, 0.3
The normalized auto-correlation function values derived from each sub-band are summed together into a single integrated auto-correlation function.
where is defined as following
![Page 16: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/16.jpg)
Block Diagram of Audio Feature Extraction tool
Framing
PeakDetection
Audio Signal(fs=8kHz)
Filt
er
bank
Auto-correlation
Inte
gra
tion
Audio Feature…
Audio FeatureFrame Rate Conversion
Pre Emphasis
![Page 17: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/17.jpg)
Peak Detection
The integrated auto-correlation function is converted into a 128-bit length feature vector and each bit position corresponds to the lag of the auto-correlation function.
(lag)
……
0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0……
![Page 18: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/18.jpg)
Audio Feature Similarity Calculation Tool (Informative)
![Page 19: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/19.jpg)
Block Diagram Audio Feature Similarity Calculation Tool (Informative)
Time difference between
audio signals
Audio Feature Sequence #1
Audio FeatureBlock Extraction
Block SimilarityCalculation
Audio Feature Sequence #2
Audio Feature Frame Rate Conversion
Audio FeatureBlock Extraction
Audio Feature Frame Rate Conversion
![Page 20: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/20.jpg)
Block Diagram Audio Feature Similarity Calculation Tool (Informative)
Time difference between
audio signals
Audio Feature Sequence #1
Audio FeatureBlock Extraction
Block SimilarityCalculation
Audio Feature Sequence #2
Audio Feature Frame Rate Conversion
Audio FeatureBlock Extraction
Audio Feature Frame Rate Conversion
![Page 21: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/21.jpg)
Block Extraction
Audio Feature sequence #1
Audio Feature sequence #2
, , , … . , , 0 N N
, , , … . , , 0 N N
The blocks are generated by concatenating the consecutive audio features
……
……
Block Similarity Calculation is performed between two blocks of audio features.
N
N
N
![Page 22: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/22.jpg)
Block Diagram Audio Feature Similarity Calculation Tool (Informative)
Time difference between
audio signals
Audio Feature Sequence #1
Audio FeatureBlock Extraction
Block SimilarityCalculation
Audio Feature Sequence #2
Audio Feature Frame Rate Conversion
Audio FeatureBlock Extraction
Audio Feature Frame Rate Conversion
![Page 23: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/23.jpg)
Block Similarity Calculation
J A , B ∩
∪
time
τ
A B ∩ ∪
∩
∪410 0.4
Example
Block Similarity between and is calculated as follows:
![Page 24: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/24.jpg)
Time Difference Estimation
i
j
τ1 Ng’τ0 τ2
Score(τ0) Score(τ1) Score(τ2)
Ng’-τ2
Nf’
-τ0
argmax Score
The time difference which has the largest score is regarded as the time difference between two audio feature sequences:
Each box represents block similarity J ,
the summations for each time difference is performed along the arrows.
For each time difference , a score is calculated by using the block similarity as follows:
Score1
min ′ , ′ max , 0 J ,,
,
![Page 25: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/25.jpg)
Time Difference Estimation
i
delay
j
0 20 40 60
-20-40 80
Example
![Page 26: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/26.jpg)
Performance evaluation
Capturing the 1st screen content and additive noise sound at the 2nd screen.→ The noise contaminated 1st screen content files
Line-out of the 1st screen and the 2nd screen are captured as a single stereo wave file.→ Time difference between the L-ch and
the R-ch in the file is measured
![Page 27: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/27.jpg)
1st Screen content and additive noise files
Filename of 1st
screen content files Description
1st_betty 5.1down mix (according to ARIB STD-B32) version
of CO_11_Betty3b_output
1st_speech 2 Speech (German Male, SQAM track 54)
1st_music 2 Music (Wind ensemble, SQAM track 67)
Filename of additive noise sound files
Description
File4 noise_pinknoise
File5 noise_speech Speech (English Female, SQAM track 49)
File6 noise_music Music (Eddie Rabbitt, SQAM track 70)
The 1st Screen Content Files
Additive Noise Sound Files
![Page 28: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/28.jpg)
Result
Filename of 1st screen content files
Filename of additive noise sound files
Signal level of the 1st screen content (dB)
0 ‐6 ‐12 ‐18 ‐24 ‐30 ‐36
1st_bettynoise_pinknoise 0.003 0.003 0.007 N/A N/A N/A N/Anoise_speech 0.019 ‐0.009 ‐0.014 ‐0.014 N/A N/A N/Anoise_music ‐0.002 0.001 0.018 0.007 N/A N/A N/A
1st_speechnoise_pinknoise ‐0.004 0.007 0.012 ‐0.001 N/A N/A N/Anoise_speech 0.002 0.014 ‐0.013 0.003 0.008 N/A N/Anoise_music ‐0.005 0.018 0.018 0.000 0.014 ‐0.011 ‐0.009
1st_musicnoise_pinknoise 0.007 ‐0.007 0.015 N/A N/A N/A N/Anoise_speech 0.002 0.008 ‐0.004 0.007 ‐0.016 N/A N/Anoise_music 0.002 0.016 ‐0.007 0.015 0.018 N/A N/A
Time Difference between 1st Screen and 2nd Screen Line-Out Signals (sec)
The figures with orange background is approximately within 1frme length (32ms). → Synchronization is successful !
![Page 29: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/29.jpg)
Result (cont.)Synchronization robustness against interference noise
Allo
wable
SNR for
synchro
niz
ation
measu
red
at 2
nd
scre
en(dB)
pinknoise
‐24
‐21
‐18
‐15
‐12
‐9
‐6
‐3
0
speech music pinknoise speech music pinknoise speech music
betty speech music
Additive noise
1st screencontent
![Page 30: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/30.jpg)
MPEG-4 Audio Object Type (ISO/IEC 14496-3:2009)
Obj
ect T
ype
ID
Aud
io O
bjec
t Typ
e
gain
con
trol
[…]
Rem
ark
0 Null[..] […]43 SAOC44 LD MPEG Surround45 SAOC-DE46 Audio Sync
47 -95 (reserved)
![Page 31: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/31.jpg)
Demonstration
1st screen(blue walkman): Instrument only
2nd screen(my note PC): Vocal only
Noise(white walkman): Female speech
Same song
![Page 32: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/32.jpg)
Conclusion
• MPEG-4 Audio Synchronization standard defines:
Audio Feature Extraction tool and syntax of the feature stream(Normative) Feature Similarity Calculation Tool (Informative) The Audio Object Type (AOT=46) “Audio Sync”
to allow transmission of audio feature for synchronization as elementary stream
• The MPEG-4 synchronization mechanism works with highly noisy environment and proven that the scheme is useful under practical conditions.
![Page 33: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application](https://reader033.vdocuments.us/reader033/viewer/2022042406/5f20597f224dac42710b355d/html5/thumbnails/33.jpg)
End