on video transcoding to super-resolution videoswcsiu/presentation/keynotepresentation_video... ·...

1

Siu_...OnVideoTranscoding 1

The Hong Kong Polytechnic UniversityDepartment of Electronic and Information Engineering, Centre for Multimedia Signal Processing

Prof. W.C. Siu, Chair Professor and Centre Director 8 June, 2008

Keynote Speaker:Professor W.C. Siu

Chair Professor and Director Centre for Signal Processing Department of Electronic and Information

EngineeringHong Kong Polytechnic University

On Video Transcoding to Super-Resolution Videos

ICNNSP’2008: IEEE International Conference on Neural Networks and Signal Processing, 6-10 June 2008, Zhenjiang China




Contents:1. Introduction

1.1 Hybrid Video Coding1.2 Object Oriented Coding1.3 Advanced Video Concepts1.4 A highlight of others of our studies

2. Motion Estimation Algorithms2.1 Sample studies: Fast Adaptive Search Algorithm, Fast Pixel Decimation2.2 Sample studies: Fast Exhaustive Full Search, Novel Directional Search

3. Video Transcoding3.1 Video Transcoding (frame skipping)3.2 Video Transcoding (H.264 to H.264 conversion)

4. Extending to video Enlargement Super-resolution Videos4.1 New Edge-Directed Interpolation4.2 Modified Edge-Directed Interpolation4.3 SR video Construction 4.4 SR video re-encoding

6. Summary and Conclusion

2




1. Introduction

1.1 Hybrid Video Coding

In the recent years, there is a remarkable progress in Video Coding. In this talk we mainly concentrate on predictive Hybrid Video Coding.

Predictive Coding:

Instead of transmitting a frame (called current frame), its motion vectors with reference to a previous frame (called reference frame) are transmitted.

This will produce the motion compensated frame which consists of prediction errors.

Most videos nowadays are coded by using the Hybrid Video Codingwhich makes use of the Predictive Coding.




In order to recover the signal without the motion estimation errors, a motion compensated Residual Frame is constructed, such that

Residual frame = Current frame – Motion compensatedframe

This predicted residual frame is subsequently coded in a similar manner as an intra frame

i.e. through DCT Quantization Entropy coding

Reference frame

Current frame

Motion compensated

frame

Residual frame

3




Hybrid Video CodingSource video

Frame Memory

Frame Memory

Frame Memory

Frame Memory

Motion Estimation

Motion vectors

Motion Estimation

Motion Compensation

Pre

dic

ted

fra

me

Motion Compensation

2D-DCT+

-2D-DCT Quantizer VLC

coder BufferQuantizer VLC coder Buffer

Dequantizer

Inverse 2D-DCT

++

Dequantizer

Inverse 2D-DCT

Regulator

011010…111

Compressed bit stream

Regulator

011010…111




Hybrid Video CodingSource video

Frame Memory

Frame Memory

Motion Estimation

Motion Compensation

Pre

dic

tive

fra

me

Regulator

011010…111

Motion Compensation

2D-DCT+

-Quantizer VLC

coder Buffer

Dequantizer

Inverse 2D-DCT

++


Motion vectors

Error Frame

Motion Compensation

4




1.2 Object Oriented Coding

• To divide the scene into Background object(s): Hybrid Coding, or Sprite Generation Technique

• and foreground objects: Multiple objectsHybrid CodingObject based coding (complete object)Time, position, motion manipulations, etc.

• Segmentation is still a problem, since the definition of an object can never make clear to computers.

• Object boundaries always merge with the background, etc.

Object Extraction

DM4




1.3 Advanced Video Concepts - from HVC to Advanced Video Coding

(H.264)Source video

Frame Memory

Frame Memory

Motion Estimation

Motion Compensation

Pre

dic

tive

fra

me

Regulator

011010…111

Motion Compensation

2D-DCT+

-Quantizer VLC

coder Buffer

Dequantizer

Inverse 2D-DCT

++


Motion vectors

Source video

Frame Memory

Frame Memory

Motion Estimation

Motion Compensation

Pre

dic

tive

fra

me

Regulator

011010…111

Motion Compensation

+

-

VLC coder Buffer

Dequantizer

Inverse 2D-DCT

++


2D-DCT Quantizer

Motion vectors

Motion Estimation

Motion Compensation Frame

Memory

Frame Memory

Frame Memory

+

Intra-frame Prediction

Pre

dic

tive

fra

me

2D-DCT QuantizerInteger Transform, Scaling,

Quantization-

Dequantizer

Inverse 2D-DCT

Scalingand

Inverse Transform

+

Source video

Frame Memory

+

Regulator

011010…111VLC coder Buffer


Motion vectors

5




Source video

Frame Memory

Frame Memory

Motion Estimation

Motion Compensation

Pre

dic

tive

fra

me

Regulator

011010…111

Motion Compensation

+

-

VLC coder Buffer

++


Motion vectors

Frame Memory

Frame Memory

Frame Memory

Intra-frame Prediction

Transform, Scaling,Quantization

Scalingand

Inverse Transform

Advanced Video Coding

Digital Transform

Multi-Ref. Frame

Variable Block Size




1. Background: Hybrid Coding, or Sprite Generation Technique

2. Foreground:Multiple objects

Hybrid Coding

Object based coding (complete object)

Time, position, motion manipulations, etc.

Techniques developed:

1. Motion estimation/Sprite GenerationOur own fast hybrid Coding making use new concepts of fast motion estimation, and sprite generation techniques.

Video Composition: (Object Oriented Processing)

6




Techniques developed (con’t):

2. Improved Automatic Image SegmentationMaking use a new marker-extraction technique and color information, simplified area morphology and modified watershed algorithm.

3. Fast Wavelet ComputationUsing fast lifting algorithm, possibility to use overcompete wavelets.

4. Etc.

ObjectPlayer

Jump to Conclusion

______________________________________________________________________________________Ko-Cheung Hui, Wan-Chi Siu and Yui-Lam Chan , “Fast Motion Estimation of Arbitrary Shaped Video Objects in MPEG-4”,

pp.33-50, Vol.18, Issue 1, Signal Processing: Image Communication, Elsevier Science, January 2003, The Netherlands.

H. Gao, W.C. Siu and C. Hou, ‘Improved Techniques for Automatic Segmentation’, pp.1273-80, Vol. 11, No.12, December 2001, IEEE Transactions on Circuits and Systems for Video Technology, USA.




A few Published Works:

1. A few adaptive motion estimation algorithms proposed, which make use of simple statistics to determine the search directions and locations; some pioneer work obtained very good citations.

2. Worked on fast algorithms with

(i) with a selected sub-set of pattern(s), and

(ii) with pixel adaptive pixel decimation._________________________________________________________________References:

Yui-Lam Chan and Wan-Chi Siu, ‘An Efficient Search Strategy for Block Motion Estimation using Image Features’, IEEE Transactions on Image Processing , pp.1223-38, Vol.10, No.8, August 2001, USA.

Yui-Lam Chan and Wan-Chi Siu, ‘Reliable Block Motion Estimation through the Confidence Measure of Error Surface’, pp.135-46, Vol.76, issue 2, Signal Processing, 1999, Switzerland

Yui-Lam Chan and Wan-Chi Siu, 'New Adaptive Pixel Decimation for Block Motion Vector Estimation', IEEE Transactions on Circuits & Systems for Video Technology, pp.113-118, Vol.6, No.1, February, 1996, U.S.A. (Listed as one the Most Cited Papers since 1990 on CSVT website, http://tcsvt.polito.it/, 2008 IEEE Trans on CSVT.)

2. Studies on Fast Motion Estimation

7




3. Recently suggested the concept of error clustering, which gives a completely revised concept on adaptive motion estimation. This is able to replace the PDS, and working Successive Elimination Algorithm (SEA) or Multilevel SEA for extremely fast full search motion estimation.

4. Recently we suggested:

(i) to use a search window being equal to the size of Motion Vector(s) in the 1st for multi-frame motion estimation.

(ii) to use partial SAD for variable block sizes motion estimation.

(iii) to use directional search to form a novel scheme, etc._________________________________________________________________References: Ko-Cheung Hui, Wan-Chi Siu and Yui-Lam Chan, ‘New Adaptive Partial Distortion Search using Clustered Pixel Matching

Error Characteristic’, pp.597-607, Vol.14, No.5, May 2005, IEEE Transactions on Image ProcessingM.Y. Chiu and W.C. Siu, New Results on Exhaustive Search Algorithm for Motion Estimation using Adaptive Partial Distortion

Search and Successive Elimination Algorithm’, Proceedings, pp.3978-81, IEEE International Symposium on Circuits and Systems (ISCAS’2006), May 2006, Island of Kos, Greece.

Liangming Ji and Wan-Chi Siu, ‘Reduced Computation using Adaptive Search Window Size for H.264 Multi-frame Motion Estimation’, Paper 1568982117, pp.1-5, Proceedings, 14th European Signal Processing conference (EUSIPCO’2006), September 2006, Florence Italy

Yan-Ho Kam and Wan-Chi Siu, ‘A Fast Full Search Scheme for Rate-Distortion Optimization of Variable Block Size and Multi-frame Motion Estimaiton’, Proceedings, paper 3095, pp.1-4, Proceedings, IEEE International Midwest Symposium on Circuits and Systems (MWSCAS’2006), August 2006, San Juan, Puerto Rico, USA.

Ying Zhang, Wan-Chi Siu and Tingzhi Shen, ‘Yet a Faster Motion Estimation Algorithm with Directional Search Strategies’, pp.475-78, Proceedings, 15th International Conference on Digital Signal Processing (DSP’2007), July 2007, Cardiff,

UK.

Adaptive Search Window

ErrorClustering




3. Video Transcoding

Universal Access

ServersVideo Mobile Television

Set-top Box

Intelligent HomeHome

PC

Office Computers

Wide Area Network

Video Transcoding: Given a variety of client devices, it is difficult for a server to tailor the content for individual devices.A video server may have to provide quality support services to heterogeneous clients or transmission channels.

It is in this reason that the video server should have the capability to perform transcoding:

a process of converting a previously compressed video bitstream into a bit stream of different nature or lowerbitrate.

8




Homogeneous Transcoding – three types:

1. Frame Skipping

2. Video Downscaling

3. Transcoding with Bit-rate Reduction

Heterogeneous Transcoding:

4. Conversion of videos among standards (or between frame types)




Conventional Transcoder:

VLD: Variable Length DecodingVLC: Variable Length CodingQ1

-1: Inverse Quantization (Fine Quantzer)Q2: Quantization (Q2 Coarse Quantzer)

MCF: Motion Compensated FrameDCT: Discrete Cosine TransformDCT -1: Inverse Discrete Cosine TransformEMV: Encoding Motion VectorMC: Motion Compensation

VLD

Q2-1

DCT-1

Frame BufferMC

Stream Separation VLD

Q1-1

DCT-1

MCF

Reference Frame 1

Motion Vectors

Q2

Q2-1

DCT-1

MCF

Reference Frame 2

Motion Vectors

VLCDCT

ME

EMV

Compressed Bit Stream

Decoding Front Encoder

Coarse Re-Encoder

Motion Vectors

End Decoder

9




Frame-Skipping TranscoderWhen the frame rate changes, the incoming quantized DCT coefficients of residual signal may no longer be valid because they refer to a frame which may have been dropped.

First, the transcoder decodes the incoming bitstream in the pixel domain. Second, the decoded video frame is then re-encoded at the desired lower frame rate.

Transcoder

Front-encoder decoder encoder End-decoder

The incoming bitstream is decoded into the pixel domain

The decoded video frame is re-encoded at the desired frame rate.

FrameSkipping




To look at the decoding and re-encoding parts alone

• First, the video bitstream performs VLC decoding, inverse quantization and inverse DCT. So, frame Rt-1 can be reconstructed and stored in buffer FB. Note that Rt-1 is required to act as the reference frame for the reconstruction of frame Rt. Hence we have

where is the prediction error (residual signal).

Frame-skipping Transcoder

Q-1 DCT-1 DCT Q

FB

MC

Q-1

DCT-1

FBMC

+ -

Rt

),( tt vu

),( st

st vu

+

++

A

BsVLC-1 VLC+

from front-encoder

te

s2tR 1tR

stR

)]e(DCT[Q st

),(ˆ),(),( 1 jievjuiRjiR ttttt

te

10




Let us assume that Rt-1 be dropped.

• If Rt-1 is dropped, we have to find frame Rt at time t, with reference to the previous non-skipped frame at time t-2, i.e. Rt-2 .

• New compensation error, est(i,j) has to be found.

Frame-skipping Transcoder

Q-1 DCT-1 DCT Q

FB

MC

Q-1

DCT-1

FBMC

+ -

Rt

),( tt vu

),( st

st vu

+

++

A

BsVLC-1 VLC+

from front-encoder

te

s2tR 1tR

stR

)]e(DCT[Q st




Direction Addition Approach:Let us consider a special case - Macroblocks without motion compensation

Recall that Rt-1 is dropped; we can use the motion vectors (ut, vt) and (ut-1, vt-1) to reconstruct the new motion vector (us

t, vst).

Now we need to find )]([ steDCTQ

RtRt-1

(dropped)Rt-2

BMtBMt-1BMt-2

),(),( 11 ttst

st vuvu),(),(),( 11 tttt

st

st vuvuvu

(ut,vt) = (0,0)(ut-1,vt-1)

11




Macroblocks without motion compensation

Note that and are available from the incoming bitstream.

)]e(DCT[Q 1t)]ˆ([ teDCTQ

ttt eMBMB ˆ1 121 ˆ ttt eMBMB

)ˆ()ˆ()( 1 ttst eDCTeDCTeDCT

)]ˆ([)]ˆ([)]([ 1 ttst eDCTQeDCTQeDCTQ

12 ˆˆ ttttst eeMBMBe

RtRt-2

BMtBMt-1BMt-2

)]ˆ([ teDCTQ)]ˆ([ 1teDCTQ

)]([ steDCTQ

Rt-1 (dropped)




Macroblocks without motion compensation

A Direct Addition of the DCT Coefficients: newly quantized DCT coefficients can be computed in the DCT-domain by adding directly the quantized DCT coefficients between the data in the DCT-domain buffer and the incoming DCT coefficients, whilst the updated DCTcoefficients are stored in the DCT-domain buffer.

ttt eMBMB ˆ1 121 ˆ ttt eMBMB

)ˆ()ˆ()( 1 ttst eDCTeDCTeDCT

)]ˆ([)]ˆ([)]([ 1 ttst eDCTQeDCTQeDCTQ

12 ˆˆ ttttst eeMBMBe

RtRt-2

BMtBMt-1BMt-2

)]ˆ([ teDCTQ)]ˆ([ 1teDCTQ

)]([ steDCTQ

Rt-1 (dropped)

12




Direct Addition of the DCT Coefficients• It is not necessary to perform the motion compensation, DCT, quantization,

inverse DCT and inverse quantization– the complexity is greatly reduced.

• Requantization is not necessary for macroblocks coded without motion compensation– the quality degradation due to re-encoding of the transcoder is avoided.

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50

frame number

Perc

enta

ge o

f m

arco

bloc

k w

itho

ut

Mot

ion

Com

pens

atio

n (%

)

By using a direct addition of the DCTcoefficients for non-moving macroblocks, the computational complexity involved in processing these macroblocks can be reduced significantly and the additional re-encoding error can be avoided.

The distribution of the coding modes for the typical “salesman” sequence

____________________________________________________________________

References:

Kai-Tat Fung, Yui-Lam Chan and Wan-Chi Siu, “Low-Complexity and High-Quality Frame-Skipping Transcoder for Continuous Presence Mutlipoint Video Conferencing”, pp.31-46, Vol.6, No.1. February 2004, IEEE Transactions on Multimedia, USA.

Kai-Tat Fung, Yui-Lam Chan and and W.C. Siu, ‘New Architecture for Dynamic Frame-Skipping Transcoder’, pp.860-900, Vol.11, No.8, IEEE Transactions on Image Processing, August 2002, USA.




Video Transcoding: Sample Study 2–Transcoding the H.263 to the H.264 within the Transform Domain

Why transcoding from H.263 to H.264 ?The complete migration to the new video coding algorithm

will take several years since H.263 and MPEG are widely used in many multimedia applications nowadays.

This creates an important need for transcoding technologies that convert the widely available H.263 compressed videos to H.264 compressed format and vice versa. However, given the significant differences between the H.263 and the H.264 algorithms, transcoding is much more complex.

Vodeo Coding & Transcoding: Prof. Wan-Chi Siu

References: Wan-Chi Siu, Yui-Lam Chan and Kai-Tat Fung, “On Transcoding a B-frame to a P-frame in the Compressed Domain”, pp.1093-

1102, Vol.9, Issue 6, October 2007, IEEE Transactions on Multimedia, USA.Kai-Tat Fung and Wan-Chi Siu, ‘DCT-based Video Downscaling Transcoder using Split and Merge Technique’ , pp.394-403,

Vol.15, No.2, February 2006, IEEE Transactions on Image Processing, USA.

Kai-Tat Fung and Wan-Chi Siu, ‘On Re-composition of Motion Compensated Macroblock for DCT-based Video Transcoding’, pp.44-58, Vol.21, No.1, January 2006, Signal Processing: Image Communication, Elsevier Science, The Netherlands.

Kai-Tat Fung, Yui-Lam Chan and Wan-Chi Siu, “Low-Complexity and High-Quality Frame-Skipping Transcoder for Continuous Presence Mutlipoint Video Conferencing”, pp.31-46, Vol.6, No.1. February 2004, IEEE Transactions on Multimedia, USA.

13




Sample Study 3: H.264H.264 -Architecture of a Down-sizing Transcoder-

Read and Decodeone frame

P-Frame

Check frame type

I-frame

E.g. for H.264: High profile from HD to SD

We have to convert a video of HD (1920 x 1080) format

to SD (1280 x 720) format For a macroblock:

1. To determine its mode type: Intra or inter

2. To determine its prediction mode for intra-mode

3. To determine its mode (VBS) for inter-mode

4. Check if skip mode to be used

5. Motion re-estimation.




HDTV: HD 19201080SD 1280720

3 2Transcoding

3 2Transcoding

14




A B

C D

A B

C D

3 2Transcoding

3 2Transcoding




HDTV: Video TranscodingTranscoding from HD and SD formats:

Procedure:(i) timing Analysis,

(ii) data extraction from the codec,

(iii) building an ideal speed-up model for transcoding in the H.264 platform (architecture realization) and

(iv) video transcoder refinement using various technolgies:

Technologies: (Algorithms have to be designed for )

(1) inter/intra re-decision(2) intra mode re-decision (I16x16 or I4x4)(3) inter mode re-prediction(4) motion vector re-estimation

15




Study of the average encoding speed of the JM12.2 encoder

Sequence name: CrowdRun_720p

No. of frames: 500

Intra-frame period: 0

No. of slices per frame: 1

QP-I, QP-P: 30

QP-B: 32

Inter-block-sizes used: 16*16, 16*8, 8*16, 8*8

Intra-block-sizes used: 16*16, 8*8, 4*4

Max search range: 128

Number of reference frames: 1

Sub-pixel depth: Quarter-pixel

Entropy coding method: CAVLC

Special features applied: Weighted prediction, skip and direct coding modes, 8x8 integer transform, deblocking filter

Encoder complied by: VC6.0




No. of B-frames = 0

Total time: 1770.327 sec

Reading frames time: 11.927 sec (0.67%)

Padding reference frames time: 0.000 sec (0.00%)

Integer ME time (EPZS): 156.196 sec (8.82%)

Sub-pel ME time (1/2 & 1/4 pel): 277.982 sec (15.70%)

Interpolation time: 88.747 sec (5.01%)

Getting MVs for direct mode time: 0.000 sec (0.00%)

Weighted prediction time: 1.918 sec (0.11%)

Intra prediction time: 657.611 sec (37.15%)

Computing distortion values for modes time: 28.422 sec (1.61%)

Computing rate values for modes time: 185.476 sec (10.48%)

Luma residue coding time: 95.462 sec (5.39%)

Chroma residue coding time: 139.090 sec (7.86%)

Setting parameters time: 7.544 sec (0.43%)

Entropy coding time: 10.229 sec (0.58%)

Deblocking time: 9.499 sec (0.54%)

Other time: 100.224 sec (5.66%)

PSNR: 33.43 dB Bit-rate: 22625.70 kbps@50Hz

16




No. of B-frames = 2

Total time: 2649.320 sec

Reading frames time: 16.621 sec (0.63%)

Padding reference frames time: 0.000 sec (0.00%)

Integer ME time (EPZS): 532.062 sec (20.08%)

Sub-pel ME time (1/2 & 1/4 pel): 463.725 sec (17.50%)

Interpolation time: 29.039 sec (1.10%)

Getting MVs for direct mode time: 1.854 sec (0.700%)

Weighted prediction time: 3.054 sec (0.12%)

Intra prediction time: 628.926 sec (23.74%)

Computing distortion values for modes time: 43.832 sec (1.65%)

Computing rate values for modes time: 237.783 sec (8.98%)

Luma residue coding time: 190.330 sec (7.18%)

Chroma residue coding time: 221.465 sec (8.36%)

Setting parameters time: 7.975 sec (0.30%)

Entropy coding time: 8.078 sec (0.30%)

Deblocking time: 9.350 sec (0.35%)

Other time: 255.226 sec (9.63%)

PSNR: 32.74 dB Bit-rate: 20901.61 kbps@50Hz




JM 12.2 encoder timingQP(I,P,B) = 27, 28, 29 720P 250 Frames search range = ±32 All modes turned on

(4.81%)(12.89%)(7.20%)Average:(1.69%)(5.58%)(66.26%)Average:

44.02 (4.67%)

121.83 (12.93%)

68.72 (7.29%)

942.1844.13

(1.69%)148.10 (5.66%)

1719.91 (65.76%)

2615.36ducks take off

43.94 (4.95%)

114.01 (12.85%)

63.00 (7.10%)

887.4743.55

(1.69%)141.39 (5.49%)

1719.91 (66.75%)

2573.35crowd run

16x16, 16x8, 8x16, 8x8

(7.53%)(4.55%)(1.56%)Average:(1.99%)(1.39%)73.64%Average:

43.66 (7.41%)

26.94 (4.57%)

8.95 (1.52%)

589.0543.65

(1.98%)30.65

(1.39%)1617.56 (73.41%)

2203.40ducks take off

43.75 (7.66%)

25.87 (4.53%)

9.16 (1.6%)

571.4843.75

(2.00%)30.53

(1.40%)1613.98 (73.87%)

2185.02crowd run

Intepolationtime (s)

Sub ME time (s)

Integer ME time (s)

Total time (s)Intepolation

time (s)Sub ME time

(s)Integer ME

time (s)Total time (s)

16x16 only

EPZS (Extended diamond pattern)SAD reuse algorithm

17




Key Technologies:(1) Inter/Intra mode Decision (I,P, SKIP, etc. by block with fixed location, or simple majority)

(2) Inter-block modes re-decision (16x16, 16x8, …4x4)

(a) Natural reduction, using majority etc.

(b) Further mode selection with better quality, such as refinement

(3) Intra Prediction Modes (differential code, vertical, horizontal,…, diagonal.. Prediction)

(5) Interpolations for sub-pixel interpolation: integer decimation

(6) Motion Vector Re-estimation(a) MV reuse using original MV (as far as possible)(b) MV reuse using, mean, median, align to the best, align to the worst, weighed

residual error signal,..(c) MV refinement using temporal and spatial records(d) MV reuse and residual error signal reuse(d) sub-pixel motion re-estimation

(6) Video Down Sizing and Interpolation(a) Interpolations for downsizing 21, 32, 1M/N, fixed ration interpolation, (b) and then variable ratio interpolation(c) Quality interpolation, for example edge-preserving interpolation.

(7) Transform domain video transcoding

Etc.

Video transcoding Technologies:Mode

Decision




Samples of Experimential Results

Table 2 shows the results of our realization of the transcodingresults using the H.264 JM12.2 and using our fast approaches forconverting the Crowd Run of size 1280x720 to 2/3 of this size. It is seen that there is a substantial reduction in computation time for motion estimation, mode decision, and etc. and a speedup of 2.6 time is achieved.

Table 2: Comparison of results using JM12.2 and our fast approach

284.32s (2.68X)763.07sTotal time

211.01s (74.14%)210.80s (27.63%)Others

17.54s (6.17%)131.78s (17.27%)Intra prediction

5.23s (1.84%)39.28s (5.15%)Other ME time

30.25s (10.64%)227.23s (29.78%)Sub-pel ME

20.50s (7.21%)153.98s (20.18%)Integer-pel ME

After fast algorithmsJM 12.2

18




Transcoder Demonstration


On video Coding for HDTV using H.264 Standard

To convert HD (1920 x 1080) formatto SD (1280 x 720) format

(Real-time demonstration has bee done, but here is a reduced version, due the speed constraint of the Labtop computer.)





• Full Decode + Downsize + Full Encode

• Transcoding– (Mode Re-Decision + MV Refinement)

Full 1Full 1

Trans 1Trans 1

Full 2Full 2

Trans 2Trans 2

19




Related Publication (technologies used)Transcoding:

Zhaoguang Liu and W.C. Siu, ‘A Downsizing Video Transcoder Based on H.264’, Progress Report, PhD/Transcoder research report, November 2006, EIE, The Hong Kong Polytechnic University.

K.T. Fung and W.C. Siu, ‘Diversity and Importance Measures for Video Downscaling’, Proceedings, pp.1061-4, Vol.2, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’2005), March 2005, Philadelphia, USA.

Kai-Tat Fung and Wan-Chi Siu, ‘DCT-based Video Downscaling Transcoder using Split and Merge Technique’ , pp.394-403, Vol.15, No.2, February 2006, IEEE Transactions on Image Processing, USA

Motion Estimation:

Ying Zhang, Wan-Chi Siu and Tingzhi Shen, ‘Yet a Faster Motion Estimation Algorithm with Directional Search Strategies’, pp.475-478, Proceedings, 15th International Conference on Digital Signal Processing (DSP’2007), July 2007, Cardiff, UK

Ko-Cheung Hui, Wan-Chi Siu and Yui-Lam Chan, ‘New Adaptive Partial Distortion Search using Clustered Pixel Matching Error Characteristic’, pp.597-607, Vol.14, No.5, May 2005, IEEE Transactions on Image Processing

M.Y. Chiu and W.C. Siu, New Results on Exhaustive Search Algorithm for Motion Estimation using Adaptive Partial Distortion Search and Successive Elimination Algorithm’, Proceedings, pp.3978-81, IEEE International Symposium on Circuits and Systems (ISCAS’2006), May 2006, Island of Kos, Greece

Liangming Ji and Wan-Chi Siu, ‘Reduced Computation using Adaptive Search Window Size for H.264 Multi-frame Motion Estimation’, Paper 1568982117, pp.1-5, Proceedings, 14th European Signal Processing conference (EUSIPCO’2006), September 2006, Florence Italy




4. Extending to Video Amplificaitons and Super-resolution videos

With the development of visual communication and image processing, there is a high demand for high-resolution images such as video surveillance, remote sensing, medical imaging, HDTV and other entertainment applications.

However, image resolution depends on the physical characteristics of the imaging devices.

It is sometimes difficult to improve the image resolution by using better sensors because of the high cost or hardware physical limits.

Super-resolution (SR) image reconstruction is a promising technique to increase the resolution of an image or sequence of images beyond the resolving power of the imaging system.

20




A SR video may also require to be re-encoded for various reasons, including

(i) to allow standard devices to view the SR videos without using additional conversion devices, (ii) to save SR video reconstruction time since the computing power of the viewing devices may not be sufficient and (iii) to avoid the unavailability of the SR video package at the viewing site. It is also true that broadcasting companies are looking for good technologies to covert videos between formats, different resolutions, and frame/bit rates. It is particularly difficult to do up-conversion of a compressed video, say for example from SDTV to HDTV, due to the missing data and blurring effect of edges by simple interpolation. Furthermore re-encoding of these SR videos is required in many practical situations, since contents providers often have to standardize various video clips for uniform storage or transmission.




(1) Read and Decode one frame(2) Parameter Extraction

(i) MV, Modes, etc.(ii) No. of zero coefficients(iii) Residual errors

Video Interpolation

Possible Output

Yes

Yes

No

No

Exit

Initialization

Check MB/Slice type (mode type decision)

Re-encoding Process

Intra MB Inter MB

End of Bit stream?

Re-encoding?

Full Mode Decision?

Predict the Mode from original parameters

Find the Best Mode

ME by Full Search?

Find the Best MV

Predict the MV from original parameters

Encode MB

Full Encoding?

Find the Best Prediction

Mode


Encode MB

Frame end?

Yes No

NoYes

No

Frame end?

No

Write Bitstream

Deblock Picture

NoYes

Yes

Yes

Figure 5:Architecture of TranscodingPlatform(Video Enlargement)

21




In the previous years, most researchers, including us, just concentrated on downward conversion.

Recently it is clear to us that there is a great need to develop techniques for upward conversion, including image/video up-sizing, frame interpolation, and super-video coding.

This is a challenging topic but difficult one.

Some works have been done by few researchers, but many technologies are still unavailable or pre-mature.

Hence this forms a fruitful direction for further research.




We have built an architecture which allows us to re-encode the SR video for either storage or transmission.

We fully utilized the decoded data, statistics and parameters available from the previously encoded LR video to facilitate the super-resolution conversion.

As shown in fig.5, a model has to be built for this investigation. The H.264 is our codec kernel.

The model consists of three parts (a) “encoded bit-stream” decoding, (b) video interpolation and (c) re-encoding.

We opt for a simple frame work as shown in fig.5, while many fundamental technologies are desperately needed for its practical realization.

22




(i) The interpolation is done initially within the decoded LR video frame without considering information from the temporal direction. The re-encoding part is done simply using the H.264 encoder, which requires relatively long encoding time. Fig.6 shows the result of a preliminary test on converting the “Rush Hour” sequence from the SD(1280x720) format to HD(1920x1080) format in the high profile of the H.264. The upper curve shows the quality and bit-rate of using fully decoding and encoding, with simple linear interpolation for magnification. The lowest curve shows the production of the compressed HR video by the simplest and quickest approach. In this approach we made use of the decoded motion vectors, decoded prediction modes, decoded mode sizes, etc. of the LR frame for the re-coding. This is done by some default arrangements. No motion estimation, no mode decision, etc. were required.

It is about three times or more faster than that using the fully re-encoding mode, but it suffers from low PSNR and high bit rate.

The middle curve shows a hypothetically case. This gives the best possible result that can be achieved if we do not perform full motion estimation, mode decision, etc. while the best parameters (MV, modes, etc.) were picked from the list of parameters decoded from the LR frame. This forms the target for fast algorithm development.




Figure 6: Video Enlargement

Rush Hour

37.5

38.5

39.5

40.5

41.5

42.5

43.5

44.5

45.5

1000 3000 5000 7000 9000Bitrate (kbps)

PS

NR

(dB

)

FullSpeedTarget

23




(ii) A key part of this work is to design fast and accurate algorithms for obtaining encoding modes, motion vectors, or even transform coefficients without goingthrough the heavy computational processes.

The process is surprisingly close to downsize transcoding. We have to do

(1) inter/inter mode re-decision, (2) intra mode re-decision (16x16 or 4x4), (3) inter mode re-prediction, (4) motion vector re-estimation, etc.

The data and parameters available in originally encoded LR video are used to formulate the fast algorithms.




(ii) The following strategies are used.

(a) Higher weights should to be given to parameters with larger areas.

(b) All modes/MV (from LR frames) with the areas of LR blocks overlapping with the SR block should have a good priority to be checked.

(c) The number of zero coefficients should be able to reflect the motion activities of the block.

(d) Treat cases with different QP differently.

(e) Refinement are made according to models built.

24




A: Interpolation Techniques: In order to remove the burring effect, edge enhancement is one of the best way to improve the quality of a super-resolution image/video sequence.

We propose an improved edge directed interpolation method by removing the accumulated interpolation error, and reducing correlation structure miss-match problem. Let us recall the transfer function of a Wiener filter,

where Y(k) is the predicted value, (k)’s are the linear prediction coefficients and x(n)’s are known samples. By optimizing the mean square error, MSE (=E[e2(n)] ), we can come up with an equation for finding the coefficients of the Wiener filter for the interpolation,

rdx = Rxx (1)

where rdx (=E[x(n)x(n-i)]) is a cross-correlation function and

Rxx (=E[x(n-k)x(n-i)]) is an autocorrelation function.

0

)()()(n

nkxnkY




The New Edge-Directed Interpolation (NEDI)[14] scheme is to modela natural image as a second-order locally stationary Gaussian processwhich allows the interpolation using a simple linear prediction.

The covariance of the image pixels in a local block (training window) can be used to obtain the prediction coefficients of the estimation problem.

Consider the interpolation of an image X to a high-resolution image Y.

32 33 34 35 36 37

24 25 26 27 28 29

16 17 18 19 20 21

8 9 10 11 12 13

0 1 2 3 4 5

(a)32 33 34 35 36 37

24 25 26 27 28 29

16 17 18 19 20 21

8 9 10 11 12 13

0 1 2 3 4 5

(b)

Figure 7: New Edge-Directed Interpolation (NEDI)

To minimize the distance between estimated unknown pixel to its real postion

25




In fig.7, the numbers are used to represent the locations of the original low resolution pixel points.

The solid point, entitled as yi, as shown in fig.7(a) is a high resolution point to be interpolated from four neighbor low-resolution pixels {x18, x19,x26, x27}.

In order to have the simplest formulations, one-D representation has been used as far as possible for explanation.

The predicted pixel becomes,

From eqn.1, we have = R-1xx rdx (2)

)1(' pixelsgsurroundinselectediiiii xxy




The computation of rdx (cross-correlation between yi and it’s interpolating points) and Rxx (the auto-correlation among interpolating points) would require knowledge of statistics of yi with its neighbors which are not available before the interpolation.

This difficulty is overcome by the “geometric duality” property, as illustrated fig.7(b).

The correlations between yi in the high resolution domain and its neighbors points, 18, 19, 26 and27 are replaced by the correlations of four sets of sample (training) points as enclosed by dotted lines as shown in fig.7(b).

)1(' pixelsgsurroundinselectediiiii xxy

= R-1xx rdx (2)

32 33 34 35 36 37

24 25 26 27 28 29

16 17 18 19 20 21

8 9 10 11 12 13

0 1 2 3 4 5

(a)32 33 34 35 36 37

24 25 26 27 28 29

16 17 18 19 20 21

8 9 10 11 12 13

0 1 2 3 4 5

(b)

26




For example the statistics are available for interpolating point 18from its neighbors, points 9, 11, 25 and 27 in the low resolution (LR) domain. Hence we can write

and

where elements of y are the training points and the row of C are the set of respective points to interpolate elements of y. In this case we have

rdx = CTy and Rxx = CTC.

27

26

19

18

x

x

x

x

y

36342018

35331917

28261210

2725119

xxxx

xxxx

xxxx

xxxx

C

32 33 34 35 36 3724 25 26 27 28 2916 17 18 19 20 218 9 10 11 12 130 1 2 3 4 5

(a)32 33 34 35 36 3724 25 26 27 28 2916 17 18 19 20 218 9 10 11 12 130 1 2 3 4 5

(b)




To interpolate a point between two vertical LR pixels (2nd

step), the same procedure is used with a rotation by an angle /4 as shown in figs.8(a) and (b).

In fig.8, circles represent LR pixels and grey dots represent the interpolated points in the 1st step (fig.7) and small black dots represent HR points to be interpolated.

To save computation, the NEDI adopted a hybrid approach, this correlation based interpolation is applied to edge pixels only and bilinear interpolation is applied to non-edge pixels (i.e. pixels in smooth regions).

27




However, the NEDI suffers from the prediction error propagation problem which limits the performance of the algorithm.

NEDI is a two-step interpolation scheme, where the first step makes use of the original pixels for interpolation, whilst the second step makes use of the interpolation results obtained from the first step, i.e. gray pixels in Fig.8 to obtain the interpolation pixel (the small black dot).

The interpolation error in the first step will be propagated to the second interpolation step, and thus causes the interpolation error propagation problem.

At the same time, NEDI also suffers from covariance structure miss-match problem. The span of pixels does not represent the best coverage in the HR domain.




Figure 8:Modified 2nd step, (a) Interpolation problem, (b) original training set, (c) and (d) proposed training sets.

(a) (b)

(c) (d)

28




Hence a different set of pixels could give a better interpolation of the edges. We resolve the problem by suggesting a new version.

The first step is the same as before.

In the second step, we propose to interpolate the unknown pixels by a sixth-order linear prediction with a training window as shown in figs.8(c) and (d) by using points on the original LR domain only.

This completely eliminates the error propagation problem.

To reduce the covariance miss-match problem, we may use multiple low-resolution training window candidates, i.e. a scheme to choose one from more than one low-resolution training windows to represent the covariance of the high-resolution block to perform the linear prediction, as shown in fig.8(d).




j

i

Figure 5: Suggested Enhanced NEDI.

References: 1. X. Li and M. Orchard, “New Edge-Directed Interpolation”, IEEE Trans. On Image Processing, vol. 10, no. 10, October 2001, pp. 1521 – 1527

2. W.S. Tam, C.W. Kowk and W.C. Siu, “A Modified Edge Directed Interpolation for Images”, Paper submitted to IEEE Transactions on Image.

3. ….

Statistics to be used

29




Figs.9 and 10 show that results of our approach on interpolation for the enlargement of an image and simulated SR video reconstruction. The reader may note the bar and connection parts above the wheel of fig.10, which look more smooth and sharper. The effect is more effective if we use some further level of amplifications.

(a) Using original Step 2 (b) Using new Step 2

Figure 9: Preliminary results of the proposed new approach for edge enhancement




Figure 10: SR video by simulation, Top right: original video frame, bottom left: by linear interpolation by Intel lib, bottom right: SR video with accurate MVs.

30




Experimental Results – Original Images

Jet Plane Bicycle




Experimental Results – test image: Jet PlaneBilinear

(PSNR=28.98dB)NEDI

(PSNR=32.47dB)MEDI

(PSNR=32.34dB)

Final Image

Error Image*

* Intensity is scaled between range 0 to 255

31




Experimental Results – test image: bicycle

Bilinear (PSNR=18.68dB)

NEDI (PSNR=20.89dB)

MEDI (Our) (PSNR=20.67dB)

Final Image

Error Image*

* Intensity is scaled between range 0 to 255




B. Super-Resolution Video: Since the interpolation from a frame to form an enlarged frame is restricted by the resolution and information available from the original image, it is very natural to use more frames (both in temporal and spatial domains) to construct the enlarge frame.

An enlarged frame obtained from more then one orginalframe is defined as a super-resolution farme (video) in this paper.

This can be achieved by both non-iterative and iterative approach.

Due to the limitation in space, let us not to discuss the details of our approach, but code some experimental as shown in fig.9. Interested reader may refer to the literature for further information.

32




Super-resolution Images/Videos

Super-resolution (SR) image/video reconstruction (SR) is a promising technique to increase the resolution of an image or sequence of images (video) beyond the resolving power of the imaging system.

Modified definition of Transcoding: A process of converting a previously compressed video bitstream into a bit stream of different nature or lower/higer bit-rate.




Figure1: Video Interpolation

33




Reasons for Re-encoding:

A SR video may also require to be re-encoded for various reasons, including

(i) to allow standard devices to view the SR videos without using additional conversion devices,

(ii) to save SR video reconstruction time since the computing power of the viewing devices may not be sufficient and

(iii) to avoid the unavailability of the SR video package at the viewing site.




Reasons for Re-encoding (con’t):It is also true that broadcasting companies are looking

for good technologies to covert videos between formats, different resolutions, and frame/bit rates.

It is particularly difficult to do up-conversion of a compressed video, say for example from SDTV to HDTV, due to the missing data and blurring effect of edges by simple interpolation.

Furthermore re-encoding of these SR videos are required in many practical situations, since contents providers often have to standardize various video clips for uniform storage or transmission. The viewer side may not have the conversion module or computing power for real-time SR reconstruction.

Go to Demonstration

34




Key Technologies on the study of in Super-resolution (SR) videos:1. Image Interpolation, hence

2. Video Interpolation- blurring effiect- aliasing effect- edge enhancement techniques (new technique is available)- iterative and non-iterative approaches (non-itera.. is simple)- noise reduction technique (new technique is available)

3. Spatial domain super-resolution videos using multi-images

4.Temporal domain super-resolution videos using temporalfeatures (- using further information from video frames)

5. SR video from encoded video frames

6. SR video re-encoding using lower-resolution compressed videos (general kernel suggested here, further technologies required)




Block diagram for re-encoding SR Videos

First, the transcoder decodes the incoming bitstream in the pixel domain. Second, the decoded video frame magnified with super-resolution techniques, and then re-encoded at the desired frame-rate.

Transcoder

Front-encoder decoderMagnification& re-encoding

End-decoder

The incoming bitstream is decoded into the pixel domain

The decoded video frame is magnified and subsequently re-encoded at the desired frame rate.

35




Framework of the Architecture of Super-Resolution Transcoding

Read and Decodeone frame

P-MB

Check frame type

I-MB

E.g. for H.264-High profile: from SD to HD

For a macroblock:

1.To determine its mode type: Intra or inter

2. To determine its prediction mode for intra-mode

3. To determine its mode (VBS) for inter-mode

4. Check if skip mode to be used

5. Motion re-estimation.



Form SR Video

Possible Output

Yes

Yes

No

No

Exit

Initialization

Check modeRe-encoding Process

Intra MB Inter MB

End of Bit stream?

Re-encoding?

Full Mode? Decision?


Find the Best Mode

ME by Full Search?

Find the Best MV


Encode MB

Full mode?

Find the Best

Prediction Mode

Predict the Mode from

original parameters

Encode MB

Frame end?

Yes No

NoYes

No

Frame end?

No

Write Bitstream

Deblock Picture

NoYes

Yes

Yes

Ideal Case

Basis




36







HDTV: Video Upsizing with SR techniques Transcoding from SD and HD formats

(i) Compressed video decoding: also decoding the original motion vector, modes, residual error value and statistics available.)

(ii) Super-resolution video formation: mosaicing using multi-frames from the videoSimple linear interpolation (for missing point)Edge enhancement (edge detection and edge-directed interpolation, etc.) Noise removal and/or deblocking, etc.

(ii) SR video Re-encoding:(1) inter/intra re-decision(2) intra mode re-decision (I16x16 or I4x4)(3) inter mode re-prediction(4) motion vector re-estimationetc.

Details

37




Super-Resolution Video Kernel Demonstration:

• Fully Decode + Upsize + Fully Encode

• Transcoding

– (Fully Decode + Upsize +Fast Encoding using Mode Re-Decision + MV Refinement + etc.)

Full 1Full 1

Trans 1Trans 1

Full 2Full 2

Trans 2Trans 2

Converting HDTV video with H.264 StandardfromTo convert SD (1280 x 720) format

To HD (1920 x 1080) format




Ideal SuperIdeal Super--resolution Video:resolution Video:

Original Frame

Simulated four LR images: LR0, LR1, LR2, LR3 with size of 172*144

38




Original Frame

Left: Linear Interpolation by Intel LibraryRight: SR video with accurate MVs

Super Resolution Demo




Super Resolution Demo

39




Conclusion:1. We started with the Hybrid Video Coding model, and gradually

moved on to Object-oriented Video Coding.

2. The Advanced Video Coding (H.264) includes almost no new concepts, except that it fine trims existing techniques in a systematic way to optimize the coding efficiency. This can reduce the bitrate to half of that of the MPEG-2 standard.

3. Can we squeeze further that the bitrate be improved by one more time? Some researchers go back to the object oriented coding, whilst others continue with the optimization or move to other sophisticated applications, such as multi-view video coding or advanced scalable coding.

4. Motion Estimation is an important topic. We have done much work on it, but did not talk too much about it in this presentation. Can we have a Fast ME algorithm which gives better quality as compared with Exhaustive Full Search Algorithm?




6. We then talked about transcoding, which is a process to convert an encoded video from one format to another format. Our work involves both

(i) heterogeneous transcoding, such as from H.263 to H.264 and

(ii) homogeneous transcoding, such as from H.264 to H.264.

H.264 to H.264 transcoding: in the high profile,from SD to HD (Not difficult, but good quality is difficult.)from HD to SD (why?) Would pixel interpolation be important?

Mode Type re-decision: Intra modeInter mode (including skip mode, etc.)

Intra Mode: 4x4 or 16x16? What is the prediction mode?Inter Mode: Mode decision (horizontal, vertical, …)Motion vector re-prediction

7. We then talk about the significance of super-resolution video and some of it related techniques:Simple interpolation, and the new edge-directed interpolation and our modified edge-directed interpolation ,.., also a complete Kernel structure.

8. A brief highlight of our work being carried out has also been given. These include (i) to look for practical ways to form SR videos and

(ii) the re-encoding of SR videos

Centre for Signal Processing

More Details?

40




The End:thank you!






Video Interpolation

Possible Output

Yes

Yes

No

No

Exit

Initialization

Check MB/Slice type (mode type decision)

Re-encoding Process

Intra MB Inter MB

End of Bit stream?

Re-encoding?

Full Mode Decision?


Find the Best Mode

ME by Full Search?

Find the Best MV


Encode MB

Full Encoding?

Find the Best Prediction

Mode


Encode MB

Frame end?

Yes No

NoYes

No

Frame end?

No

Write Bitstream

Deblock Picture

NoYes

Yes

Yes

Figure 5:Architecture of TranscodingPlatform(Video Enlargement)

InterpolationA

on video transcoding to super-resolution videoswcsiu/presentation/keynotepresentation_video... ·...

Documents