fast block motion estimation with 8 bit partial sums using simd architecture

71
Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures Presented by: •Ahmed Abdel-Hafeez •Ahmed El-Bohy •Ahmed Emam •Ahmed Kandil Supervised by/Presented to: Pf.Dr. Attalah Hashaad Published by: Chunjiang J. Duanmu et. al. Published in August 2007.

Upload: ahmad-abdelhafeez

Post on 06-May-2015

388 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Fast block motion estimation with 8 bit partial sums using SIMD architecture

Fast Block Motion Estimation With 8-Bit Partial Sums Using

SIMD Architectures

Presented by: •Ahmed Abdel-Hafeez•Ahmed El-Bohy•Ahmed Emam•Ahmed Kandil

Supervised by/Presented to: Pf.Dr. Attalah Hashaad

Published by: Chunjiang J. Duanmu et. al. Published in August 2007.

Page 2: Fast block motion estimation with 8 bit partial sums using SIMD architecture

2

Outline• Abstract.• Introduction.• 8-bit partial sums.• Multilevel 8-bit partial sums.• Computational complexity.• Simulation Results.• Conclusion.

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 3: Fast block motion estimation with 8 bit partial sums using SIMD architecture

3

Abstract• Fast block motion estimation algorithms are needed for real-time

implementations of video coding standards due to the high computational complexity of the full-search algorithm for block motion estimation.

• In this paper, an algorithm using 8-bit partial sums of 16 luminance values for a fast block motion estimation is proposed. The technique of using the partial sums is employed to reduce the computational complexity of not only the full search algorithm but also some of the fast block motion estimation algorithms while maintaining their accuracy.

• Furthermore, it is shown that the byte-type data-parallelism on an SIMD architecture can be utilized to access and process these partial sums concurrently to accelerate the process of motion estimation.

• Simulation results are presented to demonstrate that the use of the partial sums can accelerate the execution of the full-search and another search algorithms on an SIMD architecture significantly.

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 4: Fast block motion estimation with 8 bit partial sums using SIMD architecture

4

Introduction- - Applications

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Basics

Page 5: Fast block motion estimation with 8 bit partial sums using SIMD architecture

5

Chronological Table of Video Coding StandardsThe objective of video coding is to compress moving images

H.261

(1990)

MPEG-1

(1993)

H.263

(1995/96)

H.263+

(1997/98)

H.263++

(2000)

H.264

( MPEG-4

Part 10 )

(2002)MPEG-4 v1

(1998/99)MPEG-4 v2

(1999/00)MPEG-4 v3

(2001)

1990 1992 1994 1996 1998 2000 2002 2003

MPEG-2

(H.262)

(1994/95)ISO/IEC

MPEG

ITU-TVCEG

Page 6: Fast block motion estimation with 8 bit partial sums using SIMD architecture

6

Introduction-Basics- VideoFrame 1 Frame 2 Frame 3 Frame 4

Luminance (Y) : Describes the brightness of the pixel.

Chrominance (CbCr) : Describes the color of the pixel.

Frame

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 7: Fast block motion estimation with 8 bit partial sums using SIMD architecture

7

Introduction-Basics- Video Data Drawback

• An uncompressed video data is big in size.– This is due to data redundancy, there are two

general types of data redundancy in a video:

Spatial redundancy

In a frame, adjacent pixels are usually correlated. e.g. - The grass is green in the background of a frame.

Frame 1 Frame 2 Frame 3 Frame 4

Time based redundancy

In a video, adjacent frames are usually correlated. e.g. - The green background is persisting frame after frame.

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 8: Fast block motion estimation with 8 bit partial sums using SIMD architecture

8

• Predict current frame based on previously coded frames

• Types of coded frames:– I-frame – Intra-coded frame, coded independently of all

other frames– P-frame – Predictively coded frame, coded based on

previously coded frame– B-frame – Bi-directionally predicted frame, coded based on

both previous and future coded frames

Introduction-Basics- Video Compression

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 9: Fast block motion estimation with 8 bit partial sums using SIMD architecture

9

Block Matching

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 10: Fast block motion estimation with 8 bit partial sums using SIMD architecture

10

• What is Motion Estimation?– Predict current frame from previous

frame– Determine the displacement of an object

in the video sequence– The amount of data to be coded can be

reduced significantly if the previous frame is subtracted from the current frame.

Motion Estimation

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 11: Fast block motion estimation with 8 bit partial sums using SIMD architecture

11

Block Based Motion Estimation Algorithms

Time-domain Algorithms Frequency-domain Algorithms

Matching Algorithms Gradient Based Algorithms

Block-MatchingFeature-matching

Pel-recursive Block-recursive Phase-correlation (DFT)

Matching in (DCT) domain

Matching in wavelet domain

Mesh Based Motion Estimation Algorithms

Motion Estimation Classification

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 12: Fast block motion estimation with 8 bit partial sums using SIMD architecture

12

Motion Estimation (ctd)

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 13: Fast block motion estimation with 8 bit partial sums using SIMD architecture

13

Motion Estimation (ctd)

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 14: Fast block motion estimation with 8 bit partial sums using SIMD architecture

14

Motion Estimation (ctd)

Reference Frame

Current Frame

Current 16x16 Block

Mot

ion

Vecto

r

Search Window

Sum of Absolute Difference (SAD)

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 15: Fast block motion estimation with 8 bit partial sums using SIMD architecture

15

• CCF(Cross-Correlation Function)

• MSE(Mean Square Error Function)

• MAE(Mean Absolute Error)

• SAD(Sum of Absolute Difference)

• PDC(Pixel Difference Classification)

• MAE(or MAD,SAD are commonly employed due to their simplicity in hardware implementation)

Distortion Criterion for measuring distance between previous block and search area block

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 16: Fast block motion estimation with 8 bit partial sums using SIMD architecture

16

SAD(dx,dy) =

(MVx, MVy) = min (dx,dy)ЄR2 SAD(dx,dy)

1 1

1 |),(),(|Nx

xm

Ny

ynkk dyndxmInmI

SAD

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 17: Fast block motion estimation with 8 bit partial sums using SIMD architecture

17

Search Algorithms

Search Algorithms

FAST

MULTISTEP

3SS 4SS HBS UDS

EXHAUSTIVE

SE MSE VF PFGSE

FULL

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 18: Fast block motion estimation with 8 bit partial sums using SIMD architecture

18

Search Algorithms (ctd)

• There is a trade-off between the run time and the accuracy.

• Full search will be most accurate because of exhaustive search, but will require more time

• Fast search is faster but the accuracy will be reduced because of estimation algorithms.

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 19: Fast block motion estimation with 8 bit partial sums using SIMD architecture

19

Full-Search

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

not suitable for real time.

Page 20: Fast block motion estimation with 8 bit partial sums using SIMD architecture

20

•Simplest algorithm, but computationally most expensive

Exhaustive Search

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 21: Fast block motion estimation with 8 bit partial sums using SIMD architecture

21

Three Step Search (3SSA)

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 22: Fast block motion estimation with 8 bit partial sums using SIMD architecture

22

Three Step Search (3SSA) (ctd)

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 23: Fast block motion estimation with 8 bit partial sums using SIMD architecture

23

Three Step Search (3SSA) (ctd)

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 24: Fast block motion estimation with 8 bit partial sums using SIMD architecture

24

Three Step Search (3SSA) (ctd)

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 25: Fast block motion estimation with 8 bit partial sums using SIMD architecture

25

3SSA Block Matching

► Three-Step Search (3SS)– 9 Points: Central point & its 8

surroundings– Distance: w/2– Find the best match– Use previous best as center– Half distance, select 8 new– Repeat algorithm 3 times– Examines 25 points– Assumes a uniform

distribution of MV’s

1

1

11

11

1 1

1

23

2

2

222

2

2333 3 3

33

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 26: Fast block motion estimation with 8 bit partial sums using SIMD architecture

26

4SSA

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 27: Fast block motion estimation with 8 bit partial sums using SIMD architecture

27

Unrestricted center-bitiased Diamond Search Algorithm (UDSA)

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 28: Fast block motion estimation with 8 bit partial sums using SIMD architecture

28

Hexagon-Bitased search algorithm (HBSA)

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 29: Fast block motion estimation with 8 bit partial sums using SIMD architecture

29

Problem Definition

• The high computational requirement of the Full Search (FS) algorithm does not allow it to work in real time applications, despite its high accuracy.

• Fast Block motion estimation algorithms have lower computational complexity, but lower accuracy.

• Since, fast block motion estimation are chosen for real time applications Hence in this paper too.

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 30: Fast block motion estimation with 8 bit partial sums using SIMD architecture

30

Aim

• To improve the accuracy of some of the fast block motion estimation techniques without increasing the computational complexity.

• To make best use of Single Instruction Multiple Data (SIMD) architecture and to take advantage of byte-type data-parallelism to further accelerate the execution of the algorithms to achieve the main goal.

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 31: Fast block motion estimation with 8 bit partial sums using SIMD architecture

31

Limitation

• If the partial sums for an algorithm is more than 8 bits for a reference block cannot be put, accessed, and manipulated in a contiguous memory space, since there are partial sums of other reference blocks lying in between; due to this, a large number of CPU cycles are lost in manipulating these data. As a consequence, these algorithms are not suitable for SIMD implementations.

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 32: Fast block motion estimation with 8 bit partial sums using SIMD architecture

32

Procedure

• Devise a scheme that uses only 8 bit partial sum and discard as many SAD computations as possible, without excluding the optimal motion vector.– The proposed partial sums can not only be utilized

in the full-search algorithm as well as in some of the fast block motion-estimation algorithms.

• Devise a scheme that generalises the previous scheme to multi-level case and optimally utilise it.

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 33: Fast block motion estimation with 8 bit partial sums using SIMD architecture

33

Partial Sums

268+ 483

600Add the hundreds (200 + 400)

Add the tens (60 +80) 140Add the ones (8 + 3)

Add the partial sums(600 + 140 + 11)

+ 11751

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 34: Fast block motion estimation with 8 bit partial sums using SIMD architecture

34

8 Bit Partial Sums- Objective

• The objective of this paper is to find new partial sums of only eight bits, so that they can be of the packed byte-type on an SIMD architecture.

• In this way, eight additions or subtractions, for the partial sums can be executed in one SIMD instruction

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 35: Fast block motion estimation with 8 bit partial sums using SIMD architecture

35

8-bit Partial Sums 0123456789101112131415

16 X 16

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide∑(n)

Page 36: Fast block motion estimation with 8 bit partial sums using SIMD architecture

36

Lower Bound

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

using

Page 37: Fast block motion estimation with 8 bit partial sums using SIMD architecture

37

Scheme One- Algorithm

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

• Step 1) Initialization a) Compute all of the 8-bit partial sums of

sixteen luminance values for the current frame and save them in a contiguous memory space.

b) Retrieve all the 8-bit partial sums of sixteen luminance values for the reference frame in a saved contiguous memory

Page 38: Fast block motion estimation with 8 bit partial sums using SIMD architecture

38

Scheme One- Algorithm (ctd)

• Step 2) For every current block, execute the block motion-estimation process. – Step 2.1) Initialization

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 39: Fast block motion estimation with 8 bit partial sums using SIMD architecture

39

Scheme One- Algorithm (ctd)

– Step 2.2) Search • For (each search location of in a motion-

estimation algorithm)

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 40: Fast block motion estimation with 8 bit partial sums using SIMD architecture

40

Scheme One- Flow Chart

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 41: Fast block motion estimation with 8 bit partial sums using SIMD architecture

41

Multilevel 8-bit Partial Sums

16 X 16

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 42: Fast block motion estimation with 8 bit partial sums using SIMD architecture

Multi-level Visualisation

Page 43: Fast block motion estimation with 8 bit partial sums using SIMD architecture

Multi-level Visualisation

Page 44: Fast block motion estimation with 8 bit partial sums using SIMD architecture

Multi-level Visualisation (ctd)

Page 45: Fast block motion estimation with 8 bit partial sums using SIMD architecture

Multi-level Visualisation (ctd)

Page 46: Fast block motion estimation with 8 bit partial sums using SIMD architecture

Multi-level Visualisation (ctd)

Page 47: Fast block motion estimation with 8 bit partial sums using SIMD architecture

Multi-level Visualisation (ctd)

Page 48: Fast block motion estimation with 8 bit partial sums using SIMD architecture

Multi-level Visualisation (ctd

Page 49: Fast block motion estimation with 8 bit partial sums using SIMD architecture

49

Partial Sum Pyramid

Partial Sum Pyramid

8 x 16

4 x 16

2 x 16

1 x 16

Level 1 Level 2 Level 3 Level 4ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 50: Fast block motion estimation with 8 bit partial sums using SIMD architecture

50

Multilevel 8-bit Partial Sums- Upper Bound (UB)

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

.

Page 51: Fast block motion estimation with 8 bit partial sums using SIMD architecture

51

Scheme Two Algorithm

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

• Step 1) Initialization a) Compute all of the 8-bit partial sums of levels

one and four for the current frame and save them in a contiguous memory space.

b) Retrieve all of the 8-bit partial sums of levels one and four for the reference frame in a saved contiguous memory space.

Page 52: Fast block motion estimation with 8 bit partial sums using SIMD architecture

52

Scheme Two Algorithm (ctd)

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

• Step 2) For every current block, execute the block motion-estimation process. – Step 2.1) Initialization

Page 53: Fast block motion estimation with 8 bit partial sums using SIMD architecture

53

Scheme Two Algorithm (ctd)– Step 2.2) Search

• For (each search location of in a motion-estimation algorithm)

Page 54: Fast block motion estimation with 8 bit partial sums using SIMD architecture

54

Scheme Two- Flow Chart

Page 55: Fast block motion estimation with 8 bit partial sums using SIMD architecture

55

Possible Conditions

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Condition 1:

Condition 2:

Condition 3:

Condition 4:

Page 56: Fast block motion estimation with 8 bit partial sums using SIMD architecture

56

Possible Combinations

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 57: Fast block motion estimation with 8 bit partial sums using SIMD architecture

AVERAGEEXECUTION TIME(INMILLISECONDS)PERFRAME FORVARIOUSMETHODS

Results

57ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 58: Fast block motion estimation with 8 bit partial sums using SIMD architecture

58

Possible Combinations

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 59: Fast block motion estimation with 8 bit partial sums using SIMD architecture

59

SIMD

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 60: Fast block motion estimation with 8 bit partial sums using SIMD architecture

60

COMPUTATIONAL COMPLEXITY AND AVERAGE NUMBER OF CPU CYCLES PER BLOCK USING FSA

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 61: Fast block motion estimation with 8 bit partial sums using SIMD architecture

61

COMPUTATIONAL COMPLEXITY AND AVERAGE NUMBER OF CPU CYCLES PER BLOCK USING SEA

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 62: Fast block motion estimation with 8 bit partial sums using SIMD architecture

62

COMPUTATIONAL COMPLEXITY AND AVERAGE NUMBER OF CPU CYCLES PER BLOCK USING 3SSA

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 63: Fast block motion estimation with 8 bit partial sums using SIMD architecture

63

COMPUTATIONAL COMPLEXITY ANDAVERAG ENUMBER OF CPU CYCLES PER BLOCK USING 4SSA

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 64: Fast block motion estimation with 8 bit partial sums using SIMD architecture

64

COMPUTATIONAL COMPLEXITY AND AVERAGE NUMBER OF CPU CYCLES PER BLOCK USING UDSA

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 65: Fast block motion estimation with 8 bit partial sums using SIMD architecture

65

COMPUTATIONAL COMPLEXITY AND AVERAGE NUMBER OF CPU CYCLES PER BLOCK USING HBSA

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 66: Fast block motion estimation with 8 bit partial sums using SIMD architecture

66

THE PERCENTAGE OF SPEEDUP OFFERED BY SIMD IMPLEMENTATION FOR A MOTION ESTIMATION ALGORITHM WITH SCHEME 2 INCORPORATED

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 67: Fast block motion estimation with 8 bit partial sums using SIMD architecture

67

Conclusion

Introduced a new technique of 8 bit partial sum.

The partial sums were used to make best use of SIMD architecture, and hence improving the speed of motion estimation algorithm.

Since these partial sums have the characteristic of having only 8 bits, eight of them can be processed concurrently using a single 64-bit SIMD register.

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 68: Fast block motion estimation with 8 bit partial sums using SIMD architecture

68

Conclusion The notion of the 8-bit partial sums has then been

extended to the four-level case and shown that there are 15 possible methods of utilizing these multilevel partial sums to accelerate the block motion-estimation algorithms without any loss of accuracy.

The full-search algorithm has then been used to determine as to which one of these 15 methods would provide the lowest computational complexity in order for it to be chosen to accelerate various motion-estimation algorithms.

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 69: Fast block motion estimation with 8 bit partial sums using SIMD architecture

69

Conclusion

Extensive simulations have been carried out to find the average number of CPU cycles needed per block for various algorithms incorporating the chosen method.

These simulations have shown that the proposed scheme is capable of providing a substantial speed-up for the various existing motion-estimation algorithms through the reduction of their computational complexities.

The simulation results also demonstrate that the implementation on an SIMD architecture can further accelerate the proposed scheme by more than 93%.

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 70: Fast block motion estimation with 8 bit partial sums using SIMD architecture

70

References1. “FPGA Implementation of a Novel, Fast Motion Estimation Algorithm for Real-Time Video

Compression”, FPGA 2001, CA. USA, S. Ramachandran and S. Srinivasan, Feb. 20012. “Image & Video Compression for Multimedia Engineering”, Y.Q. Shi and H. Sun, 20003. “A New Diamond Search Algorithm for Fast Block-Matching Motion Estimation”, IEEE Trans. Image

Processing, S. Zhu and K. K. Ma, Feb. 20004. “A Novel Four-Step Search Algorithm for Fast Block Motion Estimation”, IEEE Trans. Circuits System,

Video Technology, L. M. Po and W. C. Ma, June 19965. “Successive Elimination Algorithm for Motion Estimation” W. Li and E. Salari IEEE Trans. , Jan. 19956. “A New Three-Step Search Algorithm for Block Motion Estimation”, IEEE Trans. Circuits System,

Video Technology, R. Li, B. Zeng, and M.L. Liou, Aug. 19947. “Predictive Coding Based on Efficient Motion Estimation”, IEEE Trans. on communications, R.

Srinivasan, K.R. Rao, Aug. 19858. “Motion Compensated Inter-Frame Coding for Video-Conferencing”, T. Koga, K. Iinuma, A. Hirano, Y.

Iijima, and T. Ishiguro, Proc. NTC81, Nov. 19819. “Displacement Measurement and its Applications”, IEEE Trans. on communications, J.R. Jain and

A.K Jain, Dec. 1981

ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide

Page 71: Fast block motion estimation with 8 bit partial sums using SIMD architecture

71ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide