cbs_sips2005

1
IEEE 2005 Workshop on Signal Processing Systems (SIPS'05), November 2, Athens, Greece Flexible Hardware Architecture for 2-D Separable Convolution-based Scaling There is not a single scaling technique that suites all kind of images (photo, CAD, Text...) the user is willing to print or display. Formally, any convolution-based scaling operation can be decomposed in three steps: an anti-aliasing filter, image reconstruction by continuous convolution and re-sampling to the final grid. Based on this, we propose a flexible hardware-friendly discrete convolution engine operating a memory that stores a programmable 2D- separable interpolation kernel. We also state a technique for optimizing the memory size given the kernel and the scale factor. Finally, we describe a novel flexible filter that overcomes aliasing artifacts regardless of image frequency content. Jordi Arnabat and Francisco Cardells Hewlett-Packard, Large-Format Technology Lab, Barcelona, Spain {jordi.arnabat, francisco.cardells}@hp.com Image Scaling . Not a unique interpolation technique to achieve good IQ for all types of images: adaptable HW is the key to survival. . Formally, scaling can be thought of as: continuous reconstruction of the discrete input and re-sampling at the output grid. . Propose a flexible hardware built from a classical convolution-based scaler, where IQ is chosen by means of a programmable kernel. Filtering Stage 0 1 2 1 2 3 4 Y1 5 6 7 8 w 2 1 w 2 2 Y2 A B 1 2 3 A B Y1 Digital interpolator low pass filter low pass filter 3 4 5 4 5 6 A B Y2 Digital interpolator low pass filter low pass filter 0 Scaler Data Flow . Downscaling implies a pre-filtering step to remove frequencies not representable in the output grid. (aliasing) (a) Moving Average (b) Frequency-Sharpened CIC (c) Multistage CIC . Propose architecture to enable (a) & (b) pre-filters. . Wide range of interpolation techniques: NN, bilinear, bicubic, gaussian, …, yours! . Complexity/latency of the hardware is determined by the interpolation function support. . Resampling by means of shift- variant FIR filter, of length = kernel support . Kernel shape can be programmed in a memory by means of a LUT, sampled at . . As a design rule, any kernel shape needs twice as many samples per interval as the maximum scale factor. . For example, a scaler performing up to 32x, using a 4 tabs support kernel, 8-b word precision requires a 2Kb LUT. The datapath for this interpolator requires 2.2 kgates. bilinear nearest neighbor 4 x Interpolation Pre-filtering Conventional Scaling uses a hardwired set of rules for upscaling and another for downscaling. Instead we build any scaling operation as a flexible prefiltering + interpolation this flexibility is required as there is not a single best scaling algorithm for all kind of images Programmable Low-pass FIR filter. Cut-off frequency given by downscale factor Programmable Continuous Convolution up-scaling down-scaling nearest neighbor bicubic Interpolation, Kernel Sampling w -1 w 0 w 1 w 2 W W 24 W 23 W 22 W 21 W 14 W 13 W 12 W 11 ... Neighbor index () Programmable Interpolation Kernel 1/(1-Z -1 ) 1/(1-Z -1 ) 1/(1-Z -1 ) R R (1-Z -1 ) 2 3 (1-Z -1 ) (1-Z -1 ) 1/(1-Z -1 ) 1/(1-Z -1 ) 1/(1-Z -1 ) R R (1-Z -1 ) 2 3 (1-Z -1 ) (1-Z -1 ) (a) (b) (a) (b) (c) Down-scaling by a factor of 1.5 after (a) moving average and (b) frequency sharpened CIC filter. Artifacts circled and images resized to aid direct comparison. (a) (b) (original) Frequency response of three different pre-filtering schemes. (a) & (b) are combined into one flexible architecture. (a) Nearest neighbor (b) Bilinear interpolation (c) B-spline order 3 (d) Keys’ bicubic a=-1/2 Interpolation by continuous convolution. Principles of operation. k*D 2 1 2 3 n 1 2 k w 1 w 2 o[ k ] Shape of the Interpolation kernel is sampled at a given frequency (). Data (weights) is stored LUT- wise in a memory. . In down-scaling the low-pass filter does not have to be applied to all the incoming pixels. . Instead only the base points for the interpolation are pre-computed to remove the aliasing frequencies. . There must be a number of equivalent serial low-pass filters equal to the kernel support.

Upload: jordi-arnabat

Post on 13-Apr-2017

61 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: cbs_sips2005

IEEE 2005 Workshop on Signal Processing Systems (SIPS'05), November 2, Athens, Greece

Flexible Hardware Architecture for 2-D Separable Convolution-based Scaling

There is not a single scaling technique that suites all kind of images (photo, CAD, Text...) the user is willing to print or display. Formally, any convolution-based scaling operation can be decomposed in three steps: an anti-aliasing filter, image reconstruction by continuous convolution and re-sampling to the final grid. Based on this, we propose a flexible hardware-friendly discrete convolution engine operating a memory that stores a programmable 2D-separable interpolation kernel. We also state a technique for optimizing the memory size given the kernel and the scale factor. Finally, we describe a novel flexible filter that overcomes aliasing artifacts regardless of image frequency content.

Jordi Arnabat and Francisco Cardells

Hewlett-Packard, Large-Format Technology Lab, Barcelona, Spain

{jordi.arnabat, francisco.cardells}@hp.com

Image Scaling

. Not a unique interpolation technique to

achieve good IQ for all types of images:

adaptable HW is the key to survival.

. Formally, scaling can be thought of as:

continuous reconstruction of the discrete

input and re-sampling at the output grid.

. Propose a flexible hardware built from

a classical convolution-based scaler,

where IQ is chosen by means of a

programmable kernel.

Filtering Stage

0 1 2

1 2 3 4Y1

5 6 7 8

w21

w22

Y2

A B

1 2 3

A

B

Y1Digital

interpolator

low pass

filter

low pass

filter

3 4 5

4 5 6

A

B

Y2Digital

interpolator

low pass

filter

low pass

filter

0

Scaler Data Flow

. Downscaling implies a pre-filtering step

to remove frequencies not

representable in the output grid.

(aliasing)

(a) Moving Average

(b) Frequency-Sharpened CIC

(c) Multistage CIC

. Propose architecture to enable (a) & (b)

pre-filters.

. Wide range of interpolation

techniques: NN, bilinear, bicubic,

gaussian, …, yours!

. Complexity/latency of the hardware

is determined by the interpolation

function support.

. Resampling by means of shift-

variant FIR filter, of length = kernel

support

. Kernel shape can be programmed in

a memory by means of a LUT,

sampled at .

. As a design rule, any kernel shape

needs twice as many samples per

interval as the maximum scale

factor.

. For example, a scaler performing up

to 32x, using a 4 tabs support

kernel, 8-b word precision requires a

2Kb LUT. The datapath for this

interpolator requires 2.2 kgates.

bilinear

nearest neighbor

4x

Interpolation

Pre-filtering

Conventional Scaling uses a hardwired set

of rules for upscaling and another for

downscaling.

Instead we build any scaling operation as a

flexible prefiltering + interpolation

this flexibility is required as there is not a

single best scaling algorithm for all kind of

imagesProgrammable Low-pass

FIR filter.

Cut-off frequency given by

downscale factor

Programmable

Continuous

Convolution

up-scaling

down-scaling

nearest neighbor bicubic

Interpolation, Kernel

Sampling

w-1w

0w

1w

2

W

W24W23W22W21

W14W13W12W11

...

Neighbor index ()

Programmable Interpolation Kernel

1/(1-Z -1 ) 1/(1-Z -1 )

1/(1-Z -1 ) R

R

(1-Z -1 ) 2

3

(1-Z -1 ) (1-Z -1 )

1/(1-Z -1 ) 1/(1-Z -1 )

1/(1-Z -1 ) R

R

(1-Z -1 ) 2

3

(1-Z -1 ) (1-Z -1 )

(a)

(b)

(a)

(b)

(c)

Down-scaling by a factor of 1.5 after (a) moving average and (b) frequency sharpened CIC

filter. Artifacts circled and images resized to aid direct comparison.

(a)

(b)

(original)

Frequency response of three different pre-filtering

schemes. (a) & (b) are combined into one flexible

architecture.

(a) Nearest neighbor (b) Bilinear interpolation

(c) B-spline order 3 (d) Keys’ bicubic a=-1/2 Interpolation by continuous convolution. Principles of operation.

k*D2

1 2 3 n

1 2 k

w1

w2

o[k]

Shape of the Interpolation kernel is sampled at a

given frequency (). Data (weights) is stored LUT-

wise in a memory.

. In down-scaling the low-pass filter does

not have to be applied to all the

incoming pixels.

. Instead only the base points for the

interpolation are pre-computed to

remove the aliasing frequencies.

. There must be a number of equivalent

serial low-pass filters equal to the

kernel support.