novel algorithms in the memory management of multi-dimensional signal processing florin balasa...

43
Management Management of Multi-Dimensional Signal of Multi-Dimensional Signal Processing Processing Florin Balasa Florin Balasa University University of Illinois at Chicago of Illinois at Chicago

Upload: emma-davis

Post on 05-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Novel Algorithms in the Memory Management Novel Algorithms in the Memory Management

of Multi-Dimensional Signal Processingof Multi-Dimensional Signal Processing

Florin BalasaFlorin Balasa

UniversityUniversity of Illinois at Chicagoof Illinois at Chicago

Page 2: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

OutlineOutline

The importance of memory managementThe importance of memory management

in multi-dimensional signal processingin multi-dimensional signal processing A lattice-based frameworkA lattice-based framework The computation of the minimum dataThe computation of the minimum data

memory size memory size Optimization of the dynamic energy consumptionOptimization of the dynamic energy consumption

in a hierarchical memory subsystem in a hierarchical memory subsystem

Mapping multi-dimensional signalsMapping multi-dimensional signals

into hierarchical memory organizationsinto hierarchical memory organizations Future research directionsFuture research directions ConclusionsConclusions

Page 3: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Memory management Memory management for signal processing applicationsfor signal processing applications

Real-time multi-dimensional signal processing systemsReal-time multi-dimensional signal processing systems

(video and image processing, telecommunications,(video and image processing, telecommunications, audio and speech coding, medical imaging, etc.)audio and speech coding, medical imaging, etc.)

data transfer and data storagedata transfer and data storage

system performancesystem performancepower consumptionpower consumption

chip areachip area

The designer must focusThe designer must focus on the exploration of on the exploration of the memory subsystemthe memory subsystem

Page 4: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

In the early years of high-level synthesisIn the early years of high-level synthesis

memory management tasks tackled at memory management tasks tackled at scalar levelscalar level

Algebraic techniques Algebraic techniques (similar to those used in modern compilers) (similar to those used in modern compilers)

Memory management Memory management for signal processing applicationsfor signal processing applications

register-transfer level (RTL) algorithmic specificationsregister-transfer level (RTL) algorithmic specifications

More recentlyMore recently

memory management tasks at memory management tasks at non-scalar levelnon-scalar level

high-level algorithmic specificationshigh-level algorithmic specifications

Page 5: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Affine algorithmicAffine algorithmic

specificationsspecifications

T[0] = 0;T[0] = 0;for ( j=16; j<=512; j++ )  {for ( j=16; j<=512; j++ )  { S[0][j-16][0] = 0;S[0][j-16][0] = 0;   for ( k=0; k<=8; k++ )   for ( k=0; k<=8; k++ ) for (i=j-16; i<=j+16; i++ )for (i=j-16; i<=j+16; i++ )                  S[0][j-16][33*k+i-j+17] = S[0][j-16][33*k+i-j+16] + A[4][j] – A[k][i];S[0][j-16][33*k+i-j+17] = S[0][j-16][33*k+i-j+16] + A[4][j] – A[k][i]; T[j-15] = S[0][j-16][297] + T[j-16];T[j-15] = S[0][j-16][297] + T[j-16];}}out = T[497];out = T[497];

Memory management Memory management for signal processing applicationsfor signal processing applications

Loop-organized algorithmic specificationLoop-organized algorithmic specification

Main data structures: multi-dimensional arraysMain data structures: multi-dimensional arrays

Page 6: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

A Lattice-Based FrameworkA Lattice-Based Framework

… … A [2i+3j+1] [5i+j+3] [4i+6j+2]A [2i+3j+1] [5i+j+3] [4i+6j+2] … …

jj

ii

x=2i+3j+1x=2i+3j+1

y=5i+j+3y=5i+j+3

z=4i+6j+2z=4i+6j+2A[x][y][z]A[x][y][z]

Iterator spaceIterator space Index spaceIndex space

for (i=0; i<=4; i++)for (i=0; i<=4; i++)

for (j=0; j <= 2i && j <= -i+6; j++)for (j=0; j <= 2i && j <= -i+6; j++)

Page 7: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

A Lattice-Based FrameworkA Lattice-Based Framework

22 3355 1144 66

11

3322

xx

yyzz

ii

jj++==

Iterator spaceIterator spaceIndex spaceIndex spaceaffineaffine

mappingmapping

0 <= i <= 4 , 0 <= j <=2i, j <= -i+6 0 <= i <= 4 , 0 <= j <=2i, j <= -i+6

for (i=0; i<=4; i++)for (i=0; i<=4; i++)

for (j=0; j <= 2i && j <= -i+6; j++)for (j=0; j <= 2i && j <= -i+6; j++)

… … A [2i+3j+1] [5i+j+3] [4i+6j+2]A [2i+3j+1] [5i+j+3] [4i+6j+2] … …

Page 8: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

A Lattice-Based FrameworkA Lattice-Based Framework

Any array reference can be modeledAny array reference can be modeled as a linearly bounded lattice (LBL)as a linearly bounded lattice (LBL)

LBL = { LBL = { xx = = TT··ii + + uu | | AA··ii >= >= bb } }

Iterator spaceIterator space

- - scope of nested loops, andscope of nested loops, and

- iterator-dependent conditionsiterator-dependent conditions

Affine mappingAffine mapping

PolytopePolytopeLBLLBLaffineaffine

mappingmapping

Page 9: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

A Lattice-Based FrameworkA Lattice-Based Framework

for (i=0; i<=4; i++)for (i=0; i<=4; i++)

for (j=0; j <= 2i && j <= -i+6; j++)for (j=0; j <= 2i && j <= -i+6; j++)

How many memory locations are necessaryHow many memory locations are necessary to store the array reference to store the array reference A [2i+3j+1] [5i+j+3] [4i+6j+2] A [2i+3j+1] [5i+j+3] [4i+6j+2]

… … A [2i+3j+1] [5i+j+3] [4i+6j+2]A [2i+3j+1] [5i+j+3] [4i+6j+2] … …

Page 10: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

A Lattice-Based FrameworkA Lattice-Based Framework

The storage requirement of an array referenceThe storage requirement of an array reference is the size of its index space (i.e., a lattice !!)is the size of its index space (i.e., a lattice !!)

LBL = { LBL = { xx = = TT··ii + + uu | | AA··ii >= >= bb } }

f : f : ZZn n ZZmm f(f(ii) = ) = TT··ii + + uu

Is function f a one-to-one mapping ??Is function f a one-to-one mapping ??

Size(index space) = Size(iterator space)Size(index space) = Size(iterator space)

If YESIf YES

Page 11: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

A Lattice-Based FrameworkA Lattice-Based Framework

Computation of the size of an integer polytopeComputation of the size of an integer polytope

for (i=0; i<=4; i++)for (i=0; i<=4; i++)

for (j=0; j <= 2i && j <= -i+6; j++)for (j=0; j <= 2i && j <= -i+6; j++)

… … A [2i+3j+1] [5i+j+3] [4i+6j+2] A [2i+3j+1] [5i+j+3] [4i+6j+2]

11

22

11

00

Step 1Step 1

Find the vertices of the iterator spaceFind the vertices of the iterator spaceand their supporting polyhedral conesand their supporting polyhedral cones

C(VC(V11) = { r) = { r11 , r , r22 } = } =

Page 12: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

A Lattice-Based FrameworkA Lattice-Based Framework

Computation of the size of an integer polytope (cont’d)Computation of the size of an integer polytope (cont’d)

11

22

00

-1-1

Step 2Step 2

C(VC(V11) = + ) = +

Decompose the supporting cones into unimodular conesDecompose the supporting cones into unimodular cones(Barvinok’s decomposition algorithm) (Barvinok’s decomposition algorithm)

00

11

11

00++

Step 3Step 3 Find the generating function of each supporting coneFind the generating function of each supporting cone

F(VF(V11) = + ) = + (1-xy(1-xy22) (1-y) (1-y-1-1))

11

(1-y) (1-x)(1-y) (1-x)

11++

Step 4Step 4 Find the number of monomials in the generating functionFind the number of monomials in the generating functionof the whole polytope F = F(Vof the whole polytope F = F(V11) + F(V) + F(V22) + …) + …

Page 13: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Affine algorithmicAffine algorithmic

specificationsspecifications

T[0] = 0;T[0] = 0;for ( j=16; j<=512; j++ )  {for ( j=16; j<=512; j++ )  { S[0][j-16][0] = 0;S[0][j-16][0] = 0;   for ( k=0; k<=8; k++ )   for ( k=0; k<=8; k++ ) for (i=j-16; i<=j+16; i++ )for (i=j-16; i<=j+16; i++ )                  S[0][j-16][33*k+i-j+17] = S[0][j-16][33*k+i-j+16] + A[4][j] – A[k][i];S[0][j-16][33*k+i-j+17] = S[0][j-16][33*k+i-j+16] + A[4][j] – A[k][i]; T[j-15] = S[0][j-16][297] + T[j-16];T[j-15] = S[0][j-16][297] + T[j-16];}}out = T[497];out = T[497];

The Memory Size Computation ProblemThe Memory Size Computation Problem

What is the minimum data storage necessaryWhat is the minimum data storage necessary to execute an algorithm (affine specification)to execute an algorithm (affine specification)

Any scalar signal must be stored only during its lifetimeAny scalar signal must be stored only during its lifetime

Signals having disjoint lifetimes can share the same locationSignals having disjoint lifetimes can share the same location

Page 14: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Affine algorithmicAffine algorithmic

specificationsspecifications

T[0] = 0;T[0] = 0;for ( j=16; j<=512; j++ )  {for ( j=16; j<=512; j++ )  { S[0][j-16][0] = 0;S[0][j-16][0] = 0;   for ( k=0; k<=8; k++ )   for ( k=0; k<=8; k++ ) for (i=j-16; i<=j+16; i++ )for (i=j-16; i<=j+16; i++ )                  S[0][j-16][33*k+i-j+17] = S[0][j-16][33*k+i-j+16] + A[4][j] – A[k][i];S[0][j-16][33*k+i-j+17] = S[0][j-16][33*k+i-j+16] + A[4][j] – A[k][i]; T[j-15] = S[0][j-16][297] + T[j-16];T[j-15] = S[0][j-16][297] + T[j-16];}}out = T[497];out = T[497];

The Memory Size Computation ProblemThe Memory Size Computation Problem

The number of scalars (array elements): 153,366The number of scalars (array elements): 153,366The minimum data storage storage: 4,763The minimum data storage storage: 4,763

All the previous works proposed estimation techniques !All the previous works proposed estimation techniques !

Page 15: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

for ( j=0; j<n ; j++ )for ( j=0; j<n ; j++ ){ A [ j ] [ 0 ] = in0;{ A [ j ] [ 0 ] = in0; for ( i=0; i<n ; i++ )for ( i=0; i<n ; i++ ) A [ j ] [ i+1 ] = A [ j ] [ i ] + 1;A [ j ] [ i+1 ] = A [ j ] [ i ] + 1;}}

for ( i=0; i<n ; i++ )for ( i=0; i<n ; i++ ){ alpha [ i ] = A [ i ] [ n+i ] ;{ alpha [ i ] = A [ i ] [ n+i ] ; for ( j=0; j<n ; j++ )for ( j=0; j<n ; j++ ) A [ j ] [ n+i+1 ] = A [ j ] [ n+i+1 ] = j < i ? A [ j ] [ n+i ] :j < i ? A [ j ] [ n+i ] : alpha [ i ] + A [ j ] [ n+i ] ;alpha [ i ] + A [ j ] [ n+i ] ;}}for ( j=0; j<n ; j++ ) B [ j ] = A [ j ] [ 2*n ];for ( j=0; j<n ; j++ ) B [ j ] = A [ j ] [ 2*n ];

# define n 6# define n 6

The Memory Size Computation ProblemThe Memory Size Computation Problem

Page 16: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Decompose the LBL’s of the array refs.Decompose the LBL’s of the array refs.into disjoint latticesinto disjoint lattices

LBLLBL11 LBLLBL22UU

LBLLBL

LBLLBL11 = { = { xx = = TT11··ii11 + + uu11 | | AA11··ii11 >= >= bb11 } }

LBLLBL22 = { = { xx = = TT22··ii22 + + uu22 | | AA22··ii22 >= >= bb22 } }

TT11··ii11 + + uu11 = = TT22··ii22 + + uu22

Diophantine system of eqs.Diophantine system of eqs.

{ { AA11··ii11 >= >= bb11 , , AA22··ii22 >= >= bb22 } }

New polytopeNew polytope

The Memory Size Computation ProblemThe Memory Size Computation Problem

Page 17: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Decomposition of the array references of signal A Decomposition of the array references of signal A (illustrative example)(illustrative example)

The Memory Size Computation ProblemThe Memory Size Computation Problem

Page 18: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Memory Size Computation AlgorithmMemory Size Computation Algorithm

Step 1Step 1 For every indexed signal in the algorithmic specification,For every indexed signal in the algorithmic specification,decompose the array references in decompose the array references in disjoint latticesdisjoint lattices

Step 2Step 2 Based on the lattice lifetime analysis, find the memoryBased on the lattice lifetime analysis, find the memorysize at the boundaries between the blocks of codesize at the boundaries between the blocks of code

Step 3Step 3 Analyzing the amounts of signals produced and consumedAnalyzing the amounts of signals produced and consumedIn each block, prune the blocks of code where theIn each block, prune the blocks of code where themaximum storage cannot happenmaximum storage cannot happen

Step 4Step 4 For each of the remaining blocks, compute the maximumFor each of the remaining blocks, compute the maximummemory sizememory size

computing the maximum iterator vectors of the scalarscomputing the maximum iterator vectors of the scalars

exploiting the one-to-one mapping property of array referencesexploiting the one-to-one mapping property of array references

Page 19: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Memory trace for an SVD updating algorithmMemory trace for an SVD updating algorithm

Page 20: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Memory trace for a 2-D Gaussian blur filter algorithmMemory trace for a 2-D Gaussian blur filter algorithm

Page 21: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

for ( i = 0; i < 95 ; i++ )for ( i = 0; i < 95 ; i++ ) for ( j = 0; j < 32 ; j++ )for ( j = 0; j < 32 ; j++ ) { { if ( i+j > 30 && i+j < 63 )if ( i+j > 30 && i+j < 63 ) A [ i ] [ j ] = … ;A [ i ] [ j ] = … ; if ( i+j > 62 && i+j < 95 )if ( i+j > 62 && i+j < 95 ) … … = A[ i - 32 ] [ j ] ;= A[ i - 32 ] [ j ] ; }}

for ( j = 0; j < 32 ; j++ )for ( j = 0; j < 32 ; j++ ) for ( i = 0; i < 95 ; i++ )for ( i = 0; i < 95 ; i++ ) { { if ( i+j > 30 && i+j < 63 )if ( i+j > 30 && i+j < 63 ) A [ i ] [ j ] = … ;A [ i ] [ j ] = … ; if ( i+j > 62 && i+j < 95 )if ( i+j > 62 && i+j < 95 ) … … = A[ i - 32 ] [ j ] ;= A[ i - 32 ] [ j ] ; }}

784 locations784 locations

32 locations32 locations

Study the effect of loop transformations on the data memoryStudy the effect of loop transformations on the data memory

Page 22: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

For the first time, the storage requirements of applicationsFor the first time, the storage requirements of applications can be exactly computed using formal techniquescan be exactly computed using formal techniques

All the previous works are estimation techniques;All the previous works are estimation techniques; they are sometimes very inaccuratethey are sometimes very inaccurate

The previous works have constraints on the specificationsThe previous works have constraints on the specifications

This approach works for the entire class of affine specificationsThis approach works for the entire class of affine specifications

The previous works are illustrated with “simple’’ benchmarksThe previous works are illustrated with “simple’’ benchmarks(in terms of array elements, array references, lines of code) (in terms of array elements, array references, lines of code)

This approach was tested on complex benchmarksThis approach was tested on complex benchmarks e.g.: code with 113 loop nests 3-level deep,e.g.: code with 113 loop nests 3-level deep, 906 array references, over 900 lines of code, 906 array references, over 900 lines of code, 4 million scalar signals4 million scalar signals

The Memory Size Computation ProblemThe Memory Size Computation Problem

Page 23: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Multi-dimensional arrays stored off-chipMulti-dimensional arrays stored off-chip

Copies of the frequently accessed array parts Copies of the frequently accessed array parts

should be stored on-chip should be stored on-chip

Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem

SPMSPMOff-chipOff-chipmemorymemory

Copy candidateCopy candidate

Two layer modelTwo layer model

On-chip scratch-padOn-chip scratch-padmemorymemory

Page 24: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

The need of an array partitioning based The need of an array partitioning based on the intensity of memory accesses on the intensity of memory accesses

Data reuse model based on latticesData reuse model based on lattices

How to select the copy candidates?

Rows/columns – somewhat better …

Entire arrays – unlikely …

How to find array parts heavily accessed?

Page 25: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

A3A16

A2A17

A1A15

Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem

Page 26: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

#accesses A#accesses A11: 13,569: 13,569

#accesses A#accesses A22: 13,569: 13,569

#accesses A#accesses A33: 13,569: 13,569

#accesses A#accesses A1717: 131,625: 131,625

Total #accesses ATotal #accesses A1717: 172,332: 172,332

Decomposition in disjoint latticesDecomposition in disjoint lattices

Computation of exact number Computation of exact number of memory accesses per latticeof memory accesses per lattice

Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem

Lattice ALattice A1717

Page 27: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Map of the array space Map of the array space based on based on

average memory accessesaverage memory accesses

Array space of signal AArray space of signal A

Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem

Page 28: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Array space of signal AArray space of signal A

3-D map of the array space 3-D map of the array space based on the exact number based on the exact number

of memory accessesof memory accesses

Page 29: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Map of the array space Map of the array space based on based on

average memory accessesaverage memory accesses

Array space of signal AArray space of signal A

Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem

Page 30: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Map of the array space Map of the array space based on based on

average memory accessesaverage memory accesses

Array space of signal AArray space of signal A

Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem

Page 31: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Dynamic energyDynamic energy – computed based on number – computed based on number

of accesses to each memory layerof accesses to each memory layer

[ Reinman 99 ]

Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem

CACTI power modelCACTI power model One or two orders of magnitude between an One or two orders of magnitude between an

SPM access and an off-chip accessSPM access and an off-chip access

Energy per access is SPM size-dependent – Energy per access is SPM size-dependent –

constant for small SPM sizes (< a few Kbytes) constant for small SPM sizes (< a few Kbytes)

Page 32: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem

Page 33: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Signal-to-Memory MappingSignal-to-Memory Mapping

A [ indexA [ index11 ] [ index ] [ index22 ] ]

Physical MemoryPhysical Memory

Base address Base address of signal Aof signal A

00

11

22

Window size Window size of signal Aof signal A

Page 34: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Signal-to-Memory MappingSignal-to-Memory Mapping

Mapping model Mapping model (can be used in hierarchical memory organizations)(can be used in hierarchical memory organizations)

m-dim. arraym-dim. array

w w ii = = Max { dist. alive elements having same index i } + 1Max { dist. alive elements having same index i } + 1

mapped tomapped to

m-dim. window ( wm-dim. window ( w11 , … , w , … , wmm ) )

A [ indexA [ index11 ] … [ index ] … [ indexmm ] ]

A [ indexA [ index11 modmod w w11] … [ index] … [ indexmm modmod w wmm]]

Page 35: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Bounding window:Bounding window:

(w(w11,w,w22) = (4,6)) = (4,6)

Storage requirements:Storage requirements:

4 x 6 = 244 x 6 = 24

[ ][ ]A i j

[ mod 4][ mod 6]A i j

Iteration (i=7 , j=9)Iteration (i=7 , j=9)

( 2; 16; )

( 2 18; 2 3; ) {

( 3 21 && 3 34) [ ][ ] ...

( 3 42 && 3 55) ... [ 3][ 6]

}

for i i i

for j i j i j

if i j i j A i j

if i j i j A i j

Page 36: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Bounding window:Bounding window:

(w(w11,w,w22) = (4,6)) = (4,6)

Storage requirements:Storage requirements:

4 x 6 = 244 x 6 = 24

[ ][ ]A i j

[ mod 4][ mod 6]A i j

Iteration (i=7 , j=9)Iteration (i=7 , j=9)

( 2; 16; )

( 2 18; 2 3; ) {

( 3 21 && 3 34) [ ][ ] ...

( 3 42 && 3 55) ... [ 3][ 6]

}

for i i i

for j i j i j

if i j i j A i j

if i j i j A i j

Page 37: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Computation of the WindowComputation of the Windowof a Lattice of Live Signalsof a Lattice of Live Signals

ii

jj

Iterator spaceIterator space

00 11 22 33

11

22

Index spaceIndex space

xx = = TT··ii + + uu

indexindex11

indexindex22

A [2i+3j] [5i+j]A [2i+3j] [5i+j]

for ( i=0; i<=3; i++ )for ( i=0; i<=3; i++ )

for ( j=0; j<= 2; j++ )for ( j=0; j<= 2; j++ )

if ( 3i >= 2j ) if ( 3i >= 2j )

… … A [2i+3j] [5i+j] A [2i+3j] [5i+j] … …

Page 38: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Computation of the WindowComputation of the Windowof a Lattice of Live Signalsof a Lattice of Live Signals

indexindex11

indexindex22

A [2i+3j] [5i+j]A [2i+3j] [5i+j]

00 1212

1717

( w( w11 = 13 , w = 13 , w22 = 18 ) = 18 )

ww11 = 13 = 13

ww22

= 1

8 =

18

2-D window2-D window

by integer projectionby integer projectionof the lattice on the axesof the lattice on the axes

Page 39: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Future workFuture work

Computation of storage requirements for high-throughputComputation of storage requirements for high-throughput

applications, where the code contains explicit parallelismapplications, where the code contains explicit parallelism

Improve the algorithm that aims to optimize the dynamicImprove the algorithm that aims to optimize the dynamic

energy consumption, extending it to an arbitrary numberenergy consumption, extending it to an arbitrary number

of memory layersof memory layers

Extend the hierarchical memory allocation model to saveExtend the hierarchical memory allocation model to save

leakage energyleakage energy

Use area models for memories in order to trade-off Use area models for memories in order to trade-off

decrease of energy consumption and the increase of areadecrease of energy consumption and the increase of area

implied by the memory fragmentationimplied by the memory fragmentation

Page 40: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

Future workFuture work

Memory management for configurable architecturesMemory management for configurable architectures

Several FPGA contain distributed RAM modulesSeveral FPGA contain distributed RAM modules

Homogeneous architectures Homogeneous architectures RAMs of same capacityRAMs of same capacityevenly distributedevenly distributed

(Xilinx Virtex II Pro)(Xilinx Virtex II Pro)

Heterogeneous architecturesHeterogeneous architectures A variety of RAMsA variety of RAMs

(Altera Stratix II)(Altera Stratix II)

Memory management for dynamically reconfigurable systemsMemory management for dynamically reconfigurable systems

Page 41: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

ConclusionsConclusions

The The exactexact computation of the data storage requirement computation of the data storage requirement

of an application [ IEEE TVLSI 2007 ]of an application [ IEEE TVLSI 2007 ]

A data reuse formal model based on partitioning A data reuse formal model based on partitioning

the arrays according to the intensity of memory accessesthe arrays according to the intensity of memory accesses

[ ICCAD 2006 ] [ ICCAD 2006 ]

A general framework based on lattices for addressingA general framework based on lattices for addressing

several memory management problemsseveral memory management problems

Unique features of this researchUnique features of this research

A signal-to-memory mapping model which works A signal-to-memory mapping model which works

for hierarchical memory organizations [ DATE 2007 ] for hierarchical memory organizations [ DATE 2007 ]

Page 42: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

ConclusionsConclusions

This topic is considered by the Semiconductor ResearchThis topic is considered by the Semiconductor Research

Corporation (SRC) Corporation (SRC) one of the top synthesis problems one of the top synthesis problems

still unsolvedstill unsolved

This research is interdisciplinary: EE+CS+MathThis research is interdisciplinary: EE+CS+Math

General goalGeneral goal design of a (hierarchical) memory design of a (hierarchical) memory

subsystem optimized for power consumption and chip area,subsystem optimized for power consumption and chip area,

s.t. performance constraints, starting from the specification s.t. performance constraints, starting from the specification

of a (multi-dimensional) signal processing applicationof a (multi-dimensional) signal processing application

There is interest for international co-operationThere is interest for international co-operation (potential funding: the NSF-PIRE program)(potential funding: the NSF-PIRE program)

Page 43: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago

ConclusionsConclusions

Graduate studentsGraduate students

Hongwei Zhu (Ph.D. defense: Spring 2007)Hongwei Zhu (Ph.D. defense: Spring 2007)

Ilie I. Luican (Ph.D. defense: Spring 2009)Ilie I. Luican (Ph.D. defense: Spring 2009)

Karthik Chandramouli (M.S. completed)Karthik Chandramouli (M.S. completed)