a power efficient architecture for 2-d discrete wavelet transform
TRANSCRIPT
![Page 1: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/1.jpg)
A POWER EFFICIENT ARCHITECTURE FOR 2-D DISCRETE WAVELET TRANSFORM
Rahul Jain, CoWare India
Preeti Ranjan Panda, IIT-Delhi
![Page 2: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/2.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
2
Agenda
� Memory Power Optimization
� Existing Z-Scan based Schemes
� Low Power Z-Scan (Proposed Architecture )
� Results
� Conclusion
![Page 3: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/3.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
3
� Importance of Optimizing Memory System Energy
� Many emerging applications like JPEG2000 are data intensive
� Memory system can contribute up to 90% energy
� Concurrently Optimizing Memory Architecture and Accesses
� Algorithm Level� Reduce memory requirement
� Improve regularity of accesses
� Build optimized memory architecture� Memory Partitioning
� Custom Circuits
Memory Power Optimization
![Page 4: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/4.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
4
Z-Scan based Schemes [Chiu-SIPS’03]
� Suspending a DWT line computation
� Store 4 intermediate values
� Z-Scan
� Column Processing starts early
� On-Chip Buffer Required = 4*MM =Image Tile ht
� Optimal Z-Scan
� EBCOT Code-Block size (CW*CH) considered
� On-Chip Buffer Required = 4*M+4*2*CW
� Usually CW=CH=64 (values used in exp.)
2* CW
2* CH
![Page 5: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/5.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
5
Low-Power Z-Scan (1)
� Generalize the Z-Scan� Compute r elements in a row� For Z Scan, r =2� For Optimal Z-Scan, r = 2*CW� On-Chip Buffer Required = 4*M+4*r
r r
2*CH
![Page 6: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/6.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
6
Low-Power Z-Scan (2)
� r will be a sub-integral multiple of 2*CW� This considers the Code Block Size
� 2 separate buffers used� Row Buffer (RB) = 4*M� Column Buffer (CB) = 4*r
� How to decide the value of r ?� Size of CB α r� RB Sleep Time α r
CB: r locations
RB in Low Power Mode
RB access
![Page 7: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/7.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
7
Memory Power Analysis (1)
� Let us assume that each element is computed in unit time (Energy and Power can be used interchangeably)
� For a memory of size 2n, Let
� Pa(2n) : memory access power
� Ps(2n) : sleep mode / data retention mode power
� Pw(2n) : wakeup power for each state transition from
sleep mode to active mode
� Let, Ps(2n) = s* Pa (2
n) and Pw (2n) = w* Pa (2n)
� s = 0.1, w = 0.33 (Assumed for Experiments)
� Buffer Accesses
� Read at Resumption
� Write at Suspension
![Page 8: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/8.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
8
Memory Power Analysis (2)
� Row Buffer Power
� 2 access per r elements
� RB in sleep mode for r-2 element computation
� Wakeup RB once per row
� Power per ‘r’ element computation:
Prow_buffer (r, M) = 2* Pa(M) + (r-2) * Ps(M) + Pw(M)
RB in Low Power Mode
Row Computation Suspends
Row Computation Resumes
Wakeup
![Page 9: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/9.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
9
Memory Power Analysis (3)
� Column Buffer Power
� 1 access per element
� Power consumption per element computation:
Pcol_buffer (r) = Pa(r)
� Power per 2-D DWT Element Computation:
Prow_buffer (r, M)/r + Pcol_buffer (r)
Col Computation Suspends
Col Computation Resumes
![Page 10: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/10.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
10
Variation of Power with r
0.00E+00
1.00E-10
2.00E-10
3.00E-10
4.00E-10
5.00E-10
6.00E-10
2 4 8 16 32 64 128
M=512
M=256
M=128
M=64
M=32
Value of r
Energy (J)
r=16
r=32
![Page 11: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/11.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
11
� Banked Buffer
� Increases the average idleness of the each buffer
� Lower Access Power
� Predictable state changes, no timing overheads
� Let there be ‘b’ RB banks and ‘c’ CB banks
� Average RB power per element:
Prow = [Power of bank in use*M/b + Sleep Power*(M-M/b)] / M
= [{Prow_buffer (r, M/b) / r} * M/b + Ps (M/b) * (M-M/b)] / M
� Each bank waked up once for M*r elements� Additional Row Buffer Wakeups per Element = b/M*r
Power Implications of Banking (1)Power Implications of Banking (1)
![Page 12: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/12.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
12
� Average column-buffer power per element:
Pcol = [{Pcol_buffer (r/c)} * r/c + Ps (r/c) * (r-r/c)] / r
� No of Column Buffer Wakeups per Element = c/r
� Additional Wakeup Power :
Pwakeups = [Pw(M/b) * b/M*r ] + [ Pw(r/c) * c/r ]
� MUX power considered
� Total Power per Element :
Prow + Pcol + Pwakeups + Pmux
Power Implications of Banking (2)Power Implications of Banking (2)
![Page 13: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/13.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
13
r vs Power (Banked Case, M=512)
Min Power with r=64, c=4, b=8
![Page 14: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/14.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
14
Energy Consumption Comparison
MZ-scan
(10-11J)
Optimal Z-scan
(10-11J)
Low-Power Z-scan
(10-11J)r c b
% imp
32 23.4 29.1 8.08 32 4 4 72.2
64 25.5 29.3 8.13 64 4 4 72.3
128 29.9 29.7 8.18 64 4 8 72.5
256 38.5 30.6 8.29 64 4 8 72.9
512 55.8 32.3 8.49 64 4 8 73.7
1024 90.3 35.8 8.89 64 4 8 75.2
Up to 90% and 75% improvement over Z-Scan and Optimal Z-Scan respectively
![Page 15: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/15.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
15
Energy Modelling
� Sequential Access Memory [Moon-CICC’02]
� Configured as a circular buffer
� Address Sequencing logic and decoders replaced with row sequencer to get low power and high speed
� Banked implementation used for big memory
� Energy Modelling [Coumeri-TVLSI’00]
� Empirical Equations for modelling energy of on-chip SRAM memory
� Model parameters are Size, Bit Width, Access Mode
� Individual equations for different memory components
� To model SAM, Row Decoder, Column Decoder, Buffers not considered
![Page 16: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/16.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
16
Conclusion
� A methodology to arrive at a Low-Power DWT architecture proposed
� Co-Optimization of Memory Architecture and Access pattern done
� Up to 90% energy saving achieved
� The derived architecture depends on the target memory technology
� Would lead to different architectures for ASIC and FPGA implementations
![Page 17: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/17.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
17
References:
� [Chiu-SIPS’03]: Mu-Yu Chiu et al (2003).Optimal data transfer and buffering schemes for JPEG2000 encode. IEEE Workshop on SIPS, Aug. 2003, pp. 177 – 182
� [Moon-CICC’02]: Joong-Seok Moon et.al (2002). Low-power sequential access memory design. Custom Integrated Circuits Conference, 2002. pp.111 – 114
� [Coumeri-TVLSI’00]: Coumeri, S.L et al (2000). Memory modelling for System Synthesis. IEEE Trans. VLSI Systems, , June 2000, pp:327 – 334
![Page 18: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/18.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
18
Thank You
Questions!
![Page 19: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/19.jpg)
Backup Slides
![Page 20: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/20.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
20
Discrete Wavelet Transform� 2D wavelet transform:
� 1st:1D wavelet transform to all rows
� 2nd:1D wavelet transform to all columns
� Each Row/Column can be computed independently
� Store 4 values at line computation suspension
Z(2i+1)
Z(2i)0 2 4 6 8
Y(2i+1)
X(i)
Y(2i)
0
0
2
2
4
4
6
6
8
8
1 3 5 7
1 3 5 7
1 3 5 7
Colored arrows show multiplication by constants a, b, c, ddefined in JPEG2000 standard
![Page 21: A Power Efficient Architecture for 2-D Discrete Wavelet Transform](https://reader033.vdocuments.us/reader033/viewer/2022052619/556aeaeed8b42a86218b4d1c/html5/thumbnails/21.jpg)
10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006
21
Buffer Structure
� The Buffers are all the time full
� They are accessed like a circular FIFO
� General Memory Row Decoder not required
� use a counter
� use a shift register loaded with a 1 initially
� Every Write Signal
� Increments the counter
� Shifts the Register
� Store all the 4 intermediate values in one Column
� No need for the Column Decoder
� This would be similar to Sequential Access Memory (SAM) [Moon-CICC’02]