copyright agrawal & srivaths, 2007 low-power design and test, lecture 6 1 low-power design and...
Post on 21-Dec-2015
216 views
TRANSCRIPT
Copyright Agrawal & Srivaths, 2Copyright Agrawal & Srivaths, 2007007
Low-Power Design and Test, Lecture 6Low-Power Design and Test, Lecture 6 11
Low-Power Design and TestLow-Power Design and Test
Memory and Multicore Memory and Multicore DesignDesign
Vishwani D. AgrawalVishwani D. AgrawalAuburn University, USAAuburn University, [email protected]@eng.auburn.edu
Srivaths RaviSrivaths RaviTexas Instruments IndiaTexas Instruments India
[email protected]@ti.com
Hyderabad, July 30-31, 2007http://www.eng.auburn.edu/~vagrawal/hyd.html
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 2
Memory ArchitectureMemory Architecture
Word 0Word 1Word 2
M bits
Storage cell
Word N-2Word N-1
Input-Output (M bits)
N w
ord
s
S0
SN-1
Word 0Word 1Word 2
M bits
Storage cell
Word N-2Word N-1
Input-Output (M bits)
N w
ord
s
S0
SN-1
A0
A1
.Ak-1
Dec
oder
k a
ddre
ss li
nes
k = log2N
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 3
Memory OrganizationMemory Organization
Sense amplifiers/drivers
Column decoder
AK
AK-1
AL-1
Storage cell
Word line
Bit line
Input-Output (M bits)
A0
AK-1
2L-K
M.2K
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 4
An SRAM CellAn SRAM Cell
bit bit
VDD
WL
BL BL
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 5
Read OperationRead Operation
bit bit
VDD
WL
BL BL
1. Precharge to VDD
2. WL = Logic 1
3. Sense amplifier converts BL swing to logic level
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 6
Precharge CircuitPrecharge Circuit
bit bit
VDDWL
BL BLDiff. sense ampl.
VDDVDD PC
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 7
Reading 1 from CellReading 1 from Cell
Pre
char
ge
time
WL
BL
BL
Sense ampl. output
Pulsed to save bit line charge
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 8
Write Operation, bit = 1Write Operation, bit = 1→ 0→ 0
bit bit
VDD
WL
BL BL
011. Set BL = 0, BL = 1
2. WL = 1
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 9
Cell Array Power Cell Array Power ManagementManagement
Smaller transistorsSmaller transistors Low supply voltageLow supply voltage Lower voltage swing (0.1V – 0.3V for Lower voltage swing (0.1V – 0.3V for
SRAM)SRAM) Sense amplifier restores the full voltage Sense amplifier restores the full voltage
swing for outside use.swing for outside use.
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 10
Sense AmplifierSense Amplifier
bit bit
SESense ampl. enable:Low when bit lines are precharged and equalized
VDD
Full voltage swing output
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 11
Block-Oriented ArchitectureBlock-Oriented Architecture
A single cell array may contain 64 A single cell array may contain 64 Kbits to 256 Kbits.Kbits to 256 Kbits.
Larger arrays become slow and Larger arrays become slow and consume more power.consume more power.
Larger memories are block oriented.Larger memories are block oriented.
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 12
Hierarchical OrganizationHierarchical Organization
Global data bus
Global amplifier/driver
I/O
Block 0 Block 1 Block P-1
Controlcircuitry
Block selector
Row addr.
Column addr.
Block addr.
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 13
Power SavingPower Saving Block-oriented memoryBlock-oriented memory
Lengths of local word and bit lines are Lengths of local word and bit lines are kept small.kept small.
Block address is used to activate the Block address is used to activate the addressed block.addressed block.
Unaddressed blocks are put in power-Unaddressed blocks are put in power-saving mode:saving mode: sense amplifier and row/column decoders are sense amplifier and row/column decoders are
disabled.disabled. Power is maintained for data retention in cells.Power is maintained for data retention in cells.
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 14
Static PowerStatic Power
0.0 0.6 1.2 1.8Supply voltage
1.3μ
1.1μ
900n
700n
500n
300n
100n
0.13μ CMOS
0.18μ CMOS
8-kbit SRAM
7x
incr
eas
e
Lea
kag
e c
urr
ent
(A
mp
ere
s)
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 15
Adding Resistance in Leakage Adding Resistance in Leakage PathPath
SRAM cell array
SRAM cell array
SRAM cell array
GND
VDD
sleep
sleep
Low-threshold transistor
VSS.int
VDD.int
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 16
Lowering Supply VoltageLowering Supply Voltage
SRAM cell array
SRAM cell array
SRAM cell array
GND
VDD
sleep
VDDL= 100mV for 0.13μ CMOS
Sleep = 1, data retention mode
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 17
Parallelization of MemoriesParallelization of Memories
instr. A instr. C instr. E
.
.
.
f/2
Mem 1
instr. B instr. D instr. F
.
.
.
f/2
Mem 2
MUXf/2 0 1
Power = C’ f/2 VDD2
C. Piguet, “Circuit and Logic Level Design,” pp. 124-125 inW. Nebel and J. Mermet (Eds.), Low Power Design in DeepSubmocron Electronics, Springer, 1997.
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 18
ReferencesReferences
K. Itoh, K. Itoh, VLSI Memory Chip DesignVLSI Memory Chip Design, , Springer-Verlag, 2001.Springer-Verlag, 2001.
J. M. Rabaey, A. Chandrakasan and B. J. M. Rabaey, A. Chandrakasan and B. Nikolić, Nikolić, Digital Integrated CircuitsDigital Integrated Circuits, , Upper Saddle River, New Jersey: Upper Saddle River, New Jersey: Pearson Education, Inc., 2003.Pearson Education, Inc., 2003.
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 19
Low-Power Datapath Low-Power Datapath ArchitectureArchitecture
Lower supply voltageLower supply voltage This slows down circuit speedThis slows down circuit speed Use parallel computing to gain the speed backUse parallel computing to gain the speed back
Works well when threshold voltage is also Works well when threshold voltage is also lowered.lowered.
About 60% reduction in power obtainable.About 60% reduction in power obtainable. Reference: A. P. Chandrakasan and R. W. Reference: A. P. Chandrakasan and R. W.
Brodersen, Brodersen, Low Power Digital CMOS Low Power Digital CMOS DesignDesign, Boston: Kluwer Academic , Boston: Kluwer Academic Publishers (Now Springer), 1995.Publishers (Now Springer), 1995.
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 20
A Reference DatapathA Reference Datapath
Combinationallogic
OutputInputR
eg
iste
r
Re
gis
ter
CK
Supply voltage = Vref
Total capacitance switched per cycle = Cref
Clock frequency = fPower consumption: Pref = CrefVref
2f
Cref
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 21
A Parallel ArchitectureA Parallel Architecture
Comb.Logic
Copy 1
Comb.Logic
Copy 2
Comb.Logic
Copy N
Re
gis
ter
Re
gis
ter
Re
gis
ter
Re
gis
ter
N to
1 m
ulti
ple
xer
MultiphaseClock gen. and mux
control
InputOutput
CK
f
f/N
f/N
f/N
Each copy processes every Nth input, operates at reduced voltage
Supply voltage:VN ≤ V1 = Vref
N = Deg. of parallelism
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 22
Level Converter: L to HLevel Converter: L to H
Vin_L
Vout_H
VDDH
VDDL
Transistors with thicker oxide and longer channels
N. H. E. Weste and D. Harris, CMOS VLSI Design, ThirdEdition, Section 12.4.3, Addison-Wesley, 2005.
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 23
Level Converter: H to LLevel Converter: H to L
Vin_H Vout_L
VDDLTransistors with thicker oxide and longer channels
N. H. E. Weste and D. Harris, CMOS VLSI Design, ThirdEdition, Section 12.4.3, Addison-Wesley, 2005.
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 24
Control Signals, N = 4Control Signals, N = 4
CK
Phase 1
Phase 2
Phase 3
Phase 4
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 25
PowerPowerPN = Pproc + Poverhead
Pproc = N(Cinreg+ Ccomb)VN2f/N + CoutregVN
2f
= (Cinreg+ Ccomb+Coutreg)VN2f
= CrefVN2f
Poverhead = CoverheadVN2f ≈ δCref(N – 1)VN
2f
PN = [1 + δ(N – 1)]CrefVN2f
PN VN2
── = [1 + δ(N – 1)] ───P1 Vref
2
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 26
Voltage vs. SpeedVoltage vs. Speed CLVref CLVref
Delay of a gate, T ≈ ──── = ────────── I k(W/L)(Vref – Vt)2
where I is saturation currentk is a technology parameterW/L is width to length ratio of transistorVt is threshold voltage
Supply voltage
No
rma
lize
d g
ate
de
lay,
T
4.0
3.0
2.0
1.0
0.0 Vt Vref =5VV2=2.9V
N=1
N=2
V3
N=31.2μ CMOS Voltage reduction
slows down as we get closer to Vt
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 27
Increasing MultiprocessingIncreasing Multiprocessing
PN/P1
1 2 3 4 5 6 7 8 9 10 11 12
1.0
0.8
0.6
0.4
0.2
0.0
Vt=0V (extreme case)
Vt=0.4V
Vt=0.8V
N
1.2μ CMOS, Vref = 5V
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 28
Extreme Cases: VExtreme Cases: Vtt = 0 = 0Delay, T α 1/ Vref
For N processing elements, delay = NT → VN = Vref/N
PN 1── = [1+ δ (N – 1)] ── → 1/NP1 N2
For negligible overhead, δ→0
PN 1── ≈ ──P1 N2
For Vt > 0, power reduction is less and there will be an optimum value of N.
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 29
Example: Multiplier CoreExample: Multiplier Core
Specification:Specification: 200MHz Clock200MHz Clock 15W dissipation @ 5V15W dissipation @ 5V Low voltage operation, VLow voltage operation, VDDDD ≥ 1.5 volts ≥ 1.5 volts
(V(VDDDD – 0.5) – 0.5)22
Relative clock rate = Relative clock rate = ────────────── 20.2520.25
Problem:Problem: Integrate multiplier core on a SOCIntegrate multiplier core on a SOC Power budget for multiplier ~ 5WPower budget for multiplier ~ 5W
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 30
A Multicore DesignA Multicore Design
MultiplierCore 1
MultiplierCore 5
Reg
RegR
egR
eg
5 to
1 m
ux
MultiphaseClock gen.
and muxcontrol
Input
Output
200MHzCK
200MHz
40MHz
40MHz
40MHz
MultiplierCore 2
Core clock frequency = 200/N, N should divide 200.
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 31
How Many Cores?How Many Cores?
For N cores:For N cores: clock frequency = 200/N MHzclock frequency = 200/N MHz Supply voltage, VSupply voltage, VDDNDDN= 0.5 + (20.25/N)= 0.5 + (20.25/N)1/21/2 Volts Volts Assuming 10% overhead per core,Assuming 10% overhead per core,
VVDDNDDNPower dissipation =15 [1 + 0.1(N – 1)] Power dissipation =15 [1 + 0.1(N – 1)] ((──────))
2 2
wattswatts 55
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 32
Design TradeoffsDesign TradeoffsNumber of Number of cores, Ncores, N Clock (MHz)Clock (MHz) Core supply Core supply
VDDN (Volts)VDDN (Volts)Total PowerTotal Power
(Watts)(Watts)
11 200200 5.005.00 15.015.0
22 100100 3.683.68 8.948.94
44 5050 2.752.75 5.905.90
55 4040 2.512.51 5.295.29
88 2525 2.102.10 4.504.50
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 33
Power Reduction in Power Reduction in ProcessorsProcessors
Just about everything is used.Just about everything is used. Hardware methods:Hardware methods:
Voltage reduction for dynamic powerVoltage reduction for dynamic power Dual-threshold devices for leakage reductionDual-threshold devices for leakage reduction Clock gating, frequency reductionClock gating, frequency reduction Sleep modeSleep mode
Architecture:Architecture: Instruction setInstruction set hardware organizationhardware organization
Software methodsSoftware methods
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 34
Parallel ArchitectureParallel Architecture
Processor
f
Processor
f/2
Processor
f/2
f
Input Output
Input
Output
Capacitance = CVoltage = VFrequency = fPower = CV2f
Capacitance = 2.2CVoltage = 0.6VFrequency = 0.5fPower = 0.396CV2f
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 35
Pipeline ArchitecturePipeline Architecture
Processor
f
Input Output
Re
gis
ter
½Proc.
f
Input Output
Re
gis
ter
½Proc.
Re
gis
ter
Capacitance = CVoltage = VFrequency = fPower = CV2f
Capacitance = 1.2CVoltage = 0.6VFrequency = fPower = 0.432CV2f
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 36
Approximate TrendApproximate Trend n-parallel proc.n-parallel proc. n-stage pipeline n-stage pipeline
proc.proc.
CapacitanceCapacitance nCnC CC
VoltageVoltage V/nV/n V/nV/n
FrequencyFrequency f/nf/n ff
PowerPower CVCV22f/nf/n22 CVCV22f/nf/n22
Chip areaChip area n timesn times 10-20% increase10-20% increase
G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: KluwerAcademic Publishers, 1998.
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 37
Multicore ProcessorsMulticore Processors
2000 2004 2008
Per
form
ance
bas
ed o
nS
PE
Cin
t200
0 an
d S
PE
Cfp
2000
ben
chm
arks
Multicore
Single core
Computer, May 2005, p. 12
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 38
Multicore ProcessorsMulticore Processors
D. Geer, “Chip Makers Turn to Multicore D. Geer, “Chip Makers Turn to Multicore Processors,” Processors,” ComputerComputer, vol. 38, no. 5, pp. , vol. 38, no. 5, pp. 11-13, May 2005.11-13, May 2005.
A. Jerraya, H. Tenhunen and W. Wolf, A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” “Multiprocessor Systems-on-Chips,” ComputerComputer, vol. 5, no. 7, pp. 36-40, July , vol. 5, no. 7, pp. 36-40, July 2005; 2005; this special issue contains three this special issue contains three more articles on multicore processorsmore articles on multicore processors..
S. K. Moore, “Winner Multimedia Monster – S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Cell’s Nine Processors Make It a Supercomputer on a Chip,” Supercomputer on a Chip,” IEEE SpectrumIEEE Spectrum, , vol. 43. no. 1, pp. 20-23, January 2006. vol. 43. no. 1, pp. 20-23, January 2006.
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 39
Cell - Cell Broadband Engine Cell - Cell Broadband Engine ArchitectureArchitecture
L to RAtsushi Kameyama, ToshibaJames Kahle, IBMMasakazu Suzoki, Sony
© I
EE
E S
pe
ctru
m,
Jan
ua
ry 2
00
6
Nine-processor chip:192 Gflops
Copyright Agrawal & Srivaths, 2007 Low-Power Design and Test, Lecture 6 40
Cell’s Nine-Processor ChipCell’s Nine-Processor Chip
© IEEE Spectrum, January 2006 Eight IdenticalProcessors f = 5.6GHz (max)44.8 Gflops