1 low power system on chip design. 2 system level power optimization algorithm selection / algorithm...

1

Low Power System on Chip

Design

http://www.skku.ac.kr/

2

System Level Power Optimization

• Algorithm selection / algorithm transformation

• Identification of hot spots• Low Power data encoding• Quality of Service vs. Power• Low Power Memory mapping• Resource Sharing / Allocation


3

Levels for Low Power Design

Level ofAbstraction Expected SavingAlgorithm

ArchitectureLogic Level

Layout LevelDevice Level

10 - 100 times10 - 90%20 - 40%10 - 30%10 - 30%

SystemAlgorithm

ArchitectureCircuit/Logic

Technology

Hardware-software partitioning, Complexity, Concurrency, Locality,

Parallelism, Pipelining, Signal correlationsSizing, Logic Style, Logic Design

Threshold Reduction, Scaling, Advanced packaging

Regularity, Data representation Instruction set selection,

Data rep. SOI

Power down


4

High Performance System 구현을 위한 제반 요소

High Performance System

Reduced SwingLogic

Low Voltage

Low VT

AdvancedTechnology

High Speed

Deep SubmicronTechnology

ChannelEngineering

High Density

Low Power perGate

Low Capacitance


5

System Level Power Optimization

• Algorithm selection / algorithm transformation

• Identification of hot spots• Low Power data encoding• Quality of Service vs. Power• Low Power Memory mapping• Resource Sharing / Allocation


6

전력 소모에 대한 고찰• Digital 회로에서 전력 소모의 구성 성분Power f C V I V Q f V

Switching Activity f Frequency CSupply Voltage I Leakage Current

Q Short Circuit Charge

DD leak DD short circuit DD

leak

short circuit

2

: : :: :

:

CapacitanceVDD


7

Vdd, power, and current trend

Year

Volta

ge

Pow

er p

er c

hip

[W]

VDD

cur

rent

[A]

0 0

200 500

Current

Power

Voltage2.5

2.0

1.5

1.0

0.5

0.01998 2002 2006 2010 2014

International Technology Roadmap for Semiconductors 1998 update


8

Three Factors affecting Energy– Reducing waste by Hardware Simplification: redundant h/

w extraction, Locality of reference,Demand-driven / Data-driven computation,Application-specific processing,Preservation of data correlations, Distributed processing

– All in one Approach(SOC): I/O pin and buffer reduction– Voltage Reducible Hardwares

– 2-D pipelining (systolic arrays)– SIMD:Parallel Processing:useful for data w/ parallel st

ructure– VLIW: Approach- flexible


9

전력 소모를 줄일 수 있는 설계 방법• 공급 전압을 조절하는 방법

– IC 내에서 high speed 가 필요한 곳에만 높은 전압을 사용한다 .– 사용하지 않는 block 에 대해서는 sleep mode 로 전력 소모를 줄인다 .

• 동작 주파수를 낮추는 방법– Parallel processing 으로 같은 throughput 을 얻으면서 동작 주파수는 낮춘다 . 이로 인한 면적의 증가는 필연적이다 .– 큰 clock buffer 의 사용을 피한다 .– Phase Locked Loop (PLL) 을 사용하여 필요한 곳에만 주파수를 높여 사용한다 .


10

전력 소모를 줄일 수 있는 설계 방법• Parasitic capacitance 를 줄이는 방법

– Critical node 에 짧은 배선을 사용한다 .– 3 배 이상의 fan-out 을 피한다 .– 낮은 전압 사용시 배선의 폭을 줄인다 .– 가능한 한 작은 크기의 transistor 를 사용한다 .

• Switching Activity 를 줄이는 방법– Bit 수를 감소시킨다 .– Dynamic 회로보다는 static 회로를 사용한다 .– 전체 transistor 수를 줄인다 .– 가장 active 한 node 는 internal node 로 결정한다 .


11

전력 소모를 줄일 수 있는 설계 방법• Switching Activity 를 줄이는 방법

– 각 node 에서 주파수와 capacitance 의 곱의 합이 최소가 되도록 logic 을 설계한다 . 즉 , switching activity 가 통계적으로 최소가 되도록 한다 .

– Logic tree 를 결정할 때 , 입력 신호의 activity 가 높을수록 VDD 또는 ground 에서 멀리 위치시킨다 .

– Activity 가 큰 cell 은 dynamic 으로 , activity 가 작은 cell 은 static으로 설계한다 .– Data 가 변하지 않는 flip-flop 의 clock 을 off 시킨다 .– 항상 사용하지 않는 cell 의 clock 을 disable 시킬 수 있도록 한다 .

f Ci ii

n

1

min , f mean switching frequency of node i

C capacitance of node i

i

i


12

Web browsing is slow with 802.11 PSMSon! Haven’t I told you to turn on power-

saving mode. Batteries don’t grow on trees you know!But dad! Performance

SUCKS when I turn on power-saving

mode!So what! When I was your age, I walked 2

miles through the snow to fetch my Web

pages!• Users complain about performance degradation


13

IBM’s PowerPC Lower Power Architecture

• Optimum Supply Voltage through Hardware Parallel, Pipelining ,Parallel instruction execution– 603e executes five instruction in parallel (IU, FPU, BPU, LSU, SRU) – FPU is pipelined so a multiply-add instruction can be issued every clock

cycle – Low power 3.3-volt design

• Use small complex instruction with smaller instruction length – IBM’s PowerPC 603e is RISC

• Superscalar: CPI < 1– 603e issues as many as three instructions per cycle

• Low Power Management– 603e provides four software controllable power-saving modes.

• Copper Processor with SOI• IBM’s Blue Logic ASIC :New design reduces of power by a factor of

10 times


14

Power-Down Techniques

◆ Lowering the voltage along with the clock actually alters the energy-per-operation of the microprocessor, reducing the energy required to perform a fixed amount of work


15

Voltage vs Delay

•Use Variable Voltage Scaling or Scheduling for Real-time Processing •Use architecture optimization to compensate for slower operation, e.g., Parallel Processing and Pipelining for concurrent increasing and critical path reducing.


16

Why Copper Processor?• Motivation: Aluminum resists the flow

of electricity as wires are made thinner and narrower.

• Performance: 40% speed-up • Cost: 30% less expensive• Power: Less power from batteries• Chip Size: 60% smaller than Aluminum

chip


17

Silicon-on-Insulator• How Does SOI Reduce Capacitance ?

Eliminated junction capacitance by using SOI (similar to glass) is placed between the impuritis and the silicon substrate high performance, low power, low soft error


18

Clock Network Power Managements

• 50% of the total power• FIR (massively pipelined circuit): video processing: edge detection voice-processing (data transmission like xDSL) Telephony: 50% (70%/30%) idle, 동시에

이야기하지 않음 .with every clock cycle, data are loaded into the

working register banks, even if there are no data changes.


19

Partitioning• Performance Requirements

– 몇몇의 Function 들은 Hardware 로의 구현이 더 용이– 반복적으로 사용되는 Block– Parallel 하게 구성되어 있는 Block

• Modifiability– Software 로 구성된 Block 은 변형이 용이

• Implementation Cost– Hardware 로 구성된 Block 은 공유해서 사용이 가능

• Scheduling– 각각 HW 와 SW 로 분리된 Block 들을 정해진 constraints 들에 맞출 수 있도록

scheduling– SW Operation 은 순차적으로 scheduling 되어야 한다– Data 와 Control 의 의존성만 없다면 SW 와 HW 는 Concurrent 하게

scheduling


20

Low power partitioning approach

• Different HW resources are invoked according to the instruction executed at a specific point in time

• During the execution of the add op., ALU and register are used, but Multiplier is in idle state.

• Non-active resources will still consume energy since the according circuit continue to switch

• Calculate wasting energy• Adding application specific core and partial running Whenever one core performing, all the other cores are

shut down


21

Design Flow

- Max 94% energy saving and in most case even reduced execution time- 16k sell overhead

Application

DevideAppliction incluster

List scheduleComputeutilization

rate(ASIC)

Select cluster

Computeutilization

rate(uP)

-Core EnergyEstimation

HW Synthesis

SEvaluate


22

H/W and S/W 통합 저전력 설계 최적화

H/W 합성 및 에너지 예측

HW SW 통합

S/W 코아 에너지 예측

SW 에너지 효율 계산

시스템 수준 에너지 예측

클러스터 스케쥴링

클러스터 선택

HW 에너지 효율 계산

클러스터 링

알고리즘 선택S/WS/WH/WH/W

- Max 94% energy saving and in most case even reduced execution time- 16k sell overhead


23

PN-CodeGeneration

SynchronousAccumulator

(SW)

SynchronousAccumulator1

(HW)

Cost(Speed,Area,Power)

EnergyEstimate

(SW)


(HW)

Comparator(SW)

AsynchronousAccumulator

(SW)Comparator

(SW)

EnergyEstimate

(HW)

Comparatorwith

precomputation(HW)


(HW)

Comparatorwith

precomputation(HW)

GOAL!

PN-CodeGeneration

SynchronousAccumulator

(SW)


(HW)

Cost(Speed,Area,Power)

EnergyEstimate

(SW)


(HW)

Comparator(SW)


(SW)Comparator

(SW)

EnergyEstimate

(HW)

Comparatorwith

precomputation(HW)


(HW)

Comparatorwith

precomputation(HW)

GOAL!

IS-95 CDMA Searcher H/W and S/W 통합 설계

황인기 , 성균관대


24

Low Power DSP• DO-LOOP Dominant

VSELP Vocoder : 83.4 %2D 8x8 DCT : 98.3 %LPC computation : 98.0 %

DO-LOOP Power Minimization ==> DSP Power Minimization

VSELP : Vector Sum Excited Linear PredictionLPC : Linear Prediction Coding


25

Loop unrolling• The technique of loop unrolling replicates the body of a

loop some number of times (unrolling factor u) and then iterates by step u instead of step 1. This transformation reduces the loop overhead, increases the instruction parallelism and improves register, data cache or TLB locality.

Loop overhead is cut in half because two iterations are performed in each iteration. If array elements are assigned to registers, register locality is improved because A(i) and A(i +1) are used twice in the loop body. Instruction parallelism is increased because the second assignment can be performed while the results of the first are being stored and the loop variables are being updated.

for i to NA i A i A i A i

= - ( ) = ( ) + ( - ) ( + )

2 11 1

for i to NA i A i A i A iA i A i A i A i

= - 2 step 2 ( ) = ( ) + ( - ) ( + ) ( ) = ( ) + ( ) ( + )

21 1

1 1 2


26

Loop Unrolling (IIR filter example)

loop unrolling : localize the data to reduce the activity of the inputs of the functional units or two output samples are computed in parallel based on two input samples.

Neither the capacitance switched nor the voltage is altered. However, loop unrolling enables several other transformations (distributivity, constant propagation, and pipelining). After distributivity and constant propagation,

The transformation yields critical path of 3, thus voltage can be dropped.

)( 211

211

nnnnnn

nnn

YAXAXYAXYYAXY

22

1

211

nnnn

nnn

YAYAXY

YAXY


27

Loop Unrolling for Low Power


28



29



30

Designing a Parallel FIR To obtain a parallel processing structure, the

SISO(single-input single-output) system must be converted into a MIMO(multiple-input multiple-output) system. y(3k) = ax(3k)+bx(3k-1)+cx(3k-2) y(3k+1) = ax(3k+1)+bx(3k)+cx(3k-1) y(3k+2) = ax(3k+2)+bx(3k+1)+cx(3k)

Parallel Processing systems are also referred to as block processing systems.


31

Parallel Processing (2)

Parallel processing architecture for a 3-tap FIR filter (with block size 3)


32

Parallel Processing (3)<Combined fine-grain pipelining and parallel processing for 3-tap FIR filter>


33

Motion Estimation


34

Block Matching Algorithm


35

Configurable H/W Paradigms


36

Why Hardware for Motion Estimation?

• Most Computationally demanding part of Video Encoding

• Example: CCIR 601 format• 720 by 576 pixel• 16 by 16 macro block (n = 16)• 32 by 32 search area (p = 8)• 25 Hz Frame rate (f frame = 25)• 9 Giga Operations/Sec is needed for Full

Search Block Matching Algorithm.


37

Why Reconguration in Motion Estimation?

• Adjusting the search area at frame-rate according to the changing characteristics of video sequences

• Reducing Power Consumption by avoiding unnecessary computation

Motion Vector Distributions


38

Architecture for Motion Estimation

From P. Pirsch et al, VLSI Architectures for Video Compression, Proc. Of IEEE, 1995


39

Re-configurable Architecture for ME


40

Power Estimation in Recongurable Architecture


41

Power vs Search area


42

Resource Reuse in FPGAs


43

Motion Estimation - Conventional


44

Motion Estimation - Data Reuse

P P PP P P PP P

a add abs

b add add abs

abs add

2 22

0 45

2

2 1

2

//

.Therefore, power reduction factor is 11%


45

Vector Quantization• Lossy compression technique which exploits the correlati

on that exists between neighboring samples and quantizes samples together


46

Complexity of VQ EncodingThe distortion metric between an input vector X anda codebook vector C_i is computed as follows:

Three VQ encoding algorithms will be evaluated: full search, tree search and differential codebook tree-search.


47

Full Search• Brute-force VQ: the distortion between the input vector an

d every entry in the code-book is computed, and the codeindex that corresponds to the minimum distortion is determined and sent over to the decoder.

• For each distortion computation, there are 16 8-bit memory accesses (to fetch the entries in the codeword), 16 subtractions, 16 multiplications, 15 additions. In addition, the minimum of 256 distortion values, which involves 255 comparison operations, must be determined.


48

Tree-structured Vector Quantization

If for example at level 1, the input vector iscloser to the left entry, then the right portion of the tree is never compared below level 2 and an index bit 0 istransmitted.

Here only 2 x log 2 256 = 16 distortion calculations with 8 comparisons


49

Algorithmic Optimization• Minimizing the number of operations

– example• video data stream using the v

ector quantization (VQ) algorithm

• distortion metric

– Full search VQ• exhaustive full-search• distortion calculation : 256• value comparison : 255

15

0

2

jijji CXD

– Tree-structured VQ• binary tree-search• some performance

degradation• distortion calculation :

16 ( 2 x log2 256 )• value comparison : 8

1

2 2

3 3 3 3

8 8

0

0 1 0 1

1


50

Differential Codebook Tree-structure Vector Quantization

• The distortion difference b/w the left and right node needs to be computed. This equation can be manipulated to reduce the number of operations

.


51

Algorithmic Optimization– Differential codebook tree-structure VQ

• modify equation for optimizing operations

algorithm # ofmem.

accessfull searchtree searchdifferentialtree search

# ofmul.

# ofadd.

# ofsub

4096 4096 3840 4096256 256 240 264

136 128 128 0

15

0

15

0,,

2,

2,

15

0

15

0

2,

2,

2j j

jleftjrightjjrightjleft

j jjrightjjleftjrightleft

CCXCX

CXCXD


52

ALU MULT

ACC PR

X Y

MUL > (5 * ALU)

X

Y

[ Modified Booth Encoding ]One of 0, X, -X, 2X, -2Xbased on each 2 bits of Y

Multiplication and Accumulation: MAC

• Major operation in DSP

PR

CSA

CPA


53

Operand Swapping (1/2)

• Weight = how many additions are needed ?

ByBooth Encoding

0011110000X000X0

Y= Weight = 2

7FFF AAAA0001 AAAA7FFF 66660001 AAAA7FFF AAAA0001 0001

A B A*B B*A22.0

31.6

28.8

10.0

10.0

12.2

Saving

54%

68%

58%

Current (mW)Operands

Low WeightHigh Switching


54

DIGLOG multiplierC n n C n n

A A B B

A B A B B A A B

mult add

jR

kR

jR

kR

jR

kR R R

( ) , ( ) ,

,

( )( )

253 214

2 2

2 2 2 2

2 where n world length in bits

1st Iter 2nd Iter 3rd Iter

Worst-case error -25% -6% -1.6%

Prob. of Error<1% 10% 70% 99.8%

With an 8 by 8 multiplier, the exact result can be obtained at a maximum of seven iteration steps (worst case)


55

Voltage Scaling• Merely changing a processor clock frequency

is not an effective technique for reducing energy consumption. Reducing the clock frequency will reduce the power consumed by a processor, however, it does not reduce the energy required to perform a given task.

• Lowering the voltage along with the clock actually alters the energy-per-operation of the microprocessor, reducing the energy required to perform a fixed amount of work.


56

Different Voltage Schedules

0 5 10 15 20 25 Time(sec)

5.02 1000Mcycles50MHz

40J

(A)

0 5 10 15 20 25 Time(sec)

5.02 750Mcycles50MHz

32.5J

(B)

0 5 10 15 20 25Time(sec)

5.02

1000Mcycles40MHz

25J (C)

Timing constraint

2.52

250Mcycles25MHz

4.02

Ene

rgy

cons

umpt

ion

( V

dd2 )


57

Data Driven Signal Processing

The basic idea of averaging two samples are buffered and their work loads are averaged.

The averaged workload is then used as the effective workload to drive the power supply.

Using a pingpong buffering scheme, data samples In +2, In +3

are being buffered while In, In +1

are being processed.


58

RTL: Multiple Supply Voltages Scheduling

Filter Example


59

A hardware / software partitioning technique with hierarc

hical design space exploration Houria Oudghiri, Bozena Kaminska, and Janusz Rajski,

Mentor Graphics Corp.

• A set of DSP examples are considered for co-design on a specific architecture in order to accelerate their performance on a target architecture including a standard DSP processor running concurrently with a custom SIMD (Single Instruction Multiple Data) processor


60

proposed methodologyinput : List of blocks and time constraints , output : Two subsets where blocks are assignedStep 1 : construct the complete weighted dependency graph GStep 2 : Assign all blocks to software, compare the complete system execution timeStep 3 : while (time constraints not satisfied) do step 3_i : Select the node with the maximum execution time (i) step 3_ii : Assign i to hardware, Update the system execution time step 3_iii : while (time constraints not satisfied) do step 3_iii_1 : Select the maximum weighted edge connected to i with the most time consuming node (j) step 3_iii_2 : Assign to hardware, Update the dependency graph G Update the system execution time endo endo


61

co-design target architecture

The Texas Instruments DSP processor TMS320C40 is used as the master processor and the custom SIMD processor PULSE (Parallel Ultra Large Scale Engine, 4 processors in parallel) as the slave processor


62

The hierarchical model of the FFT transform

Level 1

Level 2

Level 3

Level 4

Level 5

Level 6

Level 7

Level 8

FFT

Initialize

Bit Revers

al

Danielson control

Output

Initialize VariableInitialize

Data

Bit_init

Bit_loop1Bit_incr

Dan_init

Dan_loop

Out_init

Out_write

Out_incr

Bit_shift

Bit_loop2

Bit_cond

Bit_acc

Index_init

Read_data

Index_incr

Data_test

Danielson

Dan_init

Bit_test

Bit_swap1Bit_swap2Loop2_testLoop2_assLoop2_shif

InitializeDan_loop

1

Loop1_initLoop1_bodyLoop1_incr

Update Variable

s

Dan_loop1

Loop2_initLoop2_bodyLoop2_incr

InitializeDan_real

Dan_imag


63

Block assignment at different hierarchical levels

level Nb.of Bolcks

C40 PULSE Time(ms) / time constraint = 25 ms

PULSE C40 Total1 4 2 2 18.14 4.8 22.942 10 6 4 18.8 2.96 21.763 17 11 6 15.56 9 24.564 22 18 6 14.68 10.24 24.925 24 17 7 14.56 10.4 24.946 24 22 2 6.82 17.72 24.547 25 22 3 7 17.92 24.928 27 18 9 5.88 18.64 24.52


64

Function-Architecture Co-Design CadenceCadence


65


66

System C supports:– Mentor Graphics - Seamless® C-Bridge™– Verisity - SpecMan™ Elite– Forte Design Systems - ESC Library– Emulation & Verification Engineering - Zebu– Axys Design - MaxSim™– CoWare - N2C updated for SystemC 2.0– Cadence - SPW 4.8 / SystemC v2.0 IF– Synopsys - CoCentric System Studio

• Plus Kluwer book - “System Design Using SystemC”, 2002


67

OCAPI-xl design flow


68

Application Structure


69

Specification and modeling• Executable specification - Verilog, VHDL, C,

C++, Java.• Common models: synchronous dataflow

(SDF), sequential programs (Prog.), communicating sequential processes (CSP), object-oriented programming (OOP), FSMs, hierarchical/concurrent FSM (HCFSM).

• Depending on the application domain and specification semantics, they are based on

different models of computation.


70

Hardware Synthesis• Many RTL, logic level, physical level commercial CAD tools.• Some emerging high-level synthesis tools: Behavioral Compiler (Synosys), Monet

(Mentor Graphics), and RapidPath (DASYS).• Many open problems: memory optimization,

parallel heterogeneous hardware architectures, programmable hardware synthesis and optimization, communication optimization.


71

Software synthesis• The use of real-time operating systems (RTOSs)• The use of DSPs and micro-controllers – code generation issues• Special processor compilation in many cases is

still far less efficient than manual code generation!

• Retargeting issues - C code developed for TI TMS320C6x is not optimized for running on

Philips TriMedia processor.


72

Interface synthesis• Interface between:

- hardware-hardware - hardware-software - software-software

• Timing and protocols • Recently, first commercial tools

appeared: the CoWare system (hw-sw protocols) and the Synopsys Protocol Compiler

(hw interface synthesis tool)


73

Co-design Sites• Bibliography of Hardware/Software Codesign: http://www-ti.informatik.uni-tuebingen.de/~buchen/ • Ralf Niemann's Codesign Links and Literature:

http://ls12-www.informatik.uni-dortmund.de/~niemann/codesign/codesign_links.html • URLs to Hardware/Software Co-Design Research: http://www.ece.cmu.edu/~thomas/hsURL.html • RASSP Architecture Guide: http://www.sanders.com/hpc/ArchGuide/TOC.html • EDA, Electronic Design Automation: http://www.eda.org• COMET (Case Western Reserve University): http://bear.ces.cwru.edu/research/hard_soft.html • COSMOS (Tima - Cmp, France): http://tima-cmp.imag.fr/Homepages/cosmos/research.html • COSYMA (Braunschweig): http://www.ida.ing.tu-bs.de/projects/cosyma/ • Handel-C (Oxford): http://oldwww.comlab.ox.ac.uk/oucl/hwcomp.html • Lycos (Technical University of Lyngby, Denmark): http://www.it.dtu.dk/~lycos/ • MOVE (Technical University Delft): http://cardit.et.tudelft.nl/MOVE/ • Polis (University of Berkeley): http://www

cad.eecs.berkeley.edu/Respep/Research/hsc/abstract.html • ProCos (UK Research): http://www.comlab.ox.ac.uk/archive/procos/codesign.html • Ptolemy (University of Berkeley): http://ptolemy.eecs.berkeley.edu/ • SPAM (Princeton): http://www.ee.princeton.edu/~spam/ • TRADES (University of Twente, INF/CAES): http://wwwspa.cs.utwente.nl/aid/aid.html Specificatiet

alen• SystemC: http://www.systemc.org

http://www-ti.informatik.uni-tuebingen.de/~buchen/



http://ls12-www.informatik.uni-dortmund.de/~niemann/codesign/codesign_links.html

http://www.ece.cmu.edu/~thomas/hsURL.html

http://www.ece.cmu.edu/~thomas/hsURL.html

http://www.sanders.com/hpc/ArchGuide/TOC.html

http://www.sanders.com/hpc/ArchGuide/TOC.html

http://www.eda.org/

http://www.eda.org/

http://bear.ces.cwru.edu/research/hard_soft.html

http://bear.ces.cwru.edu/research/hard_soft.html

http://tima-cmp.imag.fr/Homepages/cosmos/research.html

http://www.ida.ing.tu-bs.de/projects/cosyma/

http://www.ida.ing.tu-bs.de/projects/cosyma/

http://oldwww.comlab.ox.ac.uk/oucl/hwcomp.html

http://oldwww.comlab.ox.ac.uk/oucl/hwcomp.html

http://www.it.dtu.dk/~lycos/



http://cardit.et.tudelft.nl/MOVE/



http://www-cad.eecs.berkeley.edu/Respep/Research/hsc/abstract.html

http://www-cad.eecs.berkeley.edu/Respep/Research/hsc/abstract.html

http://www.comlab.ox.ac.uk/archive/procos/codesign.html

http://www.comlab.ox.ac.uk/archive/procos/codesign.html

http://ptolemy.eecs.berkeley.edu/



http://www.ee.princeton.edu/~spam/



http://wwwspa.cs.utwente.nl/aid/aid.html

http://wwwspa.cs.utwente.nl/aid/aid.html

http://www.systemc.org/

http://www.systemc.org/


74

SOC CAD Companies• Cadence www.cadence.com• Duet Tech www.duettech.com• Escalade www.escalade.com• Logic visions www.logicvision.c

om• Mentor Graphics www.mentor.c

om• Palmchip www.palmchip.com• Sonic www.sonicsinc.com• Summit Design www.summit-d

esign.com

• Synopsys www.synopsys.com• Topdown design solutions www.

topdown.com• Xynetix Design Systems www.x

ynetix.com• Zuken-Redac www.redac.co.uk


1 low power system on chip design. 2 system level power optimization algorithm selection / algorithm...

Documents