an 401: low power design techniques · 2020. 7. 21. · altera corporation 1 an-401-1.0 preliminary...

26
Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application note provides low-power logic design techniques for Stratix ® II and Cyclone II devices. These devices are based on a 1.2-V core voltage and 90-nm process technology. They are the first 90-nm FPGAs that utilize a low-k dielectric material that dramatically reduces dynamic power and improves performance. Stratix II devices include new, more efficient, logic structures called adaptive logic modules (ALMs) which obtain maximum performance while minimizing power consumption. Cyclone II FPGAs offer the optimal blend of high performance and low power in a low-cost FPGA. Altera ® provides the Quartus ® II PowerPlay Power Analyzer to aid you during the design process by delivering fast and accurate estimations of power consumption. You can minimize power consumption, while taking advantage of the industry’s leading FPGA performance, by using the tools and techniques described in this application note. f For more information on the PowerPlay Power Analyzer, refer to the PowerPlay Power Analyzer chapter in volume 3 of the Quartus II Handbook. Total FPGA power consumption is comprised of I/O Power, core static power, and core dynamic power. This application note focuses on design techniques that help reduce core dynamic power and I/O power. Power Dissipation This section describes the sources of power dissipation in Stratix II and Cyclone II devices. You can refine techniques that reduce power consumption in your design by understanding the sources of power dissipation. Figure 1 shows the power dissipation of Stratix II and Cyclone II devices in different designs. All designs were analyzed at a fixed clock rate of 200 MHz and exhibited varied logic resource utilization across available resources. August 2005, ver 1.0

Upload: others

Post on 22-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Altera Corporation AN-401-1.0

August 2005, ver 1.0

Low Power DesignTechniques

Application Note 401

Introduction This application note provides low-power logic design techniques for Stratix® II and Cyclone™ II devices. These devices are based on a 1.2-V core voltage and 90-nm process technology. They are the first 90-nm FPGAs that utilize a low-k dielectric material that dramatically reduces dynamic power and improves performance. Stratix II devices include new, more efficient, logic structures called adaptive logic modules (ALMs) which obtain maximum performance while minimizing power consumption. Cyclone II FPGAs offer the optimal blend of high performance and low power in a low-cost FPGA.

Altera® provides the Quartus® II PowerPlay Power Analyzer to aid you during the design process by delivering fast and accurate estimations of power consumption. You can minimize power consumption, while taking advantage of the industry’s leading FPGA performance, by using the tools and techniques described in this application note.

f For more information on the PowerPlay Power Analyzer, refer to the PowerPlay Power Analyzer chapter in volume 3 of the Quartus II Handbook.

Total FPGA power consumption is comprised of I/O Power, core static power, and core dynamic power. This application note focuses on design techniques that help reduce core dynamic power and I/O power.

Power Dissipation

This section describes the sources of power dissipation in Stratix II and Cyclone II devices. You can refine techniques that reduce power consumption in your design by understanding the sources of power dissipation.

Figure 1 shows the power dissipation of Stratix II and Cyclone II devices in different designs. All designs were analyzed at a fixed clock rate of 200 MHz and exhibited varied logic resource utilization across available resources.

1Preliminary

Page 2: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Power Dissipation

Figure 1. Average Core Dynamic Power Dissipation

Notes to Figure 1:(1) 112 different designs were used to benchmark these results.(2) 93 different designs were used to benchmark these results.(3) 5% of dynamic core power dissipation comes from DSP blocks in designs using DSP blocks.

As shown in Figure 1, a significant amount of the total power is dissipated in routing for both Stratix II and Cyclone II devices, with the remaining power dissipated in logic, clock, and RAM blocks.

In Stratix II and Cyclone II devices, a series of column and row interconnect wires of varying lengths provide signal interconnects between logic array blocks (LABs), memory block structures, and digital signal processing (DSP) blocks or multiplier blocks. These interconnects dissipate the largest component of device power.

FPGA combinational logic is another source of power consumption. The basic building block of logic in Stratix II devices is the ALM, and in Cyclone II devices, it is the logic element (LE).

f For more information on ALMs and LEs in Stratix II and Cyclone II devices, refer to the Stratix II Device Handbook and Cyclone II Device Handbook, respectively.

Memory and clock resources are other major consumers of power in FPGAs. Stratix II devices feature the TriMatrix™ memory architecture. TriMatrix memory includes 512-bit M512 blocks, 4-Kbit M4K blocks, and 512-Kbit M-RAM blocks, which are each configurable to support many features. Cyclone II devices have 4-Kbit M4K memory blocks.

Average Core Dynamic Power Dissipation by Block Type in Stratix II Devices at a 12.5% Toggle Rate (1)

Routing35%

Combinational Logic15%

Registered Logic25%

DSP Blocks1%

Average Core Dynamic Power Dissipation by Block Type in Cyclone II Devices at a 12.5% Toggle Rate (2)

Memory9%

Global Clock Routing 15%

Routing39%

Combinational Logic23%

Registered Logic16%

Memory14%

Global Clock Routing7%

DSP Blocks1% (3)

2 Altera CorporationPreliminary

Page 3: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Low Power Design Techniques

There are several low-power design techniques that can reduce power consumption when applied during FPGA design implementation. This section provides detailed design techniques for Stratix II and Cyclone II devices and their effect on overall design power. The results of these techniques may be different from design to design.

Standard Fit-Fitter Effort

The use of Standard Fit for Fitter effort, rather then Fast Fit or Auto Fit, reduces design power consumption because the Standard Fit option fully optimizes your design during placement by minimizing interconnect resources. Less interconnect usage reduces power consumption because fewer interconnect wires are charged and discharged each cycle.

Standard Fit Experiment for Stratix II Devices

In this experiment, four designs are implemented in Stratix II devices using the Auto Fit and Standard Fit, Fitter setting effort.

Table 1 shows resource utilization for the designs used in the experiment.

The PowerPlay Power Analyzer estimates the power using a gate-level simulation output file. Figure 2 shows that the Standard Fit technique saves as much as 9% of power in Stratix II devices.

Table 1. Designs Used in the Standard Fit Experiment for Stratix II Devices

Design Name ALUTs (Speed Mapping) ALUTs (Area Mapping)

Design 1 Auto and Standard Fit 5,166

Design 2 Auto and Standard Fit 13,741

Design 3 Auto and Standard Fit 2,889

Design 4 Auto and Standard Fit 40,500

Altera Corporation 3Preliminary

Page 4: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Figure 2. Power Savings Using the Standard Fit Setting for Stratix II Devices

Standard Fit Experiment for Cyclone II Devices

In this experiment, four designs were implemented in Cyclone II devices using the Auto Fit and Standard Fit, Fitter setting effort.

Table 2 shows the resource utilization for the designs used in the experiment.

The PowerPlay Power Analyzer estimates the power using a gate-level simulation output file. Figure 3 shows that the Standard Fit technique saves as much as 16% of power in Cyclone II devices.

Table 2. Designs Used in the Standard Fit Experiment for Cyclone II Devices

Design Name LEs (Speed Mapping) LEs (Area Mapping)

Design 1 Auto and Standard Fit 13,396

Design 2 Auto and Standard Fit 7,343

Design 3 Auto and Standard Fit 14,758

Design 4 Auto and Standard Fit 15,075

0%1%2%3%4%5%6%7%8%

1 2 43

9%10%

Designs

Powe

r Sav

ings

4 Altera CorporationPreliminary

Page 5: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Figure 3. Power Savings Using the Standard Fit Setting for Cyclone II Devices

Area-Driven Synthesis

The use of area optimization rather than timing or delay optimization during synthesis saves power because fewer logic blocks are used. Using less logic usually means less switching activity.

Area-Driven Synthesis Experiment for Stratix II Devices

In this experiment for Stratix II devices, five designs are compiled with the Quartus II software in two ways. First, the designs are compiled optimizing for area. The same designs are then compiled optimizing for speed.

Table 3 shows ALUT usage in the area-driven synthesis experiment.

The PowerPlay Power Analyzer estimates the power using a gate-level simulation output file combined with vectorless estimation techniques. Figure 4 shows that the area-driven technique reduces power consumption by as much as 31% in Stratix II devices.

Table 3. Designs Used in the Area-Driven Synthesis Experiment for Stratix II Devices

Design Name ALUTs (Area Mapping) ALUTs (Speed Mapping)

Design 1 5,682 8,553

Design 2 16,986 17,783

Design 3 36,554 36,312

Design 4 4,717 5,820

Design 5 15,947 15,978

0%2%4%6%8%

10%12%14%16%

1 2 43

18%

Designs

Pow

er S

avin

gs

Altera Corporation 5Preliminary

Page 6: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Figure 4. Power Savings Using Area-Driven Synthesis for Stratix II Devices

Area-Driven Synthesis Experiment for Cyclone II Devices

In this experiment for Cyclone II devices, five designs are compiled with the Quartus II software in two ways. First, the designs are compiled optimizing for area. The same designs are then compiled optimizing for speed.

Table 4 shows LE usage in the area-driven synthesis experiment.

The PowerPlay Power Analyzer estimates the power using a gate-level simulation output file combined with vectorless estimation techniques. Figure 5 shows that the area-driven technique reduces power consumption by as much as 15% in Cyclone II devices.

Table 4. Designs Used in the Area-Driven Synthesis Experiment for Cyclone II Device

Design Name LEs (Area Mapping) LEs (Speed Mapping)

Design 1 13,020 16,429

Design 2 13,317 13,636

Design 3 5,384 5,690

Design 4 33,640 40,008

Design 5 21,409 22,988

0%5%

10%15%20%25%30%35%

1 2 43 5

Designs

Pow

er S

avin

gs

6 Altera CorporationPreliminary

Page 7: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Figure 5. Power Savings Using Area-Driven Synthesis for Cyclone II Devices

Clock Power Management

Clocks represent a significant portion of dynamic power consumption due to their high switching activity and long route paths. Figure 1 shows a 7% average contribution to power consumption for global clock routing in Stratix II devices and 15% in Cyclone II devices. Actual clock-related power consumption is higher than this because the power consumed by local clock distribution within logic, memory, and DSP or multiplier blocks is included in the power consumption for the respective blocks.

Clock routing power is automatically optimized by the Quartus II software, which only enables those portions of the clock network that are required to feed downstream registers. Power can be further reduced by gating clocks when they are not needed. It is possible to build clock gating logic, but this approach is not recommended because it is difficult to generate a glitch-free clock in FPGAs using ALMs or LEs.

Stratix II and Cyclone II devices use clock control blocks that include an enable signal. A clock control block is a clock buffer that lets you dynamically enable or disable the clock network and dynamically switch between multiple sources to drive the clock network. You can use the Quartus II MegaWizard® Plug-In Manager to create this clock control block with the altclkctrl megafunction. Stratix II and Cyclone II devices provide clock control blocks for global clock networks. In addition, Stratix II devices have clock control blocks for regional clock networks. The dynamic clock enable or disable feature lets internal logic control the clock network. When a clock network is powered down, all the logic fed by that clock network does not toggle, thereby reducing the overall power consumption of the device. Figure 6 shows a 4-input clock control block diagram.

0%2%4%6%8%

1 2 43 5

Pow

er S

avin

gs

Designs

10%12%14%16%

Altera Corporation 7Preliminary

Page 8: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Figure 6. Clock Control Block Diagram

f For more information about using clock control blocks, refer to the altclkctrl Megafunction User Guide.

Another contributor to clock power consumption is the LAB clock that distributes a clock to the flipflops within a LAB. LAB clock power can be the dominant contributor to overall clock power. For example, in Cyclone II devices each LAB can use two clocks and two clock enable signals as shown in Figure 7. Each LAB’s clock and clock enable signals are linked. For example, an LE in a particular LAB using the labclk1 signal also uses the labclkena1 signal.

Figure 7. LAB-Wide Control Signals

To reduce LAB-wide clock power consumption without disabling the entire clock tree, use the LAB-wide clock enable to safely gate the LAB-wide clock. The Quartus II software automatically promotes register-level clock enable signals to the LAB-level. All registers within an LAB

inclk 3×inclk 2×inclk 1×inclk 0×

clkselect[1..0]

outclk

ena

6

labclk1 labclk2 labclr2syncload

labclkena1 labclkena2 labclr1 synclr

LocalInterconnect

LocalInterconnect

LocalInterconnect

LocalInterconnect

DedicatedLAB RowClocks

8 Altera CorporationPreliminary

Page 9: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

that share a common clock and clock enable are controlled by a shared gated clock. To take advantage of these clock enables, use a clock enable construct in the relevant HDL code for the registered logic.

LAB-Wide Clock Enable Example

This VHDL code makes use of a LAB-wide clock enable. This clock-gating logic is automatically turned into an LAB-level clock enable signal.

IF clk'event AND clock = '1' THEN IF logic_is_enabled = '1' THEN reg <= value; ELSE reg <= reg; END IF;END IF;

f For more information on LAB-wide control signals, refer to the Stratix II Architecture or Cyclone II Architecture chapters in the respective device handbook.

Reducing Memory Power Consumption

The memory blocks in FPGA devices can represent a large fraction of typical core dynamic power. Memory represents 14% of the core dynamic power in a typical Stratix II device design and 9% in a Cyclone II device design. Memory blocks are unlike most other blocks in the device because the vast majority of their power is tied to the clock rate, and is insensitive to the toggle rate on the data and address lines.

When a memory block is clocked, there is a sequence of timed events that occur within the block to execute a read or write. The circuitry controlled by the clock consumes the same amount of power regardless of whether or not the address or data has changed from one cycle to the next. Thus, the toggle rate of input data and the address bus have no impact on memory power consumption.

The key to reducing memory power consumption is to reduce the number of memory clocking events. You can achieve this through clock network-wide gating described in “Clock Power Management” on page 7, or on a per-memory basis through use of the clock enable signals on the memory ports. You can use the Quartus II MegaWizard Plug-In Manager to create these enable signals by selecting the Clock enable signal option for the appropriate port when generating the memory block function (Figure 8).

Altera Corporation 9Preliminary

Page 10: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Figure 8. MegaWizard Plug-In Manager RAM 2-Port Clock Enable Signal Selectable Option

For example, take a design that contains a 32-bit-wide M4K memory block in ROM mode that is running at 200 MHz. Assuming that the output of this block is only needed roughly every four cycles, this memory block will consume 8.45 mW of dynamic power according to the demands of the downstream logic. By adding a small amount of control logic to generate a read clock enable signal for the memory block only on the relevant cycles, the power can be cut 75% to 2.15 mW.

You can also use the MAXIMUM_DEPTH parameter for your memory megafunction to save power in Stratix II and Cyclone II devices; however, this approach may increase the number of LEs required to implement the memory and affect design performance.

The MAXIMUM_DEPTH parameter for memory modules can be set manually in the megafunction instantiation or in the MegaWizard Plug-In Manager (Figure 9).

10 Altera CorporationPreliminary

Page 11: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Figure 9. MegaWizard Plug-In Manager RAM 2-Port Maximum Depth Selectable Option

Figure 10 shows an example of a 4K × 4 (4K deep and 4 bit wide) memory implementation in two different configurations using M4K memory blocks. The minimum logic area implementation uses M4K blocks configured as 4K bits deep × 1 bit wide. This implementation is the default in the Quartus II software, because it has the minimum logic area (0 logic cells) and the highest speed. However, all four M4K blocks are active on each memory access in this implementation, which increases RAM power. The minimum RAM power implementation is created by setting the MAX_DEPTH parameter in the altsyncram megafunction to 1024. This implementation uses four M4K blocks configured as 1K bits deep × 4 bits wide. An address decoder is implemented by the altsyncram megafunction to select which of the four M4K blocks should be activated on a given cycle, based on the state of the top two user address bits. The altsyncram megafunction automatically implements a multiplexer to feed the downstream logic by choosing the appropriate M4K output. This implementation reduces RAM power, because only one M4K block is active on any cycle, but it requires extra logic cells, costing logic area and potentially impacting design performance.

Altera Corporation 11Preliminary

Page 12: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

1 In the Quartus II software, version 5.0, inactive RAM blocks created by megafunctions with the MAX_DEPTH parameter are shut down only when specified by a read enable signal. Consequently, it is important that you specify in the megafunction that a read enable signal is required. This signal can be connected to VCC if the altsyncram is expected to read every cycle. MAX_DEPTH power optimizations only require that the read enable signal exists.

Figure 10. 4K × 4 Memory Implementation Using Multiple M4K Blocks

Reducing Memory Power Consumption Example

Table 5 shows power usage measurements for a 4K × 36 simple dual-port memory implemented using multiple M4K blocks in a Stratix II EP2S15 device. For each implementation, the M4K blocks are configured with a different memory depth.

AddrDecoder

4

1K deep × 4 wideM4K RAM

Addr[0:9]

Addr[10:11]

Data[0:3]

Addr[10:11]

4K words deep and4 bits wide

Addr[0:11]

4K deep × 1 wideM4K RAM

Data[0:3]

Minimum RAM Power(Power Efficient)

Minimum Logic Area(Power Inefficient)

Table 5. 4K × 36 Simple Dual-Port Memory Implemented Using Multiple M4K Blocks (Part 1 of 2)

M4K Configuration Number of M4K Blocks ALUTs

4K × 1 (default setting) 36 0

2K × 2 36 40

1K × 4 36 62

512 × 9 32 143

12 Altera CorporationPreliminary

Page 13: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Figure 11 shows the amount of power saved using the MAXIMUM_DEPTH parameter. For all implementations, a user-provided read enable signal is present to indicate when read data is needed. Using this power saving technique can reduce power consumption by as much as 60%.

Figure 11. Power Savings Using MAXIMUM_DEPTH Parameter

As the memory depth becomes shallower, memory dynamic power decreases because unaddressed M4K blocks can be shut off using a decoded combination of address bits and the read enable signal. For a 128-deep memory block, power used by the extra LEs starts to outweigh the power gain achieved by using a shallower memory block depth. The power consumption of the memory blocks and associated LEs depends on the memory configuration.

Pipelining & Retiming

Designs with many glitches consume more power because of faster switching activity. Glitches cause unnecessary and unpredictable temporary logic switches at the output of combinational logic. A glitch usually occurs when there is a mismatch in input signal timing leading to unequal propagation delay.

For example, consider an input change on a 2-input XOR gate from 1to 0, followed a few moments later by a second input change from 0 to 1. For a brief moment of time both inputs become 1 (high) during the state transition, resulting in 0 (low) at the output of the XOR gate. Subsequently, when the second input transition takes place, the XOR gate

256 × 18 32 302

128 × 36 32 633

Table 5. 4K × 36 Simple Dual-Port Memory Implemented Using Multiple M4K Blocks (Part 2 of 2)

M4K Configuration Number of M4K Blocks ALUTs

0%10%20%30%40%50%60%70%

4K × 1 2K × 2 256 × 18 128 × 361K × 4 512 × 9M4K Configuration

Pow

er S

avin

gs

Altera Corporation 13Preliminary

Page 14: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

output becomes 1 (high). During signal transition, a glitch is produced before the output becomes stable, as shown in Figure 12. This glitch can propagate to subsequent logic and create unnecessary switching activity, increasing power consumption. Circuits with many XOR functions, such as arithmetic circuits or cyclic redundancy check (CRC) circuits, tend to have many glitches if there are several levels of combinational logic between registers.

Figure 12. XOR Gate Showing Glitch at the Output

Pipelining can reduce design glitches by inserting flipflops into long combinational paths because flipflops do not allow glitches to propagate through combinational paths. Therefore, a pipelined circuit tends to have less glitching. Pipelining has the additional benefit of generally allowing higher-clock-speed operations, although it does increase the latency of a circuit (in terms of the number of clock cycles to a first result). Figure 13 shows an example where pipelining is applied to break up a long combinational path.

XOR (exclusive OR) Gate

A

B Q

A

B

Q

Timing diagram for the 2-input XOR Gate

Glitch

t

14 Altera CorporationPreliminary

Page 15: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Figure 13. Pipelining Example

Pipelining is very effective for glitch-prone arithmetic systems because it reduces switching activity, resulting in reduced power dissipation in combinational logic. Additionally, pipelining allows higher-speed operation by reducing logic-level numbers between registers. The disadvantage of this technique is that if there are not many glitches in your design, pipelining may increase power consumption by adding unnecessary registers. Pipelining can also increase resource utilization.

Pipelining Experiment for Stratix II Devices

In this experiment, three designs are implemented in Stratix II devices with and without pipelining. These three designs heavily use arithmetic (based on XOR functions) that may result in significant glitching.

Combinationallogic

Combinationallogic

Combinationallogic

Short logicdepth

Short logicdepth

Long logicdepthD Q D Q

D Q D Q D Q

Non-pipelined

Pipelined

Altera Corporation 15Preliminary

Page 16: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Table 6 shows the resource utilization for the designs used in the experiment.

The PowerPlay Power Analyzer estimates the power using a gate-level simulation output file. Figure 14 shows that pipelining reduces dynamic power consumption by as much as 31% in Stratix II devices.

Figure 14. Power Savings Using Pipelining for Stratix II Devices

Pipelining Experiment for Cyclone II Devices

In this experiment, four designs are implemented in Cyclone II devices with and without pipelining. These three designs heavily use arithmetic (based on XOR functions) that may result in significant glitching.

Table 6. Resources Used in the Pipelining Experiment for Stratix II Devices

Design Name Pipelined ALUTs Registers

Multiplier (Design 1) No 9,726 448

Yes 9,772 1,109

Accumulator multipliers (Design 2)

No 13,719 1,120

Yes 14,007 2,260

Fir filter (Design 3) Yes (level 1) 1,048 949

Yes (level 2) 932 929

0%5%

10%15%20%25%30%35%

1 2 3Designs

Pow

er S

avin

gs

16 Altera CorporationPreliminary

Page 17: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Table 7 shows resource utilization for the designs used in the experiment.

The PowerPlay Power Analyzer estimates the power using a gate-level simulation output file. Figure 15 shows that pipelining reduces dynamic power by as much as 31% in Cyclone II devices.

Figure 15. Power Savings Using Pipelining for Cyclone II Devices

Gate-Level Register Retiming

You can also use gate-level register retiming to reduce circuit switching activity. Retiming shuffles registers across combinational blocks without changing design functionality. The Perform gate-level register retiming option in the Quartus II software enables the movement of registers across combinational logic to balance timing, enabling the software to trade off the delay between timing-critical and non-critical paths.

Retiming uses fewer registers than pipelining. Figure 16 shows an example of gate-level register retiming, where the 10-ns critical delay is reduced by moving the register relative to the combinational logic, resulting in the reduction of data depth and switching activity.

Table 7. Resources Used in the Pipelining Experiment for Cyclone II Devices

Design Name Pipelined LEs Registers

Accumulator Multipliers (Design 1)

No 6,870 320

Yes 13,071 3,719

Adder (Design 2) No 7,392 1,076

Yes 7,343 752

Divider (Design 3) No 6,659 320

Yes 6,735 520

0%5%

10%15%20%25%30%35%

1 2 3Designs

Pow

er S

avin

gs

Altera Corporation 17Preliminary

Page 18: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Figure 16. Gate-Level Register Retiming

1 Gate-level register retiming makes changes at the gate level. If you are using an atom netlist from a third-party synthesis tool, you must also select the Perform WYSIWYG primitive resynthesis option to undo the atom primitives to gates map (so that register retiming can be performed) and then to re-map gates to Altera primitives. When using the Quartus II integrated synthesis, retiming occurs during synthesis before the design is mapped to Altera primitives.

f For more information on register retiming, refer to the Netlist Optimizations & Physical Synthesis chapter in volume 2 of the Quartus II Handbook.

Gate-Level Register Retiming Experiment for Stratix II Devices

In this experiment for Stratix II devices, three designs are compiled with the Quartus II software in two ways. First, a netlist from a third-party synthesis tool is compiled. Then, the same netlist from the third-party synthesis tool is compiled by selecting the Perform WYSIWYG primitive resynthesis and Perform gate-level register retiming options.

D Q D Q

D Q D Q

D Q

D Q

10 ns 5 ns

7 ns 8 ns

No retiming

Retiming

18 Altera CorporationPreliminary

Page 19: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Table 8 shows resource usage results.

The PowerPlay Power Analyzer estimates the power using a gate-level simulation output file. Figure 17 shows that the combination of WYSIWYG remapping and gate-level register retiming reduces power consumption by as much as 6% in Stratix II devices.

Figure 17. Power Savings Using Retiming for Stratix II Devices

Gate-Level Register Retiming Experiment for Cyclone II Devices

In this experiment for Cyclone II devices, three designs are compiled with the Quartus II software in two ways. First, a netlist from a third-party synthesis tool is compiled. Then, the same netlist from the third-party synthesis tool is compiled by selecting the Perform WYSIWYG primitive resynthesis and Perform gate-level register retiming options.

Table 8. Resources Used in the Gate-Level Register Retiming Experiment for Stratix II Devices

Design Name

WYSIWYG & Register Retiming

ALUTs Registers DSP Blocks Memory

Design 1 No 2,051 691 0 16

Yes 1,882 731 0 16

Design 2 No 123,909 40,070 0 0

Yes 95,593 39,816 0 0

Design 3 No 6,354 6,019 64 3,584

Yes 7,496 5,970 64 3,584

6%5%4%3%2%1%0%

1 2 3Designs

Pow

er S

avin

gs

Altera Corporation 19Preliminary

Page 20: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Table 9 shows resource usage results.

The PowerPlay Power Analyzer estimates the power using a gate-level simulation output file. Figure 18 shows that the combination of WYSIWYG remapping and gate-level register retiming reduces power consumption by as much as 21% in Cyclone II devices.

Figure 18. Power Savings Using Retiming for Cyclone II Devices

Architectural Optimization

You can use design-level architectural optimization by taking advantage of specific device architecture features. These features include dedicated memory and DSP or multiplier blocks available in FPGA devices to perform memory or arithmetic related functions. You can use these blocks in place of LUTs to reduce power consumption. For example, you can build large shift registers from RAM-based first-in first-out (FIFO) buffers instead of building the shift registers from the LE registers.

The Stratix II device family allows you to efficiently target small, medium, and large memories with the TriMatrix memory architecture. Each TriMatrix memory block is optimized for a specific function. The

Table 9. Resources Used in the Gate-Level Register Retiming Experiment for Cyclone II Devices

Design Name

WYSIWYG & Register Retiming

LEs Registers Multiplier Blocks Memory

Design 1 No 385 137 0 0

Yes 278 143 0 0

Design 2 No 14,758 1,683 0 0

Yes 13,079 1,683 0 0

Design 3 No 31,727 29,097 96 3,120

Yes 27,038 24,272 96 3,120

0%5%

10%15%20%25%

1 2 3Designs

Pow

er S

avin

gs

20 Altera CorporationPreliminary

Page 21: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

M512 memory blocks are useful for implementing small FIFO buffers, DSP, and clock domain transfer applications. M512 memory blocks are more power efficient than the distributed memory structures in some competing FPGAs. The M4K memory blocks are used to implement buffers for a wide variety of applications, including processor code storage, large look-up table implementation, and large memory applications. The M-RAM blocks are useful in applications where a large volume of data must be stored on-chip. Effective utilization of these memory blocks can have a significant impact on power reduction in your design.

The Cyclone II device family has configurable M4K memory blocks that provide various memory functions such as RAM, FIFO buffers, and ROM.

Architectural Optimization Experiment for Stratix II Devices

In this experiment, three designs are implemented in Stratix II devices in three ways to illustrate the power-reducing capabilities of dedicated blocks. The first two designs use logic elements and DSP blocks. The third design uses M4K and M-RAM blocks. In the third design, you can see that using MRAM blocks are more power efficient than M4K blocks for large memory applications.

Table 10 shows relative resource usage results.

The PowerPlay Power Analyzer estimates the power using a gate-level simulation output file. Figure 19 shows that the architectural optimization technique has power savings of as much as 60% in Stratix II devices.

Table 10. Designs Used in the Architectural Optimization Experiment for Stratix II Devices

Design Name Implementation ALUT Register DSP Blocks Memory

Design 1 Regular implementation

9,726 448 0 0

Dedicated resource implementation

1,124 448 121 0

Design 2 Regular implementation

13,719 1,120 0 0

Dedicated resource implementation

2,880 896 212 0

Design 3 M4K 286 228 0 1,835,008 (M4K)

M-RAM 224 224 0 1,835,008 (M-RAM)

Altera Corporation 21Preliminary

Page 22: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Figure 19. Power Savings Using Dedicated Blocks for Stratix II Devices

Architectural Optimization Experiment for Cyclone II

In this experiment, three designs are implemented in Cyclone II devices in three ways to illustrate the power-reducing capabilities of dedicated blocks. The first two designs use LEs and multiplier blocks. The third design uses LEs and M4K blocks.

Table 11 shows relative resource usage results.

The PowerPlay Power Analyzer estimates the power using a gate-level simulation output file. Figure 20 shows that the architectural optimization technique has power savings of as much as 88% in Cyclone II devices.

0%10%20%30%40%50%60%70%

1 2 3

Pow

er S

avin

gs

Designs

Table 11. Designs Used in the Architectural Optimization Experiment for Cyclone II Devices

Design Name Implementation LEs Register Multiplier Blocks Memory

Design 1 Regular implementation

6,870 320 0 0

Dedicated resource implementation

1,130 320 49 0

Design 2 Regular implementation

7,343 752 0 0

Dedicated resource implementation

1,401 608 44 0

Design 3 Regular implementation

1,550 1,265 0 0

M4K 72 72 0 1,152

22 Altera CorporationPreliminary

Page 23: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

Figure 20. Power Savings Using Dedicated Blocks for Cyclone II Devices

I/O Power Guidelines

Non-terminated I/O standards such as LVTTL and LVCMOS have a rail-to-rail output swing. The voltage difference between logic-high and logic-low signals at the output pin is equal to the VCCIO supply voltage. If the capacitive loading at the output pin is known, the dynamic power consumed in the I/O buffer can be calculated as:

P = 0.5 × F × C × V2

In this equation, F is the output transition frequency and C is the total load capacitance being switched. V is equal to VCCIO supply voltage. Because of the quadratic dependence on VCCIO, lower voltage standards consume significantly less dynamic power. In addition, lower pin capacitance is an important factor in considering I/O power consumption. Hardware and simulation data show that Stratix II device I/O pins have half the pin capacitance of the nearest competing FPGA. Cyclone II devices exhibit 20% less I/O power consumption than competitive, low-cost, 90-nm FPGAs.

TTL I/O buffers consume very little static power. As a result, the total power consumed by a LVTTL or LVCMOS output is highly dependent on load and switching frequency.

When using resistively terminated I/O standards like SSTL and HSTL, the output load voltage swings by a small amount around some bias point. The same dynamic power equation is used, where V is the actual load voltage swing. Because this is much smaller than VCCIO, dynamic power is lower than for nonterminated I/O under similar conditions. These resistively terminated I/O standards dissipate significant static (frequency-independent) power, because the I/O buffer is constantly driving current into the resistive termination network. However, the

0%10%20%30%40%50%60%70%

1 2 3

80%90%

100%

Pow

er S

avin

gs

Designs

Altera Corporation 23Preliminary

Page 24: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

lower dynamic power of these I/O standards means they often have lower total power than LVCMOS or LVTTL for high-frequency applications. Use the lowest drive strength I/O setting that meets your speed and waveform requirements to minimize I/O power when using resistively terminated standards.

You can save a small amount of static power by connecting unused I/O banks to the lowest possible VCCIO voltage of 1.2 V.

Table 12 shows the total supply and thermal power consumed by outputs using different I/O standards for Stratix II devices. The numbers are for an I/O pin transmitting random data clocked at 200 MHz with a 10-pF capacitive load.

For this specific configuration, non-terminated standards generally use less power, but this is not always the case. If the frequency or the capacitive load is increased, the power consumed by non-terminated outputs increases faster than the power of terminated outputs.

f For more information on I/O Standards, refer to the Selectable I/O Standards in Stratix II Devices chapter in volume 2 of the Stratix II Device Handbook or the Selectable I/O Standards in Cyclone II Devices chapter in the Cyclone II Device Handbook

Table 12. I/O Power for Different I/O Standards for Stratix II Devices

StandardTotal Supply Current Drawn

from VCCIO Supply (mA)Total On-Chip Thermal

Power Dissipation (mW)

3.3-V LVTTL 2.42 9.87

2.5-V LVCMOS 1.9 6.69

1.8-V LVCMOS 1.34 4.18

1.5-V LVCMOS 1.18 3.58

3.3-V PCI 2.47 10.23

SSTL-2 class I 6.07 4.42

SSTL-2 class II 10.72 5.1

SSTL-18 class I 5.33 3.28

SSTL-18 class II 8.56 4.06

HSTL-15 class I 6.06 3.49

HSTL-15 class II 11.08 4.87

HSTL-18 class I 6.87 4.09

HSTL-18 class II 12.33 5.82

24 Altera CorporationPreliminary

Page 25: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

AN-401-1.0

Conclusion

The combination of a smaller process technology (specifically a 90-nm geometry), the use of low-k dielectric material, and reduced supply voltages significantly reduces dynamic power consumption in Stratix II and Cyclone II FPGAs. However, this reduction in dynamic power may be offset as design complexity and clock frequencies increase in 90-nm technology devices. Use the design recommendations presented in this application note to optimize resources utilization and minimize power consumption.

Copyright © 2005 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company, thestylized Altera logo, specific device designations, and all other words and logos that are identified astrademarks and/or service marks are, unless noted otherwise, the trademarks and service marks of AlteraCorporation in the U.S. and other countries. All other product or service names are the property of theirrespective holders. Altera products are protected under numerous U.S. and foreign patents and pendingapplications, mask work rights, and copyrights. Altera warrants performance of itssemiconductor products to current specifications in accordance with Altera’s standardwarranty, but reserves the right to make changes to any products and services at any timewithout notice. Altera assumes no responsibility or liability arising out of the applicationor use of any information, product, or service described herein except as expressly agreedto in writing by Altera Corporation. Altera customers are advised to obtain the latestversion of device specifications before relying on any published information and beforeplacing orders for products or services.

25 Altera Corporation

101 Innovation DriveSan Jose, CA 95134(408) 544-7000http://www.altera.comApplications Hotline:(800) 800-EPLDLiterature Services:[email protected]

Page 26: AN 401: Low Power Design Techniques · 2020. 7. 21. · Altera Corporation 1 AN-401-1.0 Preliminary Application Note 401 Low Power Design Techniques Introduction This application

Low Power Design Techniques Low Power Design Techniques

26 Altera CorporationPreliminary