vlsi pld 2011 [compatibility mode]
DESCRIPTION
aTRANSCRIPT
Designer’s Choice• Digital designer has various options
– SSI (small scale integrated circuits) or MSI (medium scale integrated circuits) components
– Difficulties arises as design size increases– Interconnections grow with complexity resulting in a
prolonged testing phase
•Programmable Device (PLD)
PLA — a Programmable Logic Array (PLA) is a relatively small PLD that contains two levels of logic, an AND-plane and an OR-plane, where both levels are programmable
PAL — a Programmable Array Logic (PAL) is a relatively small PLD that has a programmable AND-plane followed by a fixed OR-plane
– Simple programmable logic devices• PALs (programmable array logic)• PLAs (programmable logic array)
– Architecture not scalable; Power consumption and delays play an important role in extending the architecture to complex designs
– Implementation of larger designs leads to same difficulty as that of discrete components
CPLD — a more Complex PLD that consists of an arrangement of multiple PAL-like blocks on a single chip.
FPGA — a Field-Programmable Gate Array is an PLD featuring a general structure that allows very high logic capacity.
PROGRAMMABLE LOGIC DEVICES
Programmable logic device is a general purpose chip for implementing logic circuitry.
Fig . PROGRAMMABLE LOGIC DEVICE AS A BLOCK BOX
Programmable logic array (PLA):
PLA is a special type of PROM.
In PROM addressing is fixed and only the data is programmable. Conversely, in PLAs addressing itself is programmableovercoming the rigidity of PROM.
Several types of PLD s are commercially available. The first developed was the programmable logic array (PLA).
Programmable logic array (PLA): continued..
Commercially available PLA’s come in larger sizes than than we have shown here.
Typical parameters are 16 inputs, 32 products terms, andeight outputs.
Advantage:
The PLA is efficient in terms of the area needed for its implementation on an integrated circuit chip.
Offers Flexibility
Programmable logic array (PLA): continued..
Historically, PLA manufactures faced following difficulties,
•Hard to fabricate correctly,•Reduced the speed performance of circuits implemented In the PLAs.
These drawbacks led to the development of a similar device in which the AND plane is programmable, but the OR plane is fixed.
Programmable array logic (PAL) device Such a chip is known as a programmable array logic (PAL) device.
Advantage :Elimination of fuses in the OR array and associated electronics to blow them. So, a large area is saved in a PAL.
f1 = x1 x2 (!x3) + (!x1)x2x3
f2= (!x1)(!x2) + x1 x2 x3.
Programmable array logic (PAL) device…
An example of a PAL with three inputs, four product terms, and two outputs is given below
Programmable array logic (PAL) device…
In comparison to the PLA of the PAL offer less flexibility;
To compensate for the reduced flexibility, PAL’s are manufactured in a range of sizes, with various numbers of inputs and outputs, and different numbers of inputs to the OR gates..
FromFastCONNECT
36
SUM-TermLogic
D/T Q
R S
to/from other macrocells
RegisterXOR 18
GlobalClocks
GlobalOEs
P-term ClkP-term R&SP-term OE
3 2 or 4
to/from other macrocells
GlobalR/S
P-TermAllocator
CoolRunner (Xilinx) CPLD Features
• Basic Blocks– Function Blocks (FB) consisting of one or more levels
of macrocells which use a PLA configuration– Interconnect using AIM: advanced interconnect matrix– Input/Output block
• Arch. aimed towards achieving speed and reducing power dissipation
CoolRunner-II Architecture
PLA: (Xilinx) PLA like strictur,AIM: Advanced interconnect matrix,MC:macrocell,BSC:Boundary scan chain,ISP:in system programming
AIM• Advanced Interconnect matrix can be thought as a
software controlled crossbar switch delivering 40 signals each into FBs from any of the following:– IO blocks– FB outputs (feedback term)– Special control signals (GSR, global clocks)
• AIM is designed to minimize propagation delays and reduce power consumption.
• Propagation delays are fixed irrespective of source/destination pair to enable predictive timing delay models to be used in fitter/p&r tools.
CoolRunner-II features
• Available from 32-512 macrocells• 33-270 user I/Os• Fastest device has ~400 MHz system performance,
tsu=1.7 ns ,tPD=3 ns, tco=2.8 ns• Power Management features: DataGATE function
– User can selectively block toggling of free running signals at the pins propagating inside the chip
• Mixed voltage operation• Multiple packaging options• 1000 programming cycles, 20yr retention
FPGA• Field-Programmable Gate Arrays (FPGAs)
provide the benefits of custom CMOS VLSI, while avoiding the initial cost, time delay, and inherent risk of a custom/semicustom ASIC.
• FPGA consists of an array of logic blocks and routing channels.
• The FPGA has three major configurable elements: input/output blocks, configurable logic blocks (CLBs), and interconnects.
FPGA Basics• The IOBs provide the interface between the
package pins and internal signal lines.• The CLBs provide the functional elements for
constructing user's logic • The programmable interconnect resources
provide routing paths to connect the inputs and outputs of the CLBs and IOBs onto the appropriate networks.
• In terms of logic granularity, FPGAs offer finer granularity than CPLDs (But much courser granularity than ASIC libraries)
FPGA Basics
• Figure depicts a FPGA with a two-dimensional array of logic blocks that can be interconnected by interconnect wires.
• All internal connections are composed of metal segments with programmable switching points to implement the desired routing.
• An abundance of different routing resources is provided to achieve efficient automated routing.
Xilinx FPGA Families• Old families
– XC3000, XC4000, XC5200– Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for
modern designs.• High-performance families
– Virtex (0.22µm)– Virtex-E, Virtex-EM (0.18µm)– Virtex-II, Virtex-II PRO (0.13µm)– Virtex-4 (0.09µm)
• Low Cost Family– Spartan/XL – derived from XC4000– Spartan-II – derived from Virtex– Spartan-IIE – derived from Virtex-E– Spartan-3
LUT (Look-Up Table) Functionality
• Look-Up tables are primary elements for logic implementation
• Each LUT can implement any function of 4 inputs
x1 x2 x3 x4
y
x1 x2
y
LUT
x1x2x3x4
y
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
x1 x2 x3 x4
y
x1 x2 x3 x4
y
x1 x2
y
x1 x2
y
LUT
x1x2x3x4
y
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y0100010101001100
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
0x1
0x2 x3 x4
0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1
y1111111111110000
5-Input Functions implemented using two LUTs
• One CLB Slice can implement any function of 5 inputs• Logic function is partitioned between two LUTs• F5 multiplexer selects LUT
A4
A3
A2
A1 WS DI
D
LUTROMRAM
1
0
F4
F3
F2
F1
A4
A3
A2
A1
WS DI
D
LUTROMRAM
F5GXORG
nBXBX10
BX
X
F5
A4
A3
A2
A1 WS DI
D
LUTROMRAM
A4
A3
A2
A1 WS DI
D
LUTROMRAM
1
0
1
0
F4
F3
F2
F1
A4
A3
A2
A1
WS DI
D
LUTROMRAM
A4
A3
A2
A1
WS DI
D
LUTROMRAM
F5GXORG
F5GXORG
nBXBX10
nBXBX10
BX
X
F5
5-Input Functions implemented using two LUTs
LUTLUT
X5 X4 X3 X2 X1 Y0 0 0 0 0 00 0 0 0 1 10 0 0 1 0 00 0 0 1 1 00 0 1 0 0 10 0 1 0 1 10 0 1 1 0 00 0 1 1 1 00 1 0 0 0 10 1 0 0 1 00 1 0 1 0 00 1 0 1 1 10 1 1 0 0 10 1 1 0 1 10 1 1 1 0 10 1 1 1 1 11 0 0 0 0 01 0 0 0 1 01 0 0 1 0 01 0 0 1 1 01 0 1 0 0 01 0 1 0 1 01 0 1 1 0 01 0 1 1 1 11 1 0 0 0 01 1 0 0 1 11 1 0 1 0 01 1 0 1 1 11 1 1 0 0 01 1 1 0 1 11 1 1 1 0 01 1 1 1 1 0
LUTLUT
OUT
RAM16X1S
O
DWE
WCLKA0A1A2A3
RAM32X1S
O
DWEWCLKA0A1A2A3A4
RAM16X2S
O1
D0
WEWCLKA0A1A2A3
D1
O0
=
=LUT
LUT or
LUT
RAM16X1D
SPO
DWE
WCLKA0A1A2A3DPRA0 DPODPRA1DPRA2DPRA3
or
Distributed RAM
• CLB LUT configurable as Distributed RAM– A LUT equals 16x1 RAM– Implements Single and Dual-
Ports– Cascade LUTs to increase RAM
size
• Synchronous write• Synchronous/Asynchronous
read– Accompanying flip-flops used
for synchronous read
Interconnect • Interconnect connects signal from one CLB to
another CLB or to the IO.• Interconnects are distinguished by the relative length
of their segments: single-length lines, double-length lines and Longlines.
• In addition global buffers drive fast, low-skew nets most often used for clocks or global control signals.
• Innovative “active” interconnects to drastically reduce propagation delays of nets in newer devices– Individual R-C delays created by each programmable
interconnect point (PiP) adds to the total delay of the net
Field Programmability
• Field programmability is achieved through switches (Transistors controlled by memory elements or fuses)
• Switches control the following aspects• Interconnection among wire segments• Configuration of logic blocks
• Distributed memory elements which controls the switches and configuration of logic blocks are together called “Configuration Memory”
Technology of Programmable Elements• Vary from vendor to vendor. All share the common
property: Configurable in one of the two positions –‘ON’ or ‘OFF’
• Can be classified into three categories:– SRAM based– Anti-Fuse based– EPROM/EEPROM/Flash based
• Desired properties:• Minimum area consumption (for high density)• Low on resistance; High off resistance (for high speed)• Low parasitic capacitance to the attached wire (for high speed)• Reliability in volume production• Re-programmability• Cost
• In the Static RAM FPGA programmable connections are made using transistors, transmission gates, or multiplexers that are controlled by SRAM cells.
• The main feature of this technology is that it allows fast in-circuit reconfiguration.
• The FPGA can either actively read its configuration data out of external serial or byte-parallel PROM or the configuration data can be written into the FPGA.
Static RAM Technology
SRAM Reconfiguration• Advantages
– Design updates are easy, can be made to product already on field.
– Reconfiguration is fast– Selective reconfiguration is possible– Simplifies hardware design and debugging. Reduces time
to market.• Disadvantages
– The programmability causes reduction in speed of logic– Less efficient utilization of Silicon has cost implications– IP protection issues (reverse engineering)
Anti-Fuse Technology• An anti-fuse resides in a high-impedance
state; and can be programmed into low impedance or "fused" state.
• The link is created by melting the thin isolating dielectric between two metal layers.
Anti-Fuse Technology
• Less expensive than the RAM technology, however, device becomes OTP
• Advantages– Faster than a programmable switch– Secure programming technology prevents reverse
engineering and design theft– Retains configuration indefinitely
EPROM, EEPROM or Flash Based Programming Technology
• EPROM Programming Technology– Two gates: Floating and Select– Normal mode:
• No charge on floating gate• Transistor behaves as normal n-channel transistor
– Floating gate charged by applying high voltage• Threshold of transistor (as seen by gate) increases• Transistor turned off permanently
– Re-programmable by exposing to UV radiation
EPROM Programming Technology
• No external storage mechanism needed• Re-programmable• Not in-system re-programmable• Re-programming is a time consuming task
Few FPGA vendors
• Xilinx (SRAM)– Virtex-II, Virtex-II Pro, Pro-X and Virtex-4 (SoC)– Spartan-3 (low cost)
• Altera– Stratix-II (SRAM) (SoC)– Cyclone-II (low cost)
• Actel– ProASIC+ (Flash)– Axcelerator (Antifuse)
• Atmel, QuickLogic, DynaChip, ..
CLB Slice Structure• Each slice contains two sets of the
following:– Four-input LUT
• Any 4-input logic function,• or 16-bit x 1 sync RAM• or 16-bit shift register
– Carry & Control• Fast arithmetic logic• Multiplier logic• Multiplexer logic
– Storage element• Latch or flip-flop• Set and reset• True or inverted inputs• Sync. or async. control
Block RAM• Most efficient memory implementation
– Dedicated blocks of memory• Ideal for most memory requirements
– 4 to 14 memory blocks– 4 kbits per block– Use multiple blocks for larger memories
• Builds both single and true dual-port RAMs
FPGA Additional FeaturesBlock RAMSpecial I/Os Dedicated multipliers
Delay-Locked Loop (DLL)The most basic function of the DLL component is to eliminateclock skew.
Digital Clock Manager (DCM)FPGA devices provide flexible, complete control overclock frequency, phase shift and clk skew through the use ofthe DCM.
FPGA Features
RISC processor blocksExample: PowerPC™ (Vitrex II- Pro)
• Thirty-two 32-bit General Purpose Registers (GPRs)• Low power consumption: 0.9mW/MHz
• IBM CoreConnect bus architecture support
New featuresDedicated DSP blocks
Phase-matched clock dividers (PMCD)Dynamic reconfiguration port (DRP)
FPGA ConfigurationConfiguration is the process by which the bitstream of adesign, as generated by the Xilinx development software, isloaded into the internal configuration memory of the FPGA.
Configuration ModesSpartan-II devices support the following four configurationmodes: The Configuration mode pins (M2, M1, M0) select mode.
FPGAs Vs. CPLDs
• FPGAs have much smaller basic building block (e.g. 4-input LUT as compared to a 56 p-term PLA)
• FPGAs use more flexible and faster interconnects. Also number of interconnect tends to be much larger than a comparable CPLD
• FPGA structure is inherently scalable to realize very large logic matrices
• FPGAs tend to incorporate several features suitable for designing digital systems (more than just logic e.g. memories, clock buffers, DLLs etc)
• Most FPGAs are SRAM/antifuse based while CPLDs are EEPROM/Flash based
• Fastest FPGAs, tend to be SLOWer than fastest CPLDs for small, fast logic design requirements
• FPGA design cycles tend to be more complex