designing for 100+mhz 1. 2 1999 designs demand... higher system speed higher integration...

1Designing for 100+MHz

Designing for 100+ MHzDesigning for 100+ MHz


1999 Designs Demand...1999 Designs Demand...

Higher system speed

Higher integration— smaller size, less power, better reliability

Lower cost

Shorter development time

Better product differentiation


Traditional Multi-Chip BoardsTraditional Multi-Chip Boards

Discrete design components— CPU, memory— bus transceivers, PCI controller, FIFOs— Ethernet controller, Graphics accelerator,

MPEG, DSP, etc.— programmable logic as glue and custom function

Advantages:— well-documented sophisticated functions— readily available as IP in silicon


Multi-Chip Board ProblemsMulti-Chip Board Problems

Physical size

Power consumption and reliability

PC board signal integrity

Limited flexibility— prevents design modifications and upgrades— prevents product diversification— prevents product customization

Poor product differentiation— standard parts = standard architecture


FPGA AdvantagesFPGA Advantages

Smaller size

Lower power consumption

Better signal integrity— fewer PC-board issues

Enhanced flexibility— easy modifications, upgrades, etc.

Enhanced product differentiation— proprietary architectures


FPGAs Users Want...FPGAs Users Want...

System clock rate of 100+ MHz

>100,000 gates

Efficient design methodologies

Availability of well-documented Cores

Reasonable cost


The FPGA SolutionThe FPGA Solution

4th Generation FPGALogic+Memory+Routing

Multi-Standard Select I/O

Temperature Sensing

Delay-Locked Loop for Fast Clock and I/O

3.3 ns SynchronousDual-Port SRAM

500 Mbps SelectMAP Configuration


Now the Challenge...Now the Challenge...

Together, we can do it...— we’ll supply the ingredients...— you use them intelligently

But don’t forget...— the clock period is less than 10 ns !

Design a 100+ MHz system


Designing for 100+ MHz.Designing for 100+ MHz.

Volts, Amps, and Watts— PCB signal distribution— chip inputs and outputs— power and thermal considerations

Ones and zeros— logic emulation

Bits and bytes— memory hierarchy


’65 ’70 ’75 ’80 ’85 ’90 ’95 ’00 ’05 ’10Year

Clock Frequency

Trace Length MHz

Inches per 1/4 Clock Period

2048

1024

512

256

128

64

32

16

8

4

2

1

Moore Meets EinsteinMoore Meets Einstein

Speed Doubles Every 5 Years…...But the speed of light never changes


Volts, Amps, and WattsVolts, Amps, and Watts PCB design issues

— capacative loading— transmission lines and termination

Chip inputs and outputs— clock distribution and DLLs— I/O standards

Power and thermal considerations— temperature sensing diode — power supply decoupling

Configuration— new SelectMAP mode


Capacitive LoadingCapacitive Loading

Capacitance slows outputs and increases power— output delay increase:

– ~ 25 ps per pF of additional loading— output power dissipation increase:

– 11 µW per MHz per pF with 3.3-V swing

Sources of capacitance— 10 pF max for each device pin— 2 pF per inch for narrow traces ( 0.8 pF/cm )— 130 pF per inch2 for copper areas ( 20 pF/cm2)

IBIS files provide output impedance details


Transmission LinesTransmission Lines

Some traces must be treated as transmission lines to minimize ringing— transmission line if round trip > transition time— lumped-capacitance if round trip < transition time

Signal delay on a PCB:— 140 to 180 ps per inch ( 50 to 70 ps/cm)

Lumped-capacitance trace length:— 3 inches max for a 1-ns transition time (7.5 cm)— 6 inches max for a 2-ns transition time (15 cm)


50 Ω

50 Ω

50 Ω

VCC

50 Ω

100 pF

100 Ω

100 Ω

(50 Ω Total)

22 Ω 27 Ω

Traditional Thevenintermination at the end

Dynamic termination at the end is better and saves power

Series termination at the source is best single source and destination only!

Terminated Transmission Lines Terminated Transmission Lines Reflections and ringingReflections and ringing


Clock

Data

On-Chip Clock DistributionOn-Chip Clock Distribution

Clock distribution introduces delay— larger chips suffer more clock delay

CLBIOB


IOBFlip-Flop

QDData

ClockClock

DistributionDelay

Clock

Required Data Valid(without delay)

Required Data Valid(with delay)

Delay

Clock Delay Problems Clock Delay Problems

Clock delay increases clock-to-output times

Clock delay leads to unacceptable input hold time— set-up time is negative

Additional data delay can eliminate the hold time — set-up time becomes positive— but tolerance build-up widens the data-valid window


DLLs Maximize I/O SpeedDLLs Maximize I/O Speed

Clock-to-output time plus set-up time determinesthe I/O speed and data bandwidth— min clock period = max clock-to-out + max set-up

Traditional solution:— use highly buffered, balanced clock trees

– needed to reduce internal clock skew– cannot totally eliminate the delay

The Virtex solution:— use a Delay-Locked-Loop ( DLL )

– aligns the internal and external clocks– effectively eliminates the clock-distribution delay


Clock

Data

ComparatorError

Delay

Virtex Has 4 Independent DLLsVirtex Has 4 Independent DLLs

DLLs adjust clock delay to align internal and external clocks— digital closed-loop control — 25 to 200-MHz range, 35-picosecond resolution

CLBIOB


Fast Clock-to-Out With DLLFast Clock-to-Out With DLL

Clock

3.8 ns

Virtex FPGA Virtex FPGA

Q

DLL

D

DLL

1.9 ns

0.5 ns

160 MHz inter-chip data rate— 16-mA LVTTL— IOB register to IOB register


LVTTL Data Rate with DLLLVTTL Data Rate with DLL

1.4 ns measured clock-to-output delay

Output standard = LVTTL Fast 16mA

(OBUF_F_16)

Temp=100C, Vdd=2.375V, Vcco=3.3V

Waveforms:

1: CLKIN

2: DATA OUT (no DLL)

3: DATA OUT (DLL deskewed)

Timing

w/o DLL w/ DLL

r->r r->f r->r r->f

3.9n 3.9n 1.4n 1.4n


Other DLL FunctionsOther DLL Functions

Double the incoming clock frequency — fast internal operation – slow external clock

Clock mirroring to the PCB

Divide clock by 1.5, 2, 2.5, 3, 4, 5, 8, or 16

Adjust clock duty cycle to 50-50

Create four quadrature clock phases— input four sequential bits per clock period


25 MHz 25% Duty

Cycle

25 MHz 50% Duty Cycle

Virtex FPGA

1X

Duty Cycle CorrectionDuty Cycle Correction

~25% duty cycle in – 50% duty cycle out

DLLDLL


Clock Doubling and MirroringClock Doubling and Mirroring

Clock mirror with less than 100 ps skew— simplifies PCB clock distribution

Virtex

Zero-Delay Internal Clock Buffer

37 MHz74 MHz #1

74 MHz #2

74 MHz Internal

37 MHz Internal

System Clock

SDRAM

Inside FPGA

Inside FPGA

SystemClock

1 Input Load ExactlyAligned

ExactlyAligned

Actual HDTV Customer Example

SDRAM

DLL 2DLL 2

DLL 1DLL 1


66MHz

Clock

132 MHz Clock

Virtex FPGA

2X

DLLDLL

Precise Clock MirroringPrecise Clock Mirroring

2x system clock for board use


CLKIn 200 MHz

CLKout 200 MHz

CLKDV 12.5 MHz

Clock DivisionClock Division

Divide clock by 1.5, 2, 2.5, 3, 4, 5, 8, or 16— maintain synchronous edges


Multi-Standard SelectI/OMulti-Standard SelectI/OGTL+

5V Tolerant

2.5V SSTL

1.8V

3.3V LVTTL

5V

MicroProcessorMicroProcessor SRAMSRAM

DSPDSP

Mixed SignalMixed Signal

Busses/Backplanes(3/5V PCI, ISA, GTL…)

Busses/Backplanes(3/5V PCI, ISA, GTL…)

FLASHFLASH

SDRAMSDRAMSDRAM


Mix & Match Output StandardsMix & Match Output Standards

User-supplied voltages determine output swing— 3.3 V, 2.5 V, 1.5 V— one voltage per bank— a bank is half of a chip edge

Output characteristics are programmable on a per-pin basis— push-pull or open-drain— LVTTL drive strength

– 2-mA to 24-mA sink and source current— LVTTL Slew rate


InternalReference

VREF

Input

Input

Input

Input

Input

Input

VREF

Mix & Match Input StandardsMix & Match Input Standards

Internal or user-supplied threshold voltage— selectable on a per-pin basis— one user-supplied

threshold voltage per bank

Programmable over-voltage protection— 5-V tolerant or diode

clamp to VCCO— selectable on a per-pin basis


SSTL Clock-to-Out With DLLSSTL Clock-to-Out With DLL

200 MHz inter-chip data rate— SSTL 3, Class II— IOB register to IOB register

Clock

2.8 ns

Virtex FPGA Virtex FPGA

Q

DLL

D

DLL

1.9 ns

0.3 ns

(Stub Series Transceiver Logic)


SSTL Data Rate with DLLSSTL Data Rate with DLL

Output standard = SSTL 3 Class 2

(OBUF_SSTL3_II)

Temp=100C, Vdd=2.375V, Vcco=3.3V, Vtt=1.5V

Waveforms:

1: CLKIN

2: DATA OUT (no DLL)

3: DATA OUT (DLL deskewed)

Timing

w/o DLL w/ DLL

r->r r->f r->r r->f

3.5n 3.8n 1.1n 1.3n

1.3 ns measured clock-to-output delay— much lower noise than LVTTL


From FPGA to System ComponentFrom FPGA to System Component‘Redefining the FPGA’‘Redefining the FPGA’

"Virtex moves FPGAs from glue to system component” - Ron Neale, EE

GT

L+

High Speed System Backplane

Low VoltageCPU

LVTTL

SD

RA

M (

133M

Hz)

SSTL3

Cache SRAM (Mbytes)

LVCMOS

Chip 1 Chip 1

x1 CLK x2 CLK


Power and Thermal IssuesPower and Thermal Issues

Power and heat are serious concerns

All CMOS power consumption is dynamic— proportional to VCC

2

— proportional to capacitance— proportional to frequency

Virtex conserves power— 2.5-V supply voltage— small geometries and short interconnects

reduce capacitance


384 16-bit Counters 2.5 W Total




XCV300

XCV1000

Virtex Power ConsumptionVirtex Power Consumption

Virtex is designed to conserve power— 100 MHz 16-bit counters

– 12.5 MHz average transition rate– 6.5 mW per counter including clock distribution

— 100 MHz 8-bit counters– 25 MHz average transition rate– 5 mW per counter including clock distribution


DXP

DXN

VirtexFPGA SBMCLK

SBMDATA

ALERT

MaximMAX1617

Thermal ManagementThermal Management

Temperature-sensing diode— matched to maxim MAX 1617 A/D— programmable alarms— similar to the Pentium II solution


Power Supply DecouplingPower Supply Decoupling

CMOS power-supply current is dynamic— current pulse every active clock edge

Peak current can be 5x the average current— instantaneous current peaks can only be

supplied by decoupling capacitors

Use one 0.1 µF ceramic chip capacitor for each power-supply pin— low L and R are more important than high C— double up for lower L and R if necessary— use direct vias to the supply planes, close to the

power-supply pins


VirtexFPGA

WE, CS Data

Virtex ConfigurationVirtex Configuration

New byte-wide SelectMAP mode— up to 528 Mbps at 66 MHz

– simple handshake protocol— up to 400 Mbps at 50 MHz

– no handshake required

Configuration bit-stream length— 0.5 Mbits to 6.1 Mbits

CSAddress

ConfigurationEPROM

Control Logic(EPLD)

Busy


Volts, Amps, and Watts: RecapVolts, Amps, and Watts: Recap PCB design issues

— minimize capacitance for higher speed— terminate transmission lines to reduce ringing

Chip inputs and outputs— use DLLs to maximize I/O bandwidth— use SelectI/O to interface with different standards

Power and thermal considerations— use the sensing diode to manage chip temperature— decouple the power supply well

Configuration— configure faster with the SelectMAP mode


Designing for 100+ MHz.Designing for 100+ MHz.

Volts, Amps, and Watts— PCB Signal Distribution— chip Inputs and Outputs— power and Thermal Considerations

Ones and zeros— logic Emulation



Spending the 10 ns BudgetSpending the 10 ns Budget

Fast logic requires fast function generators— signals often pass through several

function generators

Routing delays must also be kept short— there are routing delays between every

function generator

Arithmetic delays are important— carry chains often create critical paths


You Don’t Have To Be An ExpertYou Don’t Have To Be An Expert

You don’t have to be an FPGA architecture expert to implement high-performance designs— the benefits of a good architecture are automatic

– all the logic goes faster – software provides easy access to the features

You can achieve high-performance only with a good FPGA architecture— a good FPGA empowers its users

You’ll design better if you know the architecture— matching your design style to the available features increases

performance and/or lowers cost


CarryFnctGen

CarryFnctGen

CarryFnctGen

CarryFnctGen

Virtex CLBVirtex CLB Logic and arithmetic delay reduction demands

improvements in the CLB

Virtex CLB is divided into two slices, each with:– 2 function generators– 2 flip-flops– 2 bits of carry logic


Fast Function GeneratorsFast Function Generators

Each function generator emulates 2 to 3 levels of logic— a 10-level logic path typically requires

3 to 5 Function Generators in series— at 100 MHz, they must be less than

2 ns each including the routing

Virtex has 0.6-ns function generators— leaves 1.4 ns for each route


F5 F5

FnctGen

F6FnctGen

FnctGen

FnctGen

Connecting Function GeneratorsConnecting Function Generators Some functions need several function generators

— F5 MUXs connect pairs of function generators– functions with 5 to 9 inputs

— F6 MUXs connect all 4 function generators– functions with 6 to 17 inputs


CarryFnctGen

CarryFnctGen

CarryFnctGen

CarryFnctGen

CarryFnctGen

CarryFnctGen

CarryFnctGen

CarryFnctGen

Fast Local RoutingFast Local Routing Local routing provides fast interconnects

— in a CLB, Function Generators connect with minimal routing delays

— fast paths between adjacent CLBs increases flexibility


Use Pipelining for SpeedUse Pipelining for Speed

Shorter clock periods means doing less each period— create a pipeline structure— pipeline stages operate concurrently— more functions are done at the same time— throughput increases

All function generators have output flip-flops— most pipeline support is “free”


In directly cascaded pipelines the flip-flopsare not free

One SRLUT can implementup to 16 bits of delay— shift data in and select

the appropriate tap

16

-Bit

Sh

ift

Re

gis

ter

16-Bit Pipeline in One LUT16-Bit Pipeline in One LUT

Input

Output

DelaySelect


Fast Logic Needs Fast RoutingFast Logic Needs Fast Routing

Our typical design with 3 to 5 CLBs needed an average routing delay of 1.4 ns or less— the Virtex routing

architecture deliversthis performance

Delay is independentof direction— dependably

short delays

Vector-based Interconnect

The circles show 1.4-ns routing regions


Go Farther, FasterGo Farther, Faster

Virtex achieves its speed through a hierarchy of highly buffered routing resources— wires span 1, 2, or 6 CLBs

The Virtex routing architecture is designed for large arrays— today’s FPGAs are big…

but tomorrow’s will be even bigger

Virtex is designed to maintain its performance even in very large arrays


No Routing CongestionNo Routing Congestion

For high-speed applications, routing must be dependably fast— not just capable of being fast

In the past, high device utilization has caused routing congestion— critical nets might be forced to meander

Virtex minimizes these problems— abundant resources prevent congestion

If it needs to be fast, it will be fast – automatically!


CLB CLB CLB CLB CLB

Built-in Tri-State BussesBuilt-in Tri-State Busses

Bi-directional busses are supported directly by tri-state buffers built into each CLB— two drivers per CLB— segmentable every four CLB columns


Arithmetic – A Special CaseArithmetic – A Special Case

Adders, accumulators, counters, and comparators all depend on carry chains

Carry-chain logic is usually much deeper than the rest of the design— 32 levels for a 16-bit ripple adder— too deep to use function generators at 100 MHz— arithmetic delays would limit performance

Dedicated carry logic provides the desired speed— 16-bit adders can operate at up to

200 MHz register-to-register


Wide ArithmeticWide Arithmetic

64-bit adders would require 128 levels of logic— expensive complex carry schemes would be needed

to preserve performance

Virtex minimizes the carry propagation delay— 100 ps per bit pair— zero routing delay between CLBs

Minimal performance loss for each extra bit

16-bit adders operate at up to 200 MHz64-bit adders operate at up to 135 MHz


Efficient Virtex MultipliersEfficient Virtex Multipliers

Cascade vs. tree structure— cascade simpler and smaller— tree is faster

Virtex gives the best of both worlds— as fast as a tree— smaller than a cascade

160 MHz clock rate for pipelined 16 x 16 multiplier

4 x 4 8 x 8 16 x 16

CascadeTreeVirtex Tree

4 x 4 8 x 8 16 x 16

CascadeTreeVirtex Tree

Del

ayN

umbe

r of C

LBs


0 1

0

0 1

0

0 1

0

0 1

0 1

Fast Address DecodersFast Address Decoders

Wide address decoderscould slow operation— wide AND gates with

invertable inputs

Virtex carry-chain MUXscan act as AND gates— combine function

generator ANDs

64-bit decoders operateat up to 155 MHz


Speed Is Never WastedSpeed Is Never Wasted

You can never have too much performance— excess performance can always be traded for

size and cost reduction

Replace single-cycle functions with smaller multi-cycle versions— a 2-cycle multiplier is half the cost of a

single-cycle multiplier

Reduce costs by designing downto the performance you need


2X 2X

DLL2DLL2DLL1DLL1

90 MHz 180 MHz

45 MHz

Creating a High-Speed ClockCreating a High-Speed Clock

Logic sometimes needs to operate faster than the available clock— multiple RAM accesses in a single cycle— low-speed PCB clock distribution for power or

noise reduction

Virtex DLLs can double and redouble incoming clocks


Optimized for the FutureOptimized for the Future

Deep sub-micron technology permits larger and larger array sizes— poses new circuit-design challenges— changes the rules of FPGA architecture

Across-chip routing is the most vulnerable— could easily limit design performance

Virtex is designed for long-term growth— even long, across-chip routes will remain fast

Virtex is tomorrow’s FPGA… today!


10 ns 10 ns isis Long Enough Long Enough

Virtex CLBs can implement relatively complex functions in 10 ns— 0.6 ns per 4-input function generator

Virtex offers fast interconnections— even across-chip when fully utilized— fast tri-state buses

Support for very fast arithmetic operations— 16-bit adders at 200MHz


Implement Designs Implement Designs Automatically Automatically

You don’t have to be an FPGA wizard to use Virtex

Virtex is optimized for automated implementation— uniform structure

– efficient mapping/synthesis— ample routing

– simple placement and no congestion— predictable performance

– effective synthesis

IP cores speed design even more— validated functionality with guaranteed performance



Volts, Amps, and Watts— PCB signal distribution— chip inputs and outputs— power and thermal considerations

Ones and zeros— logic emulation



100+ MHz Memory100+ MHz Memory

Virtex memory operates up to 200 MHz

High-speed memory has two benefits— data storage

– “work-in-progress”

– input/output buffers, FIFOs

— accelerating complex functions– store pre-computed values in look-up tables


Data Storage HierarchyData Storage Hierarchy

Virtex supports 3 levels of memory hierarchy On-chip SelectRAM+

— small-to-medium memories — 0.6-ns read access time

On-chip Block SelectRAM+ — larger memories— true dual-ported operation— 3.3-ns read access time

Fast SelectI/O interfaces to external RAM— DLL boosts memory bandwidth


SelectRAM+SelectRAM+

SelectRAM+ uses CLB LUTs as user memory— 16-deep RAMs— 32-deep RAMs— 16-deep dual-ported RAMs— 16-deep shift registers

Cascadable for larger memories— 128 or more words deep— uses logic resources for expansion


Block SelectRAM+Block SelectRAM+

Up to 32 dual-ported 4096-bit RAM Blocks— synchronous read and write

True dual-port memory— each port has full read and write capability— different clocks for each port

Configurable aspect ratio— trade width for depth

– 4096 x 1 bit to 256 x 16 bits— separate configurations for each port

Dedicated routing for memory expansion


High-Speed Memory InterfacesHigh-Speed Memory Interfaces

SelectI0 and DLLs together provide fast access to many types of external memory

Xilinx currently offers two reference designs— fully synthesized— automatic placement and routing

SDRAM … up to 125 MHz

ZBTRAM … up to 143 MHz

(Zero Bus-Turn-around)


Input/Output Data BuffersInput/Output Data Buffers

High-performance systems need data buffers to decouple internal operation from I/O activity— I/O may be sporadic (burst-mode busses) — I/O may be faster or slower— I/O may be wider or narrower

I/O buffers can take several forms — dual-ported RAMs— ping-pong buffers— FIFOs


Dual-ported I/O BuffersDual-ported I/O Buffers

Block SelectRAM+ is ideal for I/O buffers— dual-ported operation

– independent clocks and controls– bridges between clock domains– simultaneous read and write

— port-specific aspect-ratio control– built-in rate/width conversions

SelectRAM+ provides similar benefits on a smaller scale


Ping-pong buffers are pairs of blocks that alternate between input and processing

SRLUT for small buffers— self-addressing input— 0.6-ns read access

Larger buffers can usethe dual-ported Block RAM— one address bit alternates

read/write areas— 3.3-ns read access

16-B

it S

hif

t R

egis

ter

16-B

it S

hif

t R

egis

ter

Select

ReadAddress

Input

Output

Ping Pong BuffersPing Pong Buffers


Small FIFOs can be implemented in SRLUTs— word count addresses the output data— increment and enable SRLUT to Push— decrement to Pop— enable only for both

16-Byte FIFO in 4 CLBs— 16 x 16 in 6 CLBs— 200+ MHz

Expandable for deeperFIFOs 1

6-B

it S

hif

t R

egis

ter

Input

Down

WordCounter

Up

Push

Pop

Small FIFOs in SRLUTsSmall FIFOs in SRLUTs

Output


Large FIFOs in Block RAMLarge FIFOs in Block RAM

Large FIFOs can use the dual-ported block RAM— add read and write

address counters

Asynchronous push and pop

Different port sizes give rate-for-width conversion

Block RAM FIFOs can operate at up to 170 MHz including flag logic

BlockSelectRAM+

Input Output

Push

Pop

Addrs Addrs

WE

Data Data

Co

un

ter

En En

Control LogicFull Empty

Co

un

ter


Pre-computing for SpeedPre-computing for Speed

Some functions are too complex for 10-ns logic implementation— pipelining is not always possible

An alternative is to pre-compute all the possible results and store them in memory— select a result according to the inputs

Function time is independent of complexity— 0.6 ns SelectRAM+ access time— 3.3 ns Block SelectRAM+ access time

The function table can be smaller than the logic


Multiplication By A ConstantMultiplication By A Constant

Sometimes, data has to be “scaled”— multiplied by a constant value

A full multiplier is too expensive— it can multiply by a variable— unnecessarily general and too

complex

Storing all multiples of the constant is a better alternative — smaller and much faster

Constant

InputMultiplier

ArrayScaledData

Input ScaledData

ProductTable


A 216-word product table is impractical— partition the input into nibbles

– use 16-word LUTs for nibble products– combine the partial products in adders

Roughly half the CLBs of a full multiplier— for a 16-bit Coefficient:

36 CLBs vs.62 CLBs

Pipeline the addersfor extra speed

ScaledData

Input

LUT

LUT

LUT

LUT

x16

x256

x4096

16-bit Scaler16-bit Scaler


The SRLUT mode can be used to update the table— “push-only” stack— last 16 bits loaded define the table

A simple accumulatorcomputes all productsof a new constant Output

ClearConstant

ChangeConstant

Reg-isterReg-

ister

Load

Changing the ConstantChanging the Constant

16

-Bit

Sh

ift

Re

gis

ter

Input


Large Function TablesLarge Function Tables

Larger functions can be implemented in the Block SelectRAM+— 12-input functions— micro-coded state machines

Data tables can also be implemented— sine/cosine tables for DSP, for example— dual-ported access gives the sine and cosine

simultaneously— a simple address offset gives 90º phase shift for

accessing sine and cosine from a single table


Block RAM/ROM CreationBlock RAM/ROM Creation

CORE Generator software creates RAMs and ROMs— simple GUI

interface

Initialization file is loaded into RAMs and ROMs at configuration time


Memory SummaryMemory Summary

Virtex has two kinds of internal memory — distributed SelectRAM+ for small RAMs— Block SelectRAM+ for larger RAMs

SelectRAM+— 0.6 ns read access time— 16- and 32-word RAMs— 16-word dual-ported RAMs— 16-word shift registers

– sequential write/random-access read– FIFOs, pipelining, LUT functions, etc...


Memory SummaryMemory Summary

Dual-ported 4096-bit Block SelectRAM+— 3.3 ns read access time— true dual-ported operation

– both ports are read/write– ports can be clocked asynchronously

— configurable aspect ratio– 4096 x 1 bit to 256 x 16 bits– configure ports differently for width/rate conversion

High-speed SelectI/O access to external RAM



Volts, Amps, and Watts— DLLs and flexible I/O standards— fast inter-chip communication— simple rules for good signal integrity

Ones and zeros— fast logic and fast interconnect— dependable high performance

Bits and bytes— distributed SelectRAM+— dual-ported Block SelectRAM+


The Virtex FamilyThe Virtex Family

The complete Virtex Data Sheet is on your AppLinx CD-ROMand at www.xilinx.com/partinfo/virtex.pdf

XCV50 XCV100 XCV150 XCV200 XCV300 XCV400 XCV600 XCV800 XCV1000

System Gates 57,906 108,904 164,674 236,666 322,970 468,252 661,111 888,439 1,124,022

Logic Cells 1,758 2,700 3,888 5,292 6,912 10,800 15,552 21,168 27,648

Block RAM 32 Kb 40 Kb 48 Kb 56 Kb 64 Kb 80 Kb 96 Kb 112 Kb 128 Kb

User I/OCS144 94 94

TQ144 94 94PQ/HQ240 164 164 164 164 164 164 164 164

BG256 180 180BG352 260 260 260BG432 316 316 316 316BG560 404 404 404 404

FG256 176 176 176 176FG456 260 284 312FG600 404 404 404FG680 500 514 514

designing for 100+mhz 1. 2 1999 designs demand... higher system speed higher integration...

Documents