inf5430 ppchu chap09 · 1. poor design practice and remedy • synchronous design is the most...
TRANSCRIPT
Outline
1. Poor design practice and remedy2. More counters3. Register as fast temporary storage4. KDA application examples
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 1
4. KDA application examples
1. Poor design practice and remedy
• Synchronous design is the most important methodology
• Poor practice in the past (to save chips)– Misuse of asynchronous reset
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 2
– Misuse of asynchronous reset– Misuse of gated clock – Misuse of derived clock
Misuse of asynchronous reset• Poor design: use reset to clear register in
normal operation. • e.g., a poorly mod-10 counter
– Clear register immediately after the counter reaches 1010
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 3
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 4
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 5
• Problem– Glitches in transition 1001 (9) => 0000 (0)– Glitches in aync_clr can reset the counter– How about timing analysis? (maximal clock
rate)
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 6
rate)
• Asynchronous reset should only be used for power-on initialization
• Remedy: load “0000” synchronously
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 7
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 8
Misuse of gated clock• Poor design: use a and gate to disable the
clock to stop the register to get new value • E.g., a counter with an enable signal
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 9
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 10
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 11
• Problem– Gated clock width can be narrow– Gated clock may pass glitches of en– Difficult to design the clock distribution
network
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 12
network
• Remedy: use a synchronous enable
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 13
Misuse of derived clock• Subsystems may run at different clock rate• Poor design: use a derived slow clock for slow
subsystem
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 14
• Problem– Multiple clock distribution network– How about timing analysis? (maximal clock
rate)
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 15
rate)
• Better use a synchronous one-clock enable pulse
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 16
• E.g., second and minutes counter– Input: 1 MHz clock – Poor design:
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 17
– Better design
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 18
• VHDL code of poor design
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 19
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 20
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 21
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 22
• Remedy: use a synchronous 1-clock pulse
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 23
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 24
A word about power
• Power is a major design criteria now• In CMOS technology
– Dynamic power is proportional to the switching frequency of transistors
– High clock rate implies high switching freq
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 25
– High clock rate implies high switching freq
• Clock manipulation– Can reduce switching frequency– But should not be done at RT level
• Development flow: 1. Design/synthesize/verify a regular
synchronous subsystems 2(a). Derived clock: use special circuit (PLL etc.)
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 26
2(a). Derived clock: use special circuit (PLL etc.) to obtain derived clocks
2(b). Gated clock: use “power optimization” software tool to convert some register into gated clock
2. More counters• Counter circulates a set of specific patterns • Counter:
– Binary– Gray counter– Ring counter– Linear Feedback Shift Register (LFSR)
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 27
– Linear Feedback Shift Register (LFSR)– BCD counter
• Binary counter:– State follows binary counting sequence – Use an incrementor for the next-state logic
d qr_reg r_next
q
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 28
d
clk
q
reset
+1r_reg r_next
reset
clk
q
• Gray counter:– State changes one-
bit at a time – Use a Gray
incrementor
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 29
Gray code counter (section 7.5.1)
RTL Hardware Design by P. Chu
Chapter 7 30
• Direct implementation
RTL Hardware Design by P. Chu
Chapter 7 31
• Observation– Require 2n rows– No simple algorithm for gray code increment– One possible method
• Gray to binary• Increment the binary number
RTL Hardware Design by P. Chu
Chapter 7 32
• Increment the binary number• Binary to gray
• binary to gray
• gray to binary
RTL Hardware Design by P. Chu
Chapter 7 33
• gray to binary
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 34
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 35
Ring counter• Circulate a single 1• E.g., 4-bit ring counter:
1000, 0100, 0010, 0001• n patterns for n-bit register• Output appears as an n-phase signal
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 36
• Output appears as an n-phase signal• Non self-correcting design
– Insert “0001” at initialization and circulate the pattern in normal operation
– Fastest counter
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 37
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 38
• Self-correcting design:shifting in a ‘1’ only when 3 MSBs are 000
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 39
LFSR (Linear Feedback Shift Reg)
• A shifter reg with a special feedback circuit to generate the serial input
• The feedback circuit performs xor operation over specific bits
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 40
operation over specific bits• Can circulate through 2n-1 states for an n-
bit register
• E.g, 4-bit LFSR
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 41
• Property of LFSR– N-bit LFSR can cycle through 2n-1 states– The feedback circuit always exists – The sequence is pseudorandom
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 42
• Application of LFSR– Pseudorandom: used in testing, data
encryption/decryption– A counter with simple next-state logic
e.g., 128-bit LFSR using 3 xor gates to circulate
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 43
e.g., 128-bit LFSR using 3 xor gates to circulate 2128-1 patterns (takes 1012 years for a 100 GHz system)
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 44
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 45
• Read remaining of Section 9.2.3 (design to including 00..00 state)
• Read Section 9.2.4 (BCD counter, design similar to the second/minute counter in Section 9.1.3
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 46
Section 9.1.3
PWM (pulse width modulation)
• Duty cycle: percentage of time that the signal is asserted
• PWM: use a signal, w, to specify the duty cycle
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 47
cycle– Duty cycle is w/16 if w is not “0000”– Duty cycle is 16/16 if w is “0000”
• Implemented by a binary counter with a special output circuit
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 48
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 49
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 50
3. Register as fast temporary storage• RAM
– RAM cell designed at transistor level– Cell use minimal area– Behave like a latch– For mass storage– Need a special interface logic
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 51
– Need a special interface logic
• Register– D FF requires much larger area– Synchronous – For small, fast storage– E.g., register file, fast FIFO, Fast CAM (content
addressable memory)
Register file
• Registers arranged as an 1-d array• Each register is identified with an address• Normally has 1 write port (with write
enable signal)
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 52
enable signal)• Can has multiple read ports
• E.g., 4-word register file w/ 1 write port and two read ports
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 53
• Register array:– 4 registers– Each register has an enable signal
• Write decoding circuit:– 0000 if wr_en is 0 – 1 bit asserted according to w_addr if wr_en is 1
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 54
– 1 bit asserted according to w_addr if wr_en is 1
• Read circuit:– A mux for each read port
• 2-d data type needed
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 55
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 56
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 57
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 58
FIFO Buffer
• “Elastic” storage between two subsystems
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 59
• Circular queue implementation• Use two pointers and a “generic storage”
– Write pointer: point to the empty slot before the head of the queue
– Read pointer: point to the tail of the queue
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 60
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 61
• FIFO controller– Read and write pointers: 2 counters– Status circuit:
• Difficult• Design 1: Augmented binary counter• Design 2: with status FFs
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 62
– LSFR as counter
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 63
• Augmented binary counter: – increase the counter by 1 bits– Use LSB bits (in this case 3 bits) as register address– Use MSB bit to distinguish full or empty
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 64
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 65
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 66
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 67
• 2 extra status FFs– Full_reg/empty_reg memorize the current staus– Initialized as 0 and 1– Modified according to wr and rd signals
(i.e. wr&rd):• 00: no change• 11: advance read pointer/write pointer; full/empty no
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 68
• 11: advance read pointer/write pointer; full/empty no change due to both read and write operations
• 10: advance write pointer; de-assert empty; assert full if needed (when write pointer=read pointer)
• 01: advance read pointer; de-assert full; asserted empty if needed (when write pointer=read pointer)
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 69
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 70
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 71
• Non-binary counter for the pointer– Exact location does not matter as long as the
write pointer and read pointer follow the same pattern
– Other counters can be used for the second scheme
– E.g, use LFSR
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 72
– E.g, use LFSR
KDA application: FPGA modem overview
• Modem datapath functionality (i.e. TXD and RXD) created in System Generator Xilinx tool.
• Interfaces and high level control hand coded in VHDL• Implemented using Xilinx Virtex4 lx60
RW_IFIRQ
73
MCUPIFCRU
MBIF ADIF
TXD
RXD
TX_AGC
I/Q TX_DATA
RX_RSSI
RX_DATA
PA
MCLK
ARST_N
TX_CTRLTX_TDM
TX_EOW
RX_CTRLRX_TDM
RX_EOW RFIF
CLK_7M37
RF
KDA application: Top level TXD and RXD design
20.16 MHz
20.16 MHz
FIFO
FIFO
53.76 MHz
53.76 MHz
DAC
ADRX
TX
74
• Multiple rate and multiple clock domain System Generator design• 4+2 .ngc netlist files from System Generator integrated in top level VHDL
design• Timing constraints
– Multi-cycle timing constraints included in .ngc files (NB! ChipScope Pro)– All clock nets, clock domain crossings and other known paths constrained
leaving the number of unconstrained paths to a minimum.– ISE Timing Analyzer reports all unconstrained paths
KDA application: Packet DMA; the PTX module
• Packed DMA in transmission direction (i.e. from CPU).
/ 75 /INF5430
• Payload data first written to 32/64 kbyte Xilinx Block RAM (BRAM), and then two 16-bit words withpacked data start address in BRAM and number of bytes in packed are written to the cntrl packed FIFO.
• BUFRAM Finite State Machine (FSM) first reads the two control words and then reads payload data from BRAM.
• CTRL packed FIFO must be rather large (i.e. 1 kbyte) to be able to store many small packets.
• PCIe-PIF BRAM write interface and BUFRAM FSM read interface in different clock domains!
• For design verification during implementation data may be routed back (i.e. to the CPU) via the receiving PRX module (see next slide) in data loopback mode.
KDA application: Packet DMA; the PRX module
• Packet DMA in receiver direction (i.e. to the CPU).
/ 76 /INF5430
• Packet DMA in receiver direction (i.e. to the CPU).
• Payload data first stored in 32/64 kbyte BRAM, and then two 16-bit words with packed data startaddress in BRAM and number of bytes in packed are written to the cntrl packed FIFO
• The CPU first reads the two control words from the cntrl packet FIFO and then reads payload data from BRAM.
• CTRL packed FIFO must be rather large (i.e. 1 kbyte) to be able to store many small packets.
• PCIe-PIF BRAM read interface and BUFRAM FSM write interface in different clock domains!
• Payload data may be received from the PTX module in loopback mode.