fpga: applications and examples
Post on 23-Feb-2016
73 Views
Preview:
DESCRIPTION
TRANSCRIPT
FPGA: Applications and Examples
Wu, JinyuanFermilabJune 2014
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 2
Introduction A design example of a TDC is presented in detail. The functional blocks used in this TDC is
expected reusable in other projects.
Design Example:ADC without ADC
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 3
FPGA: Applications and Examples 4June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Digitization of Analog Waveforms
DRS4
ADC
ADC
ADC
ADC
FPGA
There are applications of digitizing slow waveforms at 20 to 50 MSPS. Fast waveforms can be stored in DRS4 and digitized at slower rate. ADC chips cost and power consumption are relatively high.
AMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
ADC
ADC
ADC
ADC
FPGA
Slow Digitization~50 MSPS
Fast Digitization ~5 GSPSvia DRS4 slow down to ~50 MSPS
FPGA: Applications and Examples 5June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
ADC Using FPGAAMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
ADC
ADC
ADC
ADC
FPGA
TDC
TDC
TDC
TDC
R1 R1
C
R2
FPGA
VREF
Analog signals from AMP & Shapers are directly fed to FPGA pins.
FPGA outputs and passive RC network are used to generate ramping reference voltage VREF.
The input voltages and VREF are compared using FPGA differential input receivers.
The times of transitions representing input voltage values are digitized by TDC blocks in FPGA.
T1 T2 T3 T4
V1 V2V3 V4
V1 V2V3 V4
T1 T2 T3 T4
FPGA: Applications and Examples 6June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Using TDC as ADC: Single Ended Version
1
1.5
2
2.5
2500 3000 3500 4000 4500 5000 5500
t(ns)
V
Leading Ramp Trailing Ramp
0
8
16
24
32
40
48
56
64
0 32 64 96 128 160 192 224 256
Leading Ramp Trailing Ramp
RawData
Input Waveform, Overlap Trigger& Reference Voltage
Converted
FPGA
TDC
TDC
50 50
1000pF
100
VREF
7
Using TDC as ADC: Differential Version
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
0
0.5
1
1.5
2
2.5
0 32 64 96 128 160 192 224 256
V
t (ns)
Vc+ Vc- Vin+/2 Vin-/2
FPGA
TDC
TDC
R RC
R1
VREF+
4xR2
4xR2
VREF-
VIN1+VIN1-
VIN2+VIN2-
8
DelayLine &
SamplingRegister
Array
TDC Core: High Hit Rate High Precision Version
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
Input transitions are sampled in the delay line & sampling register array. The position of the transition in the array reflect transition time.
The position is encoded into fine time. The raw time data are send through the bin-by-
bin calibration block to compensate for the differences of the bin widths.
The encoded times of transitions are temporarily stored in the pipeline buffer.
When the device is triggered, data in the pipeline buffers are read out via the data load transfer registers.
The coarse times are attached with the fine times and stored in the zero suppression buffer.
Data are sent out of the device using LVDS serial link.
CK250
DataLoad/
TransferRegister
Encoder &Calibration
IN0
IN1
IN2
IN3
EventBuffer w/
ZeroSuppression
Encoder &Calibration
Encoder &Calibration
Encoder &Calibration
PipelineBuffer
PipelineBuffer
PipelineBuffer
PipelineBuffer
SerializeData
Output
TriggerLogic
&Timing
Trigger
Channel, CoarseTime
Reset
Delay Line &Sampling Register Array
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 9
DelayLine &
SamplingRegister
ArrayCK250
DataLoad/
TransferRegister
Encoder &Calibration
IN0
IN1
IN2
IN3
EventBuffer w/
ZeroSuppression
Encoder &Calibration
Encoder &Calibration
Encoder &Calibration
PipelineBuffer
PipelineBuffer
PipelineBuffer
PipelineBuffer
SerialData
Output
TriggerLogic
&Timing
Trigger
Channel, CoarseTime
Reset
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 10
TDC Using FPGA Logic Chain Delay
This scheme uses current FPGA technology
Low cost chip family can be used. (e.g. EP2C8T144C6 $31.68)
Fine TDC precision can be implemented in slow devices (e.g., 20 ps in a 400 MHz chip).
IN
CLK
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 11
Good, However
Auto calibration solved some problems However, it won’t eliminate the ultra-wide bins
0
20
40
60
80
100
120
140
160
180
0 16 32 48 64
bin
wid
th (p
s)
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 12
Wave Union Launcher A
In
CLK
1: Unleash0: HoldWave UnionLauncher A
Regular TDC records only one transition
Wave Union TDC records multiple transitions.
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 13
Wave Union Launcher A: 2 Measurements/hit
1: Unleash
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 14
Sub-dividing Ultra-wide Bins
1: Unleash
1
2
1
2
Device: EP2C8T144C6 Plain TDC:
Max. bin width: 160 ps. Average bin width: 60 ps.
Wave Union TDC A: Max. bin width: 65 ps. Average bin width: 30 ps.
0
2040
6080
100120
140160
180
0 16 32 48 64 80 96 112 128bin
wid
th (p
s)
Plain TDC
Wave Union TDC A
Encoder & CalibrationJune 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 15
DelayLine &
SamplingRegister
ArrayCK250
DataLoad/
TransferRegister
Encoder &Calibration
IN0
IN1
IN2
IN3
EventBuffer w/
ZeroSuppression
Encoder &Calibration
Encoder &Calibration
Encoder &Calibration
PipelineBuffer
PipelineBuffer
PipelineBuffer
PipelineBuffer
SerialData
Output
TriggerLogic
&Timing
Trigger
Channel, CoarseTime
Reset
16
Encoder of Thermometer Code
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
When the sampling register array outputs a thermometer code, a transition is captured by the TDC.
The transition is detected by a set of AND gates.
For an ideal thermometer code, only one output of the AND gates =1.
Several OR gates are used to generate the fine time value (QTF[5..0]) and the fine time valid signal (QTFV).
Note that in practical design, two stages of pipeline registers are inserted to ensure fast operating frequency.
Every clock cycle, there is a QTF value output. But it is a valid hit only if QTFV = 1.
0
31
QY[n] = D[n] & !D[n+1]
QYD
QTFVQY[31]
QY[0]
QTF[5]QY[31]
QY[16]
QTF[0]
QY[1]
QTF[4]
QY[3]
QY[31]
QTF[3]
QTF[2]
QTF[1]
17
Bubbles in Thermometer Code
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
In practical implementation, the output of the sampling register array may have a “bubble” in the thermometer code.
Regular transition detecting logic would set multiple bits.
An modified logic will eliminate the bubble and output only the first edge.
0
31
QY[n] =D[n] &
!D[n+1] &!D[n+2] &
…
QYD
QY[n] = D[n] & !D[n+1]
18
Taking Bubbles into Account
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
The bubbles are correlated to the input time.
The number of set bits is a better code for input time.
t0 t0 + dt1 t0 + dt1 + dt2t0 + dt1 + dt2+ dt3
t0 + dt1 + dt2+ dt3 +dt4
nn+1 n+2 n+3 n+4
19
Encoder of Wave Union TDC
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
In the wave union TDC, the output from the sampling register array contains multiple transitions.
The locations of each transitions are encoded and these location codes are combined into one fine time code.
One combination method we used is to sum the first and the third location codes.
0
31
QY[n] =
QYD
QTFV
QTF[]
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 20
0
500
1000
1500
2000
2500
0 16 32 48 64
bin
time
(ps)
Auto Calibration Using Histogram Method It provides a bin-by-bin calibration at
certain temperature. It is a turn-key solution (bin in, ps out) It is semi-continuous (auto update
LUT every 16K events)
0
20
40
60
80
100
120
140
160
180
0 16 32 48 64
bin
wid
th (p
s)
DNLHistogram
In (bin)LUT
S
Out (ps)
16KEvents
FPGA: Applications and Examples 21June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Histogram Booking
+1
RA,WA
TF
FPGA: Applications and Examples 22June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Block RAM Based Histogram
D QTF4
TFV
RAM
QDWAWERA
D QD Q
+1
D Q
D Q
Pipelined structure allows higher operating frequency yielding higher throughput.
Restriction: Same bin is not hit within 4 cycles.
TF0TF1TF2TF3
N1 N0+1
TF0 TF1 TF2 TF3 TF4TF0 TF1 TF2 TF3 TF4
TF0 TF1 TF2 TF3 TF4TF0 TF1 TF2 TF3 TF4N0 N1 N2 N3 N4
TF0 TF1 TF2 TF3 TF4N0+1 N1+1 N2+1 N3+1 N4+1
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 23
Calibration Pulse Generation: Random != Uniform
16384 Events
When number of events is finite, random hits has large fluctuations. Pulses with evenly spread timing relative to the TDC clock are desirable.
24
Cascaded PLL Circuits
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
VCCCLOCK_50 INPUT
Cy clone III
inclk0 f requency : 152.381 MHzOperation Mode: No Compensation
Clk Ratio Ph (dg) DC (%)
c0 105/64 0.00 50.00
inclk0 c0
altpll3
inst25
Cy clone III
inclk0 f requency : 152.381 MHzOperation Mode: No Compensation
Clk Ratio Ph (dg) DC (%)
c0 64/39 0.00 50.00
inclk0 c0
altpll4
inst30
Cy clone III
inclk0 f requency : 50.000 MHzOperation Mode: No Compensation
Clk Ratio Ph (dg) DC (%)
c0 64/21 0.00 50.00
inclk0 c0
altpll0
inst21
CK_B
CK250aCK_B
CK251cCK_B
Two stages of PLL circuits are cascaded together. f(CK250a) = 250 MHz f(CK251c) = 250.06 MHz f(CK251c) = (4096/4095)*f(CK250a)
T(CK250a) - T(CK251c) = 0.97 ps.
CK250a
CK251cCLOCK_50
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 25
Test Result in an Oscilloscope Screen Capture
A total of 16384 Calibration edges are collected. Entire 4000 ps range are scanned 4 times (4*4096 = 16384). The histogram (with 50 ps/bin) serves as a demonstration of calibration lookup
table.
Trigger EdgesBy CK250a
Calibration EdgesBy CK251c
CalibrationLookup Table
Pipeline BufferJune 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 26
DelayLine &
SamplingRegister
ArrayCK250
DataLoad/
TransferRegister
Encoder &Calibration
IN0
IN1
IN2
IN3
EventBuffer w/
ZeroSuppression
Encoder &Calibration
Encoder &Calibration
Encoder &Calibration
PipelineBuffer
PipelineBuffer
PipelineBuffer
PipelineBuffer
SerialData
Output
TriggerLogic
&Timing
Trigger
Channel, CoarseTime
Reset
FPGA: Applications and Examples 27June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Pipeline & Its Implementation
CNT
D Q D Q D Q D QD Q
(RESET)
WE=1
A pipeline stores one data word per clock cycle, regardless the data is valid or not.
The pipeline looks like register arrays chained together.
But it is more economical to implement using block RAM inside FPGA.
RAM
QDWAWERA
FPGA: Applications and Examples 28June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Pipeline Buffer with Ping-Pong Pages
WE=1
A single RAM block (e. g. 256 words) is divided into two logic pages with 128 words each.
Selection of the two pages is done by the highest address bit.
TDC data are stored in the writing page. After 128 clock cycles, new data
overwrite the old data. The page keeps a history of 128 clock cycles long.
Once a trigger arrives, the writing and reading pages swap to each other, the data before trigger will be read out while the new data are stored into the current writing page.
PGW, Writing Page
PGR = !PGW, Reading Page
WA0=TC0
1
2
3
4
5
WA6=TC6
PGW
WA, Writing Address
0
1
2
3
4
5
RA6
PGR
RA, Reading Address
RAM
QDWAWERA
Data Load andTransfer Register
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 29
DelayLine &
SamplingRegister
ArrayCK250
DataLoad/
TransferRegister
Encoder &Calibration
IN0
IN1
IN2
IN3
EventBuffer w/
ZeroSuppression
Encoder &Calibration
Encoder &Calibration
Encoder &Calibration
PipelineBuffer
PipelineBuffer
PipelineBuffer
PipelineBuffer
SerialData
Output
TriggerLogic
&Timing
Trigger
Channel, CoarseTime
Reset
FPGA: Applications and Examples 30June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
The Load/Transfer (Shift) RegistersQ[
]D3[]
When LD=1, data from D3[] to D0[] are load to the register array.
When LD=0, data are shifted out of the Q0[] port, one word per clock cycle.
Data from multiple channels are merged into a single stream.
Q[]
Q[]
D1[]
Q[]
LD=1,0,0,0
D2[]
D0[]
Q0[]=D0,D1,D2,D3
Trigger Logic &Timing
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 31
DelayLine &
SamplingRegister
ArrayCK250
DataLoad/
TransferRegister
Encoder &Calibration
IN0
IN1
IN2
IN3
EventBuffer w/
ZeroSuppression
Encoder &Calibration
Encoder &Calibration
Encoder &Calibration
PipelineBuffer
PipelineBuffer
PipelineBuffer
PipelineBuffer
SerialData
Output
TriggerLogic
&Timing
Trigger
Channel, CoarseTime
Reset
32
InputSignal
Conditioning(Optional)
Trigger and Timing
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
Upon Reset, the coarse time counter counts from 0.
When trigger arrives, a internal readout sequence is started which controls data moving out of pipeline to the event buffer.
ReadoutSequencer
TriggerLogic
&Timing
Trigger
Trigger
TRIGP MC (RA)
LD
InputSignal
Conditioning(Optional)
Reset
CoarseTime
Counter
TC (WA)
Reset
RSTP
ECNT
33
Signal Conditioning: De-Glitch
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
For most signals received from outside of the FPGA device, receiving it with a D flip-flop is usually sufficient.
If the external signal travels over very long cable and the termination is poor, the de-glitch circuit shown above can be used.
More delay stages can be added for even better performance.
VCCCK INPUT
CLRN
DPRN
Q
DFF
inst58
CLRN
DPRN
Q
DFF
inst60
CLRN
DPRN
Q
DFF
inst70
CK
CK
VCCINPUT INPUT
QOUTOUTPUT
OR2
inst
Q2
CK
Q1
34
Signal Conditioning: De-Bounce
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
If the input is a mechanical switch, de-bouncing is usually necessary. The active input causes a counter to set with MSB (Qcnt[7]) becomes 1. When the input = 0, the MSB is still high and the counter increases by 1 every clock cycle. After 128 clock cycles, the MSB becomes 0. This way, the output signal OUTP is a smooth pulse without glitch as long as the off time
interval of the SW is < 128 clock cycles.
up countersset 128sset
clock
cnt_enq[7..0]
lpm_counter3
inst1
CK
CLRN
DPRN
Q
DFF
inst59
CK
QBSW
Qcnt[7..0]
Qcnt[7]
Qcnt[7] WIRE
inst2
OUTP
FPGA: Applications and Examples 35June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
A Counter with Synchronous Load as a Sequencer
D[] Q[]
0/+1
B[]
A+B
D[]
Q[]
SLOAD
D[]
CNTEN
0: Disable+1: Inc
A loadable counter can be used as a simple sequencer.
Q[7] is used as counter enable CNTEN.
Once 128+M is loaded, the counter counts 128-M clock cycles.
SLOAD
36
Signal Conditioning: Multi- to Single-Cycle Pulse
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
This circuit samples 0 to 1 transitions.
The input pulse width can be 1 or more clock cycles.
For each input transitions, a single cycle pulse is generated.
CLRN
DPRN
Q
DFF
inst58
CLRN
DPRN
Q
DFF
inst60
AND2
inst69
CLRN
DPRN
Q
DFF
inst70NOT
inst76
CIN
CK
CK
POUTQ1
CK
Q2
FPGA: Applications and Examples 37June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Coarse Time & Pipeline Writing
WE=1
After reset, the coarse time counter TC counts from 0.
TC can be as many bit as required, e.g. 32 bits is sufficient to represent 4G clock cycles, or 16 s if clock frequency is 250 MHz.
The lower 7 bits TC[6..0] are used as WA[6..0].
The writing page is filled repeatedly every 128 cycles.
PGW, Writing Page
WA0=TC0
1
2
3
4
5
WA6=TC6
PGW
WA, Writing Address
RAM
QDWAWERA
CoarseTime
Counter
TC (WA)RSTP
FPGA: Applications and Examples 38June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Event Window
When a TRIGP pulse arrives, the writing page becomes the reading page.
A timing window starting from L1 cycles before the current TC and L2 cycles wide are to be read out.
The trigger timing block will set RA to M and then increase RA from m to (m+L2 mod 128).
In the trigger logic, two loadable counters are implemented:
ECNT, 10 bits, used to enable both counters to read out L2 memory locations for 4 channels.
MC, 34 bits, used to indicate coarse time and generate RA.
A load pulse is also generated for the Data Load/Transfer Registers.
Reading Page(Was WP)
0
1
2
3
4
5
RA6
PGR
RA, Reading Address
RAM
QDWAWERA
WA=(TC mod 128) L1
L2
TC=n
m=(M mod128)M=n-L1
ReadoutSequencer
TRIGP MC (RA)
LD
ECNT
PGR(!PGW)
FPGA: Applications and Examples 39June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Readout Timing
After trigger, PGW and PGR swaps. Writing page becomes reading page. The ECNT generate a count enable signal CNTEN a total of 4*L2 cycles long (L2=[1, 128]). RA starts from location m and increase every 4 clock cycles allowing 4 channels to output. LD becomes high every 4 clock cycles to load data to the transfer register.
Event Buffer &Zero Suppression
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 40
DelayLine &
SamplingRegister
ArrayCK250
DataLoad/
TransferRegister
Encoder &Calibration
IN0
IN1
IN2
IN3
EventBuffer w/
ZeroSuppression
Encoder &Calibration
Encoder &Calibration
Encoder &Calibration
PipelineBuffer
PipelineBuffer
PipelineBuffer
PipelineBuffer
SerialData
Output
TriggerLogic
&Timing
Trigger
Channel, CoarseTime
Reset
FPGA: Applications and Examples 41June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
TDC
TC
TFV
TC
TF
FIFOPUSH
OUT
Zero-Suppression Scheme
A TDC data word is written into a FIFO buffer only when the hit is valid, i.e., TFV = 1.
When the data is zero suppressed, coarse time information is not correlated with the memory address. Therefore, the coarse time must be added to the data.
For multi-channel data concentration, channel id must be added as well.
A header may also be written into the FIFO to create a block of data.
WriteHeader
Header
FPGA: Applications and Examples 42June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Data Concentration Path
CNT D Q
TRIGP
When TRIGP pulse arrives, a header is written into the FIFO and the MC signal starts counting.
The RA and LD are derived from MC and data in pipeline buffers are concentrated into one stream containing fine time, TF and data valid signal TFV.
The delayed version of MC represents channel ID CH and the coarse time TCR of the data in the output stream.
If the hit is valid, a data word including TCR, TF and CH is pushed into the FIFO.
PipeLine
Buffer
QDWAWERA
D
LD
D
LD
D
LD
D
LD
D Q D QMC
LD
RA
TFV
TCR,TF,CH
FIFOPUSH
OUT
TRIGP = Write Header
HeaderTFV,TF
TCR,CH
PipeLine
Buffer
QDWAWERA
PipeLine
Buffer
QDWAWERA
PipeLine
Buffer
QDWAWERA
FPGA: Applications and Examples 43June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Data Concentration Timing
This diagram explains the details of the data concentration timing. Note that RA, LD, CH and TRC signals are derived from MC.
FPGA: Applications and Examples 44
FIFO Almost Full Logic
If the trigger system is well designed, the FIFO is less likely to become full, but guarding logic must be designed to ensure reliability of the data.
One possibility is to bring the Almost Full bit of the FIFO to the data to alert a possible data loss.
TFV
TCR,TF,CH
FIFOPUSH
OUT
TRIGP = Write Header
Header
AlmostFull
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
SerialData Output
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 45
DelayLine &
SamplingRegister
ArrayCK250
DataLoad/
TransferRegister
Encoder &Calibration
IN0
IN1
IN2
IN3
EventBuffer w/
ZeroSuppression
Encoder &Calibration
Encoder &Calibration
Encoder &Calibration
PipelineBuffer
PipelineBuffer
PipelineBuffer
PipelineBuffer
SerialData
Output
TriggerLogic
&Timing
Trigger
Channel, CoarseTime
Reset
FPGA: Applications and Examples 46
Serial Data Output
Use a shift register for parallel to serial conversion. If the data link is DC coupled, a plain serial data should be good. If the data link is AC coupled, consider a AC balanced coding scheme such as 8B/10B
ShiftRegister
Load
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
FPGA: Applications and Examples 47
Plain Serial Data
H e l l o (Space) w o r …
B o n j o u r (Space) t …
FREN
CH
“Bonjour tout le m
onde.”
ENG
LISH“H
ello world.”
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
• Two streams of the plain serial data• The starting is indicated by the first 1->0 transition after long continuous 1’s.
FPGA: Applications and Examples 48
Serial Data with 8B/10B Encoding
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
• A byte of 8-bit payload is transmitted with 10 bits.• A special code K28.1 that has 0111110 or 1000001 pattern is used
to indicate the boundary of the data frame. (See example)
K28.1
K28.1
31 BytesID Header
32 Bytes(8 32-bit Words)Register Data
64 Bytes(16 32-bit Words)
Scalar Counts
8064 Bytes(2016 32-bits
Words)TDC Hit Data
Padding 0’s
FPGA: Applications and Examples 49
FIFO
Outputting Plain Code & 8B/10B Code
In our design, assume width of the data words in FIFO is 32 bits. The Pop and the Load signals pulse up every 32 clock cycles for outputting plain code. To output 8B10B code, a 8B/10B encoder is inserted. The pulse period of Pop and Load signals is 40
cycles.
ShiftRegister
Load
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Pop
ShiftRegister
Load
FIFO
Pop
8B/10BEncoder
32 Clock Cycles 40 Clock Cycles
50
DelayLine &
SamplingRegister
Array
TDC Core: High Hit Rate High Precision Version
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
CK250
DataLoad/
TransferRegister
Encoder &Calibration
IN0
IN1
IN2
IN3
EventBuffer w/
ZeroSuppression
Encoder &Calibration
Encoder &Calibration
Encoder &Calibration
PipelineBuffer
PipelineBuffer
PipelineBuffer
PipelineBuffer
SerializeData
Output
TriggerLogic
&Timing
Trigger
Channel, CoarseTime
Reset
The End of Lecture 1
Thanks
52
DelayLine &
SamplingRegister
Array
TDC Core: High Hit Rate High Precision Version
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
CK250
DataLoad/
TransferRegister
Encoder &Calibration
IN0
IN1
IN2
IN3
EventBuffer w/
ZeroSuppression
Encoder &Calibration
Encoder &Calibration
Encoder &Calibration
PipelineBuffer
PipelineBuffer
PipelineBuffer
PipelineBuffer
SerializeData
Output
TriggerLogic
&Timing
Trigger
Channel, CoarseTime
Reset
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 53
Lecture 2: Improvements & Comments In this part of the lecture, improvements of the
TDC described earlier will be discussed. In the portion of “Is this a good design?”, several
less preferable examples are discussed to shorten the learning curve of the young students.
Improvement:Reduce Counter Clock Frequency
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 54
FPGA: Applications and Examples 55
CoarseTimeCounter
Running Counter at Lower Frequency
A counter requires long propagation of signals in a carry chain. If a counter with large number of bits running at very high frequency, there may not be
enough time within a clock period to finish the propagation. If is possible to run a counter for higher bit with a lower frequency clock while implement
the lower bits at higher frequency. The total power consumption is reduces when a counter is implemented this way.
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
CK400
CoarseTimeCounter
CK200Q[0]
Q[N-1]
Q[1]
Q[N-1]
Q[0]CK400
FPGA: Applications and Examples 56
Detail Structure and Timing Diagram The Q[1] and upper bits
are in 200 MHz clock domain and only Q[0] is in 400 MHz clock domain.
It is also possible to run Q[2] and upper bits at 100 MHz clock domain.
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
D Q
CoarseTimeCounter
CK200 Q[1]
Q[N-1]
Q1QD Q Q[0]
CK400CK400
Q[1] .XOR. Q1QQ[1]
Improvement:Slowing Down Data Path
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 57
58
DelayLine &
SamplingRegister
Array
Low-Power Design Practice: Clock Speed
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
The Sampling Register Arrays are clocked at 250 MHz. All other stages are clocked at 62.5 MHz. When a valid hit is sampled, the Sampling Register Array is disabled so that the registered
pattern is stable for 64 ns. The Data Load/Transfer Registers are enabled to load input 64 ns, so that a valid hit is
guaranteed to be load once and only once.
CK250
DataLoad/
TransferRegister
CK62Load
ClockDisable
Sequencer
Encoder
IN0
Buffer w/Zero
Suppression
250 MHz 62.5 MHz
Improvement:Single Cable Support Digitizer
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 59
FPGA: Applications and Examples 60
Digitizer
One Cable per Digitizer
Today, it is possible to build digitizer attaching to the detector module. It is preferable and possible to minimize supporting cables to the digitizer. Perhaps a CAT-5 cable with 8 conductors is a suitable choice of the supporting cable. The interconnections between the Read Out Controller and the digitizer will need be
carefully planned.
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Detector
Read Out Controller
Digitizer
Detector
Digitizer
Detector
Digitizer
Detector
61
Interconnections of a Real Application
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
Only two pairs of fast signals are needed between TDC and the readout controller. (Extra wires in the cable can be used for JTAG or other FPGA reconfiguration signals.)
The Readout Controller sends 10 MHz clock pulses to drive the TDC.
Register setting and initialization commands are sent via pulse width modulation via the C5 Encoder and C5 Decoder.
Data from TDC FPGA is an 8B/10B stream. It is decoded in the Readout Controller. The encoder and decoders are based on over sampling
scheme. No dedicated transceivers are needed.
TDC FPGA
PLL
Timing CmdReg1a
8B10B
Clock & Command In
DataOut
C5Decoder
Readout Controller
C5 Encoder(PWM) 10B8B
62
Sending Command In the Clock Line
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
A wide-narrow sequence is decoded as a initialization command without resetting scalars.
After a known latency, an initialization pulse is generated inside FPGA that resets the coarse time counter and a normal operation sequence is started.
The narrow-wide sequence can be reserved as resetting command that will reset scalars.
Initialization(Without ResettingScalars)
Command Valid
Init1x
Reset(With ResettingScalars)
TDC FPGA
PLL
Timing CmdReg1a
8B10B
Clock & Command In
DataOut
C5Decoder
Readout Controller
C5 Encoder(PWM) 10B8B
FPGA: Applications and Examples 63June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-1 0 1 2 3 4 5 6
The Clock-Command Combined Carrier Coding (C5)
A data train contains 5 pulses and each pulse is transmitted in four unit time intervals, usually in four internal clock cycles at frequency f.
Information is carried with wide, normal and narrow pulses and the first pulse is always wide or narrow.
When not transmitting data, all pulses have normal width. The data stream is DC balanced over 5 pulses suitable for AC coupled transmission. All leading edges are evenly spread so that the pulse train can be used directly drive the
receiver side logic or PLL.
FPGA: Applications and Examples 64June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Data with Clock or Clock with Data?
The 8B/10B stream: Data with Clock. The C5 pulse train: Clock with Data
Transceiver
8B/10B Stream
RX
Data RecoveredClock
Improvement:Cable Delay Monitoring
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 65
FPGA: Applications and Examples 66
Digitizer
Cable Delay Variation
The cable length may vary as temperature change. In some applications, it is necessary to monitor the cable delay for high precision timing
measurement.
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Detector
Read Out Controller
Digitizer
Detector
Digitizer
Detector
Digitizer
Detector
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 67
Cable Delay Variation Due to Temperature
The cable delay and fan-out channel timing character change with temperature. In parallel scheme, it is not easy to control these variations.
25 oC
50 oC
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 68
Signal Reflection in an Open Cable
Cables are usually terminated at the end to eliminate signal reflection. An open cable causes a reflected waveform with same polarity as the transmission signal.
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 69
The CAKE Clocking: (Cable Automatic sKew Elimination)
The clock pulses a driven through a cable to a high impedance receiver. The pulses are reflected back to the sending side. The transmission and reflection signals are added together to form a cake shaped-waveform.
w
V/4
R
TDC d
Transmission
Reflection
Transmission+Reflection
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 70
The CAKE Clocking Waveforms
The width of the cake base is (w+2d) and the cable length can be measured and monitored continuously based on TDC values collected from sending side only.
w
V/4
R
TDCdA
R
dB
w+2dA
w+2dB
TDCV/4
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 71
Test Results
If the FPGA sends clock pulses at the same time, the clock edges at the receiving ends won’t be aligned due to cable length difference.
If the mean times of the cake-shaped pulses at the sending end are aligned, the clock edge at the receiving ends will be aligned.
Sending Side Receiving Side
Improvement:Common Timing Reference
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 72
FPGA: Applications and Examples 73
Digitizer
Sending Timing Reference Across Modules
The common timing reference signals can be sent across modules. Note that the signals are sent both ways alternatingly to cancel delays between modules.
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Detector
Read Out Controller
Digitizer
Detector
Digitizer
Detector
Digitizer
Detector
FPGA: Applications and Examples 74
The Mean Times of All Channels
The mean of all leading edge times in a module is calculated. The mean times of all channels represent an identical same time.June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
0
2
4
6
8
10
12
14
16
0 20 40 60 80 100 120 140 160 180
FPGA: Applications and Examples 75
Adding another TDC Channel in Digitizer
An extra TDC channel is added to the digitizer.
The TDC outputs are summed together.
The meantime of the signal edges is used as the common timing reference.
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
TDC
TDC
TDC
TDC
TDC
-
S
-
-
-
TR
TL
RA RB RC RD
TDC
TDC
TDC
TDC
TDC
-
S
-
-
-
Improvement:Sending Clock to Several Modules
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 76
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 77
Clock Fan-Out: A Lot of Cables
Multiple copies of the clock signal are produced using a fan-out module. Many copies of clock are to be generated so a dedicated fan-out module is needed. Each module is fed with a clock signal via a cable.
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 78
Multi-drop Clocking Scheme Using T Connectors
Cable segments are connected using T connectors to form a multi-drop cable assembly. Clock driver can be absorbed into the same user module, eliminating a dedicated clock fan-out
module.
x
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 79
The Trapezoidal Clocking in a Nut Shell
Trapezoidal-like pulses are fed into a transmission line and return back. The ramps of two opposite traveling pulses are summed in cable. An isochronous common crossing exists at each tap.
x
Transmitting
Reflecting
Sum
x
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 80
The Trapezoidal Clocking
The 4 oscilloscope channels are connected with 3 cable segments (4 ns each). When cable is terminated (Top traces), skews are seen. When reflection is allowed, zero crossings become isochronous. (i.e., cable delays don’t matter)
With 50 WTermination
Without 50 WTermination
Is this a good design?Timing Uncertainty Confinement
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 81
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 82
Feeding CMP output to CK port of the register causes unnecessary challenges due to unconfined timing uncertainty:
Must use Gray Code Counter.
Must match propagation delays of all bits.
A Common Implementation of ADC
GrayCode
Counterf
+ CMP
-
Register
+ CMP
-
Register
TimingUncertainty
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 83
Feeding CMP output to D port of a FF reduces complexity: The counter is regular binary counter. No propagation delay matching is needed.
An Improvement
BinaryCounter
+ CMP
-Hold
f
Timing Uncertainty
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 84
Confining timing uncertainty opens possibilities for further improvements: Resolution or sampling rate can be doubled easily. Improvements by a factor of 4, 8, 16, 32 etc. are possible.
Doubling Digitizing Resolution
BinaryCounterf
+ CMP
-Hold
85
Historical Implementation in ASIC TDC
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
DLL Clock Chain
Encoder
CoarseTime
Counter
HIT CoarseTime
Register
CoarseTime
SelectionLogic
c1c0
HIT is used as CK input which creates unnecessary challenges.
Deadtime is unavoidable. Coarse time recording needs special care. Two array + encoder sets are needed for raising edge and falling edge. The register array must be reset for next event. The encoder must be re-synchronized with system clock in order to interface with readout
stage.
Unnecessary Challenges = Extra Efforts + Reduced Performance
86
Unnecessary Challenges
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
In history, Gray code counters, double counters and dual registers + MUX are found in ASIC TDC coarse time counter schemes.
Theses are unnecessary if the TDC is designed appropriately. In FPGA, a plain binary counter is sufficient.
CoarseTimeCounter
CoarseTimeCounter
CoarseTimeCounter
GrayCodeCounter
000001011010110111101100
Unnecessary for FPGA TDC
87
A Better Implementation
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
DLL Clock Chain
OR + Register
ClockDomainTransfer
DV EG T4..T0
HITMulti-
SamplingRegister
Array
Deadtimeless operation is possible. No special care is needed for coarse time. Both raising and falling edges are digitized with a single array + encoder set. No resetting is needed for the register array. The output is synchronized with the system clock and is ready to interface with readout stage.
CoarseTime
Counter
TC
16-bit Encoder with Registered Outputs 16-bit Encoder with Registered Outputs
HIT is used as D input.
88
Coarse Time Counter
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
The timing uncertainty between HIT and CLK is confined in the sampling register array.
All the remaining logics are driven by the CLK signal.
No special cares such as Gray code counter is needed for coarse time counter.
Hit Detect Logic
CoarseTimeCounter
FineTimeEncoder
HIT
CLK
ENA
FineTime
CoarseTime
DataValid
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 89
Summary
TDC is a very useful digitizer beyond time measurement.
Many functional blocks discussed can be reused in many other applications.
Use less logic elements to reduce system cost. Run clock slower in the portions when high clock
frequency is unnecessary to reduce power consumption.
Confine timing uncertainties.
The End
Thanks
91
Comparison
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
Historical Scheme:HIT-> CK; (c0..c31)->D;
Preferable Scheme:HIT-> D; (c0..c31)->CK;
Deadtime is unavoidable. Deadtimeless operation is possible.
Coarse time recording needs special care. No special care is needed for coarse time.
Two array + encoder sets are needed for raising edge and falling edge.
Both raising and falling edges are digitized with a single array + encoder set.
The register array must be reset for next event.
No resetting is needed for the register array.
The encoder must be re-synchronized with system clock in order to interface with readout stage.
The output is synchronized with the system clock and is ready to interface with readout stage.
92
A Preferable Scheme
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
DLL Clock Chain
32-bit Encoder with Registered Outputs
ClockDomainTransfer
c1c0 c16 c17
DV EG T4..T0EG: Edge, =1: Raising or =0: Falling.T4..T0: Time.DV: Data Valid, =1 Valid edge detected.It is used as PUSH signal for FIFO orWrite Enable for other memory buffers.
HITMulti-
SamplingRegister
Array
Minimum setup time between the multi-sampling register array stage and the clock domain transfer stage: 17 clock taps.
Setup time between the clock domain transfer stage and the encoder register: 32 or 16 clock taps.
All outputs including TC are aligned with c0. Supports both raising and falling edges.
CoarseTime
Counter
TC
Wave Union?
Photograph: Qi Ji, 2010June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov 93FPGA: Applications and Examples
FPGA: Applications and Examples 94June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Logic Elements and Operating Modes
D Q
ENACLRN
LUT4(16 RAM
Cells)
D Q
ENACLRN
LUT38 Cells
LUT38 Cells
NormalMode:
ArithmeticMode:
LUT4 + DFF
2 x LUT3 + DFF
ABCD
CI
A
B
CO
LUT = Look-Up Table
FPGA: Applications and Examples 95June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Synchronous Load and Clear (Cyclone II)
D Q
ENACLRN
LUT16-bit
D Q
ENACLRN
LUT8-bit
LUT8-bit
FPGA: Applications and Examples 96June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Xilinx Look-Up Table
D Q
ENACLRN
LUT4
SRL16
RAM16
4-input Look-Up Table
16-bitShift Register
16-bitDistributed RAM
FPGA: Applications and Examples 97June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
4-Input NAND, 4-Input NOR, 4-Input NAOR
A B C D
A
B
C
D
Y
A B C D
A
B
C
D
Y
A B
C D
A
B
C
D
Y
8 transistors each
ABCD
ABCD
ABCD
Y Y Y
FPGA: Applications and Examples 98June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
The Mirror Adder (Weste93)
A
A
B
B
CiCob
Sb
A
B
A
B
A B Ci
A B Ci
A
B
A
B
Ci
Ci
24-28 transistors
FPGA: Applications and Examples 99June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Transistor Usage of Logic Element
D Q
ENACLRN
LUT16-bit
6-transistor RAM bit
At least 96 transistors
X 16
FPGA: Applications and Examples 100
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Transistor CountsFPGA 1LE: 4-in LUT + DFF >96
4-input NAND, NOR etc. 8
2-to-1 MUX 6
Static RAM 6/bit
Full Adder 24-28
N-bit Multiplier > (N2)/2 FA
Revision:Non-Triggered TDC
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 101
102
DelayLine &
SamplingRegister
Array
Example: Multi-Channel TDC, Non-Triggered
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples
When there is no clearly defined trigger system and the hit rate is not very high, the digitizer can be in non-trigger operating mode.
Multi-hits are eliminated after the encoder.
The encoded times of valid hits are temporarily stored in the hit buffer.
Data are continuously read out from the hit buffer and send into time frame buffer.
Data in time from buffer are sent out via LVDS serial link continuously.
CK250
DataLoad/
TransferRegister
Encoder &Calibration
IN0
IN1
IN2
IN3
TimeFrame
Buffer w/Zero
Suppression
Encoder &Calibration
Encoder &Calibration
Encoder &Calibration
HitBuffer
SerializeData
OutputChannel,Header
Multi-HitsElimination
HitBuffer
Multi-HitsElimination
HitBuffer
Multi-HitsElimination
HitBuffer
Multi-HitsElimination
FPGA: Applications and Examples 103
Multiple Hits Elimination
The mean of all leading edge times in a module is calculated. The mean times of all channels represent an identical same time.
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
FPGA: Applications and Examples 104
Hit Buffer
The mean of all leading edge times in a module is calculated. The mean times of all channels represent an identical same time.
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov
Voltage Measurement Voltage Measurement
Charge Measurement Charge Measurement
June 2014, Wu Jinyuan, Fermilab jywu168@fnal.gov FPGA: Applications and Examples 105
In Wilkinson ADC, a capacitor is charged and then discharged. The two schemes are suitable for different applications.
Single Slope ADC ?= Wilkinson ADC
TDCTDC
INPUT
Ramping Ref. Voltage
INPUT
top related