may. 2009, wu jinyuan, fermilab [email protected] ieee rt09 short course 1 fpga structure,...

84
May. 2009, Wu Jinyuan , Fermilab jywu168@fn al.gov IEEE RT09 Shor t Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan Fermilab IEEE Real Time Conference Short Course May, 2009

Upload: isaac-freeman

Post on 11-Jan-2016

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course

1

FPGA Structure, Programming Principals and Applications:Part II

Wu, Jinyuan

Fermilab

IEEE Real Time Conference Short Course

May, 2009

Page 2: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 2

Outline Counting:

Example: LED brightness and DAC Simple Sequencing

Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation

After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization

Page 3: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 3

Flashing LED, The First Thing First

Counter

Q[23..0]

At least design an LED for an FPGA. When a board is first powered up, first

test the LED flashing function. Many things have to be right so that the

LED flashes: Power pins must be all connected. Configuration devices must be in correct mode. Design software must be correct.

Page 4: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 4

FPLED Brightness Variation

Counter

Q[23..0]A

B

A<B

LUT

Counter

Q[23..0]

A

B

A<B

The LED brightness is varied by changing the output pulse duty-cycle.

Comparator input A is the brightness and B is the clock cycle count.

Look-up table can be added to input A for different brightness variation curve.

Page 5: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 5

FP

LED Brightness Exponential Drop

Counter

Q

A

B

A<BCO

Q

SET

D

if (CO==1) {Q = Q - Q/32;}

Narrow pulse are typically stretched for LED display with fix brightness.

The circuit here provides gradually dim of the LED for better visual effect.

Possible

Student Lab

Page 6: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 6

Exponential Sequence Generator

Q

SET

D

if (CO==1) {Q = Q - Q/32;}

0

10000

20000

30000

40000

50000

60000

70000

0 20 40 60 80 100 120 140 160

An exponential sequence is generated using an accumulator shown above.

Note that not even one multiplier is used. Other function sequences: sine, co-sine, tangent, co-

tangent etc. can also be generated similarly.

Page 7: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 7

Duty-Cycle Based Single-Pin DAC (1)

The duty-cycle or pulse width of the comparator output is proportional to the DAC input at port A.

Use external RC as low-pass filter. Output voltage of an ideal LP filter is proportional to the

DAC input.

0

1

2

3

4

896 960 1024

CounterQ

A

B

A>B

DAC Input

Page 8: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 8

Duty-Cycle Based Single-Pin DAC (2)

0

1

2

3

4

896 960 1024

Q

CO

DDAC Input

Possible

Student Lab

Use carry-out of the accumulator as the output. The number of pulses is proportional to the DAC input. Rounding error is carried to later cycles. Output is smoother.

Page 9: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 9

The Frequency Spectrum of DAC (2)

0

1

2

3

4

896 960 1024

0

100

0 64 128 192 256 320 384 448 512

Frequency

0

100

0 64 128 192 256 320 384 448 512

Frequency

0

100

0 64 128 192 256 320 384 448 512

Frequency

Q

CO

DDAC Input

The first harmonic may be suppressed. Works better with regular low-pass

filters.

Page 10: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 10

The Frequency Spectrum of DAC (1)

CounterQ

A

B

A>B

DAC Input

0

1

2

3

4

896 960 1024

0

100

0 64 128 192 256 320 384 448 512

Frequency

0

100

0 64 128 192 256 320 384 448 512

Frequency

0

100

0 64 128 192 256 320 384 448 512

Frequency

The first harmonic has dominate concentration.

Works better with notch filter.

Page 11: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 11

Outline Counting:

Example: LED brightness and DAC Simple Sequencing

Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation

After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization

Page 12: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 12

ST

CLK

QA[5]

QA[4..0] 0 1 03130

Start, Count: A Single Layer Loop

The ST signal start the sequence

Counting is enabled

Counting stops

Page 13: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 13

CLK

ST

QC{1..0]

CNTC

VCCCLK INPUT

VCCST INPUT

QAA[7..0]OUTPUT

QBA[7..0]OUTPUT

up countersset 2sset

clock

cnt_en

q[1..0]

lpm_counter25

inst1

up countersclr

clock

cnt_en

q[7..0]

lpm_counter26

inst3

up countersclr

clock

cnt_en

q[7..0]

lpm_counter26

inst4

NOT

inst

OR2

inst6

CLRN

DPRN

Q

DFF

inst7CLRN

DPRN

Q

DFF

inst8

CLK

QC[0]

CLK

QC0QQ

OR2

inst9

AND2

inst10

NOT

inst12

NOT

inst13

data[7..0]eq254

eq255

lpm_decode2

inst15

data[7..0]eq254

eq255

lpm_decode2

inst16

OR2

inst11

AND2

inst14AND2

inst17

AAeqFF

BAeqFF

BAeqFF

QC[0]

CLK

CNTB

QC[0]

SCLRB

AAeqFF

QC[1]

CLK

CNTA

QC[1]

QC[0]QC0QQ

SCLRA

BAeqFF

AAeqFF

QBA[7..0]

QAA[7..0]

A Double-Layer + Single-Layer Sequencer BA AA

0 0 1 2 3 4 255

1 0 1 2 3 4 255

2 0 1 2 3 4 255

3 0 1 2 3 4 255

4 0 1 2 3 4 255

255 0 1 2 3 4 255

0 0 A double-layer loop is followed by a single-layer loop.

1 0

2 0

3 1

4 2

255 253

0 254

0 255

0 0

Inner Loop

Outer Loop

State Control

Page 14: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 14

256

Wor

d(s)

RA

M

Block Ty pe: M4K

data_a[15..0]

address_a[7..0]

w ren_a

data_b[15..0]

address_b[7..0]

w ren_b

clock

q_a[15..0]

q_b[15..0]

lpm_ram_dp3

inst2

256

Wor

d(s)

RA

MBlock Ty pe: M4K

data_a[15..0]

address_a[7..0]

w ren_a

data_b[15..0]

address_b[7..0]

w ren_b

clock

q_a[15..0]

q_b[15..0]

lpm_ram_dp3

inst5

up countersset 256sset

clock

cnt_en

q[8..0]

lpm_counter27

inst18

GN

D

A

B

A+B

dataa[15..0]

datab[15..0]

result[15..0]

lpm_add_sub15

inst21

CLRN

DPRN

Q

DFF

inst22CLRN

DPRN

Q

DFF

inst23

CLKCLK

CLK

CEA

zz[0]

zz[0]

WE

NOT

inst25

SAX[15..0]

WA[7..0]

RA[7..0]

RA[7..0]

MQX[15..0]

MQA[15..0]

SAX[15..0]

WE

VCCXD[15..0] INPUT

VCCXA[7..0] INPUT

VCCXWE INPUT

CEA,RA[7..0]

ST

CLK

CEA

OUT[15..0]OUTPUT

up countersclr

clockq[7..0]

lpm_counter28

inst24

CLK

WE

zz[31..0]

An Array Adder

Page 15: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 15

Outline Counting:

Example: LED brightness and DAC Simple Sequencing

Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation

After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization

Page 16: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 16

Cares Must Be Taken Outside FPGA (1)

DAC

FPGA

ADCShaperLP Filter

LP Filter

BandLimiting

BandLimiting

Spectrum ofOriginal Signal

Spectrum ofDAC Output

LP filter LP filter

ADC Input

SamplingIn ADC

Aliasing w/oLP Filtering

Output ofLP filter

Nyquist Frequency <(1/2) Sampling Frequency

Page 17: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 17

The “Trend” vs. The Sampling Theorem

There will be no hardware analog

processing. Everything is done

digitally in software.

It sounds very stylish

A shaper/low-pass filter is a minimum requirement.

Page 18: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 18

Cares Must Be Taken Outside FPGA (2)

DAC

FPGA

ADCShaperLP Filter

n

LP Filter

Dither

51

52

53

54

0 50 100 150

Sampling Index

AD

C

Signal Signal+Noise ADC(signal+noise) Weighted Average Threshold

51

52

53

54

0 50 100 150

Sampling Index

AD

C

Signal ADC(signal) Threshold

Resolution finer than the ADC LSB can be achieved by adding noise at ADC input and digital filtering.

Page 19: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 19

Adding Noise for Finer Resolution

Photo Credit: www.telegraph.co.uk, trinities.org

Mechanical pressure gauges usually do not track small pressure changes well.

The gauge readers may lightly tap the gauges to get more accurate reading.

The idea of dithering at ADC input is similar.

Page 20: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 20

Some Notes on Philosophy

WidebandLow Noise

NarrowbandNoisy

Good Bad

Something good in one condition can be bad in another condition.

And vise versa.

Page 21: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 21

Why Band Limiting & Dithering are Ignored? Pre-amplifiers usually have a naturally limited

bandwidth and an intrinsic noise larger than the LSB of the ADC.

So a lot of time, band limiting and dithering can be “safely” ignored since they are satisfied automatically.

High bandwidth, low noise devices now become easily accessible. A design can be too fast and too quiet.

Do not forget to review the band limiting and dithering requirements for each design.

Page 22: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 22

Outline Counting:

Example: LED brightness and DAC Simple Sequencing

Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation

After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization

Page 23: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 23

Data Reduction on Liquid Argon TPC Data

Hit waveforms in TPC carry useful information. Digitizing the waveforms creates large volume of data. Data reduction without losing useful information is necessary.

Drift Time

Wire Number

Data from BO detector of FNAL

0

100

200

300

400

500

600

700

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Page 24: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 24

Slow Variation of Raw Data

140

142

144

146

148

150

152

154

156

158

160

1100 1150 1200 1250 1300 1350 1400

More than 99% points differ from previous points by -1, 0 or +1.

Huffman Coding can be applied to the differences of the data points.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

u(n+1)-u(n)

P

wire0_15 wire16_31 wire32_95

DFF

Q

A

B

A-B

U(n+1)

D

U(n+1)-U(n)

Page 25: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 25

The Huffman Coding

The U(n+1)-U(n) value with highest probability is assigned to shortest code, i.e., single bit 1.

Values with lower probabilities are assigned with longer codes, e.g., 01, 001, 0001 etc.

Huffman coded words and regular words are distinguished by bit-15.

U(n+1)-U(n)

Code

-4 and others

Full 16 bits word

-3 000001

-2 0001

-1 01

0 1

+1 001

+2 00001

+3 0000001

1

0 0 ADC value (13-bit)

Regular ADC data for first point or when U(n+1)-U(n) is outside +-3

Huffman Coded

-1 0 0 0 +1 +2 Padding orContinue toNext WordIn this example, 6 differences of the data

samples are packed in the 16-bit data word.

11 11 1 10 0 0 0 0 0 0 0 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

u(n+1)-u(n)

P

wire0_15 wire16_31 wire32_95

Page 26: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 26

The Huffman Coding Block

The block is able to operate at up to 250MHz clock in Altera Cyclone III FPGA devices.

The block uses 245 logic cells, taking 0.6% in an EP3C40F484C6 device ($129) containing 39600 logic cells.

D[15..0]

DV

D1st

DLast

CK

DV6Q

D1st6Q

DLast6Q

Q[15..0]

QRDY

HuffmanCoding1

inst

D1st

DLast

CK250

Raw Data

Huffman Coded Data

245 Logic Cells(245/39600)*$129

= $0.80 1

0 0 ADC value (13-bit)

-1 0 0 0 +1 +2

11 11 1 10 0 0 0 0 0 0 0 0

Page 27: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 27

The Schematics of the Huffman Coding BlockVCC

D[15..0] INPUT

VCCDV INPUT

VCCCK INPUT

VCCD1st INPUT

VCCDLast INPUT

QRDYOUTPUT

Q[15..0]OUTPUT

D1st6QOUTPUT

DLast6QOUTPUT

DV6QOUTPUT

PRN

CLRN

D

ENA

Q

DFFE

inst3

GN

DV

CC

data7x[2..0]

data6x[2..0]

data5x[2..0]

data4x[2..0]

data3x[2..0]

data2x[2..0]

data1x[2..0]

data0x[2..0]

sel[2..0]

result[2..0]

lpm_mux0

inst8

AND12

inst10

BAND12

inst11

NOR2

inst14

PRN

CLRN

D

ENA

Q

DFFE

inst13

A

B

A+B

dataa[3..0]

datab[3..0]

cin

result[3..0]

cout

lpm_add_sub1

inst15

A

B

A+B

dataa[3..0]

datab[3..0]

cin

result[3..0]

cout

lpm_add_sub1

inst17

PRN

CLRN

D

ENA

Q

DFFE

inst19

OR2

inst21AND2

inst22

AND2

inst24

NOT

inst27

data[3..0]

eq0

eq1

eq2

eq3

eq4

eq5

eq6

eq7

eq8

eq9

eq10

eq11

eq12

eq13

eq14

eq15

lpm_decode0

inst16

data1x[15..0]

data0x[15..0]

sel

result[15..0]

lpm_mux1

inst37

PRN

CLRN

D

ENA

Q

DFFE

inst26

AND2

inst39

CLRN

DPRN

Q

DFF

inst41

CLRN

DPRN

Q

DFF

inst42

CLRN

DPRN

Q

DFF

inst43

A

B

A-B

dataa[13..0]

datab[13..0]

clock result[13..0]

lpm_add_sub0

inst5

CLRN

DPRN

Q

DFF

inst44

CLRN

DPRN

Q

DFF

inst46

CLRN

DPRN

Q

DFF

inst47

CLRN

DPRN

Q

DFF

inst50

CLRN

DPRN

Q

DFF

inst51

CLRN

DPRN

Q

DFF

inst52

CLRN

DPRN

Q

DFF

inst53CLRN

DPRN

Q

DFF

inst54CLRN

DPRN

Q

DFF

inst55

CLRN

DPRN

Q

DFF

inst56

CLRN

DPRN

Q

DFF

inst57CLRN

DPRN

Q

DFF

inst58

CLRN

DPRN

Q

DFF

inst59

CLRN

DPRN

Q

DFF

inst60

AND2

inst25

CLRN

DPRN

Q

DFF

inst61

OR2

inst28

CLRN

DPRN

Q

DFF

inst62

OR2

inst35

AND2

inst29

NOT

inst30

CLRN

DPRN

Q

DFF

inst48

AND2

inst31

CLRN

DPRN

Q

DFF

inst63CLRN

DPRN

Q

DFF

inst64

AND4

inst1

OR4

inst2

CLRN

DPRN

Q

DFF

inst49

NOT

inst4

CLRN

DPRN

Q

DFF

inst65CLRN

DPRN

Q

DFF

inst66CLRN

DPRN

Q

DFF

inst67

CLRN

DPRN

Q

DFF

inst68CLRN

DPRN

Q

DFF

inst69

CLRN

DPRN

Q

DFF

inst70

zz[3..0]CK

DV3Q

CKDV3Q

BADHC

ROVR

NEWWRD

v v [1]

BADHC

SST[1]

SST[2]

SST[3]

SST[4]

SST[5]

SST[6]

SST[7]

SST[8]

SST[9]

SST[10]

SST[11]

SST[12]

SST[13]

SST[14]

SST[15]

NEWWRD

CLRDATA

CK

DV4Q

CK

CK

DV4Q

v v [15],SST[1..15]

CLRDATA

CK

CK

DV3Q DV4Q

ROVR

ROVR

CK

D1st4QND1st4Q

CK

BADHC4Q BADHC5Q

CK

BADHC BADHC4Q

CK CK

CK

D1st4Q

Number of bits for Huffman Codes 0: 0(+1), 1: 2(+1), 2: 4(+1), 3: 6(+1)

Number of bits for Huffman Codes -1: 1(+1), -2: 3(+1), -3: 5(+1), -4: 7(+1)

If (NBHC+1+HCSS)>=16, HCSS.d=(0xf&(NBHC+1+HCSS))+1

e.g. NBHC=2, HCSS=14 --> HCSS.d=1

+1 w hen rollover since 15 bits/w ord are used for data

zz[3],NBHC[2..0]

D2VQ[15..0]

zz[31..0]

v v [31..0]

DV3Q

DIFF[2..0]

BADHC

NBHC[2..0]

CK

zz[2],zz[1],v v [0]

zz[2],v v [1],v v [0]

v v [2],zz[1],v v [0]

v v [2],v v [1],v v [0]

v v [2],v v [1],zz[0]

v v [2],zz[1],zz[0]

zz[2],v v [1],zz[0]

zz[2],zz[1],zz[0]

DIFF[8]

DIFF[7]

DIFF[12]

DIFF[11]

DIFF[4]

DIFF[6]

DIFF[5]

DIFF[3]

DIFF[13]

DIFF[9]

DIFF[10]

DIFF[2]

DIFF[12]

DIFF[11]

DIFF[8]

DIFF[10]

DIFF[9]

DIFF[5]

DIFF[7]

DIFF[6]

DIFF[2]

DIFF[4]

DIFF[3]

DIFF[13]

CK

CK

CK

CK

CK

CKCK

CK

CK

CK

CK

BADHC5Q

CK

DV5Q

BADHC5Q

DV4Q

DV4Q

NEWWRD

D1st4QN

CK

CK

DV5Q

CK

D1Q[15..0]

CK

CK

CK

CK

DIFF[13..0]

zz[13],D1Q[12..0]

zz[13],D2VQ[12..0]

CK Difference ofData Points

Huffman CodeLookup Table

Huffman CodeComposer

Huffman Code orRaw Data Selector

Page 28: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 28

The Compress Ratio of Huffman Coding

On typical TPC events a compression ratio of about 10 can be achieved.

Compression ratio is sensitive to high frequency noise.

D[15..0]

DV

D1st

DLast

CK

DV6Q

D1st6Q

DLast6Q

Q[15..0]

QRDY

HuffmanCoding1

inst

D1st

DLast

CK250

N

N/(10.7)

0

100

200

300

400

500

600

700

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Page 29: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 29

Outline Counting:

Example: LED brightness and DAC Simple Sequencing

Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation

After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization

Page 30: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 30

A “Mystery” of Huffman Coding Ratios on Down Sampled Data

The 5MHz data is down sampled to 1MHz. The Huffman Coding compress ratio drops from 10.7 to 7.5 when the data is down sampled.

D[15..0]

DV

D1st

DLast

CK

DV6Q

D1st6Q

DLast6Q

Q[15..0]

QRDY

HuffmanCoding1

inst

D1st

DLast

CK250

N

N/(10.7)

D[15..0]

DV

D1st

DLast

CK

DV6Q

D1st6Q

DLast6Q

Q[15..0]

QRDY

HuffmanCoding1

inst

D1st

DLast

CK250

(N/5)

(N/5)/(7.5)

Page 31: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 31

Averaging in Decimation: A Re-discovery

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 16 32 48 64

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 16 32 48 64

Simple “down-sampling” is not good. When the decimation factor is D, an averaging over D

samples is good either. An averaging over 2*D samples is necessary. There is still aliasing with averaging over 2*D samples but

it is less severe than averaging over D samples.

Nyquist Frequency <(1/2) Sampling Frequency

Page 32: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 32

Weighted Average, The CIC-2 Filter

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 16 32 48 64

Filter performance can be further improved with weighted average over 4*D samples. The filter is called Cascade-Integrate-Comb filter of order 2 (CIC-2). The CIC-1 filter is the moving average.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 16 32 48 64

Page 33: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 33

Huffman Coding Ratios for 5MHz to 1MHz

The Huffman Coding compress ratio improves as the filter in Dynamic Decimation improves.

0

2

4

6

8

10

12

no deci no filter AV5 AV10 CIC2_20

Hu

ffm

an C

od

ing

Co

mp

ress

Rat

io

R089_E104 R089_E175 R089_E174 R089_E178 R089_E179 R089_E110

Page 34: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 34

Dynamic Decimation (DD)

400

420

440

460

480

500

520

540

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Only small time intervals, i.e., region of interest (ROI) must be sampled at high rate. Most time intervals can be sampled with lower rate, without losing useful information.

Page 35: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 35

A Mystery of Dynamic Decimation & Huffman Coding

Dynamic Decimation reduces number of samples by factor of 10. Huffman Coding reduces number of bits from raw data by factor of 10. When cascaded, the combination reduces number of bits by factor of 60.

DynamicDecimation

HuffmanCoding

N N/10.6

DynamicDecimation

HuffmanCoding

N N/60N N/10.7

Page 36: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 36

Huffman Coding Ratios for Dynamic Decimation

The Huffman Coding compress ratio improves as the filter in Dynamic Decimation improves.

0

2

4

6

8

10

12

no deci no filter AV16 AV32 CIC2_64

Hu

ffm

an C

od

ing

Co

mp

ress

Rat

io

R089_E104 R089_E175 R089_E174 R089_E178 R089_E179 R089_E110

Page 37: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 37

Any Differences ?

Raw

With DynamicDecimation

0

100

200

300

400

500

600

700

0 200 400 600 800 1000 1200 1400 1600 1800 2000

0

100

200

300

400

500

600

700

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Page 38: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 38

Outline Counting:

Example: LED brightness and DAC Simple Sequencing

Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation

After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization

Page 39: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 39

TDC Using FPGA Logic Chain Delay

This scheme uses current FPGA technology

Low cost chip family can be used. (e.g. EP2C8T144C6 $31.68)

Fine TDC precision can be implemented in slow devices (e.g., 20 ps in a 400 MHz chip).

IN

CLK

Page 40: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 40

Two Major Issues In a Free Operating FPGA

0

20

40

60

80

100

120

140

160

180

0 16 32 48 64

bin

wid

th (

ps)

1. Widths of bins are different and varies with supply voltage and temperature.

2. Some bins are ultra-wide due to LAB boundary crossing

Page 41: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 41

Digital Calibration Using Twice-Recording Method

IN

CLK

Use longer delay line. Some signals may be

registered twice at two consecutive clock edges.

N2-N1=(1/f)/t

The two measurements can be used: to calibrate the delay. to reduce digitization errors.

1/f: Clock Periodt: Average Bin Width

Page 42: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 42

TDC Output at Different PS Voltage

0

5

10

15

20

25

1.5 2 2.5

VCCINT (V)

TD

C O

utp

uts

N1

n2

TDC Output at Different PS Voltage

0

5

10

15

20

25

1.5 2 2.5

VCCINT (V)

TD

C O

utp

uts

N1

n2

Tc

Digital Calibration Result Power supply voltage

changes from 2.5 V to 1.8 V, (about the same as 100 oC to 0 oC).

Delay speed changes by 30%.

The difference of the two TDC numbers reflects delay speed.

N2

N1Corrected Time

)()(

0112

01 NNL

T

NN

NNTTc

Warning: the calibration is based on average bin width, not bin-by-bin widths.

Page 43: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 43

0

500

1000

1500

2000

2500

0 16 32 48 64

bin

tim

e (p

s)

Auto Calibration Using Histogram Method It provides a bin-by-bin calibration at

certain temperature. It is a turn-key solution (bin in, ps out) It is semi-continuous (auto update

LUT every 16K events)

0

20

40

60

80

100

120

140

160

180

0 16 32 48 64

bin

wid

th (

ps)

DNLHistogram

In (bin)LUT

Out (ps)

16KEvents

Page 44: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 44

Good, However

Auto calibration solved some problems However, it won’t eliminate the ultra-wide bins

0

20

40

60

80

100

120

140

160

180

0 16 32 48 64

bin

wid

th (

ps)

Page 45: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 45

Cell Delay-Based TDC + Wave Union Launcher

Wave UnionLauncher

In

CLK

The wave union launcher creates multiple logic transitions after receiving a input logic step.

The wave union launchers can be classified into two types:

Finite Step Response (FSR) Infinite Step Response (ISR)

This is similar as filter or other linear system classifications:

Finite Impulse Response (FIR) Infinite Impulse Response (IIR)

Page 46: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 46

Wave Union Launcher A (FSR Type)

In

CLK

1: Unleash0: HoldWave UnionLauncher A

Page 47: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 47

Wave Union Launcher A: 2 Measurements/hit

1: Unleash

Page 48: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 48

Sub-dividing Ultra-wide Bins

1: Unleash

1

2

1

2

Device: EP2C8T144C6 Plain TDC:

Max. bin width: 160 ps. Average bin width: 60 ps.

Wave Union TDC A: Max. bin width: 65 ps. Average bin width: 30 ps.

0

20

40

60

80

100

120

140

160

180

0 16 32 48 64 80 96 112 128bin

wid

th (p

s)

Plain TDC

Wave Union TDC A

Page 49: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 49

Measurement Result for Wave Union TDC A

Histogram

Raw

TDC+

LUT53 MHzSeparate Crystal

-

-WaveUnion Histogram

Plain TDC: delta t RMS width: 40 ps. 25 ps single hit.

Wave Union TDC A: delta t RMS width: 25 ps. 17 ps single hit.

0

500

1000

1500

2000

2500

3000

3500

1000 1100 1200 1300 1400 1500

dt (ps)

Un-calibrated

Plain TDC

Wave Union TDC A

Page 50: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 50

More Measurements

Two measurements are better than one. Let’s try 16 measurements?

Page 51: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 51

Wave Union Launcher B (ISR Type)

Wave UnionLauncher B

In

CLK

1: Oscillate0: Hold

Page 52: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 52

Wave Union Launcher B: 16 Measurements/hit

1 Hit16 Measurements@ 400 MHz

VCCINT=1.20V

VCCINT=1.18V

Page 53: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 53

Delay Correction

0

500

1000

1500

2000

2500

3000

0 4 8 12 16

m

T0

(ps)

16

32

48

64

0 2 4 6 8 10 12 14 16

m

TD

C (

bin

)

Delay Correction Process: Raw hits TN(m) in bins are first calibrated into

TM(m) in picoseconds. Jumps are compensated for in FPGA so that

TM(m) become T0(m) which have a same value for each hit.

Take average of T0(m) to get better resolution.

The raw data contains: U-Type Jumps: [48-63][16-31] V-Type Jumps: other small jumps. W-Type Jumps: [16-31][48-63]

15

000 )(

16

1

mav mtt

The processes are all done in FPGA.

Page 54: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 54

The Test Module

Two NIM inputs

FPGA with 8ch TDC

Data Output via Ethernet

BNC Adapter to add delay @

150ps step.

Page 55: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 55

Test ResultNIM Inputs

0 1 2

RMS 10ps

LeCroy 429ANIM Fan-out

NIM/LVDS

NIM/LVDS

-

140ps

Wave Union TDC BWave Union TDC BWave Union TDC BWave Union TDC B

Wave Union TDC BWave Union TDC BWave Union TDC BWave Union TDC B

+

+BNC adapters to add delays @ 140ps step.

Page 56: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 56

Multi-Sampling TDC FPGA c0

c90

c180

c270

c0

MultipleSampling

ClockDomain

Changing

Trans. Detection& Encode

Q0

Q1

Q2

Q3QF

QE

QD

c90

Coarse TimeCounter

DV

T0T1

TS

Ultra low-cost: 48 channels in $18.27 EP2C5Q208C7.

Sampling rate: 360 MHz x4 phases = 1.44 GHz.

LSB = 0.69 ns.

4Ch

Logic elements with non-critical timing are freely placed by the fitter of the compiler.

This picture represent a placement in Cyclone FPGA

Page 57: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 57

Issues of Coarse Time Counter

There are some common misunderstandings on coarse time counters in a TDC: Tow coarse time counters are needed, driven by clocks with 180 degree

phase difference. The coarse time counter should be a Gray code counter.

Dual counters and/or Gray code counters are only needed in one ASIC TDC architecture.

In the architectures used by FPGA TDC and some ASIC TDC, only one plain binary counter is needed as coarse time counter.

CoarseTime

Counter

CoarseTime

Counter

CoarseTime

Counter

GrayCode

Counter

000001011010110111101100

Page 58: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 58

Delay Line Based TDC Architectures

HIT

CLK

HIT

CLK

HIT

CLK

HIT

CLK

Delay Hit Delay CLK Delay Both

CLK is used as clock

HIT is used as clock

Only this architecture needs dual coarse time counters.

Page 59: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 59

Implementation of Coarse Time CounterCoarseTime

Counter

FineTime

Encoder

In

CLK

ENA

Fine Time

Coarse Time

Data Ready

Hit Detect Logic

Page 60: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 60

Outline Counting:

Example: LED brightness and DAC Simple Sequencing

Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation

After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization

Page 61: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 61

Classical Picture of Serial Communications

The parallel data is converted to serial bits driven by crystal oscillator X1 in the transmitter device.

The serial data stream is used to generate a recovered clock at the receiver device with a phase lock loop (PLL).

The recovered clock is used to drive the serial-to-parallel converter and store the data into a first-in-first-out (FIFO) buffer.

The FIFO buffer is used to transfer data from the recovered clock domain to the local clock domain generated by crystal oscillator X2.

Parallel-to-SerialConverter

FIFOSerial-to-Parallel

Converter

PLLX1 X2

LocalLogic

Recovered Clock

Page 62: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 62

Serial Data Receiving Without PLL etc.

Generating recovered clock with PLL, VCO, VCXO etc. is an analog process and it is not convenient to generate in an FPGA, especially for applications with multiple receiving channels.

There are pure digital methods to receive the serial data. Digital Phase Follower: 1bit/CLK The Two-Cycle Serial IO: 1bit/(2CLK) FM Encoder and Decoder: 1bit/(2-16CLK) Clock-Command Combined Carrier Coding (C5): 4bits/(20CLK)

The transmitter and receiver can be driven by two independent free running crystal oscillators.

Parallel-to-SerialConverter

DigitalSerial-to-Parallel

Converter

X1 X2

LocalLogic

Page 63: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 63

Digital Phase Follower

c0

c90

c180

c270

c0In

MultipleSampling

ClockDomain

Changing

b0

b1

FrameDetection

DataOut

Tri-speedShift

Register

Shift2

Shift0

was3is0

SEL

was0is3

Trans.Detection

Q0

Q1

Q2

Q3QF

QE

QD

The input data rate is 1bit/clock cycle. Four clock phases, c0, c90, c180 and c270 are used to detect input transition edge. The phase for data sample follows the variation of the transition edge.

Page 64: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 64

Schematics of Digital Phase Follower

EE[3..0]OUTPUT

C1OUTPUT

C0OUTPUT

PQQ[11..0]OUTPUT

DS5B[4..0]OUTPUT

BBOUTPUT

JMPOUTPUT

ENOUTPUT

IN1

CLK0

CLK90

CLK180

CLK270

EN

QQ[11..0]

BT

JMP

WTN

EE[3..0]

phtrk1

inst3

BB

BX

JMP

EN

CLK

Q[4..0]

C1

C0

DS5B

inst

GND

D[4..0]

C1

C0

CLK

M[23..20]

Q[27..0]

QQ[23..0]

DV

S[1..0]

ERR

Word24_13z

inst9

CLK0

VCCIN1 INPUT

VCCCLK0 INPUT

VCCCLK90 INPUT

VCCCLK180 INPUT

VCCCLK270 INPUT

EE[3..0]OUTPUT

QQ[11..0]OUTPUT

JMPOUTPUT

WTNOUTPUT

BTOUTPUT

CLRN

DPRN

Q

DFF

inst3

CLRN

DPRN

Q

DFF

inst4

CLRN

DPRN

Q

DFF

inst5

CLRN

DPRN

Q

DFF

inst6

CLRN

DPRN

Q

DFF

inst9

CLRN

DPRN

Q

DFF

inst10

CLRN

DPRN

Q

DFF

inst11

CLRN

DPRN

Q

DFF

inst12

NOT

inst27

AND4

inst29

PRN

CLRN

D

ENA

Q

DFFE

inst19CLRN

DPRN

Q

DFF

inst26

CLRN

DPRN

Q

DFF

inst21CLRN

DPRN

Q

DFF

inst24

OR4

inst8

AND2

inst13

AND2

inst14

AND2

inst15

AND2

inst16

CLRN

DPRN

Q

DFF

inst25

AND2

inst1

NAND2

inst2

CLRN

DPRN

Q

DFF

inst28

CLRN

DPRN

Q

DFF

inst30

CLRN

DPRN

Q

DFF

inst31

OR4

inst

CLRN

DPRN

Q

DFF

inst32

OR4

inst18

OR4

inst20

up countersclr

clockq[6..0]

lpm_counter1

inst7

QA[3]

QA[2]

QA[1]

QA[0]

CLK0

CLK90

CLK180

CLK270 CLK90

QQ[3]

QQ[2]

QQ[1]

QQ[0]

CLK0

QQN[6..3]

QQN[5..2]

QQ[4..1]

QQ[3..0]

AD[3..0]

QQ[7..0] QQN[7..0]

CLK0

QQ[3..0] QQ[7..4]

CLK0 CLK0

QQ[7..4] QQ[11..8]

EE[3]

EE[2]

EE[1]

EE[0]

QQ[11]

QQ[10]

QQ[9]

CLK0

QQ[8]

CLK0

CLK0

ADQ[0]

EE[3]

ADQ[3]

EE[0]

CLK0

AD[3]

CLK0

ADQ[3..0]

ADQ[1]

ADQ[0]AD[2]

CLK0

CLK0

ADQ[3]

ADQ[2]

ADQ[1]

ADQ[0] QCNT[6..0]

QCNT[6]

VCCBB INPUT

VCCBX INPUT

VCCJMP INPUT

VCCEN INPUT

VCCCLK INPUT

C1OUTPUT

C0OUTPUT

Q[4..0]OUTPUTdata1x[4..0]

data0x[4..0]

sel

result[4..0]

lpm_mux4

inst

PRN

CLRN

D

ENA

Q

DFFE

inst5

OR2

inst9

XOR

inst10

XOR

inst11

NOT

inst12

PRN

CLRN

D

ENA

Q

DFFE

inst6

PRN

CLRN

D

ENA

Q

DFFE

inst7

Q[2..0],BB,BX

Q[3..0],BBD[4..0] Q[4..0]

EN

CLK

CLK

EN

CLK

EN

JMP

CLK: 375MHz Data Rate:

375Mbits/s

Page 65: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 65

The Two-Cycle Serial IO

This scheme is slower than digital phase follower but the logic is simpler. The CLK1 and CLK2 can be generated with two free running crystal oscillators.

CLK1

Data Out

Transmitter

Receiver

start bit = 1 b15 b14

b15start bit = 1 X b14X

CLK2

Data In

One data bit is transmitted every 2 clock cycles.

A logic transition is detected between these two falling edges.

Input data are stable at these clock edges.

Page 66: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 66

Schematics of the Two-Cycle Serial IO

VCCCK200 INPUT

VCCDD[15..0] INPUT

VCCDRDY INPUT

VCCSDIN INPUT

VCCDV INPUT

VCCCK100 INPUT

QQ[15..0]OUTPUT

SDOUTOUTPUT

POPCMDOUTPUT

QQOKOUTPUT

VC

CG

ND

CLRN

DPRN

Q

DFF

inst4

up countermodulus 36sclr

clockq[5..0]

cout

lpm_counterS2

inst3

CLRN

DPRN

Q

DFF

inst7

NOT

inst9

OR2

inst10NOT

inst11

NOT

inst12

CLRN

DPRN

Q

DFF

inst13

CLRN

DPRN

Q

DFF

inst14

NOT

inst16

CLRN

DPRN

Q

DFF

inst18

CLRN

DPRN

Q

DFF

inst19

AND4

inst20

NOT

inst17 up countersset 32sset

clock

cnt_en

q[5..0]

lpm_counterS4

inst2

AND6

inst22CLRN

DPRN

Q

DFF

inst23

lef t shif tload

data[16..0]

clock

enable

shiftin

shiftout

lpm_shiftregS1

inst

lef t shif tclock

enable

shiftinq[15..0]

lpm_shiftregS5

inst21

PRN

CLRN

D

ENA

Q

DFFE

inst1

CLRN

DPRN

Q

DFF

inst5

CLRN

DPRN

Q

DFF

inst24

OR2

inst15

v v v [31..0]

zzz[31..0]

CK200

CK200

DRDY

v v v [16],DD[15..0]

DV

ENA1

zzz[0]

CK200

ENA1

ENA1DV

CK200

CK200 CK200N

CK200

SEQ[0]

CK200

SEQ[0]

SEQ[5]

SEQ[4]

SEQ[3]

SEQ[2]

SEQ[1]

SDINQ

CK200N

SDIN

CK200N

CK200

CK200

SEQ[5..0]

SEQ[5]

SEQ[5]

CK200

SDIN SDINQ

CK200

CK100

CK200

434241403938373635343332

SDIN

SEQ

SDINQ

QQ

SD15 SD14 SD13 SD12 SD11 SD10

SD15 SD15,14 SD15..13 SD15..12

SSET

ENAS=SEQ[0]

SDIN1NQ

SDIN2NQ

CK200

CLK: 200MHz Data Rate: 100Mbits/s

Page 67: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 67

The FM coding

A bit is transmitted in two unit time intervals, usually in two internal clock cycles at frequency f.

For bit=1, the output toggles each cycle, i.e., with frequency (f/2) and for bit=0, the output toggles every two cycles, i.e., with frequency (f/4).

When not transmitting data, the output toggles at frequency (f/4), until seeing the start bit. The data stream is naturally DC balanced suitable for AC coupled transmission. The polarity of the interconnection doesn’t matter.

0 start bit = 1 0 0 1 1

Page 68: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 68

Schematics of FM Decoder

VCCCK212 INPUT

VCCINA INPUT

DVOUTPUT

DQ[17..0]OUTPUT

PQOUTPUT

CLRN

DPRN

Q

DFF

inst CLRN

DPRN

Q

DFF

inst2

CLRN

DPRN

Q

DFF

inst3

XOR

inst4

up countersset 8sset

clock

cnt_en

q[3..0]

lpm_counter1

inst5

data[2..0]

eq0

eq1

eq2

eq3

eq4

eq5

eq6

eq7

lpm_decode0

inst6

AND2

inst8

NOT

inst10

data[2..0]

eq0

eq1

eq2

eq3

eq4

eq5

eq6

eq7

lpm_decode0

inst11

up countersset 360sset

clock

cnt_en

q[8..0]

lpm_counter4

inst7

PRN

CLRN

D

ENA

Q

DFFE

inst1

NOT

inst9

AND6

inst12CLRN

DPRN

Q

DFF

inst13

CK212

CK212

CK212

INATOG

CK212

INATOG

TOGCNT[3..0]

TOGCNT[3]

INAQ

TOGCNT[2..0]

INAis0x

CK212

CNTSHFT

SSETFCNTSSETFCNTINAis0x

CNTSHFT

CNTSHFT,BitCNT[4..0],BTK[2..0] BTK[2..0]

OKSample

CK212

DQ[17..0],PQ

DD

OKSample

BitCNT[4]

OKSample

BitCNT[3]

BitCNT[2]

BitCNT[1]

BitCNT[0]CK212

DQ[16..0],PQ,DD

TOGCNT[2]

0 0

INAQ

INATOG

TOGCNT[2..0] 1 2 3 1 2 3 0 1 2 3 0 01 2 3 1 2 34 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3

SSETFCNT

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7BTK

CNTSHFT

OKSample

BitCNT 13 14

0 1 2 3 4 5 6 7

... 31

DV

DQ[17] DQ[16] DQ[0] PQ

Logic 0: INA:13.25MHz or 8xCK212

BitCNT: 13..31, Init to 13x8+256=260

CLK: 212MHz Data Rate: 26.5Mbits/s The ratio 8 CLK cycles/bit in this design is not an intrinsic limit.

Page 69: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 69

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

-1 0 1 2 3 4 5 6

The Clock-Command Combined Carrier Coding (C5)

A data train contains 5 pulses and each pulse is transmitted in four unit time intervals, usually in four internal clock cycles at frequency f.

Information is carried with wide, normal and narrow pulses and the first pulse is always wide or narrow.

When not transmitting data, all pulses have normal width. The data stream is DC balanced over 5 pulses suitable for AC coupled transmission. All leading edges are evenly spread so that the pulse train can be used directly drive the

receiver side logic or PLL.

Page 70: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 70

Schematics of C5 Decoder

VCCCC INPUT

VCCT38 INPUT

VCCT58 INPUT

CmdValidOUTPUT

CmdBit[3..0]OUTPUT

Y[0..4]OUTPUT

NOT

inst

CLRN

DPRN

Q

DFF

inst3

CLRN

DPRN

Q

DFF

inst4

NA

ND

2

inst

6

CLRN

DPRN

Q

DFF

inst7

CLRN

DPRN

Q

DFF

inst8

NA

ND

2

inst

9

CLRN

DPRN

Q

DFF

inst10

CLRN

DPRN

Q

DFF

inst11

NA

ND

2

inst

12

CLRN

DPRN

Q

DFF

inst13

CLRN

DPRN

Q

DFF

inst14

NA

ND

2

inst

15

CLRN

DPRN

Q

DFF

inst16

CLRN

DPRN

Q

DFF

inst17

CLRN

DPRN

Q

DFF

inst18

NOT

inst19

AND2

inst20

DFFdata[3..0]

clock

enableq[3..0]

lpm_dff0

inst22

up countermodulus 5sclr

clockq[3..0]

cout

lpm_counter0

inst27

BAND4

inst1

CLRN

DPRN

Q

DFF

inst21CLRN

DPRN

Q

DFF

inst23

Y[0]

CmdBit[3..0]

Y[0..3]

Y[1]

Y[2]

Y[3]

Y[4]

VCCCC INPUT

VCCC40 INPUT

T38OUTPUT

T58OUTPUT

CLRN

DPRN

Q

DFF

instCLRN

DPRN

Q

DFF

inst1CLRN

DPRN

Q

DFF

inst2

NOT

inst3

VCCCC INPUT

Cy clone

inclk0 period: 36.000 ns

Operation Mode: Normal

Clk Ratio Ph (dg) Td (ns) DC (%)

c0 4/1 0.00 0.00 50.00

e0 1/1 0.00 0.00 50.00

inclk0 c0

e0

locked

altpll1

inst2

CC

C40

T38

T58

Delay

inst3

T38

T58

CC

Y[0..4]

CmdValid

CmdBit[3..0]

Composer

inst8

Data Rate: 36ns/bit or 27.7Mbits/s

Internal clock: 111MHz

Page 71: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 71

Outline Counting:

Example: LED brightness and DAC Simple Sequencing

Bandwidth and Noise Issues: General Remarks on Sampling Theorem and Dithering. Example: Huffman Coding Example: Decimation & Dynamic Decimation

After-fact Calibration: Several Topics on FPGA Based TDC Serial Communication with Independent Crystals Minimum Synchronization

Page 72: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 72

Fixed Latency Everywhere?

In classical trigger system, all cables must have fixed propagation delay.

Serial links intrinsically do not have fixed latency. Do we need fixed latency at all? No.

FrontEnd

Trigger

FrontEnd

FrontEnd

FrontEnd

Trigger

FrontEnd

FrontEnd

SER

DESERDESERDESER

SER SER

?

TimingReference

Page 73: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 73

Hit Time Coding and Transmitting

Hits in each channel are coded as bits representing small time intervals.

Bit patterns are merged in a front-end module.

DetectorProcessing

BoardHit

5ns

40ns

0 1 0 0 0 0 01

0 0 0 0 0 01

0 1 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 01

0

0

00

0CLK&CMD

Page 74: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 74

Cable Delay Self Timing

At system initialization, all the Detector Processing Boards send out a special word

in the same clock cycle as start mark. At the receiving end, the absolute arrival

time from each board can be unknown and different. However, the start mark is recognized and stored in the addresses 0 of the corresponding receiving buffer. The words after the start mark are stored in sequence.

Processing Support Board

Detector Processing Board

Detector Processing Board

Detector Processing Board

Page 75: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 75

An Example

InitialMarker

Data

InitialMarker

Data

Page 76: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 76

Hit Merging and Coincidence

Hits from different inputs in the Processing Support Board are merged together with an OR function and sent out as a serial data stream.

The Coincidence Module re-align the different stream in the receiver buffers. Inside the Coincidence Module, the coincidence is searched as AND functions of the hit streams from

opposite detector sectors. Very likely, a boundary coverage logic is applied, e.g.: Trigger T[N] = HA[N]&&(HB[N] || HC[N]).

The boundary coverage for time domain is also necessary. This is satisfied by checking adjacent bits in the buffered words, e.g.: Trigger T[N] = (HA[N+1] || HA[N] || HA[N-1])&&(HB[N] || HC[N]).

Processing Support Board

Processing Support Board Coincidence Module

Page 77: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course

77

Post-Scripts

Some Extra Words for the

Young & Old

Page 78: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 78

About FPGA: Myths & Thinking We commonly heard about FPGA:

FPGA is cheap. FPGA is fast. FPGA is large. FPGA can do anything.

Not really. At least it is not always the case. Good design tricks are needed in order to take full

advantages of FPGA devices and to avoid drawbacks of FPGA devices.

FPGA: $16-$1500, Micro-Processor: $100-$500. FPGA: 500MHz, Micro-Processor: 1-3GHz. FPGA logic consumes more transistors. Only if the information is collected in FPGA.

Page 79: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 79

Moore’s Law

Number of transistors in a package:

x2 /18months

Taken from www.intel.com

Page 80: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 80

Status of Moore’s Law: an Inconvenient Truth

# of transistors Yes, via multi-core.

Clock Speed ?

Taken from www.intel.com

Page 81: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 81

Complexity in FPGA Designs

Excessive Complexity in FPGA Designs

= Fevers of Moore’s Law + Myths + No Thinking

Complexity causes higher FPGA cost. Complexity creates indirect costs such as PCB

layout, assembly, power consumption, cooling etc.

Complexity confuses people, including designers.

Page 82: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 82

Indirect Cost of Complexity

If something like this can do the job…

… why do these?

Page 83: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course 83

The Winning Line of FPGA Design

We commonly heard: FPGA devices contains millions gate. High parallelism can be implemented in FPGA. FPGA cost drops by half every 18 months.

We want to emphasize, especially to our young students:

1. Creativity,

2. Creativity,

3. Creativity, on Arithmetic ops, on Algorithms, on Architectures & on All Aspects.

O Freunde, nicht diese Töne!

Page 84: May. 2009, Wu Jinyuan, Fermilab jywu168@fnal.gov IEEE RT09 Short Course 1 FPGA Structure, Programming Principals and Applications: Part II Wu, Jinyuan

May. 2009, Wu Jinyuan, Fermilab [email protected]

IEEE RT09 Short Course

84

The End

Thanks