a bus architecture for crosstalk elimination in high performance processor design
DESCRIPTION
A Bus Architecture for Crosstalk Elimination in High Performance Processor Design. Wen-Wen Hsieh Advisor : Ting Ting Hwang. Outline. Introduction Motivation and Observation The Proposed Bus Architecture Experimental Results Conclusion. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
A Bus Architecture for A Bus Architecture for Crosstalk Elimination Crosstalk Elimination in High Performance in High Performance
Processor DesignProcessor Design
Wen-Wen HsiehWen-Wen HsiehAdvisor : Ting Ting HwangAdvisor : Ting Ting Hwang
OutlineOutline
IntroductionIntroduction Motivation and ObservationMotivation and Observation The Proposed Bus ArchitectureThe Proposed Bus Architecture Experimental ResultsExperimental Results ConclusionConclusion
IntroductionIntroduction
Crosstalk is the effect due to the Crosstalk is the effect due to the coupling coupling capacitancescapacitances..
Crosstalk causes Crosstalk causes additional delayadditional delay, , power power consumptionconsumption and and incorrect resultincorrect result of a of a circuit. circuit.
Crosstalk effect becomes much more Crosstalk effect becomes much more serious in serious in long on-chip buslong on-chip bus..
3C3C
TTi-1i-1 TTii
WWj-1j-1
WWjj
WWj+1j+1
4C4C
TTi-1i-1 TTii
WWj-1j-1
WWjj
WWj+1j+1
Crosstalk TypeCrosstalk Type
Crosstalk is classified into 4 types Crosstalk is classified into 4 types [Duan2001][Duan2001]
1C1C
TTi-1i-1 TTii
WWj-1j-1
WWjj
WWj+1j+1
2C2C
TTi-1i-1 TTii
WWj-1j-1
WWjj
WWj+1j+1
Delay with / without Delay with / without CrosstalkCrosstalk
Delay comparison for bus length 10mm in 100 Delay comparison for bus length 10mm in 100 nmnm process process [Duan2001][Duan2001]
0
100
200
300
400
500
600
700
800
900
1000
0C 1C 2C 3C 4C
Tim
e (
ps
)
Bit Ratio of 3C and 4CBit Ratio of 3C and 4C
3C 3C and and 4C 4C types of crosstalk cause types of crosstalk cause serious serious delay penaltydelay penalty but take only a but take only a small portionsmall portion of of the total transmitted data.the total transmitted data.
benchmarkbenchmarkbits of
instruction bits of
3C and 4C
ratio of
3C and 4C (%)
multiply 180736 6430 3.63.6
update 576480 20256 3.53.5
convolution 168192 5914 3.53.5
dot_product 108256 4070 3.83.8
fir2dim 195296 7500 3.83.8
fir 134016 5048 3.83.8
irr_nsection 301440 10698 3.53.5
matrix 107424 3983 3.63.6
lms 2036064 73427 3.63.6
Fetch Rate and Commit Fetch Rate and Commit RateRate
In superscalar architecture, the instruction In superscalar architecture, the instruction fetch ratefetch rate is much is much higherhigher than instruction than instruction commit ratecommit rate in bus transmission. in bus transmission.
0%
20%
40%
60%
80%
100%
multiply dot_product update matrix convolution irr_nsection fir lms fir2dim
average 36.03%
commit rate
Basic ArchitectureBasic Architecture
MemoryMemory ProcessorProcessorPrefetch Prefetch unit unit
mmbb
busbus
de-de-assemblerassembler
bbb+nb+nassemblerassembler
bb
busbus
bus width = 128, channel number = 4, channel size = 32bus width = 128, channel number = 4, channel size = 32
Bus StructureBus Structure
Memory
bus
Prefetch unit
channel1
channel2
channel3
channel4
dataT, 1dataT, 1
dataT, 2dataT, 2
dataT, 3dataT, 3
dataT, 4dataT, 4
An Example at Cycle An Example at Cycle tt
crosstalkMemory
channel1
channel2
channel3
channel4
Prefetch unit
data sent at cycle t-1 are recorded
crosstalk
datat, 3datat, 3
no crosstalk
datat, 4datat, 4
datat, 4datat, 4datat, 3datat, 3
no crosstalk
datat-1, 1datat-1, 1
datat-1, 2datat-1, 2
datat-1, 3datat-1, 3
datat-1, 4datat-1, 4
datat, 2datat, 2
datat, 1datat, 1
datat, 3datat, 3
datat, 2datat, 2
crosstalk?
NOPNOP
datat, 1datat, 1
NOPNOP
datat, 2datat, 2
bus width = 128, channel number = 4, channel size = 32bus width = 128, channel number = 4, channel size = 32
datadatat, i+1t, i+1datadatat, i+1t, i+1
Separation BitsSeparation Bits
Crosstalk elimination between adjacent data Crosstalk elimination between adjacent data segments.segments.
Distinguish data segment from NOP Distinguish data segment from NOP segment.segment.
datadatat, it, idatadatat, it, i ?XX
XX
00 000 0
data or NOP ?data or NOP ?
?NO crosstalkNO crosstalk
Crosstalk Free Crosstalk Free ConnectionConnection
00 00 00
X
datadatat, it, i
XX
datadatat, i+1t, i+1
?? XX
Crosstalk Free Crosstalk Free ConnectionConnection
1 1 1
0 0 0
0 1 1
1 1 0
0 1 0
0 0 1
1 0 1
1 0 0
00 11 00
11 11 00
11 11 11
11 00 00
11 00 1100 00 00
00 00 1100 11 11
1 1 1
0 0 0
0 1 1
1 1 0
0 1 0
0 0 1
1 0 1
1 0 0
00 11 00
11 11 00
11 11 11
11 00 00
11 00 1100 00 00
00 00 1100 11 11
Crosstalk Free CyclicCrosstalk Free Cyclic Any pairs in Any pairs in crosstalk free cycliccrosstalk free cyclic incur no crosstalk incur no crosstalk
Data Segment Data Segment CombinationCombination
00
0 0 00 0 000
datadatat, it, i is REAL DATA is REAL DATA datadatat, it, i is NOP is NOP
11
00
11
00
11
11
0 0 00 0 000
11
datadatat, it, idatadatat, it, i datadatat, i+1t, i+1
datadatat, i+1t, i+1 datadatat, it, idatadatat, it, i datadatat, i+1t, i+1
datadatat, i+1t, i+1
Separation Bits Separation Bits AssignmentAssignment
00
0 0 00 0 000
datadatat, it, i is REAL DATA is REAL DATA datadatat, it, i is NOP is NOP
datadatat, it, idatadatat, it, i
11
00
11
00
11
11
0 0 00 0 000
11
1 01 0
1 01 0
1 01 0
1 01 0
0 00 0
0 00 0
separation bits isseparation bits is 1 01 0 separation bits isseparation bits is 0 00 0
datadatat, i+1t, i+1datadatat, i+1t, i+1 datadatat, it, i
datadatat, it, i datadatat, i+1t, i+1datadatat, i+1t, i+1
De-Assembler De-Assembler ArchitectureArchitecture
NOPNOP regreg regreg regreg regreg
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
Sel_logicSel_logic
MUX1MUX1 MUX1MUX1 MUX1MUX1 MUX1MUX1
MUX2MUX2 MUX2MUX2 MUX2MUX2 MUX2MUX2
separation separation unitunit
[134:103][134:103]channelchannel11
[100:69][100:69]channelchannel22
[66:35][66:35]channelchannel33
[32:1][32:1]channelchannel44
separation separation bitsbits
[102:101][102:101]
separation separation bitsbits
[68:67][68:67]
separation separation bitsbits
[34:33][34:33]
separation separation bitsbits
[0][0]
datadata11
[127:96][127:96]datadata22
[95:64][95:64]datadata33
[63:32][63:32]datadata44
[31:0][31:0]
De-Assembler De-Assembler ArchitectureArchitecture
NOPNOP regreg regreg regreg regreg
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
Sel_logicSel_logic
MUX1MUX1 MUX1MUX1 MUX1MUX1 MUX1MUX1
MUX2MUX2 MUX2MUX2 MUX2MUX2 MUX2MUX2
separation separation unitunit
[134:103][134:103]channelchannel11
[100:69][100:69]channelchannel22
[66:35][66:35]channelchannel33
[32:1][32:1]channelchannel44
separation separation bitsbits
[102:101][102:101]
separation separation bitsbits
[68:67][68:67]
separation separation bitsbits
[34:33][34:33]
separation separation bitsbits
[0][0]
datadata11
[127:96][127:96]datadata22
[95:64][95:64]datadata33
[63:32][63:32]datadata44
[31:0][31:0]
De-Assembler De-Assembler ArchitectureArchitecture
NOPNOP regreg regreg regreg regreg
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
Sel_logicSel_logic
MUX1MUX1 MUX1MUX1 MUX1MUX1 MUX1MUX1
MUX2MUX2 MUX2MUX2 MUX2MUX2 MUX2MUX2
separation separation unitunit
[134:103][134:103]channelchannel11
[100:69][100:69]channelchannel22
[66:35][66:35]channelchannel33
[32:1][32:1]channelchannel44
separation separation bitsbits
[102:101][102:101]
separation separation bitsbits
[68:67][68:67]
separation separation bitsbits
[34:33][34:33]
separation separation bitsbits
[0][0]
datadata11
[127:96][127:96]datadata22
[95:64][95:64]datadata33
[63:32][63:32]datadata44
[31:0][31:0]
De-Assembler De-Assembler ArchitectureArchitecture
NOPNOP regreg regreg regreg regreg
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
Sel_logicSel_logic
MUX1MUX1 MUX1MUX1 MUX1MUX1 MUX1MUX1
MUX2MUX2 MUX2MUX2 MUX2MUX2 MUX2MUX2
separation separation unitunit
[134:103][134:103]channelchannel11
[100:69][100:69]channelchannel22
[66:35][66:35]channelchannel33
[32:1][32:1]channelchannel44
separation separation bitsbits
[102:101][102:101]
separation separation bitsbits
[68:67][68:67]
separation separation bitsbits
[34:33][34:33]
separation separation bitsbits
[0][0]
datadata11
[127:96][127:96]datadata22
[95:64][95:64]datadata33
[63:32][63:32]datadata44
[31:0][31:0]
De-Assembler De-Assembler ArchitectureArchitecture
NOPNOP regreg regreg regreg regreg
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
Sel_logicSel_logic
MUX1MUX1 MUX1MUX1 MUX1MUX1 MUX1MUX1
MUX2MUX2 MUX2MUX2 MUX2MUX2 MUX2MUX2
separation separation unitunit
[134:103][134:103]channelchannel11
[100:69][100:69]channelchannel22
[66:35][66:35]channelchannel33
[32:1][32:1]channelchannel44
separation separation bitsbits
[102:101][102:101]
separation separation bitsbits
[68:67][68:67]
separation separation bitsbits
[34:33][34:33]
separation separation bitsbits
[0][0]
datadata11
[127:96][127:96]datadata22
[95:64][95:64]datadata33
[63:32][63:32]datadata44
[31:0][31:0]
De-Assembler De-Assembler ArchitectureArchitecture
NOPNOP regreg regreg regreg regreg
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
cross-cross-detectordetector
Sel_logicSel_logic
MUX1MUX1 MUX1MUX1 MUX1MUX1 MUX1MUX1
MUX2MUX2 MUX2MUX2 MUX2MUX2 MUX2MUX2
separation separation unitunit
[134:103][134:103]channelchannel11
[100:69][100:69]channelchannel22
[66:35][66:35]channelchannel33
[32:1][32:1]channelchannel44
separation separation bitsbits
[102:101][102:101]
separation separation bitsbits
[68:67][68:67]
separation separation bitsbits
[34:33][34:33]
separation separation bitsbits
[0][0]
datadata11
[127:96][127:96]datadata22
[95:64][95:64]datadata33
[63:32][63:32]datadata44
[31:0][31:0]
Assembler ArchitectureAssembler Architecture
Prefetch unit (buPrefetch unit (buffer queue)ffer queue)
MUXMUX11 MUXMUX22 MUXMUX33 MUXMUX44
DSel_logicDSel_logic
separation separation bitsbits
[102:101][102:101]
separation separation bitsbits
[68:67][68:67]
separation separation bitsbits
[34:33][34:33]
separation separation bitsbits
[0][0][134:103][134:103]channelchannel11
[100:69][100:69]channelchannel22
[66:35][66:35]channelchannel33
[32:1][32:1]channelchannel44
Performance Performance Improvement Improvement
techtech 100nm100nm 70nm70nm
busbus
lengthlength10mm10mm 15mm15mm 20mm20mm 10mm10mm 15mm15mm 20mm20mm
0C0C 1.001.00 1.001.00 1.001.00 1.001.00 1.001.00 1.001.00
1C1C 1.941.94 1.891.89 1.731.73 1.611.61 1.571.57 1.741.74
2C2C 5.915.91 6.086.08 5.215.21 4.284.28 4.494.49 4.844.84
3C3C 6.646.64 7.147.14 6.626.62 5.115.11 6.396.39 7.587.58
4C4C 7.577.57 8.508.50 7.667.66 5.875.87 8.048.04 9.869.86
deassemblerdeassembler 0.510.51 0.240.24 0.120.12 0.260.26 0.120.12 0.080.08
assemblerassembler 0.220.22 0.100.10 0.050.05 0.110.11 0.050.05 0.030.03
improvement improvement ratio (%)ratio (%) 12.1512.15 24.4024.40 29.7329.73 20.8320.83 41.9841.98 49.7749.77
Extra Wires Number Extra Wires Number ComparisonComparison
The number of extra wires compares with The number of extra wires compares with Victor’s work. [Victor2001]Victor’s work. [Victor2001]
bus bus widthwidth
OursOurs
Victor’sVictor’s
theoreticaltheoretical
Victor’sVictor’s
practicalpractical
Channel sizeChannel size
44 88 1616 3232
3232 1515 77 33 11 1414 2121
6464 3131 1515 77 33 2828 4545
128128 6363 3131 1515 77 5959 8585
Cycle Count Overhead Cycle Count Overhead RatioRatio
Channel SizeChannel Size
44 88 1616 3232
complex_multiplycomplex_multiply 0.17%0.17% 0.04%0.04% 0.09%0.09% 0.26%0.26%
complex_updatecomplex_update 0.13%0.13% 0.04%0.04% 0.17%0.17% 0.50%0.50%
ConvolutionConvolution 0.13%0.13% 0.32%0.32% 0.06%0.06% 0.28%0.28%
dot_productdot_product 0.08%0.08% 00 0.13%0.13% 0.21%0.21%
Fir2dimFir2dim 0.03%0.03% 0.08%0.08% 0.06%0.06% 0.18%0.18%
FirFir 0.11%0.11% 0.01%0.01% 0.02%0.02% 0.08%0.08%
iir_Nsectioniir_Nsection 0.14%0.14% 0.11%0.11% 0.06%0.06% 0.37%0.37%
iir_1sectioniir_1section 0.17%0.17% 0.08%0.08% 0.13%0.13% 0.43%0.43%
LmsLms 0.09%0.09% 0.07%0.07% 0.12%0.12% 0.15%0.15%
MatrixMatrix 0.01%0.01% 0.01%0.01% 0.06%0.06% 0.02%0.02%
Matrix1x3Matrix1x3 0.14%0.14% 0.07%0.07% 0.11%0.11% 0.18%0.18%
n_complex_updaten_complex_update 0.05%0.05% 0%0% 0.07%0.07% 0.21%0.21%
n_real_updaten_real_update 0.08%0.08% 0.02%0.02% 0.08%0.08% 0.28%0.28%
real_updatereal_update 0.05%0.05% 0.88%0.88% 0.18%0.18% 0.39%0.39%
averageaverage 0.10%0.10% 0.12%0.12% 0.09%0.09% 0.25%0.25%
ConclusionConclusion A novel bus structure to eliminate 3C A novel bus structure to eliminate 3C
and 4C crosstalk.and 4C crosstalk.
49.77% performance improvement ratio 49.77% performance improvement ratio in the best case. in the best case.
With only 7 extra wires as compared With only 7 extra wires as compared with 85 [Victor2001with 85 [Victor2001].].
AppendixAppendix
The area overhead for 128-bit bus width wiThe area overhead for 128-bit bus width with channel size 32th channel size 32
area typearea type OursOurs Victor’sVictor’s
logic logic circuitcircuit
deassembler /deassembler /
encoderencoder
gate countgate count 97949794 885885
area (area (μm μm )) 14792.9714792.97 2359.392359.39
# storage # storage element element
(bits)(bits)128128 00
assembler /assembler /
decoderdecoder
gate countgate count 879879 14021402
area (area (μm μm )) 2053.852053.85 3381.223381.22
# extra wires (bits)# extra wires (bits) 77 8585
AppendixAppendix
The overall improvement on bus transmission.
0
0.5
1
1.5
2
2.5
0.1 0.07
The
impr
ovem
ent r
ate
μm