fully lte-a compliant turbo decoder for base station ......1 gbit/s lte-a tc decoder@28nm •...
TRANSCRIPT
Fully LTE-A compliant Turbo Decoder for Base Station Applications
Dipl Ing Stefan Weithoffer weithoffereituni-klde
41114 Brest
Evolution
2
hellip
Demand for high throughput
How to Increase Throughput
Use multiple slow decoders
Use monolithic high speed decoder
Dec
Dec
Dec
Dec
Dec
PRO
Easy to implement
CON
Low efficiency large memory
Large latency
PRO
Higher efficiency
Lower latency
CON
Challenging architecture due to iterative decoding
State-of-the-Art Turbo-Code Decoders
P MAP decoder in parallel
LTE conflict-free interleaver up to parallelism of 64
Subblock size BP
Windowing inside MAP to reduce memory (sliding window of size WL)
MAP1
Subblock 1
Interleaver Deinterleaver
Network
MAP2
Subblock 2
MAPP
Subblock P
write
read
LLLL
fn)LPB(
BTP
ACQWLpipelineMAP
iter_halfMAP
MAP
B
BP
WL
Acquisition necessary
Softdecoder inherent serial serial fwdbwd recursion on complete block
challenge parallelize MAP decoding
Turbo-Code Decoder
High Throughput (high code rates) large P
Communications performance decreases for high code rates with small LACQ
Increase LACQ to counterbalance communications performance decrease
LMAP dominates saturation in throughput
Smaller P Radix 2 Radix 4 only P2 for same throughput
Smaller LACQ Next iteration initialization (NII) LACQ =0
Smaller LWL no windowing inside MAP LWL=0
Improves communications performance
But second LLR unit mandatory and increase in memory
Re-computation only every nth metric is stored Additional state metric unit re-
calculates the other n-1 metrics Optimum n= 1198612119875
Eg LTE reduces memory storage from B=6144 to 768 state metrics
___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014
ACQWLpipelineMAP
iter_halfMAP
MAP LLLL fn)LPB(
BTP
Turbo-Code Decoder
High Throughput (high code rates) large P
Communications performance decreases for high code rates with small LACQ
Increase LACQ to counterbalance communications performance decrease
LMAP dominates saturation in throughput
Smaller P Radix 2 Radix 4 only P2 for same throughput
Smaller LACQ Next iteration initialization (NII) LACQ =0
Smaller LWL no windowing inside MAP LWL=0
Improves communications performance
But second LLR unit mandatory and increase in memory
Re-computation only every nth metric is stored Additional state metric unit re-
calculates the other n-1 metrics Optimum n= 1198612119875
Eg LTE reduces memory storage from B=6144 to 768 state metrics
___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014
ACQWLpipelineMAP
iter_halfMAP
MAP LLLL fn)LPB(
BTP
Throughput dependent on Parallelism
LTE Turbo-Code Decoder
High Throughput iteration control
Various iteration control schemes known
LTE comes with CRC attached to code blocks
If CRC on the decoded code block succeeds stop decoding
For high code rates decoding process can oscillate
CRC after every half iteration mandatory
___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany
ACQWLpipelineMAP
iterhalfMAP
MAP LLLLfnLPB
BTP
)( _
Cyclic Redundancy Check
Example 8 bit CRC (Polynomial x8+x2+x+1)
The CRC of a code block can be very efficiently evaluated if the data bits are in linear order
Ok for non-interleaved half iterations
BUT
What about interleaved half iterations
9
CRC OK
Iteration Control CRC timing
10
Additional latency
Additional memory
Two CRC units necessary
LTE Turbo-Code Decoder
Iteration control based on code block CRC
Significant additional latency for interleaved half iterations
Additional memory (B-bit)
Second CRC unit mandatory
On-the-fly CRC calculation
Same (small) LCRC for non-interleaved and interleaved half iterations
No additional memory needed
Easier IO control
No second CRC unit needed
ACQWLpipelineMAP
CRCiterhalfMAP
MAP LLLLfLnLPB
BTP
))(( _
1 Gbits LTE-A TC Decoder28nm
bull Support for all LTE code block sizes
bull On-the-fly CRC calculation
bull Transport Block (TB) CRC calculation for
up to 4 TBs
bull Softout calculation
___
wo memories
12
Architecture Parameters
Radix Parallelism 4 16
Max Window Length 192
Max Iter 7
Acquisition NII
Input quantization 8
Technology 28 nm
Area (Post Synthesis) ~065 mm2
Throughput (Mbps) 1000
Frequency (MHz) 700
Thank you for attention
For more information please visit
httpemseituni-klde
215 Gbits LTE TC Decoder65nm
Generic Decoder Structure
Parallelize block processing
Turbo-Code Decoder softdecoder inherent serial
LDPC Decoder inherent parallel since node processing independent
Network (interleaver tanner graph) no locality
Routing congestion access conflicts
Impact on communication standards (UMTSHSPA versus LTE)
Softdecoder 1 (MAP)
Check_n1
Check_nN
Softdecoder 2 (MAP)
Variable_n1
Variable_nN
Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)
Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks
Processing Block 1 Processing Block 2 Network
Turbo Decoder Design Space
Evolution
2
hellip
Demand for high throughput
How to Increase Throughput
Use multiple slow decoders
Use monolithic high speed decoder
Dec
Dec
Dec
Dec
Dec
PRO
Easy to implement
CON
Low efficiency large memory
Large latency
PRO
Higher efficiency
Lower latency
CON
Challenging architecture due to iterative decoding
State-of-the-Art Turbo-Code Decoders
P MAP decoder in parallel
LTE conflict-free interleaver up to parallelism of 64
Subblock size BP
Windowing inside MAP to reduce memory (sliding window of size WL)
MAP1
Subblock 1
Interleaver Deinterleaver
Network
MAP2
Subblock 2
MAPP
Subblock P
write
read
LLLL
fn)LPB(
BTP
ACQWLpipelineMAP
iter_halfMAP
MAP
B
BP
WL
Acquisition necessary
Softdecoder inherent serial serial fwdbwd recursion on complete block
challenge parallelize MAP decoding
Turbo-Code Decoder
High Throughput (high code rates) large P
Communications performance decreases for high code rates with small LACQ
Increase LACQ to counterbalance communications performance decrease
LMAP dominates saturation in throughput
Smaller P Radix 2 Radix 4 only P2 for same throughput
Smaller LACQ Next iteration initialization (NII) LACQ =0
Smaller LWL no windowing inside MAP LWL=0
Improves communications performance
But second LLR unit mandatory and increase in memory
Re-computation only every nth metric is stored Additional state metric unit re-
calculates the other n-1 metrics Optimum n= 1198612119875
Eg LTE reduces memory storage from B=6144 to 768 state metrics
___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014
ACQWLpipelineMAP
iter_halfMAP
MAP LLLL fn)LPB(
BTP
Turbo-Code Decoder
High Throughput (high code rates) large P
Communications performance decreases for high code rates with small LACQ
Increase LACQ to counterbalance communications performance decrease
LMAP dominates saturation in throughput
Smaller P Radix 2 Radix 4 only P2 for same throughput
Smaller LACQ Next iteration initialization (NII) LACQ =0
Smaller LWL no windowing inside MAP LWL=0
Improves communications performance
But second LLR unit mandatory and increase in memory
Re-computation only every nth metric is stored Additional state metric unit re-
calculates the other n-1 metrics Optimum n= 1198612119875
Eg LTE reduces memory storage from B=6144 to 768 state metrics
___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014
ACQWLpipelineMAP
iter_halfMAP
MAP LLLL fn)LPB(
BTP
Throughput dependent on Parallelism
LTE Turbo-Code Decoder
High Throughput iteration control
Various iteration control schemes known
LTE comes with CRC attached to code blocks
If CRC on the decoded code block succeeds stop decoding
For high code rates decoding process can oscillate
CRC after every half iteration mandatory
___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany
ACQWLpipelineMAP
iterhalfMAP
MAP LLLLfnLPB
BTP
)( _
Cyclic Redundancy Check
Example 8 bit CRC (Polynomial x8+x2+x+1)
The CRC of a code block can be very efficiently evaluated if the data bits are in linear order
Ok for non-interleaved half iterations
BUT
What about interleaved half iterations
9
CRC OK
Iteration Control CRC timing
10
Additional latency
Additional memory
Two CRC units necessary
LTE Turbo-Code Decoder
Iteration control based on code block CRC
Significant additional latency for interleaved half iterations
Additional memory (B-bit)
Second CRC unit mandatory
On-the-fly CRC calculation
Same (small) LCRC for non-interleaved and interleaved half iterations
No additional memory needed
Easier IO control
No second CRC unit needed
ACQWLpipelineMAP
CRCiterhalfMAP
MAP LLLLfLnLPB
BTP
))(( _
1 Gbits LTE-A TC Decoder28nm
bull Support for all LTE code block sizes
bull On-the-fly CRC calculation
bull Transport Block (TB) CRC calculation for
up to 4 TBs
bull Softout calculation
___
wo memories
12
Architecture Parameters
Radix Parallelism 4 16
Max Window Length 192
Max Iter 7
Acquisition NII
Input quantization 8
Technology 28 nm
Area (Post Synthesis) ~065 mm2
Throughput (Mbps) 1000
Frequency (MHz) 700
Thank you for attention
For more information please visit
httpemseituni-klde
215 Gbits LTE TC Decoder65nm
Generic Decoder Structure
Parallelize block processing
Turbo-Code Decoder softdecoder inherent serial
LDPC Decoder inherent parallel since node processing independent
Network (interleaver tanner graph) no locality
Routing congestion access conflicts
Impact on communication standards (UMTSHSPA versus LTE)
Softdecoder 1 (MAP)
Check_n1
Check_nN
Softdecoder 2 (MAP)
Variable_n1
Variable_nN
Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)
Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks
Processing Block 1 Processing Block 2 Network
Turbo Decoder Design Space
How to Increase Throughput
Use multiple slow decoders
Use monolithic high speed decoder
Dec
Dec
Dec
Dec
Dec
PRO
Easy to implement
CON
Low efficiency large memory
Large latency
PRO
Higher efficiency
Lower latency
CON
Challenging architecture due to iterative decoding
State-of-the-Art Turbo-Code Decoders
P MAP decoder in parallel
LTE conflict-free interleaver up to parallelism of 64
Subblock size BP
Windowing inside MAP to reduce memory (sliding window of size WL)
MAP1
Subblock 1
Interleaver Deinterleaver
Network
MAP2
Subblock 2
MAPP
Subblock P
write
read
LLLL
fn)LPB(
BTP
ACQWLpipelineMAP
iter_halfMAP
MAP
B
BP
WL
Acquisition necessary
Softdecoder inherent serial serial fwdbwd recursion on complete block
challenge parallelize MAP decoding
Turbo-Code Decoder
High Throughput (high code rates) large P
Communications performance decreases for high code rates with small LACQ
Increase LACQ to counterbalance communications performance decrease
LMAP dominates saturation in throughput
Smaller P Radix 2 Radix 4 only P2 for same throughput
Smaller LACQ Next iteration initialization (NII) LACQ =0
Smaller LWL no windowing inside MAP LWL=0
Improves communications performance
But second LLR unit mandatory and increase in memory
Re-computation only every nth metric is stored Additional state metric unit re-
calculates the other n-1 metrics Optimum n= 1198612119875
Eg LTE reduces memory storage from B=6144 to 768 state metrics
___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014
ACQWLpipelineMAP
iter_halfMAP
MAP LLLL fn)LPB(
BTP
Turbo-Code Decoder
High Throughput (high code rates) large P
Communications performance decreases for high code rates with small LACQ
Increase LACQ to counterbalance communications performance decrease
LMAP dominates saturation in throughput
Smaller P Radix 2 Radix 4 only P2 for same throughput
Smaller LACQ Next iteration initialization (NII) LACQ =0
Smaller LWL no windowing inside MAP LWL=0
Improves communications performance
But second LLR unit mandatory and increase in memory
Re-computation only every nth metric is stored Additional state metric unit re-
calculates the other n-1 metrics Optimum n= 1198612119875
Eg LTE reduces memory storage from B=6144 to 768 state metrics
___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014
ACQWLpipelineMAP
iter_halfMAP
MAP LLLL fn)LPB(
BTP
Throughput dependent on Parallelism
LTE Turbo-Code Decoder
High Throughput iteration control
Various iteration control schemes known
LTE comes with CRC attached to code blocks
If CRC on the decoded code block succeeds stop decoding
For high code rates decoding process can oscillate
CRC after every half iteration mandatory
___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany
ACQWLpipelineMAP
iterhalfMAP
MAP LLLLfnLPB
BTP
)( _
Cyclic Redundancy Check
Example 8 bit CRC (Polynomial x8+x2+x+1)
The CRC of a code block can be very efficiently evaluated if the data bits are in linear order
Ok for non-interleaved half iterations
BUT
What about interleaved half iterations
9
CRC OK
Iteration Control CRC timing
10
Additional latency
Additional memory
Two CRC units necessary
LTE Turbo-Code Decoder
Iteration control based on code block CRC
Significant additional latency for interleaved half iterations
Additional memory (B-bit)
Second CRC unit mandatory
On-the-fly CRC calculation
Same (small) LCRC for non-interleaved and interleaved half iterations
No additional memory needed
Easier IO control
No second CRC unit needed
ACQWLpipelineMAP
CRCiterhalfMAP
MAP LLLLfLnLPB
BTP
))(( _
1 Gbits LTE-A TC Decoder28nm
bull Support for all LTE code block sizes
bull On-the-fly CRC calculation
bull Transport Block (TB) CRC calculation for
up to 4 TBs
bull Softout calculation
___
wo memories
12
Architecture Parameters
Radix Parallelism 4 16
Max Window Length 192
Max Iter 7
Acquisition NII
Input quantization 8
Technology 28 nm
Area (Post Synthesis) ~065 mm2
Throughput (Mbps) 1000
Frequency (MHz) 700
Thank you for attention
For more information please visit
httpemseituni-klde
215 Gbits LTE TC Decoder65nm
Generic Decoder Structure
Parallelize block processing
Turbo-Code Decoder softdecoder inherent serial
LDPC Decoder inherent parallel since node processing independent
Network (interleaver tanner graph) no locality
Routing congestion access conflicts
Impact on communication standards (UMTSHSPA versus LTE)
Softdecoder 1 (MAP)
Check_n1
Check_nN
Softdecoder 2 (MAP)
Variable_n1
Variable_nN
Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)
Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks
Processing Block 1 Processing Block 2 Network
Turbo Decoder Design Space
State-of-the-Art Turbo-Code Decoders
P MAP decoder in parallel
LTE conflict-free interleaver up to parallelism of 64
Subblock size BP
Windowing inside MAP to reduce memory (sliding window of size WL)
MAP1
Subblock 1
Interleaver Deinterleaver
Network
MAP2
Subblock 2
MAPP
Subblock P
write
read
LLLL
fn)LPB(
BTP
ACQWLpipelineMAP
iter_halfMAP
MAP
B
BP
WL
Acquisition necessary
Softdecoder inherent serial serial fwdbwd recursion on complete block
challenge parallelize MAP decoding
Turbo-Code Decoder
High Throughput (high code rates) large P
Communications performance decreases for high code rates with small LACQ
Increase LACQ to counterbalance communications performance decrease
LMAP dominates saturation in throughput
Smaller P Radix 2 Radix 4 only P2 for same throughput
Smaller LACQ Next iteration initialization (NII) LACQ =0
Smaller LWL no windowing inside MAP LWL=0
Improves communications performance
But second LLR unit mandatory and increase in memory
Re-computation only every nth metric is stored Additional state metric unit re-
calculates the other n-1 metrics Optimum n= 1198612119875
Eg LTE reduces memory storage from B=6144 to 768 state metrics
___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014
ACQWLpipelineMAP
iter_halfMAP
MAP LLLL fn)LPB(
BTP
Turbo-Code Decoder
High Throughput (high code rates) large P
Communications performance decreases for high code rates with small LACQ
Increase LACQ to counterbalance communications performance decrease
LMAP dominates saturation in throughput
Smaller P Radix 2 Radix 4 only P2 for same throughput
Smaller LACQ Next iteration initialization (NII) LACQ =0
Smaller LWL no windowing inside MAP LWL=0
Improves communications performance
But second LLR unit mandatory and increase in memory
Re-computation only every nth metric is stored Additional state metric unit re-
calculates the other n-1 metrics Optimum n= 1198612119875
Eg LTE reduces memory storage from B=6144 to 768 state metrics
___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014
ACQWLpipelineMAP
iter_halfMAP
MAP LLLL fn)LPB(
BTP
Throughput dependent on Parallelism
LTE Turbo-Code Decoder
High Throughput iteration control
Various iteration control schemes known
LTE comes with CRC attached to code blocks
If CRC on the decoded code block succeeds stop decoding
For high code rates decoding process can oscillate
CRC after every half iteration mandatory
___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany
ACQWLpipelineMAP
iterhalfMAP
MAP LLLLfnLPB
BTP
)( _
Cyclic Redundancy Check
Example 8 bit CRC (Polynomial x8+x2+x+1)
The CRC of a code block can be very efficiently evaluated if the data bits are in linear order
Ok for non-interleaved half iterations
BUT
What about interleaved half iterations
9
CRC OK
Iteration Control CRC timing
10
Additional latency
Additional memory
Two CRC units necessary
LTE Turbo-Code Decoder
Iteration control based on code block CRC
Significant additional latency for interleaved half iterations
Additional memory (B-bit)
Second CRC unit mandatory
On-the-fly CRC calculation
Same (small) LCRC for non-interleaved and interleaved half iterations
No additional memory needed
Easier IO control
No second CRC unit needed
ACQWLpipelineMAP
CRCiterhalfMAP
MAP LLLLfLnLPB
BTP
))(( _
1 Gbits LTE-A TC Decoder28nm
bull Support for all LTE code block sizes
bull On-the-fly CRC calculation
bull Transport Block (TB) CRC calculation for
up to 4 TBs
bull Softout calculation
___
wo memories
12
Architecture Parameters
Radix Parallelism 4 16
Max Window Length 192
Max Iter 7
Acquisition NII
Input quantization 8
Technology 28 nm
Area (Post Synthesis) ~065 mm2
Throughput (Mbps) 1000
Frequency (MHz) 700
Thank you for attention
For more information please visit
httpemseituni-klde
215 Gbits LTE TC Decoder65nm
Generic Decoder Structure
Parallelize block processing
Turbo-Code Decoder softdecoder inherent serial
LDPC Decoder inherent parallel since node processing independent
Network (interleaver tanner graph) no locality
Routing congestion access conflicts
Impact on communication standards (UMTSHSPA versus LTE)
Softdecoder 1 (MAP)
Check_n1
Check_nN
Softdecoder 2 (MAP)
Variable_n1
Variable_nN
Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)
Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks
Processing Block 1 Processing Block 2 Network
Turbo Decoder Design Space
Turbo-Code Decoder
High Throughput (high code rates) large P
Communications performance decreases for high code rates with small LACQ
Increase LACQ to counterbalance communications performance decrease
LMAP dominates saturation in throughput
Smaller P Radix 2 Radix 4 only P2 for same throughput
Smaller LACQ Next iteration initialization (NII) LACQ =0
Smaller LWL no windowing inside MAP LWL=0
Improves communications performance
But second LLR unit mandatory and increase in memory
Re-computation only every nth metric is stored Additional state metric unit re-
calculates the other n-1 metrics Optimum n= 1198612119875
Eg LTE reduces memory storage from B=6144 to 768 state metrics
___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014
ACQWLpipelineMAP
iter_halfMAP
MAP LLLL fn)LPB(
BTP
Turbo-Code Decoder
High Throughput (high code rates) large P
Communications performance decreases for high code rates with small LACQ
Increase LACQ to counterbalance communications performance decrease
LMAP dominates saturation in throughput
Smaller P Radix 2 Radix 4 only P2 for same throughput
Smaller LACQ Next iteration initialization (NII) LACQ =0
Smaller LWL no windowing inside MAP LWL=0
Improves communications performance
But second LLR unit mandatory and increase in memory
Re-computation only every nth metric is stored Additional state metric unit re-
calculates the other n-1 metrics Optimum n= 1198612119875
Eg LTE reduces memory storage from B=6144 to 768 state metrics
___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014
ACQWLpipelineMAP
iter_halfMAP
MAP LLLL fn)LPB(
BTP
Throughput dependent on Parallelism
LTE Turbo-Code Decoder
High Throughput iteration control
Various iteration control schemes known
LTE comes with CRC attached to code blocks
If CRC on the decoded code block succeeds stop decoding
For high code rates decoding process can oscillate
CRC after every half iteration mandatory
___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany
ACQWLpipelineMAP
iterhalfMAP
MAP LLLLfnLPB
BTP
)( _
Cyclic Redundancy Check
Example 8 bit CRC (Polynomial x8+x2+x+1)
The CRC of a code block can be very efficiently evaluated if the data bits are in linear order
Ok for non-interleaved half iterations
BUT
What about interleaved half iterations
9
CRC OK
Iteration Control CRC timing
10
Additional latency
Additional memory
Two CRC units necessary
LTE Turbo-Code Decoder
Iteration control based on code block CRC
Significant additional latency for interleaved half iterations
Additional memory (B-bit)
Second CRC unit mandatory
On-the-fly CRC calculation
Same (small) LCRC for non-interleaved and interleaved half iterations
No additional memory needed
Easier IO control
No second CRC unit needed
ACQWLpipelineMAP
CRCiterhalfMAP
MAP LLLLfLnLPB
BTP
))(( _
1 Gbits LTE-A TC Decoder28nm
bull Support for all LTE code block sizes
bull On-the-fly CRC calculation
bull Transport Block (TB) CRC calculation for
up to 4 TBs
bull Softout calculation
___
wo memories
12
Architecture Parameters
Radix Parallelism 4 16
Max Window Length 192
Max Iter 7
Acquisition NII
Input quantization 8
Technology 28 nm
Area (Post Synthesis) ~065 mm2
Throughput (Mbps) 1000
Frequency (MHz) 700
Thank you for attention
For more information please visit
httpemseituni-klde
215 Gbits LTE TC Decoder65nm
Generic Decoder Structure
Parallelize block processing
Turbo-Code Decoder softdecoder inherent serial
LDPC Decoder inherent parallel since node processing independent
Network (interleaver tanner graph) no locality
Routing congestion access conflicts
Impact on communication standards (UMTSHSPA versus LTE)
Softdecoder 1 (MAP)
Check_n1
Check_nN
Softdecoder 2 (MAP)
Variable_n1
Variable_nN
Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)
Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks
Processing Block 1 Processing Block 2 Network
Turbo Decoder Design Space
Turbo-Code Decoder
High Throughput (high code rates) large P
Communications performance decreases for high code rates with small LACQ
Increase LACQ to counterbalance communications performance decrease
LMAP dominates saturation in throughput
Smaller P Radix 2 Radix 4 only P2 for same throughput
Smaller LACQ Next iteration initialization (NII) LACQ =0
Smaller LWL no windowing inside MAP LWL=0
Improves communications performance
But second LLR unit mandatory and increase in memory
Re-computation only every nth metric is stored Additional state metric unit re-
calculates the other n-1 metrics Optimum n= 1198612119875
Eg LTE reduces memory storage from B=6144 to 768 state metrics
___ Simplified Compression of Redundancy Free Trellis Sections in Turbo Decoder ndash E Boutillon J-L Sanchez-Rojas C Marchand Communications Letters IEEE vol18 no6 pp941944 June 2014
ACQWLpipelineMAP
iter_halfMAP
MAP LLLL fn)LPB(
BTP
Throughput dependent on Parallelism
LTE Turbo-Code Decoder
High Throughput iteration control
Various iteration control schemes known
LTE comes with CRC attached to code blocks
If CRC on the decoded code block succeeds stop decoding
For high code rates decoding process can oscillate
CRC after every half iteration mandatory
___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany
ACQWLpipelineMAP
iterhalfMAP
MAP LLLLfnLPB
BTP
)( _
Cyclic Redundancy Check
Example 8 bit CRC (Polynomial x8+x2+x+1)
The CRC of a code block can be very efficiently evaluated if the data bits are in linear order
Ok for non-interleaved half iterations
BUT
What about interleaved half iterations
9
CRC OK
Iteration Control CRC timing
10
Additional latency
Additional memory
Two CRC units necessary
LTE Turbo-Code Decoder
Iteration control based on code block CRC
Significant additional latency for interleaved half iterations
Additional memory (B-bit)
Second CRC unit mandatory
On-the-fly CRC calculation
Same (small) LCRC for non-interleaved and interleaved half iterations
No additional memory needed
Easier IO control
No second CRC unit needed
ACQWLpipelineMAP
CRCiterhalfMAP
MAP LLLLfLnLPB
BTP
))(( _
1 Gbits LTE-A TC Decoder28nm
bull Support for all LTE code block sizes
bull On-the-fly CRC calculation
bull Transport Block (TB) CRC calculation for
up to 4 TBs
bull Softout calculation
___
wo memories
12
Architecture Parameters
Radix Parallelism 4 16
Max Window Length 192
Max Iter 7
Acquisition NII
Input quantization 8
Technology 28 nm
Area (Post Synthesis) ~065 mm2
Throughput (Mbps) 1000
Frequency (MHz) 700
Thank you for attention
For more information please visit
httpemseituni-klde
215 Gbits LTE TC Decoder65nm
Generic Decoder Structure
Parallelize block processing
Turbo-Code Decoder softdecoder inherent serial
LDPC Decoder inherent parallel since node processing independent
Network (interleaver tanner graph) no locality
Routing congestion access conflicts
Impact on communication standards (UMTSHSPA versus LTE)
Softdecoder 1 (MAP)
Check_n1
Check_nN
Softdecoder 2 (MAP)
Variable_n1
Variable_nN
Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)
Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks
Processing Block 1 Processing Block 2 Network
Turbo Decoder Design Space
Throughput dependent on Parallelism
LTE Turbo-Code Decoder
High Throughput iteration control
Various iteration control schemes known
LTE comes with CRC attached to code blocks
If CRC on the decoded code block succeeds stop decoding
For high code rates decoding process can oscillate
CRC after every half iteration mandatory
___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany
ACQWLpipelineMAP
iterhalfMAP
MAP LLLLfnLPB
BTP
)( _
Cyclic Redundancy Check
Example 8 bit CRC (Polynomial x8+x2+x+1)
The CRC of a code block can be very efficiently evaluated if the data bits are in linear order
Ok for non-interleaved half iterations
BUT
What about interleaved half iterations
9
CRC OK
Iteration Control CRC timing
10
Additional latency
Additional memory
Two CRC units necessary
LTE Turbo-Code Decoder
Iteration control based on code block CRC
Significant additional latency for interleaved half iterations
Additional memory (B-bit)
Second CRC unit mandatory
On-the-fly CRC calculation
Same (small) LCRC for non-interleaved and interleaved half iterations
No additional memory needed
Easier IO control
No second CRC unit needed
ACQWLpipelineMAP
CRCiterhalfMAP
MAP LLLLfLnLPB
BTP
))(( _
1 Gbits LTE-A TC Decoder28nm
bull Support for all LTE code block sizes
bull On-the-fly CRC calculation
bull Transport Block (TB) CRC calculation for
up to 4 TBs
bull Softout calculation
___
wo memories
12
Architecture Parameters
Radix Parallelism 4 16
Max Window Length 192
Max Iter 7
Acquisition NII
Input quantization 8
Technology 28 nm
Area (Post Synthesis) ~065 mm2
Throughput (Mbps) 1000
Frequency (MHz) 700
Thank you for attention
For more information please visit
httpemseituni-klde
215 Gbits LTE TC Decoder65nm
Generic Decoder Structure
Parallelize block processing
Turbo-Code Decoder softdecoder inherent serial
LDPC Decoder inherent parallel since node processing independent
Network (interleaver tanner graph) no locality
Routing congestion access conflicts
Impact on communication standards (UMTSHSPA versus LTE)
Softdecoder 1 (MAP)
Check_n1
Check_nN
Softdecoder 2 (MAP)
Variable_n1
Variable_nN
Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)
Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks
Processing Block 1 Processing Block 2 Network
Turbo Decoder Design Space
LTE Turbo-Code Decoder
High Throughput iteration control
Various iteration control schemes known
LTE comes with CRC attached to code blocks
If CRC on the decoded code block succeeds stop decoding
For high code rates decoding process can oscillate
CRC after every half iteration mandatory
___ A 150Mbits 3GPP LTE Turbo Code Decoder - M May T Ilnseher N Wehn W Raab In Proc IEEE Conference Design Automation and Test in Europe (DATE) pages 1420-1425 March 2010 Dresden Germany
ACQWLpipelineMAP
iterhalfMAP
MAP LLLLfnLPB
BTP
)( _
Cyclic Redundancy Check
Example 8 bit CRC (Polynomial x8+x2+x+1)
The CRC of a code block can be very efficiently evaluated if the data bits are in linear order
Ok for non-interleaved half iterations
BUT
What about interleaved half iterations
9
CRC OK
Iteration Control CRC timing
10
Additional latency
Additional memory
Two CRC units necessary
LTE Turbo-Code Decoder
Iteration control based on code block CRC
Significant additional latency for interleaved half iterations
Additional memory (B-bit)
Second CRC unit mandatory
On-the-fly CRC calculation
Same (small) LCRC for non-interleaved and interleaved half iterations
No additional memory needed
Easier IO control
No second CRC unit needed
ACQWLpipelineMAP
CRCiterhalfMAP
MAP LLLLfLnLPB
BTP
))(( _
1 Gbits LTE-A TC Decoder28nm
bull Support for all LTE code block sizes
bull On-the-fly CRC calculation
bull Transport Block (TB) CRC calculation for
up to 4 TBs
bull Softout calculation
___
wo memories
12
Architecture Parameters
Radix Parallelism 4 16
Max Window Length 192
Max Iter 7
Acquisition NII
Input quantization 8
Technology 28 nm
Area (Post Synthesis) ~065 mm2
Throughput (Mbps) 1000
Frequency (MHz) 700
Thank you for attention
For more information please visit
httpemseituni-klde
215 Gbits LTE TC Decoder65nm
Generic Decoder Structure
Parallelize block processing
Turbo-Code Decoder softdecoder inherent serial
LDPC Decoder inherent parallel since node processing independent
Network (interleaver tanner graph) no locality
Routing congestion access conflicts
Impact on communication standards (UMTSHSPA versus LTE)
Softdecoder 1 (MAP)
Check_n1
Check_nN
Softdecoder 2 (MAP)
Variable_n1
Variable_nN
Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)
Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks
Processing Block 1 Processing Block 2 Network
Turbo Decoder Design Space
Cyclic Redundancy Check
Example 8 bit CRC (Polynomial x8+x2+x+1)
The CRC of a code block can be very efficiently evaluated if the data bits are in linear order
Ok for non-interleaved half iterations
BUT
What about interleaved half iterations
9
CRC OK
Iteration Control CRC timing
10
Additional latency
Additional memory
Two CRC units necessary
LTE Turbo-Code Decoder
Iteration control based on code block CRC
Significant additional latency for interleaved half iterations
Additional memory (B-bit)
Second CRC unit mandatory
On-the-fly CRC calculation
Same (small) LCRC for non-interleaved and interleaved half iterations
No additional memory needed
Easier IO control
No second CRC unit needed
ACQWLpipelineMAP
CRCiterhalfMAP
MAP LLLLfLnLPB
BTP
))(( _
1 Gbits LTE-A TC Decoder28nm
bull Support for all LTE code block sizes
bull On-the-fly CRC calculation
bull Transport Block (TB) CRC calculation for
up to 4 TBs
bull Softout calculation
___
wo memories
12
Architecture Parameters
Radix Parallelism 4 16
Max Window Length 192
Max Iter 7
Acquisition NII
Input quantization 8
Technology 28 nm
Area (Post Synthesis) ~065 mm2
Throughput (Mbps) 1000
Frequency (MHz) 700
Thank you for attention
For more information please visit
httpemseituni-klde
215 Gbits LTE TC Decoder65nm
Generic Decoder Structure
Parallelize block processing
Turbo-Code Decoder softdecoder inherent serial
LDPC Decoder inherent parallel since node processing independent
Network (interleaver tanner graph) no locality
Routing congestion access conflicts
Impact on communication standards (UMTSHSPA versus LTE)
Softdecoder 1 (MAP)
Check_n1
Check_nN
Softdecoder 2 (MAP)
Variable_n1
Variable_nN
Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)
Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks
Processing Block 1 Processing Block 2 Network
Turbo Decoder Design Space
Iteration Control CRC timing
10
Additional latency
Additional memory
Two CRC units necessary
LTE Turbo-Code Decoder
Iteration control based on code block CRC
Significant additional latency for interleaved half iterations
Additional memory (B-bit)
Second CRC unit mandatory
On-the-fly CRC calculation
Same (small) LCRC for non-interleaved and interleaved half iterations
No additional memory needed
Easier IO control
No second CRC unit needed
ACQWLpipelineMAP
CRCiterhalfMAP
MAP LLLLfLnLPB
BTP
))(( _
1 Gbits LTE-A TC Decoder28nm
bull Support for all LTE code block sizes
bull On-the-fly CRC calculation
bull Transport Block (TB) CRC calculation for
up to 4 TBs
bull Softout calculation
___
wo memories
12
Architecture Parameters
Radix Parallelism 4 16
Max Window Length 192
Max Iter 7
Acquisition NII
Input quantization 8
Technology 28 nm
Area (Post Synthesis) ~065 mm2
Throughput (Mbps) 1000
Frequency (MHz) 700
Thank you for attention
For more information please visit
httpemseituni-klde
215 Gbits LTE TC Decoder65nm
Generic Decoder Structure
Parallelize block processing
Turbo-Code Decoder softdecoder inherent serial
LDPC Decoder inherent parallel since node processing independent
Network (interleaver tanner graph) no locality
Routing congestion access conflicts
Impact on communication standards (UMTSHSPA versus LTE)
Softdecoder 1 (MAP)
Check_n1
Check_nN
Softdecoder 2 (MAP)
Variable_n1
Variable_nN
Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)
Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks
Processing Block 1 Processing Block 2 Network
Turbo Decoder Design Space
LTE Turbo-Code Decoder
Iteration control based on code block CRC
Significant additional latency for interleaved half iterations
Additional memory (B-bit)
Second CRC unit mandatory
On-the-fly CRC calculation
Same (small) LCRC for non-interleaved and interleaved half iterations
No additional memory needed
Easier IO control
No second CRC unit needed
ACQWLpipelineMAP
CRCiterhalfMAP
MAP LLLLfLnLPB
BTP
))(( _
1 Gbits LTE-A TC Decoder28nm
bull Support for all LTE code block sizes
bull On-the-fly CRC calculation
bull Transport Block (TB) CRC calculation for
up to 4 TBs
bull Softout calculation
___
wo memories
12
Architecture Parameters
Radix Parallelism 4 16
Max Window Length 192
Max Iter 7
Acquisition NII
Input quantization 8
Technology 28 nm
Area (Post Synthesis) ~065 mm2
Throughput (Mbps) 1000
Frequency (MHz) 700
Thank you for attention
For more information please visit
httpemseituni-klde
215 Gbits LTE TC Decoder65nm
Generic Decoder Structure
Parallelize block processing
Turbo-Code Decoder softdecoder inherent serial
LDPC Decoder inherent parallel since node processing independent
Network (interleaver tanner graph) no locality
Routing congestion access conflicts
Impact on communication standards (UMTSHSPA versus LTE)
Softdecoder 1 (MAP)
Check_n1
Check_nN
Softdecoder 2 (MAP)
Variable_n1
Variable_nN
Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)
Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks
Processing Block 1 Processing Block 2 Network
Turbo Decoder Design Space
1 Gbits LTE-A TC Decoder28nm
bull Support for all LTE code block sizes
bull On-the-fly CRC calculation
bull Transport Block (TB) CRC calculation for
up to 4 TBs
bull Softout calculation
___
wo memories
12
Architecture Parameters
Radix Parallelism 4 16
Max Window Length 192
Max Iter 7
Acquisition NII
Input quantization 8
Technology 28 nm
Area (Post Synthesis) ~065 mm2
Throughput (Mbps) 1000
Frequency (MHz) 700
Thank you for attention
For more information please visit
httpemseituni-klde
215 Gbits LTE TC Decoder65nm
Generic Decoder Structure
Parallelize block processing
Turbo-Code Decoder softdecoder inherent serial
LDPC Decoder inherent parallel since node processing independent
Network (interleaver tanner graph) no locality
Routing congestion access conflicts
Impact on communication standards (UMTSHSPA versus LTE)
Softdecoder 1 (MAP)
Check_n1
Check_nN
Softdecoder 2 (MAP)
Variable_n1
Variable_nN
Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)
Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks
Processing Block 1 Processing Block 2 Network
Turbo Decoder Design Space
Thank you for attention
For more information please visit
httpemseituni-klde
215 Gbits LTE TC Decoder65nm
Generic Decoder Structure
Parallelize block processing
Turbo-Code Decoder softdecoder inherent serial
LDPC Decoder inherent parallel since node processing independent
Network (interleaver tanner graph) no locality
Routing congestion access conflicts
Impact on communication standards (UMTSHSPA versus LTE)
Softdecoder 1 (MAP)
Check_n1
Check_nN
Softdecoder 2 (MAP)
Variable_n1
Variable_nN
Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)
Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks
Processing Block 1 Processing Block 2 Network
Turbo Decoder Design Space
215 Gbits LTE TC Decoder65nm
Generic Decoder Structure
Parallelize block processing
Turbo-Code Decoder softdecoder inherent serial
LDPC Decoder inherent parallel since node processing independent
Network (interleaver tanner graph) no locality
Routing congestion access conflicts
Impact on communication standards (UMTSHSPA versus LTE)
Softdecoder 1 (MAP)
Check_n1
Check_nN
Softdecoder 2 (MAP)
Variable_n1
Variable_nN
Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)
Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks
Processing Block 1 Processing Block 2 Network
Turbo Decoder Design Space
Generic Decoder Structure
Parallelize block processing
Turbo-Code Decoder softdecoder inherent serial
LDPC Decoder inherent parallel since node processing independent
Network (interleaver tanner graph) no locality
Routing congestion access conflicts
Impact on communication standards (UMTSHSPA versus LTE)
Softdecoder 1 (MAP)
Check_n1
Check_nN
Softdecoder 2 (MAP)
Variable_n1
Variable_nN
Most advanced channel codes Turbo-Codes (HSPA LTE) LDPC (DVB-S2)
Iterative decoding algorithm performed on complete block with interleaved data exchange between processing blocks
Processing Block 1 Processing Block 2 Network
Turbo Decoder Design Space
Turbo Decoder Design Space