The Case for Optimum Detection Algorithms inMIMO Wireless Systems
Helmut Bolcskei
joint work with A. Burg, C. Studer, and M. Borgmann
ETH Zurich
Data rates in wireless double every 18 months
1990 1995 2000 2005 2010 20151 kbps
1 Mbps
1 Gbps
GSM
802.11802.11b 802.11g
802.11n
2-stream
UMTS HSDPA-1HSDPA-2
Edge
3GPP LTE
802.11n
4-stream
year
thro
ughp
ut
2
Need for higher throughput cannot be met bysimply allocating more bandwidth
40 MHz
20 MHz
Interference
Achieving higher throughput requires higher spectral efficiency
3
Spatial multiplexing: Transmit multiple data streamssimultaneously and in the same frequency band
4
MIMO gains carry through to system level
Advantages of MIMO
� Larger range
� Better quality of service
� Higher peak throughput
� Higher system capacity
10 20 30 40 50 60 700
100
200
300
400
500
600
range
thro
ughp
ut[M
bps]
4stre
am
s2stream
s
1 stream
2x
2x
IEEE 802.11n PHY, 40 MHz bandwidth,TGn-C channel
MIMO is part of IEEE 802.11n, IEEE 802.16e, and 3GPP LTE
5
The “Digital Home”: A challenging application forMIMO wireless systems
Ensure a wire-like experience throughout the entire home
6
Meeting user expectations requires 4 spatial streams
� Requirement: 4 HDTV video streams @ 25 Mbps each
� Aggregate throughput requirement: 100 Mbps at a range of 30m
0 50 100 150 200 250 300 350 400
10203040506070
application layer throughput [Mbps] / 60% MAC efficiency
rang
e [m
]
aggregatethroughputrequirement
802.11g(SISO)
802.11n2-stream
802.11n4-stream
+
� Current IEEE 802.11n solutions support only 2 spatial streams
� Products with three spatial streams have just been announced
7
Maximum likelihood (ML) MIMO detection
Dem
odula
tion a
nd s
epara
tion
Modula
tion a
nd m
appin
g
y = Hs + n
Maximum likelihood (ML) MIMO detection
s = arg mins∈OMT
||y −Hs||2
8
ML detection through exhaustive search
Exhaustive search: Enumerate all possible candidate vectors
� Number of candidate vectors grows exponentially in the number ofantennas
� A 4×4 system with 64-QAM modulationrequires consideration of 16’777’216candidates
4x4 IEEE 802.11nbaseband ASIC
[ETH Zurich, 2008]
5mm
5mm
1.4 mm1.4 mm2x2 ML
detector64-QAM
3x3 MLdetector64-QAM
4x4 MLdetector64-QAM
91mm
91m
m
11.3mm
11.3m
m
20M GE
1'300M GE1.7M GE
0.3M GE
Exhaustive search is not economic for more than two spatial streams
9
Soft-output (APP) MIMO detection
MIMO
channel
MIMO
detector
y = Hs + n
MIMO detector computes log-likelihood ratios (LLR) for each bit
L (xj,b) = log(P (xj,b = 1|y)P (xj,b = 0|y)
)Max-log approximation for LLRs
L (xj,b) = mins∈X (0)
j,b
||y −Hs||2 − mins∈X (1)
j,b
||y −Hs||2
X (0)j,b ,X
(1)j,b ... sets of vector symbols for which xj,b = 0, 1
10
Linear equalization decomposes the MIMO channelinto parallel SISO channels
linear equalizersoft-metric
detector
soft-metric
LLRs are computedfor each stream
separately
� Compared to the remaining baseband processing, complexity ofequalization is very low even for a large number of streams
� Complexity of LLR computation is negligible
11
MMSE is ill-suited for highly integrated devices
� Mini-PCI and half-mini PCI is becoming the de-facto standard
� Spacing of printed antennas can easily be below λ/4� Reduced antenna spacing leads to (severe) spatial correlation
Antenna 1
Antenna 2
18mm
54m
m
-75 -70 -65received power [dBm]
fram
e erro
r rat
e
Soft-outputMMSE
Close-to-optimum APP
10-1
100
10-2
IEEE 802.11n, MCS27, 40 MHz, TGn-D (MT = MR = 4, 16-QAM, rate 1/2)
MMSE detection suffers significantly from spatial correlation
12
MMSE fails to provide robustness against varyingpropagation conditions
location
signa
l pow
er (d
B)
10 15 20 25 30 35
100
SNR [dB]
BE
R
10-1
10-2
10-3
10-4
10-5
10-6
4x4MMSE
4x4 Maximumlikelihood
4x5MMSE
MMSE diversity order
ML diversityorder
Diversity: Resilience against bad channels ⇒ more reliable operation
13
The “business case” for high-end MIMO receivers
10 20 30 40 50 60050
100150200250300350400
range [m]
thro
ughp
ut[M
bps]
30.4m 35.7m 41.2m
4x4
MMSE
4x5
MMSE 4x4 APP
4x
4M
MS
E
4x
5M
MS
E4
x4
AP
P
Additional receive antennas canpartially compensate forsub-optimal receiver algorithms
� Each additional antenna costs 0.7 USD–1.0 USD� Overall manufacturing chipset cost is ≈ 9 USD� Space limitations can become critical (antenna spacing)
Boosting MMSE performance by using additional antennas isexpensive and not always possible
14
Impact of RF non-idealities
RF limitations: SNR is limited to approximately 35 dB–40 dB
-90 -80 -70 -60 -50 -40 -30 -20 -1005
101520253035404550
received power [dBm]
mea
n SN
R [dB
]
0
SNR limited by poorRF noise figure
-65 -60 -55 -50average received power [dBm]
Close-to-optimum APP
10-1
100
10-2fra
me e
rror r
ate
Soft-outputMMSE
IEEE 802.11n, MCS 31 (600 Mbps), MT = MR = 4, Greenfield, 20MHz bandwidth, 1000B packets
In IEEE 802.11n, APP detection is needed for operation in the highestrate modes
15
Performance of MMSE receiver is sensitive tointerference
Consider a 4× 5 MIMO system interfered by a single-stream system
� Information-theoretic arguments: Interference “knocks out” onereceive antenna
� Reduction to an effective 4× 4 system
MMSE detector� Diversity is lost and robustness is reduced
Optimum APP detector
� Receiver performs well even with an effectively symmetric antennaconfiguration
� Graceful performance degradation
16
Sphere decoding: Exploiting the structure of thedetection problem
Tra
nsm
itte
r
Receiv
er
MIMO
Channel
s = arg mins∈OMT
||y −Hs||2
The MIMO ML-detection problem corresponds to finding the closestpoint in a skewed, finite lattice
17
A brief history of the sphere decoding algorithm
� 1981: M. Pohst describes an algorithm to efficiently identify theclosest point in an infinite lattice
� 1993: E. Viterbo and E. Biglieri apply the Pohst algorithm to latticedecoding and introduce the sphere constraint
� 1999: E. Viterbo and J. Boutros employ sphere decoding for latticedecoding in fading channels
� 2000: M. O. Damen et al. describe the application of spheredecoding to space-time codes
18
A brief history of the sphere decoding algorithmcont’d
� 2003: B. Hochwald and S. ten Brink propose the first soft-outputsphere decoder
� 2005: A. Burg et al. provide the first VLSI implementation ofhard-output sphere decoding
� 2008: C. Studer et al. develop single tree search soft-output spheredecoding and provide a corresponding VLSI implementation
19
Sphere decoding reduces to a tree-search problem
1 Translate the problem into a tree search (triangularization)2 Nodes are associated with Partial Euclidean Distances (PEDs) d(s)3 Update rule: di(s(i)) = di+1(s(i)) + |ei|2, i = MT , . . . , 1 (tree level)4 ML detection corresponds to finding the leaf with the smallest PED
Partial Euclidean
distance
A branch-and-bound strategy realized through a sphere constraint leadsto efficient tree pruning
20
Computing the LLRs by applying the spheredecoding algorithm
L (xj,b) = mins∈X (0)
j,b
||y −Hs||2︸ ︷︷ ︸λML
− mins∈X (1)
j,b
||y −Hs||2︸ ︷︷ ︸λML
j,b
Repeated Tree Search (RTS) [Wang and Giannakis, 2004]
1 Use the sphere decoding algorithm to find λML
2 Restart the search to identify the QMT remaining minima and
constrain the search to X (xMLj,b ) by operating on pre-pruned trees
21
The single tree search (STS) philosophy [Studer etal., 2006]
Repeated tree search is highly inefficient
� For example, a 4-stream system employing 64-QAM modulationrequires 24+1 sphere decoder runs
� A given node may be visited more than once in consecutive runs
STS algorithm: Ensure that each node is visited at most once
� Search for the ML solution and all counterhypotheses concurrently
� Maintain a list containing
the ML hypothesis xML and its metric λML
the metrics of the counterhypotheses λMLj,b
� Search a subtree only if the result can lead to an update of eitherλML or of at least one of the metrics λML
j,b
22
VLSI implementation of the STS algorithm [Studeret al., 2008]
Hard-output STS
Technology 0.25 µm, 1P/5M
System 4×4, 16-QAM
Decoding norm `∞ `2
Clock freq. 87 MHz 71 MHz
Area 36 kGE 57 kGE
MHz/kGE 2.41 1.25
Hardware complexity of STS is only 30% of that of RTS based onhard-output sphere decoding
23
LLR clipping reduces complexity and providesscalability
In practice, wordwidth of LLRs must be constrained
LLR clipping
LLR clipping can be built into the STS algorithm ⇒ additional constraintfor pruning the tree
LLR clipping allows to realize aperformance/complexity tradeoff atrun-time
16 16.5 17 17.5 18 18.5 190
50100150
200250300350
400450
0.10.2
0.4
24
816
32
64
aver
age n
umbe
r of v
isite
d nod
es
required SNR [dB] for 1% FER
STS
List sphere decoder[Hochwald and ten Brink, 2005]
0.05 0.025
24
Early termination and scheduling
Sphere decoding has variable detection effort
Achieving fixed throughput under latency constraints
� A scheduler with FIFO distributes runtime across symbols
� Latency constraints: Need to constrain the decoding effort throughearly termination
STS
STSScheduler Collector
terminate terminated early
FIFOterminated early
early termination
early termination+ scheduling
25
Application of STS to IEEE 802.11n
� Data rates range from 6 Mbps to 600 Mbps
MMSE� MMSE is set to operate at a certain highest rate mode
� No performance improvement possible for lower-rate modes
STS: Adjust the decoding effort at runtime
� Use LLR clipping to reduce complexity in the highest rate modes ⇒graceful performance degradation, but still better than MMSE
� LLR clipping adjusts decoding effort to achieve close-to-optimumperformance for lower-rate modes
26
Application of STS to IEEE 802.11n
Instantiation of 10 STS units� Meet throughput and latency requirements for 40 MHz bandwidth
� Enable 600 Mbps operation with real-world RF
4x4 IEEE 802.11nbaseband ASIC
[ETH Zurich, 2008]
5mm
5mm
1.7M GE
4x4 STSdetector0.6M GE
4x4 MMSEdetector0.05M GE
2.3M GE(estimated)
Commercially available 2-stream solutions require roughly 2M GEs
27
Headaches
STS exploits (finite-alphabet) structure of transmitted vectors
� RF non-idealities limit transmit SNR to 32 dB. Transmit noiseappears spatially colored at the receiver
� Interference appears as spatially colored noise
� Phase-noise and residual frequency offset distort the discretelocations of the constellation points
MMSE� Linear detection suffers from fixed-point effects
� MMSE detection requires accurate noise estimation
28
Iterative detection and decoding
Iterate between MIMO detector and FEC decoder
MIMOdetector deinterleaverLLRs
FECdeocoder
(BCJR,LDPC)
LLRsinterleaver
vectorsymbols
� Strong channel code: More iterations can compensate forsuboptimal MIMO detector
In practice, the code is given by the standard and code rates can beclose-to one for the highest (data) rate modes
29
Tradeoff between detector complexity and numberof iterations
Guaranteed throughput requirement
� Need multiple instantiations of MIMO detectors and FEC decoders
� Area scales linearly with the number of iterations
Maximum latency constraints
� Increase throughput of the MIMO detector and the FEC decoder
� Additional area increase due to latency constraints
� Maximum throughput of the sequential FEC decoder is limited
30
Tradeoff between detector complexity and numberof iterations cont’d
Additional hardware overhead� Iterations require additional storage for baseband samples
� For strong codes, hardware complexity for FEC decoding is high
� Complexity of soft-in soft-out MIMO and FEC decoders is higherthan for non-iterative schemes
� Iterative detection and decoding leads to significant increase inhardware complexity compared to one-shot operation
� If iterations are needed, the number of iterations must be kept low
31
High-performance MIMO detector is key forefficient implementation of iterative receiver
STS I=1
MM
SEI=2
STS
I=2
MM
SEI=
410 11 12 13 14 15 16 17 18 2019
SNR [dB]
fram
e erro
r rat
e 10-1
100
10-2
10-3
MM
SE I=1
For the same performance, MMSE detection requires more iterationsthan soft-in soft-out STS sphere decoding
32