aniterativedecodingalgorithmforfusionof...

17
Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2008, Article ID 478396, 10 pages doi:10.1155/2008/478396 Research Article An Iterative Decoding Algorithm for Fusion of Multimodal Information Shankar T. Shivappa, Bhaskar D. Rao, and Mohan M. Trivedi Department of Electrical and Computer Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA Correspondence should be addressed to Shankar T. Shivappa, [email protected] Received 16 February 2007; Revised 19 July 2007; Accepted 26 October 2007 Recommended by Eric Pauwels Human activity analysis in an intelligent space is typically based on multimodal informational cues. Use of multiple modalities gives us a lot of advantages. But information fusion from dierent sources is a problem that has to be addressed. In this paper, we propose an iterative algorithm to fuse information from multimodal sources. We draw inspiration from the theory of turbo codes. We draw an analogy between the redundant parity bits of the constituent codes of a turbo code and the information from dierent sensors in a multimodal system. A hidden Markov model is used to model the sequence of observations of individual modalities. The decoded state likelihoods from one modality are used as additional information in decoding the states of the other modalities. This procedure is repeated until a certain convergence criterion is met. The resulting iterative algorithm is shown to have lower error rates than the individual models alone. The algorithm is then applied to a real-world problem of speech segmentation using audio and visual cues. Copyright © 2008 Shankar T. Shivappa et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1. INTELLIGENT SPACES AND MULTIMODAL SYSTEMS Intelligent environments facilitate a natural and ecient mechanism for human-computer interaction and human ac- tivity analysis. An intelligent space can be any physical space that possesses the following requirements [1]. (i) Intelligent spaces should facilitate normal human ac- tivities taking place in these spaces. (ii) Intelligent spaces should automatically capture and maintain awareness of the events and activities taking place in these spaces. (iii) Intelligent spaces should be responsive to specific events and triggers. (iv) Intelligent spaces should be robust and adaptive to var- ious dynamic changes. An intelligent space can be a room in a building or an out- door environment. Designing algorithms for such spaces in- volves the real-world challenges of real-time, reliable, and ro- bust performance over the wide range of events and activi- ties, which can occur in these spaces. In this paper, we con- sider an indoor meeting room scenario. Though the high- level framework that is presented below can be applied in other scenarios, we have chosen to restrict our initial inves- tigations to a meeting room. Much of the framework in this case has been summarized in [2]. Intelligent spaces have sensor-based systems that allow for natural and ecient human-computer interaction. In or- der to achieve this goal, the intelligent space needs to analyze the events that take place and maintain situational awareness. In order to analyze such events automatically, it is essential to develop mathematical models for representing dierent kinds of events and activities. Research eorts in the field of human activity analysis have increasingly come to rely on multimodal sensors. An- alyzing multimodal signals is a necessity in most scenarios and has added advantages in others [1, 3]. Human activity is essentially multimodal. Voice and ges- ture, for example, are intimately connected [4]. Researchers in automatic speech recognition (ASR) have used the multi- modal nature of human speech to enhance the ASR accuracy and robustness [5]. Recent research eorts in human activity

Upload: others

Post on 17-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

Hindawi Publishing CorporationEURASIP Journal on Advances in Signal ProcessingVolume 2008, Article ID 478396, 10 pagesdoi:10.1155/2008/478396

Research ArticleAn Iterative Decoding Algorithm for Fusion ofMultimodal Information

Shankar T. Shivappa, Bhaskar D. Rao, and Mohan M. Trivedi

Department of Electrical and Computer Engineering, University of California, San Diego, 9500 Gilman Drive,La Jolla, CA 92093, USA

Correspondence should be addressed to Shankar T. Shivappa, [email protected]

Received 16 February 2007; Revised 19 July 2007; Accepted 26 October 2007

Recommended by Eric Pauwels

Human activity analysis in an intelligent space is typically based on multimodal informational cues. Use of multiple modalitiesgives us a lot of advantages. But information fusion from different sources is a problem that has to be addressed. In this paper, wepropose an iterative algorithm to fuse information from multimodal sources. We draw inspiration from the theory of turbo codes.We draw an analogy between the redundant parity bits of the constituent codes of a turbo code and the information from differentsensors in a multimodal system. A hidden Markov model is used to model the sequence of observations of individual modalities.The decoded state likelihoods from one modality are used as additional information in decoding the states of the other modalities.This procedure is repeated until a certain convergence criterion is met. The resulting iterative algorithm is shown to have lowererror rates than the individual models alone. The algorithm is then applied to a real-world problem of speech segmentation usingaudio and visual cues.

Copyright © 2008 Shankar T. Shivappa et al. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

1. INTELLIGENT SPACES AND MULTIMODAL SYSTEMS

Intelligent environments facilitate a natural and efficientmechanism for human-computer interaction and human ac-tivity analysis. An intelligent space can be any physical spacethat possesses the following requirements [1].

(i) Intelligent spaces should facilitate normal human ac-tivities taking place in these spaces.

(ii) Intelligent spaces should automatically capture andmaintain awareness of the events and activities takingplace in these spaces.

(iii) Intelligent spaces should be responsive to specificevents and triggers.

(iv) Intelligent spaces should be robust and adaptive to var-ious dynamic changes.

An intelligent space can be a room in a building or an out-door environment. Designing algorithms for such spaces in-volves the real-world challenges of real-time, reliable, and ro-bust performance over the wide range of events and activi-ties, which can occur in these spaces. In this paper, we con-

sider an indoor meeting room scenario. Though the high-level framework that is presented below can be applied inother scenarios, we have chosen to restrict our initial inves-tigations to a meeting room. Much of the framework in thiscase has been summarized in [2].

Intelligent spaces have sensor-based systems that allowfor natural and efficient human-computer interaction. In or-der to achieve this goal, the intelligent space needs to analyzethe events that take place and maintain situational awareness.In order to analyze such events automatically, it is essentialto develop mathematical models for representing differentkinds of events and activities.

Research efforts in the field of human activity analysishave increasingly come to rely on multimodal sensors. An-alyzing multimodal signals is a necessity in most scenariosand has added advantages in others [1, 3].

Human activity is essentially multimodal. Voice and ges-ture, for example, are intimately connected [4]. Researchersin automatic speech recognition (ASR) have used the multi-modal nature of human speech to enhance the ASR accuracyand robustness [5]. Recent research efforts in human activity

Page 2: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

2 EURASIP Journal on Advances in Signal Processing

analysis have increasingly come to include multimodal sen-sors [6–8].

Certain tasks that are very difficult to handle with uni-modal sensors might become tractable with the use of mul-timodal sensors. The limitations of audio analysis in rever-berant environments have been discussed in [9]. But the ad-dition of the video modality can solve problems like sourcelocalization in reverberant environments. In fact, video anal-ysis can even provide a more detailed description of the sub-ject’s state-like emotion, and so forth, as shown in [10]. Butthe use of video alone has some disadvantages as seen in [11].Even in the simple task of speech segmentation, it is conceiv-able that some nasal sounds can be produced without anymovement of the mouth, and conversely movement of themouth alone, like in yawning, might not signify the presenceof speech. By combining the strengths of each modality, amultimodal solution for a set of tasks might be simpler thanputting together the unimodal counterparts.

Multiple modalities carry redundant information oncomplimentary channels, and hence they provide robustnessto environmental and sensor noises that might affect each ofthese channels differently. Due to these reasons, we focus ourattention towards building multimodal systems for humanactivity analysis.

1.1. Fusion of information in multimodal systems

Fusion of information from different streams is a big chal-lenge in multimodal systems. So far, there has not been anystandard fusion technique that has been widely accepted inthe published literature. Graphical models have been widelydiscussed as the most suitable candidates for modeling andfusion information in multimodal systems [12].

Information fusion can occur at various levels of a mul-timodal system. A sensor-level fusion of video signals fromnormal and infrared cameras is used for stereo analysis in[13]. At a higher level is the feature-level fusion. The audioand visual features used together in the ASR system built atJohn Hopkins University, 2000 workshop [5], are a good ex-ample of feature-level fusion. Fusions at higher levels of ab-straction (decision level) have also been proposed. Graphi-cal models have been frequently used for this task [12]. Fu-sion at the sensor level is appropriate when the modalities tobe fused are similar. As we proceed to the feature level, fus-ing more disparate sources becomes possible. At the decisionlevel, all the information is represented in the form of prob-abilities, and hence it is possible to fuse information from awide variety of sensors. In this paper, we develop a generalfusion algorithm at the decision level.

In this paper, we develop a fusion technique in the hid-den Markov model (HMM) framework. HMMs are a class ofgraphical models that have been used traditionally in speechrecognition and human activity analysis [14]. We plan to ex-tend our algorithm to more general graphical models in thefuture. Our scheme uses HMMs trained on unimodal dataand merges the decisions (a posteriori probabilities) fromdifferent modalities. Our fusion algorithm is motivated bythe theory of iterative decoding.

1.2. Advantages of the iterative decoding scheme

A good fusion scheme should have lower error rates thanthose obtained from the unimodal models. Both the jointmodeling framework and the iterative decoding frameworkhave this property. Multimodal training data is hard to ob-tain. Iterative decoding overcomes this problem by utilizingmodels trained on unimodal data. Building joint models onthe other hand requires significantly greater amounts of mul-timodal data than training unimodal models due to the in-crease in dimensionality or complexity of the joint model orboth. Working with unimodal models also makes it possibleto use a well-learned model in one modality to segment andgenerate training data for the other modalities, thus over-coming the problem of the lack of training data to a greatextent.

In many applications like ASR, well-trained unimodalmodels might already be available. Iterative decoding uti-lizes such models directly. Thus, extending the already ex-isting unimodal systems to multimodal ones is easier. An-other common scheme used to integrate unimodal HMMsis the product HMM [15]. In our simulations, we see thatthe product rule performs as well as the joint model. But theproduct rule has the added disadvantage that it assumes aone-to-one correspondence between the hidden states of thetwo modalities. The generalized multimodal version of theiterative decoding algorithm (see Section 5) relaxes this re-quirement. Moreover, the iterative decoding algorithm per-forms better than the joint model and the product HMM inthe presence of background noise, even in cases where thereis a one-to-one correspondence between the two modalities.

In noisy environments, the frames affected by noise indifferent modalities are at best nonoverlapping and at worstindependent. The joint models are not able to separate outthe noisy modalities from the clean ones. Because of this rea-son, the iterative decoding algorithm outperforms the jointmodel at low SNR. In case of other decision-level fusion al-gorithms like the multistream HMMs [16] and reliability-weighted summation rule [17], one has to estimate the qual-ity (SNR) of the individual modalities to obtain good per-formance. Iterative decoding does not need such a priori in-formation. This is a very significant advantage of the itera-tive decoding scheme because the quality of the modalitiesis in general time-varying. For example, if the speaker keepsturning away from the camera, video features are very un-reliable for speech segmentation. The exponential weightingscheme of multistream HMMs requires real-time monitor-ing of the quality of the modalities which in itself is a verycomplex problem.

2. TURBO CODES AND ITERATIVE DECODING

Turbo codes are a class of convolutional codes that performclose to the Shannon limit of channel capacity. The seminalpaper by Berrou et al. [18] introduced the concept of iter-ative decoding to the field of channel coding. Turbo codesachieve their high performance by using two simple codes,working in parallel to achieve the performance of single com-plex code. The iterative decoding scheme is a method to

Page 3: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

Shankar T. Shivappa et al. 3

o1 ot ot+1 oT

q1 qt qt+1 qT1 t t + 1 T

αt = P(Ot1, qt)

βt = P(OTt+1|qt) Recursion

variables

o1 otot−1 oT

q1 qtqt−1 qT1 tt − 1 T

αt

αt−1

γt = P(qt , ot|qt−1)

Illustration offorward recursionin BCJR algorithm

Figure 1: Illustrating the forward recursion of the BCJR algorithm.

o1 ot ot+1 oT

q1 qt qt+1 qT

θ1 θt θt+1 θT

Joint model: use modalities 1 and 2 together in a one-step decoding

Modality 1

Modality 2

Figure 2: Joint model for a bimodal scenario.

combine the decisions from the two decoders at the receiverand achieve high performance. In other words, two simplecodes working in parallel perform as well as a highly com-plex code which in practice cannot be used due to complexityissues.

We draw an analogy between the redundant informationof the two channels of a turbo code and the redundant in-formation in the multiple modalities of a multimodal sys-tem. We develop a modified version of the iterative decodingalgorithm to extract and fuse the information from parallelstreams of multimodal data.

3. FORMALIZATION OF THE PROBLEM

Let us consider a multimodal system to recognize certain pat-terns of activity in an intelligent space [1]. It consists of mul-timodal sensors at the fundamental level. From the signalscaptured by these sensors, we extract feature vectors that en-capsulate the information contained in the signals in finitedimensions. Once the features are selected, we model the ac-tivity to be recognized, statistically. For an activity that in-volves temporal variation, hidden Markov models (HMMs)are a popular modeling framework [14].

3.1. Hidden Markov models

Let λ = (A,π,B) represent the parameters of an HMM withN hidden states, that model a particular activity. Now, thedecoding problem is to estimate the optimal state sequenceQT

1 = {q1, q2, . . . , qT} of the HMM based on the sequence ofobservations OT

1 = {o1, o2, . . . , oT}.The maximum a posteriori probability state sequence is

provided by the BCJR algorithm [19]. The MAP estimate forthe hidden state at time t is given by qt = arg maxP(qt,OT

1 ).The BCJR algorithm computes this using the forward (seeFigure 1) and backward recursions.

Define

λt(m) = P(

qt = m,OT1

)

,

αt(m) = P(

qt = m,Ot1

)

,

βt(m) = P(

OTt+1 | qt = m

)

,

γt(

m′,m) = P

(

qt = m, ot | qt−1 = m′),

m = 1, 2, . . . ,N , m′ = 1, 2, . . . ,N.

(1)

Then establish the recursions

αt(m) =∑

m′αt−1

(

m′)·γt(

m′,m)

,

βt(m) =∑

m′βt+1

(

m′)·γt+1

(

m,m′),

λt(m) = αt(m)·βt(m).

(2)

These enable us to solve for the MAP state sequence givenappropriate initial conditions for α1(m) and βT(m).

3.2. Multimodal scenario

For the sake of clarity, let us consider a bimodal system. Thereare observations OT

1 from one modality and observationsΘT

1 = {θ1, θ2, . . . , θT} from the other modality. The MAP so-lution in this case would be qt = arg maxP(qt,OT

1 ,ΘT1 ). In

order to apply the BCJR algorithm to this case, we can con-catenate the observations (feature-level fusion) and train anew HMM in the joint feature space. Instead of building ajoint model, we develop an iterative decoding algorithm thatallows us to approach the performance of the joint modelby iteratively exchanging information between the simplermodels and updating their posterior probabilities.

4. ITERATIVE DECODING ALGORITHM

This is a direct application of the turbo decoding algorithm[18]. In this section, it is assumed that the hidden states inthe two modalities have a one-to-one correspondence. Thisrequirement is relaxed in the generalized solution presentedin the next section.

In the first iteration of the iterative algorithm, we decodethe hidden states of the HMM using the observations fromthe first modality, OT

1 . We obtain the a posteriori probabili-ties λ(1)

t (m) = P(qt = m,OT1 ).

In the second iteration, these a posteriori probabilities,λ(1)t (m), are utilized as extrinsic information in decoding the

Page 4: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

4 EURASIP Journal on Advances in Signal Processing

o1 ot ot+1 oT

q1 qt qt+1 qT

Modality 1

Step 1: decode hidden states using modality 1 only

Z1 Zt Zt+1 ZT

q1 qt qt+1 qT

θ1 θt θt+1 θT

Step 2: use extrinsic information from step 1 and modality 2

Extrinsicinformationfrommodality 1

Modality 2

Figure 3: First two steps of the iterative decoding algorithm.

hidden states from the observations of the second modal-ity ΘT

1 (see Figure 3). Thus the a posteriori probabilities inthe second stage of decoding are given by λ(2)

t (m) = P(qt =m,ΘT

1 ,Z(1)T1 ), where Z(1)

t = λ(1)t is the extrinsic information

from the first iteration.

4.1. Modified BCJR algorithm for incorporatingthe extrinsic information

In order to evaluate λ(2)t , we modify the BCJR algorithm as

follows:

λ(2)t (m) = P

(

qt = m,ΘT1 ,Z(1)T

1

)

,

α(2)t (m) = P

(

qt = m,Θt1,Z(1)t

1

)

,

β(2)t (m) = P

(

ΘTt+1,Z(1)T

t+1 | qt = m)

,

γ(2)t

(

m′,m) = P

(

qt = m, θt ,Z(1)t | qt−1 = m′).

(3)

Then the recursions do not change, except for the computa-

tion of γ(2)t (m′,m).

Since the extrinsic information is independent of the ob-servations from the second modality, γ(2)

t (m′,m) = P(qt =m | qt−1 = m′)·P(θt | qt = m)·P(Z(1)

t | qt = m).

Here Z(1)t = [z(1)

1t , z(1)2t , . . . , z(1)

Nt ]′

is a vector of probability

values. A histogram of each component of Z(1)t for qt = 2 in

an N = 4 state HMM synthetic problem is shown in Figure 4.From the histogram, one can see that a simple parametric

probability model for P(Z(1)t | qt = m) is obtained as

P(

Z(1)t | qt = m

) = f(

1− z(1)mt ; ρ

)·∏i /=m

f(

z(1)it ; ρ

)

, (4)

where

f (x; ρ) =

1ρe−x/ρ, x ≥ 0,

0, x < 0(5)

10.80.60.40.20

Histograms of different components of Zt for qt = 2 and N = 4

0

50

100

z 1t

10.80.60.40.20

Histograms of different components of Zt for qt = 2 and N = 4

0

20

40

z 2t

0.80.70.60.50.40.30.20.10

Histograms of different components of Zt for qt = 2 and N = 4

0

50

100

z 3t

10.80.60.40.20

Histograms of different components of Zt for qt = 2 and N = 4

0

50

100

z 4t

Figure 4: A histogram of each component of Zt for qt = 2 in anN = 4 state HMM synthetic problem.

is an exponential distribution with rate parameter 1/ρ. Otherdistributions like the beta distribution could also be used.The exponential distribution is chosen due to its simplicity.

In the third iteration, the extrinsic information to bepassed back to decoder 1 is the a posteriori probabilitiesλ(2)t (m). But part of this information, (λ(1)

t (m)), came fromdecoder 1 itself. If we were to use λ(2)

t as the extrinsic informa-tion in the third iteration, it would destroy the independencebetween the observations from the first modality and the ex-trinsic information. We overcome this difficulty by choosinganother formulation for the extrinsic information based onthe following observation:

λ(2)t (m) = α(2)

t (m)·β(2)t (m),

α(2)t (m) =

m′α(2)t−1

(

m′)·γ(2)t

(

m′,m)

,

λ(2)t (m) =

m′α(2)t−1

(

m′)·γ(2)t

(

m′,m)·β(2)

t (m),

λ(2)t (m) = P

(

Z(1)t | qt = m

)∑

m′α(2)t−1

(

m′)

·P(qt = m | qt−1 = m′)·P(θt | qt = m)·β(2)

t (m),

λ(2)t (m) = P

(

Z(1)t | qt = m

)·Y (2)t .

(6)

Note that Y (2)t does not depend on Z(1)

t and it is hence uncor-related with ot. This argument follows the same principlesused in turbo coding literature [18]. Hence, we normalize

Page 5: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

Shankar T. Shivappa et al. 5

o1 ot ot+1 oT

q1 qt qt+1 qT

Modality 1

P(qt , rt)

θ1 θt θt+1 θT

r1 rt rt+1 rT

Generalized multimodal scenario:loose correlation between modalities 1 and 2

represented by the joint probabilities

Modality 2

Figure 5: A more generalized bimodal problem.

54321

Iterations

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Ave

rage

erro

rra

tew

ith

stan

dard

devi

atio

ns

Iterative algorithm starting from modality 1Iterative algorithm starting from modality 2Joint model

Figure 6: Error rate at different iterations for a 4-state HMM prob-lem with one-to-one correspondence between the two modalities.Note the convergence of the error rate to that of the joint model.

Y (2)t to sum to 1 and consider the normalized vector to be

the extrinsic information passed on to decoder 1 in the thirditeration.

The normalized extrinsic information Z(2)t (m) = λ(2)

t (m)

/P(Z(1)t | qt = m)/

m′λ(2)t (m′)/P(Z(1)

t | qt = m′) is passedback to decoder 1.

The iterations are continued till the state sequences con-verge in both modalities or a fixed number of iterations arereached.

5. GENERAL MULTIMODAL PROBLEM

In the previous section, we assumed that the hidden statesin the two modalities of a multimodal system are the same.In this section, we loosen this restriction and allow the hid-

den states in the individual modalities to just have a knownprior co-occurrence probability (see Figure 5). In particular,if qt and rt represent the hidden states in modalities 1 and2 at time t, then we know the joint probability distributionP(qt = m, rt = m′) and assume this to be stationary.

This corresponds to the case where there is a loose butdefinite interaction between the two modalities as seen veryclearly in the case of phonemes and visemes, in audiovisualspeech recognition. There is no one-to-one correspondencebetween visemes and phonemes. But the occurrence of onephoneme corresponds to the occurrence of a few specificvisemes and vice versa.

5.1. Iterative decoding algorithm in the general case

This is an extension of the iterative decoding algorithm aspresented in the turbo coding scenario. In this case, we havethe same steps as in the iterative algorithm of Section 4. Butat the jth iteration in the modified BCJR algorithm, in the

computation of γ( j)t (m′,m) = P(qt = m, θt,Z

( j−1)t | qt−1 =

m′), we now need to compute

γ( j)t

(

m′,m) = P

(

rt = m, θt,Z( j−1)t | rt−1 = m′),

γ( j)t

(

m′,m) = P

(

rt = m | rt−1 = m′)·P(θt | rt = m)

·P(Z( j−1)t | rt = m

)

,

γ( j)t

(

m′,m) = P

(

rt = m | rt−1 = m′)·P(θt | rt = m)

·∑

n

{

P(

Z( j−1)t | qt = n)P

(

qt = n | rt = m)}

,

(7)

which can be computed by the joint probability distributionP(qt = m, rt = m′).

The rest of the iterative algorithm remains the same asbefore.

6. EXPERIMENTAL VERIFICATION OF THE ITERATIVEDECODING ALGORITHM

In this section, we present the results of applying the itera-tive decoding algorithm to a synthetic problem. We choose asynthetic problem in order to validate our algorithm beforeapplying it to a real-world problem so as to isolate the perfor-mance characteristics of our algorithm from the complexitiesof real-world data, which are dealt with in Section 7.

We generate observations from an HMM with 4 stateswhose observation densities are 4-dimensional Gaussian dis-tributions. We also construct a joint model by concatenatingthe feature vectors. The goal of the experiment is to decodethe state sequence from the observations and compare it withthe true state sequence in order to obtain the error rates. Theexperiment is repeated several times and the average errorrates are obtained.

In the first case, the joint model with 8 dimensions and4 states was used to generate the state and observation se-quences. We use the joint model to decode the state sequencefrom the observations. Next, we consider the observations

Page 6: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

6 EURASIP Journal on Advances in Signal Processing

54321

Iterations

0.22

0.24

0.26

0.28

0.3

0.32

0.34

Ave

rage

erro

rra

te

Generalized multimodal model

Figure 7: Error rate at different iterations for a generalized multi-modal problem. Note that the performance follows the same trendas in the previous case.

54321

Iterations

0.1

0.2

0.3

0.4

0.5

0.6

Ave

rage

erro

rra

tew

ith

stan

dard

devi

atio

n

Iterative decoding starting with modality 1Iterative decoding starting with modality 2Joint model

Figure 8: Error rate at different iterations in the case of noisymodalities. Note that the iterative algorithm performs better thanthe joint model at low SNR.

to be generated by two modalities with 4 dimensions each.We now consider the product rule [15] as another alternativeto the joint model. But in our simulations, we found its er-ror rates to be the same as those of the joint model. Hencewe assume the joint model to give us the baseline perfor-mance. The iterative decoding algorithm described in Sec-tion 4 is applied to decode the state sequence and compareit with the true state sequence. The results are plotted in Fig-ure 6. We can see that the iterative decoding algorithm con-verges to the baseline performance and it reduces the error

Cameras

Microphone array

Figure 9: Testbed and the associated audio and video sensors.

rate by almost 50% compared to the unimodal case (iteration1). Figure 6 also shows the standard deviation of error fromwhich it can be seen that the performance is indeed close tothe baseline performance. Since the two modalities have sim-ilar unimodal error rates, the error dynamics of the iterativealgorithm are independent of the starting modality.

In the second example, we generate observations fromtwo independent HMMs such that the state sequence followsa known joint distribution. We then apply the generalizediterative decoding algorithm described in Section 5. The re-sults are shown in Figure 7. In this case, we do not have abaseline experiment for comparison as the two streams areonly loosely coupled, but the general trend in average errorrate with each iteration is similar to the case shown in Fig-ure 6.

In the presence of noise, the iterative algorithm outper-forms the joint model as shown in Figure 8. Based on thestandard deviation of error, a standard t-test reveals that thedifference between the joint model and the iterative decod-ing algorithm is statistically significant after the third itera-tion. In this case, we added additive white Gaussian noise tothe features of one of the modalities. No a priori informationabout the noise statistics is assumed to be available. Note thatin this case, the individual modalities have varying noise lev-els, and hence the convergence of the iterative algorithm isdependent on the starting modality. But in both cases, wesee that the iterative algorithm converges to the same per-formance after the third iteration. This illustrates the advan-tage of iterative decoding over joint modeling as mentionedin Section 1.1.

7. EXPERIMENTAL TESTBED

In this section, we describe an experimental testbed that is setup at the Computer Vision and Robotics Research (CVRR)lab at the University of California, San Diego. The goal ofthis exercise is to develop and evaluate human activity anal-ysis algorithms in a meeting room scenario. Figure 9 shows adetailed view of the sensors deployed.

Page 7: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

Shankar T. Shivappa et al. 7

Figure 10: Different head poses and backgrounds for one subject out of 20 subjects in our database.

Figure 11: Face detection using the Viola-Jones face detector with various subjects.

Figure 12: Some snapshots of the lip region during a typical utterance. Observe the variations in pose and facial characteristics of the threedifferent subjects, which limit the performance of a video-only system.

Page 8: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

8 EURASIP Journal on Advances in Signal Processing

1.41.31.21.110.90.80.7×104

Sample number

−0.1

−0.05

0

0.05

0.1

Sign

alam

plit

ude

Backgroundnoise

PauseSilence

Speech

Figure 13: Audio waveform of speech in background noise. Theshort pauses between words which can be confused by an audio-only system for background noise will be detected as speech by thevideo modality, based on the lip movement.

7.1. Hardware

7.1.1. Audio sensors

The sensors consist of a microphone array. The audio signalsare captured at 16 kHz on a Linux workstation using the ad-vanced Linux sound architecture (ALSA) drivers. JACK is auseful audio server that is used here to capture and processmultiple channels of audio data in real time (as required).

7.1.2. Video sensors

We use a synchronized pair of wide-angle cameras to capturethe majority of the panorama around the table. The camerasare placed off the center of the table in order to increase theirfield of view as shown in the enlarged portion of Figure 9.

7.1.3. Synchronization

In order to facilitate synchronization, the video capture mod-ule generates a short audio pulse after capturing every frame.One of the channels in the microphone array is used torecord this audio sequence and synchronize the audio andvideo frames.

7.2. Preliminary experiments and results

In order to evaluate the performance of the iterative decod-ing algorithm on a real-world problem, we consider a sim-plified version of the meeting room conversation, with onespeaker. The goal of the experiment is to segment the speechdata into speech and silence parts. The traditional approachto the problem is to use the energy in the speech signal asa feature and maintain an adaptive threshold for the energyof the background noise. This is not accurate in the presenceof nonstationary background noise like overlapping speechfrom multiple speakers. In our experiment, we use the au-

181614121086420×104

Sample number

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

Sign

alam

plit

ude

Silence

Speech

Silence

Speech

Figure 14: Audio waveform from a typical utterance in backgroundnoise. The speech and silence parts are hand-labeled to be used asground truth.

dio and video modalities to build a multimodal speech seg-mentation system, that is robust to background noise andwhich performs better than the video-only model or the jointmodel.

7.2.1. Data collection

We collected 4 minutes of audiovisual data from 20 differentspeakers. This included 12 different head poses and 2 differ-ent backgrounds as shown in Figure 10. We used 1 minute ofdata from each speaker, that is, a total of 20 minutes of au-diovisual data, to estimate the HMM model parameters. Theremaining 3 minutes from each speaker were included in thetesting set. That is, a total of 60 minutes of testing data wasused.

7.2.2. Feature extraction

Each time step corresponds to one frame of the video sig-nal. The cameras capture video at 15 fps. We use the energyof the microphone signal in time window corresponding toeach frame as the audio feature. We track the face of thespeaker using the Viola-Jones face detector [20]. Figure 11shows some sample frames from the face detector outputfor different subjects. We consider the mouth region as thelower half of the face. The motion in the mouth region isestimated by subtracting the mouth region pixels from con-secutive frames and summing the absolute value of these dif-ferences. This sum is our video feature vector. Thus a smoothand stable face tracker is essential for accurate video featureextraction. Figure 12 shows the different positions of the lipsduring a typical utterance.

7.2.3. Modeling and testing

We then train HMMs in the audio and video domains us-ing labeled speech and silence parts of speech data. We also

Page 9: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

Shankar T. Shivappa et al. 9

300250200150100500

Frame number

0.51

1.52

2.5

Decoded state after iteration number 1

Speech

Silence

Errors

300250200150100500

Frame number

0.51

1.52

2.5

Decoded state after iteration number 2

300250200150100500

Frame number

0.51

1.52

2.5

Decoded state after iteration number 3

Error

Figure 15: The decoded states of the HMM after each iteration.Note the errors in the first iteration being corrected in the subse-quent iterations.

321

Iterations

4

6

8

10

12

14

16

18

20

Ave

rage

erro

rra

tew

ith

stan

dard

devi

atio

ns

Iterative algorithm starting from audioIterative algorithm starting from videoJoint model

Figure 16: Results showing the error rates for the iterative decodingscheme for the speech segmentation problem.

construct the joint model by concatenating the features. Theresults of the experiment on a typical noisy segment ofspeech are shown in Figure 15. The ground truth is shownin Figure 14. From the numerical results in Figure 16, wesee that by the third iteration, the iterative decoding algo-rithm performs slightly better than the joint model. This im-provement, however, is not statistically significant because

the background noise in the audio and video domains is notso severe. Though building the joint model is straightforwardin this case, it is not so easy in more complex situations, asexplained in the introductory sections. Thus the iterative al-gorithm appears to be a good fusion framework in the mul-timodal scenario.

8. CONCLUDING REMARKS

We have developed a general information fusion frameworkbased on the principle of iterative decoding used in turbocodes. We have adapted the iterative decoding algorithm tothe case of multimodal systems and demonstrated its perfor-mance on synthetic data as well as practical problem. In thefuture, we plan to further investigate its performance underdifferent real-world scenarios and also apply the fusion algo-rithm to more complex systems.

We have also described the setup of an experimentaltestbed in the CVRR lab at UCSD. In the future, we plan toextend the experiments on the testbed to include many morefeatures like speaker identification, affecting analysis andkeyword spotting. This will lead to more complex humanactivity analysis tasks with more complex models. We willevaluate the effectiveness of the iterative decoding scheme onthese complex real-world problems.

ACKNOWLEDGMENTS

The work described in this paper was funded by the RESCUEproject at UCSD, NSF Award no. 0331690. We also thankUC Discovery Program Digital Media Grant for the assis-tance with the SHIVA lab testbed at Computer Vision andRobotics Research Laboratory (CVRR), UCSD. We acknowl-edge the assistance and cooperation of our colleagues fromthe CVRR Laboratory. We also thank the reviewers whosevaluable comments helped us to improve the clarity of thepaper.

REFERENCES

[1] M. M. Trivedi, K. S. Huang, and I. Mikic, “Dynamic contextcapture and distributed video arrays for intelligent spaces,”IEEE Transactions on Systems, Man, and Cybernetics, Part A:Systems and Humans, vol. 35, no. 1, pp. 145–163, 2005.

[2] M. M. Trivedi, K. S. Huang, and I. Mikic, “Activity monitoringand summarization for an intelligent meeting room,” in Pro-ceedings of the IEEE International Workshop on Human Motion,2000.

[3] J. Ploetner and M. M. Trivedi, “A multimodal approach for dy-namic event capture of vehicles and pedestrians,” in Proceed-ings of the 4th ACM International Workshop on Video Surveil-lance and Sensor Networks, 2006.

[4] S. Oviatt, R. Coulston, and R. Lunsford, “When do we inter-act multimodally? Cognitive load and multimodal communi-cation patterns,” in Proceedings of the 6th International Confer-ence on Multimodal Interfaces, State College, Pa, USA, October2004.

Page 10: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

10 EURASIP Journal on Advances in Signal Processing

[5] C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin,and D. Vergyri, “Large-vocabulary audio-visual speech recog-nition: a summary of the Johns Hopkins summer 2000 work-shop,” in Proceedings of the IEEE Workshop on Multimedia Sig-nal Processing, Cannes, France, 2001.

[6] T. Choudhury, B. Clarkson, T. Jebara, and A. Pentland, “Mul-timodal person recognition using unconstrained audio andvideo,” in Proceedings of the 2nd International Conference onAudio-Visual Biometric Person Authentication, Washington,DC, USA, March 1999.

[7] L. Chen, R. Malkin, and J. Yang, “Multimodal detection of hu-man interaction events in a nursing home environment,” inProceedings of the 6th International Conference on MultimodalInterfaces, Pittsburgh, Pa, USA, October 2002.

[8] I. McCowan, D. Gatica-Perez, S. Bengio, G. Lathoud, M.Barnard, and D. Zhang, “Automatic analysis of multimodalgroup actions in meetings,” IEEE Transactions on Pattern Anal-ysis and Machine Intelligence, vol. 27, no. 3, pp. 305–317,March 2005.

[9] T. Gustafsson, B. D. Rao, and M. M. Trivedi, “Source local-ization in reverberant environments: modeling and statisticalanalysis,” IEEE Transactions on Speech and Audio Processing,vol. 11, no. 6, pp. 791–803, 2003.

[10] J. C. McCall and M. M. Trivedi, “Facial action coding usingmultiple visual cues and a hierarchy of particle filters,” in Pro-ceedings of the IEEE Conference on Computer Vision and PatternRecognition Workshop, 2006.

[11] N. Nikolaidis, S. Siatras, and I. Pitas, “Visual speech detectionusing mouth region intensities,” in Proceedings of the EuropeanSignal Processing Conference, Florence, Italy, September 2006.

[12] A. Jaimes and N. Sebe, “Multimodal human computer in-teraction: a survey,” in Proceedings of the IEEE InternationalWorkshop on Human Computer Interaction in Conjunction withICCV, Beijing, China, October 2005.

[13] S. J. Krotosky and M. M. Trivedi, “Mutual information basedregistration of multimodal stereo videos for person tracking,”Journal of Computer Vision and Image Understanding, vol. 106,no. 2-3, pp. 270–287, 2006, special issue on Advances in VisionAlgorithms and Systems beyond the Visible Spectrum.

[14] N. Oliver, E. Horvitz, and A. Garg, “Layered representationsfor human activity recognition,” in Proceedings of InternationalConference on Multimodal Interfaces, Pittsburgh, Pa, USA, Oc-tober 2002.

[15] J. Huang, Z. Liu, Y. Wang, Y. Chen, and E. K. Wong, “Integra-tion of multimodal features for video scene classification basedon Hmm,” in Proceedings of the IEEE Workshop on MultimediaSignal Processing, Copenhagen, Denmark, 1999.

[16] S. Dupont and J. Luettin, “Audio-visual speech modeling forcontinuous speech recognition,” IEEE Transactions on Multi-media, vol. 2, no. 3, pp. 141–151, September 2000.

[17] E. Erzin, Y. Yemez, A. M. Tekalp, A. Ercil, H. Erdogan, andH. Abut, “Multimodal person recognition for human-vehicleinteraction,” IEEE Multimedia Magazine, vol. 13, no. 2, pp. 18–31, 2006.

[18] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near shannonlimit error-correcting coding and decoding: turbo-codes,” inProceedings of the IEEE International Conference on Communi-cations, Geneva, Switzerland, May 1993.

[19] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding oflinear codes for minimizing symbol error rate,” IEEE Transac-tions on Information Theory, vol. 20, no. 2, pp. 284–287, March1974.

[20] P. Viola and M. Jones, “Robust real-time object detection,” In-ternational Journal of Computer Vision, vol. 57, no. 2, pp. 137–154, 2002.

Page 11: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

EURASIP Journal on Embedded Systems

Special Issue on

Formal Techniques for Embedded SystemsDesign and Validation

Call for Papers

Computing platforms that are embedded within larger sys-tems they control, called embedded systems, are inherentlyvery complex as they are responsible for controlling and reg-ulating multiple system functionalities. Often embedded sys-tems are also safety-critical requiring high degree of relia-bility and fault tolerance. Examples include distributed mi-croprocessors controlling the modern cars or aircrafts andairport baggage handling system that track and trace unsafebaggage. To address this growing need for safety and reliabil-ity, formal techniques are increasingly being adapted to suitembedded platforms. There has been widespread use of syn-chronous languages such as Esterel for the design of automo-tive and flight control software that requires stronger guar-antees. Languages like Esterel not only provide nice featuresfor high-level specification but also enable model checking-based verification due to their formal semantics. Other semi-formal notations are also being proposed as standards tospecify industrial embedded systems using, for example, thenewly developed IEC61499 standard for process control. Thisstandard primarily focuses on component-oriented descrip-tion of embedded control systems. The goal of this specialissue is to bring together a set of high-quality research arti-cles looking at different applications of formal or semiformaltechniques in specification, verification, and synthesis of em-bedded systems.

Topics of interest are (but not limited to):

• Verification of system-level languages• Verification of embedded processors• Models of computation and verification• Models of computation for heterogeneous embedded

systems• IP verification issues• Open system verification techniques such as module

checking and applications of module checking• Formal techniques for protocol matching and interface

process generation• Applications of DES control theory in open system

verification

• Adaptive techniques for open system verification• Verification techniques for automatic debugging of

embedded systems• Formal approaches for secure embedded systems• Hardware-software coverification of embedded sys-

tems• Compositional approaches for SOC verification• Verification of distributed embedded system

Before submission authors should carefully read over thejournal’s Author Guidelines, which are located at http://www.hindawi.com/journals/es/guidelines.html. Authors shouldfollow the EURASIP Journal on Embedded Systems manu-script format described at the journal’s site http://www.hindawi.com/journals/es/. Prospective authors should sub-mit an electronic copy of their complete manuscript throughthe journal’s Manuscript Tracking System at http://mts.hindawi.com/, according to the following timetable:

Manuscript Due August 1, 2008

First Round of Reviews November 1, 2008

Publication Date February 1, 2009

Guest Editors

Partha S. Roop, Department of Electrical & Computer En-gineering, University of Auckland, Auckland 1142,New Zealand; [email protected]

Samik Basu, Department of Computer Science, Iowa StateUniversity, Ames, IA 50011, USA; [email protected]

Chong-Min Kyung, Department of Electrical Engineering,Korea Advanced Institute of Science and Technology,Daejeon 305-701, South Korea; [email protected]

Gregor Goessler, INRIA Grenoble-Rhône-Alpes, 38334Saint Ismier Cedex, France; [email protected]

Hindawi Publishing Corporationhttp://www.hindawi.com

Page 12: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

EURASIP Journal on Wireless Communications and Networking

Special Issue on

Wireless Access in Vehicular Environments

Call for PapersWireless access in vehicular environments (WAVE) technol-ogy comes into sight as a state-of-the-art solution to con-temporary vehicular communications, which is anticipatedto be widely applied in the near future to radically im-prove the transportation environment in aspects of safety,intelligent management, and data exchange services. WAVEsystems will fundamentally smooth the progress of intelli-gent transportation systems (ITSs) by providing them withhigh-performance physical platforms. WAVE systems willbuild upon the IEEE 802.11p standard, which is still activeand expected to be ratified in April, 2009. Meanwhile, theVHF/UHF (700 MHz) band vehicular communication sys-tems are attracting increasingly attention recently.

The fast varying and harsh vehicular environment bringsabout several fresh research topics on the study of WAVE sys-tems and future vehicular communication systems, which in-clude physical layer challenges associated with mobile chan-nels, capacity evaluation, novel network configuration, effec-tive media access control (MAC) protocols, and robust rout-ing and congestion control schemes.

The objective of this special section is to gather and cir-culate recent progresses in this fast developing area of WAVEand future vehicular communication systems spanning fromtheoretical analysis to testbed setup, and from physical/MAClayers’ enabling technology to network protocol. These re-search and implementation activities will be considerablyhelpful to the design of WAVE and future vehicular commu-nications systems by removing major technical barriers andpresenting theoretical guidance. This special issue will cover,but not limited to, the following main topics:

• 700 MHz/5.8 GHz vehicle-to-vehicle (V2V)/vehicle-to-infrastructure (V2I) single-input single-output(SISO)/MIMO channel measurement and modeling,channel spatial and temporal characteristics explo-ration

• Doppler shift study, evaluation and estimate, time andfrequency synchronizations, channel estimate and pre-diction

• Utilization of MIMO, space-time coding, smart an-tenna, adaptive modulation and coding

• Performance study and capacity analysis of V2V andV2I communications operating over both 5.8GHz and700Mz

• Software radio, cognitive radio, and dynamic spectrumaccess technologies applied to WAVE and future vehic-ular communication systems

• Mesh network and other novel network configurationsfor vehicular networks

• Efficient MAC protocols development• Routing algorithms and congestion control schemes

for both real-time traffic warning message broadcast-ing and high-speed data exchange

• Cross-layer design and optimization• Testbed or prototype activities

Authors should follow the EURASIP Journal on Wire-less Communications and Networking manuscript for-mat described at the journal site http://www.hindawi.com/journals/wcn/. Prospective authors should submit an elec-tronic copy of their complete manuscript through theEURASIP Journal on Wireless Communications and Net-working’s Manuscript Tracking System at http://mts.hindawi.com/, according to the following timetable.

Manuscript Due April 1, 2008

First Round of Reviews July 1, 2008

Publication Date October 1, 2008

Guest Editors

Weidong Xiang, University of Michigan, Dearborn, USA;[email protected]

Javier Gozalvez, University Miguel Hernández, Spain;[email protected]

Zhisheng Niu, Tsinghua University, China;[email protected]

Onur Altintas, Toyota InfoTechnology Center, Co. Ltd,Tokyo, Japan; [email protected]

Eylem Ekici, Ohio State University, USA; [email protected]

Hindawi Publishing Corporationhttp://www.hindawi.com

Page 13: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

EURASIP Journal on Bioinformatics and Systems Biology

Special Issue on

Network Structure and Biological Function:Reconstruction, Modelling, and Statistical Approaches

Call for Papers

We are particularly interested in contributions, which eluci-date the relationship between structure or dynamics of bi-ological networks and biological function. This relationshipmay be observed on different scales, for example, on a globalscale, or on the level of subnetworks or motifs.

Several levels exist on which to relate biological function tonetwork structure. Given molecular biological interactions,networks may be analysed with respect to their structural anddynamical patterns, which are associated with phenotypes ofinterest. On the other hand, experimental profiles (e.g., timeseries, disturbations) can be used to reverse engineer networkstructures based on a model of the underlying functional net-work.

Is it possible to detect the interesting features with the cur-rent methods? And how is our picture of the relationsshipbetween network structure and biological function affectedby the choice of methods?

Perspectives both from simulation approaches as wellas the evaluation of experimental data and combinationsthereof are welcome and will be integrated within this spe-cial issue.

Authors should follow the EURASIP Journal on Bioin-formatics and Systems Biology manuscript format describedat the journal site http://www.hindawi.com/journals/bsb/.Prospective authors should submit an electronic copy of theircomplete manuscript through the journal Manuscript Track-ing System at http://mts.hindawi.com/, according to the fol-lowing timetable:

Manuscript Due June 1, 2008

First Round of Reviews September 1, 2008

Publication Date December 1, 2008

Guest Editors

J. Selbig, Bioinformatics Chair, Institute for Biochemistryand Biology, University of Potsdam, Germany;[email protected]

M. Steinfath, Institute for Biochemistry and Biology,University of Potsdam, Germany;[email protected]

D. Repsilber, AG Biomathematics & Bioinformatics,Genetics and Biometry and Genetics Section, ResearchInstitute for the Biology of Farm Animals, Dummerstorf,Germany; [email protected]

Hindawi Publishing Corporationhttp://www.hindawi.com

Page 14: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING

Special Issue on

Cross-Layer Design for the Physical, MAC, and LinkLayer in Wireless Systems

Call for PapersCross-layer design is a common theme for emerging wire-less networks, such as cellular systems and ad hoc networks,though the objectives and requirements can be very differ-ent. A cross-layer approach is particularly important whendesigning protocols at the physical (PHY) and medium ac-cess control (MAC)/link layer for several reasons. The pastdecade gave birth to innovative techniques such as oppor-tunistic communication, rate adaptation, and cooperativecoding, which all require close interaction between layers.For example, optimizing the rate-delay tradeoff in a mul-tiuser wireless system requires knowledge of PHY parame-ters such as channel state information (CSI) as well as delayinformation from the MAC. As another example, the notionof link in wireless relay systems is generalized towards a col-lection of several physical channels related to more than twonodes. The potential for novel solutions through cross-layerdesign is further seen in the fact that it motivates collabora-tion among researchers of different specialties, such as anten-nas and propagation, signal processing, error-control coding,scheduling, and networking (MAC) protocols.This special issue is devoted to research efforts that use cross-layer design to jointly optimize the lowest two layers of thewireless communication systems. Such cross-layer solutionswill have decisive impact in future wireless networks.Contributions on the following and related topics are so-licited:

• Random access protocols with multipacket reception(MPR), multiantenna MPR protocols

• MAC/ARQ protocols for multiantenna and wirelessrelay systems

• MAC-level consideration of time-varying channels:channel estimation, channel state information (CSI),and link adaptation

• MAC-level interference mitigation via managementof PHY-level multiuser networks: for example, inter-fering multipleaccess (MA) channels

• MAC/PHY protocols and analysis of cognitive radionetworks

• MAC protocols for PHY-restricted devices/circuitry(e.g., RFID)

• PHY-aware scheduling and resource allocation

• Analysis of fundamental tradeoffs at the PHY/MACinterface: rate-delay, energy-delay

• PHY/MAC/link aspects of network coding in wirelessnetworks

• PHY/MAC issues in wireless sensor networks and dis-tributed source co

• Cross-layer optimization of current standards(WiMAX, W-CDMA, etc.)

• Interaction of MAC/PHY design with the circuitryand the battery management

Authors should follow the EURASIP Journal on Ad-vances in Signal Processing manuscript format describedat http://www.hindawi.com/journals/asp/. Prospective au-thors should submit an electronic copy of their com-plete manuscript through the EURASIP Journal on Ad-vances in Signal Processing Manuscript Tracking Systemat http://mts.hindawi.com/, according to the followingtimetable:

Manuscript Due February 1, 2008

First Round of Reviews May 1, 2008

Publication Date August 1, 2008

GUEST EDITORS:

Petar Popovski, Center for TeleInFrastructure, AalborgUniversity, Niels Jernes Vej 12, A5-209, 9220 Aalborg, Den-mark; [email protected]

Mary Ann Ingram, School of Electrical and Computer Engi-neering, Georgia Institute of Technology, 777 Atlantic DriveNW, Atlanta, GA 30332-0250, USA;[email protected]

Christian B. Peel, ArrayComm LLC, San Jose, CA 95131-1014, USA; [email protected]

Page 15: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

Shinsuke Hara, Graduate School of Engineering, OsakaCity University, Osaka-shi 558-8585, Japan;[email protected]

Stavros Toumpis, Department of Electrical and ComputerEngineering, University of Cyprus, 1678 Nicosia, Cyprus;[email protected]

Hindawi Publishing Corporationhttp://www.hindawi.com

Page 16: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

EURASIP Journal on Bioinformatics and Systems Biology

Special Issue on

Applications of Signal Processing Techniques toBioinformatics, Genomics, and Proteomics

Call for PapersThe recent development of high-throughput molecular ge-netics technologies has brought a major impact to bioin-formatics, genomics, and proteomics. Classical signal pro-cessing techniques have found powerful applications in ex-tracting and modeling the information provided by genomicand proteomic data. This special issue calls for contributionsto modeling and processing of data arising in bioinformat-ics, genomics, and proteomics using signal processing tech-niques. Submissions are expected to address theoretical de-velopments, computational aspects, or specific applications.However, all successful submissions are required to be tech-nically solid and provide a good integration of theory withpractical data.

Suitable topics for this special issue include but are notlimited to:

• Time-frequency representations• Spectral analysis• Estimation and detection• Stochastic modeling of gene regulatory networks• Signal processing for microarray analysis• Denoising of genomic data• Data compression• Pattern recognition• Signal processing methods in sequence analysis• Signal processing for proteomics

Authors should follow the EURASIP Journal on Bioinfor-matics and Systems Biology manuscript format describedat the journal site http://www.hindawi.com/journals/bsb/.Prospective authors should submit an electronic copy oftheir complete manuscript through the EURASIP Journal onBioinformatics and Systems Biology’s Manuscript TrackingSystem at http://mts.hindawi.com/, according to the follow-ing timetable:

Manuscript Due June 1, 2008

First Round of Reviews September 1, 2008

Publication Date December 1, 2008

Guest Editors

Erchin Serpedin, Department of Electrical and ComputerEngineering, Texas A&M University, College Station, TX77843-3128, USA; [email protected]

Javier Garcia-Frias, Department of Electrical andComputer Engineering, University of Delaware, Newark, DE19716-7501, USA; [email protected]

Yufei Huang, Department of Electrical and ComputerEngineering, University of Texas at San-Antonio, TX78249-2057, USA; [email protected]

Ulisses Braga Neto, Department of Electrical andComputer Engineering, Texas A&M University, CollegeStation, TX 77843-3128, USA; [email protected]

Hindawi Publishing Corporationhttp://www.hindawi.com

Page 17: AnIterativeDecodingAlgorithmforFusionof …cvrr.ucsd.edu/publications/2008/SShivappa_EURASIP_iterative.pdfnormal and infrared cameras is used for stereo analysis in [13]. At a higher

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2007

Signal ProcessingResearch Letters in

Why publish in this journal?

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2007

Signal ProcessingResearch Letters in

Hindawi Publishing Corporation410 Park Avenue, 15th Floor, #287 pmb, New York, NY 10022, USA

ISSN: 1687-6911; e-ISSN: 1687-692X; doi:10.1155/RLSP

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2007

Signal ProcessingResearch Letters in

Research Letters in Signal Processing is devoted to very fast publication of short, high-quality manuscripts in the broad field of signal processing. Manuscripts should not exceed 4 pages in their final published form. Average time from submission to publication shall be around 60 days.

Why publish in this journal?Wide Dissemination

All articles published in the journal are freely available online with no subscription or registration barriers. Every interested reader can download, print, read, and cite your article.

Quick Publication

The journal employs an online “Manuscript Tracking System” which helps streamline and speed the peer review so all manuscripts receive fast and rigorous peer review. Accepted articles appear online as soon as they are accepted, and shortly after, the final published version is released online following a thorough in-house production process.

Professional Publishing Services

The journal provides professional copyediting, typesetting, graphics, editing, and reference validation to all accepted manuscripts.

Keeping Your Copyright

Authors retain the copyright of their manuscript, which is published using the “Creative Commons Attribution License,” which permits unrestricted use of all published material provided that it is properly cited.

Extensive Indexing

Articles published in this journal will be indexed in several major indexing databases to ensure the maximum possible visibility of each published article.

Submit your Manuscript Now...In order to submit your manuscript, please visit the journal’s website that can be found at http://www.hindawi.com/journals/rlsp/ and click on the “Manuscript Submission” link in the navigational bar.

Should you need help or have any questions, please drop an email to the journal’s editorial office at [email protected]

Signal ProcessingResearch Letters in

Editorial BoardTyseer Aboulnasr, Canada

T. Adali, USA

S. S. Agaian, USA

Tamal Bose, USA

Jonathon Chambers, UK

Liang-Gee Chen, Taiwan

P. Dan Dan Cristea, Romania

Karen Egiazarian, Finland

Fary Z. Ghassemlooy, UK

Ling Guan, Canada

M. Haardt, Germany

Peter Handel, Sweden

Alfred Hanssen, Norway

Andreas Jakobsson, Sweden

Jiri Jan, Czech Republic

Soren Holdt Jensen, Denmark

Stefan Kaiser, Germany

C. Chung Ko, Singapore

S. Maw Kuo, USA

Miguel Angel Lagunas, Spain

James Lam, Hong Kong

Mark Liao, Taiwan

Stephen Marshall, UK

Stephen McLaughlin, UK

Antonio Napolitano, Italy

C. Richard, France

M. Rupp, Austria

William Allan Sandham, UK

Ravi Sankar, USA

John J. Shynk, USA

A. Spanias, USA

Yannis Stylianou, Greece

Jarmo Henrik Takala, Finland

S. Theodoridis, Greece

Luc Vandendorpe, Belgium

Jar-Ferr Kevin Yang, Taiwan