maximum likelihood failure diagnosis in finite state machines … · 2010-02-19 · existing...
TRANSCRIPT
Maximum Likelihood Failure Diagnosis in
Finite State Machines under Unreliable Observations
Eleftheria Athanasopoulou, Lingxi Li, and Christoforos N. Hadjicostis
Abstract
In this paper we develop a probabilistic methodology for failure diagnosis in finite state machines based on a
sequence of unreliable observations. Given prior knowledge of the input probability distribution but without actual
knowledge of the applied input sequence, the core problem we consider aims to choose from a pool of known,
deterministic finite state machines (FSMs) the one that most likely matches the given sequence of observations.
The problem becomes challenging because of sensor failures which may corrupt the output sequence by inserting,
deleting, and transposing symbols with certain probabilities (that are assumed known). We propose an efficient
recursive algorithm for obtaining the most likely underlying FSM, given the possibly erroneous observed sequence.
The proposed algorithm essentially allows us to perform online maximum likelihood failure diagnosis and is applicable
to more general settings where one is required to choose the most likely underlying hidden Markov model (HMM)
based on a sequence of observations that may get corrupted with known probabilities. The algorithm generalizes
existing recursive algorithms for likelihood calculation in HMMs by allowing loops in the associated trellis diagram.
We illustrate the proposed methodology using an example of diagnosis (classification) of communication protocols.
Index terms: Finite state machines, failure diagnosis, maximum likelihood model classification, insertions,
deletions, transpositions, discrete event systems, probabilistic automata.
I. INTRODUCTION
Failure diagnosis is an important aspect in modern system and network operation, particularly in applications
that are life-critical and require high reliability (e.g., medical, transportation or military systems). In this paper
we focus on diagnosis based on discrete event system (DES) formulations. More specifically, we consider failure
diagnosis in systems that have discrete state spaces and event-driven evolutions (such systems take state transitions
only when a certain set of discrete events occurs). Any large-scale dynamic system, such as a computer system, a
manufacturing system, a chemical process, or a semiconductor manufacturing process can be modeled as a DES
at some level of abstraction [1]. Much work has been done in failure diagnosis of discrete event systems (DESs)
including deterministic diagnosis [2]–[7], probabilistic diagnosis [8], [9] or diagnosis in stochastic finite automata
This material is based upon work supported in part by the National Science Foundation under NSF Career Award No 0092696 and NSF ITR
Award No 0426831, and in part by the Air Force Office of Scientific Research under Award No AFOSR DoD F49620-01-1-0365URI. Any
opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the
views of NSF or AFOSR.
The authors are with the Coordinated Science Laboratory and the Department of Electrical and Computer Engineering, University of Illinois at
Urbana-Champaign, IL 61801–2307, USA. Corresponding author: C. N. Hadjicostis, 357 CSL, 1308 West Main Street, Urbana, IL 61801–2307,
USA (e-mail: [email protected]).
[10], [11]. Related problems of failure diagnosis and conformance testing of communication systems/protocols
(modeled by finite state machines) are explored in [12]–[14].
In this paper, we develop a recursive algorithm for maximum likelihood calculation of two (or more) finite
state machines (FSMs) based on a sequence of possibly erroneous observations. This algorithm is very general
and can be applied in many settings, such as the evaluation problem in hidden Markov models (HMMs) [16],
the parsing problem in probabilistic automata (PA) [20], [21], and the trellis-based decoding of variable length
codes (VLC) [24], [25]. In this paper, we focus on failure diagnosis applications and motivate our approach by
considering the following problem: given two known deterministic finite state machine models (one corresponding
to the fault-free version of the underlying system and the other corresponding to a faulty version of the system), we
want to determine which of the two competing models has most likely produced a given sequence of observations.
We assume that the input sequence is unknown but the a priori input distribution is known; the diagnoser needs
to make its decision based on the observed outputs which are associated (not necessarily exclusively) with FSM
transitions. The additional challenge in our diagnosis formulation is that the observed sequence may be corrupted
due to sensor malfunctions. For example, the information that the sensors provide may be corrupted due to inaccurate
measurements, limited resolution, degraded sensor performance because of aging, or hardware failures. In this work,
we are interested in unreliable sensors that may cause outputs to be deleted, inserted or transposed with certain
known, possibly time-varying probabilities.
One way to view the above problem is in terms of Figure 1: the system under diagnosis is driven by a randomly
generated input sequence xLs1 =< x[1], x[2], . . . , x[Ls] > (with known a priori input probability distribution1).
The output sequence yLs1 =< y[1], y[2], ..., y[Ls] > that is generated can become corrupted due to sensor failures;
hence, the observed sequence zL1 =< z[1], z[2], ..., z[L] > may be erroneous and its length L will generally not
equal the output sequence length Ls (L != Ls). Given an observed sequence zL1 =< z[1], z[2], ..., z[L] >, our
goal is to decide which model has most likely generated this observed sequence. The assumption of known input
statistics essentially reduces our problem to classification between two (or more) known hidden Markov models
(HMMs) based on the observation of an output sequence [16], [17]. What makes our task here more challenging
than traditional classification of Markov models are the sensor failures which may corrupt the true output sequence
yLs1 . Note that (although unlikely in a diagnosis scenario) the assumption that the distribution of the input fed to
each FSM is the same can be easily relaxed (we can have different inputs feeding to each FSM model as long as
their a priori probability distributions are known). Thus, the assumption that we are given deterministic FSMs with
known input statistics is not restrictive because, as an alternative, we can start with HMMs and perform model
classification based on our (possibly corrupted) observations.
If the observed sequence was not corrupted, i.e., if it was the same as the output sequence, the likelihood that the
observed sequence comes from a particular model can be calculated as the sum of the probabilities of all possible
1The input sequence could be white (i.e., each input could be generated with identical distribution at each time step) or it could be generated
based on some underlying Markov model. The latter case leads to a more complicated probabilistic description (HMM) but can be handled
essentially using the same techniques (applied, however, on the more complex HMM).
UncertaintySensor!inducedSystem under
Diagnosis
Fault!freeor Faulty?
Diagnoser
Input ObservedOutputSequence Sequence SequencexLs
1 yLs1 zL
1
Fig. 1. Problem formulation.
state sequences that are consistent with the observations. This can be done using a recursive algorithm similar
to the forward algorithm [16], which solves the evaluation problem in HMMs and is used frequently in speech
recognition applications (e.g., in [16]–[19]). Given a model and an observed sequence, the (standard) evaluation
problem consists of computing the probability that the observed sequence was produced by the model. When
there are multiple competing models, these probabilities can be used to choose the model which best matches the
observations. The forward algorithm is also used in pattern recognition applications [20], [21] to solve the syntax
analysis or parsing problem, (i.e., to recognize a pattern by classifying it to the appropriate generating grammar),
and in bioinformatics [22], [23] to evaluate whether a DNA or protein sequence belongs to a particular family of
sequences.
When sensor failures are possible, several output sequences may correspond to a given observed sequence and
one would need to identify all possible state sequences and the probabilities with which they agree with both the
underlying FSM model and the observations. If the output sequence is corrupted by deletions, the standard forward
algorithm is insufficient because there are potentially infinite output sequences that agree with a given observed
sequence. To address this inability of the standard forward algorithm, we propose in this paper a recursive algorithm
that allows us to efficiently compute the probability that a given FSM model matches the observed sequence: every
time a new observation is made, the algorithm simply has to update the information it keeps track of and can output
on demand the probability that a given model has produced the sequence observed so far. The recursive algorithm
we develop relates to (and generalizes) recursive algorithms for the evaluation problem in HMMs [16], the parsing
problem in probabilistic automata (PA) [20], [21] and the trellis-based decoding of variable length codes (VLC)
[24], [25]; all of these existing techniques can be modified to deal with some types of sensor failures but, unlike our
algorithm, cannot handle deletions or other combinations of sensor failures that lead to loops of silent transitions
in the associated trellis diagram. We elaborate on the relationship of our approach to these earlier approaches in a
discussion in Section VI.C, after we describe our recursive algorithm in detail.
The contribution of this paper is three-fold: (i) We formulate a maximum likelihood diagnosis problem where
systems are modeled by FSMs and the diagnosis decision is based on an observed, possibly corrupted sequence.
(ii) We construct an observation FSM to capture all possible output sequences that are associated with the observed
sequence. (iii) We propose a recursive algorithm that can be used online to compute the probability that a given
FSM model matches the observed sequence. The recursion is in terms of the number of observations in the sequence
and our algorithm extends existing likelihood evaluation algorithms because it can handle loops of silent transitions
that appear in the trellis diagram due to output deletions.
The remainder of the paper is organized as follows. In Section II we introduce the necessary notation for our
development. In Section III we formulate the diagnosis problem and present the sensor failure model that includes
insertions and deletions. In Section IV we propose a way to construct an observation FSM that captures the
observed sequence along with the pairs of compatible output sequence and error patterns (and their corresponding
probabilities). In Section V we propose an algorithm to compute the likelihood of the observations given the model
and in Section VI we develop an efficient recursive version of this algorithm. In Section VII we present an example
of failure diagnosis in a communication protocol. Conclusions and future work are discussed in Section VIII.
II. PRELIMINARIES
A deterministic finite state machine (FSM) model S can be described by a six-tuple S = (Q, X, Y, !, ", q0),
where Q is the finite set of states (without loss of generality we will also denote each state by its index, i.e.,
Q = {0, 1, ..., |Q| " 1}, where |Q| denotes the size of Q); X = {x1, x2, ..., x|X|} is the finite set of inputs;
Y = {y1, y2, ..., y|Y |} is the finite set of outputs; ! is the state transition function; " is the output function; and q0
is the initial state. The state q[n + 1] at time epoch n + 1 of the FSM is specified by its state q[n] at time epoch n
and its input x[n + 1] via the state transition function ! as q[n + 1] = !(q[n], x[n + 1]). The output of the FSM is
associated with the FSM transition and is specified via the output function " as y[n + 1] = "(q[n], x[n + 1]). Note
that the FSMs we consider here are event-driven and we use n to denote the time epoch between the occurrence
of the nth and (n + 1)st input.
We assume for simplicity2 that the inputs applied to the given FSM are chosen according to a probability
distribution determined by the current FSM state. Thus, the FSM behaves as a homogeneous Markov chain, i.e.,
a Markov chain in which the transition probabilities are not a function of time [35]. This Markov chain can be
obtained from the given FSM by assigning to each transition a probability that depends on the probabilities of the
inputs that cause it. If we denote the state transition probabilities by ajk = P{(q[n + 1] = j) | (q[n] = k)}, the
state transition matrix of the Markov chain associated with the given system is captured by A = (ajk)0!j,k!|Q|"1.
Note that, in order to keep subsequent notation clean, the rows and columns of all matrices and vectors are indexed
starting from 0 (not 1). The state transition matrix A captures how state probabilities evolve in time via the evolution
equation
![n + 1] = A![n], (1)
where ![n] is a |Q|-dimensional vector whose jth entry denotes the probability that the Markov chain is in state j
at time step n. Note that the columns of the |Q|# |Q| nonnegative matrix A sum to 1; similarly, the |Q|-dimensional
2Although not discussed in this paper, more complicated input statistics can also be handled perhaps by enlarging the state space of the
resulting HMMs. Alternatively, as we discuss in the next section, one could start with a probabilistic description of HMMs as opposed to FSMs
driven by inputs with known statistics.
probability vector ![n] has elements that are nonnegative and sum to 1.
To make the connection with HMMs more transparent, we will denote the FSM state at time step n by a |Q|-
dimensional binary indicator vector q[n] which has exactly one nonzero entry with value equal to “1.” This single
nonzero entry denotes the state of the system (i.e., if the jth entry of q[n] equals “1,” then the FSM is in state j
at time step n). If input xi is applied at time step n + 1 (i.e., x[n + 1] = xi), then the state evolution of the system
can be captured by an equation of the form
q[n + 1] = Aiq[n], (2)
where Ai is the |Q|# |Q| state transition matrix associated with input xi. Specifically, Ai is such that each of its
columns has at most one nonzero entry with value “1” (i.e., matrix Ai has a total of at most |Q| nonzero entries,
all with value “1”). A nonzero entry at the (j, k)th position of Ai denotes a transition from state k to state j
under input xi. (Clearly, the constraint that each column of Ai has at most one nonzero entry simply reflects the
requirement that there can only be at most one transition out of a particular state under a particular input.) Note
that if the inputs applied to a given FSM are statistically independent from one time step to another and that their
probability distribution is fixed so that, at any given time step n, from a given state q[n], input x[n + 1] = xi takes
place with probability pi where!|X|
i=1 pi = 1, the probability pi of input xi does not depend on the current state
of the FSM and the corresponding matrix A can be written as
A =|X|"
i=1
piAi, (3)
where Ai is the state transition matrix3 corresponding to input xi.
Since the diagnoser has no access to the inputs that drive the FSM but only observes the (possibly corrupted)
outputs it produces, the FSM state sequence is not completely known and the resulting probabilistic system is an
HMM [16]. An HMM is described by a five-tuple (Q, Y, !, ", "[0]), where Q is the set of states, Y is the set of
outputs, ! captures the state transition probabilities, " captures the output probabilities associated with transitions,
and "[0] is the initial state probability distribution vector. We define the |Q|# |Q| matrix A! , associated with output
# $ Y of the FSM, as follows: an entry at the (j, k)th position of A! captures the probability of a transition from
state k to state j that produces output #. Note that"
!#Y
A! = A. (4)
The joint probability of the state at time step n and the observations y[1], . . . , y[n] is captured by the vector "[n]
where the entry "[n](j) denotes the probability that the HMM is in state j at step n and y[1], . . . , y[n] have been
observed. More formally, the vector "[n] is defined as "[n](j) = P (Q[n] = j, Y n1 = yn
1 ), where the capital letters
denote random variables and the small letters denote values of these variables. When y[n + 1] becomes available
3The assumption that the input distribution is independent of the system state implies that each Ai is a state transition matrix that has exactly
one nonzero (“1”) entry at each of its columns.
at time epoch n + 1, we can update the joint probability of the state of the HMM and y[1], . . . , y[n + 1] as
"[n + 1] = Ay[n+1]"[n]. (5)
Note that "[n + 1] is not necessarily a probability vector; its jth entry denotes the joint probability of observing
y[1], . . . , y[n], y[n + 1] and being in state j at time step n + 1. If we normalize "[n + 1] to "![n + 1] so that its
entries sum to one, then "![n + 1](j) is the conditional probability that the HMM is in state j at time step n + 1
given the observation of y[1], y[2], . . . , y[n + 1], i.e., "![n + 1](j) = P (Q[n + 1] = j | Y n+11 = yn+1
1 ).
The above discussion described how starting from a given deterministic FSM with known input statistics, one
can develop a corresponding HMM and use it to perform failure diagnosis. Although in this work we assume we
are given deterministic FSMs with known input statistics, our development can just as easily be applied to the case
where we start with HMMs (possibly with different number of states) and our goal is to perform model classification
based on our observations.
III. PROBLEM FORMULATION
In this section, we describe in detail the problem we consider in this paper, including the system under diagnosis,
the sensor failure model, and the maximum likelihood formulation.
A. System Under Diagnosis
Using the notation introduced in the previous section, the fault-free and faulty FSM models that we are interested
in are given by S1 = (Q, X, Y, !1, "1, q10) and S2 = (Q, X, Y, !2, "2, q20) respectively. The set of states, the set of
inputs, and the set of outputs are taken to be identical for both FSMs and our focus is on detecting changes in the
state transition functionality of the FSM (this would be the case, for instance, if one aims to detect an incorrectly
implemented version of a given FSM). Note that these assumptions are not essential and can be relaxed in a
straightforward manner (in fact, as mentioned in the introduction, one could even use the techniques we develop
to classify different HMMs, e.g., different FSM models driven by different input distributions). Apart from the
uncertainty in the observed sequence due to sensor failures (which will be discussed next), there is uncertainty due
to the fact that the input sequence is not exactly known.
Example 1: We are given two competing FSM models, S1 and S2 (corresponding to the fault-free and the
faulty model respectively); these models will be used as a running example throughout the paper. The left parts
of Figures 2 and 3 show the state transition diagrams of FSMs S1 and S2 with four states Q = {0, 1, 2, 3}, three
inputs X = {x1, x2, x3}, and three outputs Y = {a, b, c}. Each transition is labeled as xi | #, where xi $ X
denotes the input that drives the FSM and # $ Y denotes the output produced by the FSM. For simplicity, we
assign equal a priori probabilities to the inputs, i.e., each input has probability 1/3 of occurring. The right parts of
Figures 2 and 3 show the HMMs that correspond to FSMs S1 and S2 under this input distribution. Each transition
in the HMM is labeled as p | #, where p denotes the probability of the transition and # $ Y denotes the output
2
3
1
2 3
2
3
1
3
1
2
1
x |c
x |c
x |c
x |c
x |c
x |a
x |a
x |a
x |b
x |b
x |bx |b
1/3 |c
1/3 |c
1/3 |c
1/3 |c
1/3 |a
1/3 |b
1/3 |b
1/3 |a
1/3 |c 2/3 |b1/3 |a0 1
23 3 2
10
Fig. 2. State transition diagram of fault-free FSM S1 (left) and HMM corresponding to S1 (right).
2
x |a3
x |a1
x |a3
x |ax |b2
1
x |a1
x |b
x |b2
2
x |cx |c3
1
1/3 |b
1/3 |a
2/3 |a
x |c3 1/3 |c
1/3 |a1/3 |b
1/3 |b
2/3 |c
1/3 |a
1/3 |b
x |b0 1 1
2 23 3
0
Fig. 3. State transition diagram of faulty FSM S2 (left) and HMM corresponding to S2 (right).
produced. Note that according to the notation of the previous section the matrices A1,! , # $ Y , for the HMM that
corresponds to S1 are the following:
A1,a =
#
$$$$$$%
0 0 0 0
1/3 1/3 0 0
0 0 0 0
1/3 0 0 0
&
''''''(, A1,b =
#
$$$$$$%
0 0 0 0
0 0 0 2/3
0 1/3 0 0
0 0 1/3 0
&
''''''(, A1,c =
#
$$$$$$%
0 1/3 1/3 0
0 0 1/3 0
0 0 0 0
1/3 0 0 1/3
&
''''''(. !
B. Sensor Failure Model
The output sequence yLs1 =< y[1], y[2], ..., y[Ls] > produced by the system under diagnosis may become
corrupted due to sensor failures; hence, the observed sequence zL1 =< z[1], z[2], ..., z[L] > may contain erroneous
information and, in general, its length will not be equal to the output sequence length (i.e., L != Ls). We consider
errors due to transient conditions that occur independently at each observation step with certain (known) probabilities
that could depend on the observation step, i.e., the probability of a particular transient error could vary as a function
of the observation step. (Note that the case of permanent errors is a special case of what we consider here, because
in that case the probability of sensor failures would be equal to one at all observation steps.) We also make the
reasonable assumption that given the output sequence, sensor failures at a particular observation step are statistically
independent of sensor failures at other observation steps and of the system model (in particular, sensor failures are
independent of the inputs that drive the system).
We start by focusing on deletions and insertions; later on, we introduce another type of error, namely transpo-
sitions. If an output # $ Y is deleted by the sensors, then we do not observe anything, i.e., the deletion causes
# % $, where $ denotes the empty label. Similarly, if an output # $ Y is inserted by the sensors, then we observe
#, i.e., the insertion causes4 $ % #.
For notational purposes, the sensor failures are captured by the set of failures F = D & I , where D =
{d!1 , d!2 , ..., d!|D| | #1, #2, ...,#|D| $ Y } is the set of |D| possible deletions and I = {i!1 , i!2 , ..., i!|I| |
#1, #2, ...,#|I| $ Y } is the set of |I| possible insertions. We also define a function out which allows us to recover
the corresponding output in the set Y given a deletion or an insertion, i.e., out(d!) = # and out(i!) = # for
d! $ D and i! $ I . The following example is provided to clarify our notation. We use E to denote the error
pattern, i.e., the sequence of non-erroneous or erroneous events. The following example is provided to clarify our
notation.
Example 1 (continued): For our example, we assume that the set of outputs is given by Y = {a, b, c} and that
the set of failures is F = {db, ia}, i.e., sensors may delete b from the output sequence or they may insert a to the
output sequence. Suppose that the observed sequence is zL1 =< a b c b c a c >. Some possible output sequences
yLs1 are the following: < $ b c b c a c >'< b c b c a c >, < $ b c b b b c a c >'< b c b b b c a c >, and
< b a b b c b b c b a b c b >. For example, the second possible output sequence corresponds to the following error
pattern: E = {insertion of a, no error, no error, no error, deletion of b, deletion of b, no error, no error, no error}.
Note that the set of possible output sequences corresponding to the observed sequence has infinite cardinality. !
Insertions and deletions occur with probabilities that vary as a function of the observation step. We assume that
deletion d! occurs with known probability pd! [m] at observation step m; similarly, an insertion of # (i! $ I) occurs
with known probability pi! [m] at step m when # is observed. The probability for the absence of sensor failures at
a particular observation step depends on the observed output at that particular step and is complement to the total
probability of sensor failures (so that the sum of the probabilities of sensor failures and the absence of errors is
one). In Section IV, we elaborate on the probability model we use for sensor failures, and describe explicitly how
to capture concisely the set of all possible output sequences that correspond to the observed sequence under the
sensor failure model, as well as how to calculate the probability of sensor failures that corrupt each possible output
sequence to the observed sequence.
4A related model for the case of unreliable observations due to transmission errors is presented in [36] by introducing an unreliable mask
function. The work in [37] considers communication in channels with insertions and a bounded number of deletions. Our failure model here
captures those cases and also allows for an infinite number of deletions. Note, however, that our primary motivation here is to handle errors
caused by sensor failures.
C. Likelihood Calculation
Given the observed sequence zL1 =< z[1], z[2], ..., z[L] > and assuming known probability distributions for the
input and initial state of the system under diagnosis, and for the sensor failures, our objective is to compare the
probability that the system under diagnosis is fault-free against the probability that it is faulty. More specifically,
to minimize the probability of incorrect diagnosis,5 we need to use the maximum a posteriori probability (MAP)
rule, i.e., we need to compare
P (S1 | zL1 ) >
< P (S2 | zL1 ). (6)
Clearly, if the probability of S1 given the observed sequence is larger than the probability of S2 given the observed
sequence, we should declare that the system under diagnosis is fault-free, whereas if it is smaller we should declare
that the machine is faulty (see [38] for more details). Our formulation is essentially a model selection task, where
the candidate models correspond to the fault-free and the faulty operation of the system under diagnosis.
Assuming known priors for FSMs S1 and S2 given by P1 and P2 respectively (with P1 + P2 = 1), the above
comparison can be reduced to
P (zL1 | S1) · P1
>< P (zL
1 | S2) · P2. (7)
Therefore, our task reduces to calculating the probability of the observed sequence given that the system under
diagnosis is S, i.e., the likelihood P (zL1 | S) of the observations given S, where S is either S1 or S2. If sensor
failures were not present, the observed sequence would be the same as the output sequence and the likelihood of
the observed sequence given S could be calculated as the sum of the probabilities of all possible state sequences
that are consistent with the observations. To ensure this consistency, we would need to identify the possible state
sequences that agree with both the observations and S. With sensor failures, however, several output sequences
correspond to the observed sequence and, for each one, we would have to identify all consistent state sequences and
their associated probabilities. Note that if we use E to denote a sensor failure pattern, then we can write P (zL1 | S)
as follows
P (zL1 | S) =
"
E,yLs1
P (yLs1 | S) · P (zL
1 , E | yLs1 , S) (8)
="
E,yLs1
P (yLs1 | S) · P (zL
1 , E | yLs1 ) (9)
Notice that P (zL1 , E | yLs
1 , S) = P (zL1 , E | yLs
1 ) because, given the output sequence, the error pattern and the
observed sequence are jointly independent of the model S. This follows from our reasonable assumption that sensor
failures are independent of the underlying system.
In the next section we develop a concise representation of all possible output sequences that may produce the
observed sequence along with the probabilities P (zL1 , E | yLs
1 ) for a given zL1 .
5Other criteria (e.g. Neyman-Pearson) can also be used but are not discussed here due to space limitations.
IV. OBSERVATION FSM
Due to sensor failures, the set of possible output sequences that match a given observed sequence may have
infinite cardinality (as shown in our example at the end of Section III.B). In this section we present a compact
way to represent the infinite set of possible output sequences, along with the probability of each possible sequence
leading to the observed sequence. In the next section, we explain how to use this representation to efficiently check
which of the possible output sequences are consistent with the underlying FSM model (S1 or S2).
We will represent the set of all possible output sequences yLs1 that correspond to the observed sequence zL
1
by the allowable behavior of an FSM that we call the observation FSM. More specifically, the observation FSM,
denoted by So = (Qo, Xo, Yo, !o, "o, qo0), has L + 2 states, starts from initial state qo0, and transitions to a new
state every time an observation is made. Notice that, by construction, the observation machine produces no outputs
(Yo = () and "o is a mapping that maps to the empty output. The set Qo = {qo0, qo1, ..., qoL+1} ' {0, 1, ..., L+1}
represents the set of states with qo0 ' 0 being the initial state. The set of inputs Xo is the union of the set of
outputs Y of the system under diagnosis, the empty label $, and the failures F , i.e., Xo = Y & {$}& F . The state
transition function !o is defined by the following three steps:
(i) Starting from state 0, So transitions to a new state every time an observation occurs, i.e.,
!o(m, z[m + 1]) = m + 1, m = 0, 1, ..., L" 1. (10)
(ii) From state L, So transitions to state L + 1 under input $, i.e., !o(L, $) = L + 1 and there is a self-loop under
input $ at the last state, i.e., !o(L + 1, $) = L + 1. (This (L + 1)st state of the observation machine can be thought
of as its only accepting state.)
(iii) We account for sensor failures by adding transitions that correspond to errors as follows:
• For each state m $ Qo\L + 1, add a self-transition under input d! for all outputs of S that are allowed to
be deleted (i.e., for all d! $ D). In other words, for each state m $ Qo (except L + 1), let !o(m, d!) = m,
)d! $ D.
• For each state m $ Qo\L + 1 with a valid input z[m + 1] = # $ Y (so that !o(m, z[m + 1]) = m + 1), if #
is allowed to be inserted (i.e., if i! $ I with out(i!) = # = z[m + 1]), add a transition under insertion i! so
that !o(m, i!) = m + 1.
If there were no sensor failures then FSM So would be constructed following only the first two steps of the
previous procedure. In order to account for sensor failures, we introduce the third step and, as a result, the observation
FSM So may have additional one-step transitions (due to insertions) and self-loops (due to deletions). From the
structure of the observation FSM So we can construct the (deterministic) state transition matrix that corresponds
to each input of So, i.e., Ao,! for all # $ Y , Ao," for the empty label, Ao,d! for all d! $ D, and Ao,i! for all
i! $ I . (The observation machine for the observed sequence in Example 1 can be seen in Figure 4; this figure is
discussed in detail in the example that follows.) Equivalently, the transition function !o is defined for each input
#$ $ Xo as follows
For m = 0, 1, ..., L " 1, !o(m, #$) =
)******+
******,
m + 1, if #$ = z[m + 1],
m, if #$ $ D,
m + 1, if #$ $ I and out(#$) = z[m + 1],
undefined, otherwise.
!o(L, #$) = L, if #$ $ D,
!o(L, $) = L + 1,
!o(L + 1, $) = L + 1.
(11)
As mentioned earlier, the observation FSM So captures all possible output sequences that may get corrupted to
the observed sequence, as well as the probability with which a possible output sequence results to the observed
sequence. We assume that sensor failures at a particular observation step occur independently from other observation
steps. This implies that the probability of sensor failures depends on the particular state of So (which is equivalent
to an observation step).
Next, we explain how we assign probabilities to the transitions of the observation FSM So, which, in effect,
results in a Markov chain that captures the probability P (zL1 , E | yLs
1 ) where E denotes a given error pattern and
yLs1 denotes the matching output sequence. Since deletions can occur at any step, state m of So (at observation
step m) has a self-loop with probability of occurrence pd! [m] for each # such that d! $ D. An insertion of # may
only occur at observation steps where # is actually observed, i.e., when a forward transition of So has input #.
In such case, i! $ I is assigned probability pi! [m], where m corresponds to the observation step at which # may
have been deleted (this implies that z[m+1] = #). Fault-free (normal) transitions are assigned probabilities so that
from each state of So the sum of the probabilities of transitions leaving that state is equal to one.
Example 1 (continued): In this example, we assume that the observed sequence is z71 =< a b c b c a c > and
we construct the observation FSM So shown in Figure 4. In fact, we can also use a regular expression [1] to capture
the set of all possible output sequences which may have resulted to z71 as < b%(a & $) b% b b% c b% b b% c b% (a &
$) b% c b% >, where b% denotes the Kleene closure of b. For the purposes of this example, we assume that after we
have observed z51 =< a b c b c > the sensors become more susceptible to noise; thus, we assign probability pdb [m]
for deletion db and probability pia [m] for insertion ia as
pdb [m] =
)+
,pdb , for m = 0, 1, 2, 3, 4
p$db, for m = 5, 6, 7
pia [m] =
)+
,pia , for m = 1, 2, 3, 4
p$ia, for m = 5, 6, 7
where p$db> pdb and p$ia
> pia . The resulting Markov chain corresponding to FSM So is shown in Figure 5 and
2 3 4 5 6 7 810
db db db db db db db db
a, ia b c b c a, ia c !
!
Fig. 4. State transition diagram of the observation FSM So in Example 1.
2 3 4 5 6 7 810
1pdb pdb pdb pdb pdb p!dbp!db
p!db
1 ! pdb 1 ! pdb 1 ! pdb 1 ! pdb 1 ! pdb 1 ! p!db1 ! p!db
1 ! p!db
Fig. 5. Markov chain corresponding to FSM So in Example 1.
its state transition matrix is given by
Ao =
#
$$$$$$$$$$$$$$$$$$$$$%
pdb 0 0 0 0 0 0 0 0
1 " pdb pdb 0 0 0 0 0 0 0
0 1 " pdb pdb 0 0 0 0 0 0
0 0 1 " pdb pdb 0 0 0 0 0
0 0 0 1 " pdb pdb 0 0 0 0
0 0 0 0 1 " pdb p$db0 0 0
0 0 0 0 0 1 " p$dbp$db
0 0
0 0 0 0 0 0 1 " p$dbp$db
0
0 0 0 0 0 0 0 1 " p$db1
&
'''''''''''''''''''''(
.
!
In the previous example, if we follow the trajectory db db a db b c b c ia c db $ we discover a “matching”
output sequence yLs1 =< a b c b c c > that is associated with the error pattern E={deletion of b, deletion of b, no
error, deletion of b, no error, no error, no error, no error, insertion of a, no error, deletion of b}. Note that for this
particular E and yLs1 the probability P (z7
1 , E | yLs1 ) can be read by simply taking the product of the corresponding
probabilities in the trajectory db db a db b c b c ia c db $.
Notice that the matrix Ao of the example does not include the probability of insertion; for instance, Ao(1, 0) =
pia + (1 " pdb " pia) = 1 " pdb and Ao(6, 5) = p$ia+ (1 " p$db
" p$ia) = 1 " p$db
. In fact, matrix Ao does not
contain all the information that is necessary to assign probabilities P (z71 , E | yLs
1 ). To ensure that we have this
information, we next construct the (probabilistic) state transition matrices capturing the probabilities of transitions
corresponding to each sensor failure separately.
The (probabilistic) state transition matrix Ao of the observation FSM So can be written as
Ao ="
!!#Xo
Ao,!!
="
!#Y
Ao,! + Ao," +"
d!#D
Ao,d! +"
i!#I
Ao,i!
="
!#Y
Ao,! + Ao," + Ao,e,
(12)
where Ao,! denotes the (probabilistic) state transition matrix corresponding to input # $ Y (Y is the set of outputs
of the FSM under diagnosis), Ao," denotes the (probabilistic) state transition matrix corresponding to the empty
string $, Ao,d! for d! $ D denotes the (probabilistic) state transition matrix corresponding to deletion d! $ D,
Ao,i! for i! $ I denotes the (probabilistic) state transition matrix corresponding to insertion i! $ I , and
Ao,e '"
d!#D
Ao,d! +"
i!#I
Ao,i! (13)
is the state transition matrix corresponding to sensor failures. Since sensor failures are assumed to be independent
between observation steps, we can compute the (probabilistic) state transition matrices corresponding to deletions
and insertions by setting
Ao,d!(m, m) = pd! [m] ·Ao,d!(m, m) = pd! [m], m = 0, 1, ..., L, d! $ D,
Ao,i! (m + 1, m) = pi! [m + 1] ·Ao,!(m + 1, m), m = 0, 1, ..., L" 1, i! $ I,(14)
and keeping all other entries of Ao,d! and Ao,i! zero. Note that transitions captured by Ao,d! correspond to
self-loops in the observation machine and transitions captured by Ao,i! correspond to one-step forward arcs. The
(probabilistic) state transition matrix corresponding to the empty label $ has only the following nonzero elements
Ao,"(L + 1, L) = 1 "Ao,e(L + 1, L) "Ao,e(L, L),
Ao,"(L + 1, L + 1) = 1,(15)
which ensures that the last state of So (state L+1) is an accepting state (an absorption state in the Markov chain).
To compute the (probabilistic) state transition matrix Ao,n corresponding to normal transitions, we make sure that
the only nonzero elements of Ao,n correspond to one-step forward arcs. Hence, Ao,n satisfies the following equation
Ao,n(m + 1, m) = 1 "|Q|"1"
j=0
Ao,e(j,m), m = 0, 1, ..., L " 1. (16)
Given the matrix Ao,n that represents the probabilities of normal transitions, we can use Ao,! (i.e., the deterministic
state transition matrix corresponding to input #) to derive the matrix that captures the probabilities of normal
transitions for each normal input # $ Y by performing element-wise multiplication as follows:
Ao,!(k, j) = Ao,n(k, j) ·Ao,!(k, j), )j, k = 0, 1, ..., L. (17)
The reasons for establishing the above notation for the transition matrices corresponding to each input of the
observation machine become more obvious with the construction of FSM H which is presented in the next section.
V. LIKELIHOOD CALCULATION UNDER SENSOR FAILURES
We ignore probabilities for now and focus on identifying all possible state sequences of S that can produce a
possible output sequence (that can correspond to the observed sequence, as captured by So). To do that, we use
FSMs S and So to construct a new FSM H defined as H = (QH , XH , !H , qH0), where QH contains subsets of
states in Qo # Q, XH = Xo (recall that Xo = Y & {$} & F ), and qH0 = (qo0, q0). FSM H has no outputs and it
is generally non-deterministic with !H defined for each input #$ $ XH as follows:
!H((m, j), #$) =
)*********+
*********,
-
&xi#X s.t. #(j,xi)=!!
(!o(m, #$), !(j, xi)), if #$ $ Y,
(!o(m, #$), j), if #$ $ I & {$},-
&xi#X s.t. #(j,xi)=out(!!)
(m, !(j, xi)), if #$ $ D,
undefined, otherwise.
(18)
Notice that FSM H , obtained in the above construction based on FSMs S and So, is neither the standard product
nor the parallel composition of S and So. The states of H can be described in terms of pairs of the form (m, j),
where m denotes the state of the observation FSM So and j denotes the state of S. The union of states is used
in Equation 16 because FSM H is a non-deterministic machine and its current state may consist of a set of
states. For example, at observation step m and state j of FSM S, if #$ $ Y and if both x1 and x2 satisfy the
constraint under the union (i.e., "(j, x1) = "(j, x2) = #$), then the set of possible states for FSM H is given by
{(!o(m, #$), !(j, x1)), (!o(m, #$), !(j, x2))}. The state transition diagram of FSM H has a special structure that
becomes more apparent if we draw it so that states of the form (m, 0), (m, 1), ..., (m, |Q|" 1), for m $ Qo, are in
a column and states of the form (0, j), (1, j), ..., (|Qo| " 1, j), for j $ Q, are in a row. From now on, since each
forward transition corresponds to a new observation, we will call each column of the transition diagram a stage
to reflect the notion of the observation step (see Figures 6 and 7 which are also discussed in more detail in an
example that follows).
After obtaining the set of sequences that are consistent with both So and S, the next step is to assign probabilities
to the transitions of H and hence construct a probabilistic FSM H with (probabilistic) state transition matrix AH .
The state transition matrix AH has dimension (L + 2)|Q| # (L + 2)|Q| and can be obtained as the sum AH ="
!!#XH
AH!! , where each matrix AH!! captures the probabilities of transitions associated with a particular input
#$ $ XH = Y &{$}&I&D. If we arrange the states of H as (0, 0), (0, 1), ..., (0, |Q|"1), (1, 0), (1, 1), ..., (1, |Q|"
1), . . . , (L + 1, 0), (L + 1, 1), ..., (L + 1, |Q| " 1), the probabilities of the transitions associated with # $ can be
obtained via the following state transition matrices:
AH,!! =
)***+
***,
Ao,!! *A!! if #$ $ Y
Ao,!! * I if #$ $ {$} & I
Ao,!! *Aout(!!) if #$ $ D,
(19)
where A! captures the probabilities of transitions in S that output #. In the above, A*B represents the Kronecker
0,0
0,1
0,2
0,3
1,0
1,1
1,2
1,3
2,0
2,1
2,2
2,3
3,0
3,1
3,2
3,3
4,0
4,1
4,2
4,3
5,0
5,1
5,2
5,3
6,0
6,1
6,2
6,3
7,0
7,1
7,2
7,3
8,0
8,1
8,2
8,3
db
a
ia
Fig. 6. State transition diagram of H1 in Example 1.
product6 of matrices A and B and its use is justified by our choice of ordering for states in H and by the fact that,
given the output sequence, the error pattern and the observed sequence are statistically independent of the inputs
of S. For instance, considering that deletions may occur only when the system under diagnosis is in a state which
can produce at least one of the outputs that can be deleted, we take the Kronecker product of Ao,d! and A! , which
results in a matrix that captures the probability of each transition in H that is associated with a d! and # (this
probability is obtained by multiplying the corresponding probability of a deletion in the observation machine So
and an input that outputs # in the underlying machine S). Note that since insertions may occur at any time, we
take the Kronecker product of Ao,i! with the identity matrix I. The overall probabilistic state transition matrix AH
is given by
AH ="
!#Y
Ao,! *A! + Ao," * I +"
d!#D
Ao,d! *A! +"
i!#I
Ao,i! * I. (20)
Note that for #$ = d! $ D, we sometimes write Ao,d! *A! = Ao,!! *Aout(!!). In particular, if we consider the
transition from state (i, j) to state (k, l), the resulting state corresponds to the (k|Q|+ l, i|Q|+ j)th entry in matrix
AH .
Example 1 (continued): FSMs H1 and H2 have 4 · 9 = 36 states each, which we name as pairs (m, j) for
m $ Qo and j $ Q. The structures of FSMs H1 and H2 are indicated in Figures 6 and 7 respectively (the dashed
6The Kronecker product [39] of an N1 ! M1 matrix A with an N2 ! M2 matrix B is denoted by A"B and is defined as the partitioned
matrix
A" B =
#
$$$$%
!00B !01B ... !0(M1!1)B
!10B !11B ... !1(M1!1)B
......
. . ....
!(N1!1)0B !(N1!1)1B ... !(N1!1)(M1!1)B
&
''''(,
where !jk is the entry at the jth row, kth column position of matrix A. Note that A " B is of dimension N1N2 ! M1M2.
0,0
0,1
0,2
0,3
1,0
1,1
1,2
1,3
2,0
2,1
2,2
2,3
3,0
3,1
3,2
3,3
4,0
4,1
4,2
4,3
5,0
5,1
5,2
5,
6,0
6,1
6,2
6,3
7,0
7,1
7,2
7,3
8,0
8,1
8,2
8,3
Fig. 7. State transition diagram of H2 in Example 1.
arcs represent transitions due to sensor failures — the majority of the inputs are not indicated in the figures for
clarity). Note that all transitions in FSMs H1 and H2 follow either a forward direction (that spans one column) or
a vertical direction. For example, FSM H1 (Figure 6) takes a transition from state (0,2) to state (1,0) under input
a (i.e., S1 moves from state 2 to state 0 and produces a which is observed). The transition from state (0,2) to
state (1,2) under input ia represents the insertion of a. In this case, the diagnoser observed a although FSM S1
did not move from state 2 (and it did not produce a). Finally, the transition from state (0,2) to (0,3) under input
db represents the deletion of b: FSM S1 took a transition from state 2 to state 3 and produced b, however, the
diagnoser did not observe b because it was deleted by the sensors.
The (probabilistic) state transition matrices for FSMs H1 and H2 are given by
AH1 = Ao,a *A1,a + Ao,b *A1,b + Ao,c *A1,c + Ao," * I + Ao,db *A1,b + Ao,ia * I,
AH2 = Ao,a *A2,a + Ao,b *A2,b + Ao,c *A2,c + Ao," * I + Ao,db *A2,b + Ao,ia * I.
For example, the transition from state (0,2) to state (0,3) under input db in FSM H1 (Figure 6) has probability of
occurring pdb · A1,b(3, 2), which is equal to the probability that FSM S1 took a transition from state 2 to state 3
producing b and then output b was deleted by the sensor (i.e., error db occurred). Note that this is exactly the entry
(3, 2) of Ao,db *A1,b. The transition from state (0,2) to state (1,2) under input ia in FSM H1 has probability of
occurring pia , which is equal to the probability that FSM S1 did not take any transition but a was inserted by the
sensor (i.e., error ia occurred). This is exactly the entry (6, 2) of Ao,ia * I. Finally, the normal transition under
input a from state (0,2) to state (1,0) is assigned probability (1 " pdb " pia) · A1,a(0, 2), which is equal to the
probability that S1 took a transition from state 2 to state 0 (producing output a) and no sensor failure occurred.
This corresponds to entry (4, 2) of Ao,a *A1,a. !
The behavior of FSM H captures behavior that is consistent with the system under diagnosis and is also a prefix
of any of the possible output sequences. Note that the accepting states of H are of the form (L + 1, j), j $ Q,
and capture the behavior that is consistent with both the system under diagnosis and any of the possible output
sequences. The (probabilistic) state transition matrix AH will most likely be such that the entries of each column
do not sum to one; however, we can easily build a proper Markov chain H $ by modifying H , so that the assigned
transition probabilities of each column sum to one. More specifically, we can append to H a new state qin which
represents the inconsistent state (i.e., if H is in state qin, then the observations are not consistent with the system
under diagnosis and the sensor failures that are allowed). To achieve this, we can add a transition from each state
of FSM H to the inconsistent state qin, with probability such that the sum of the transition probabilities leaving
that particular state is equal to one; we also add a self-loop at state qin with probability one.
The resulting Markov chain H $ has |QH! | = (L + 2) · |Q|+ 1 states. The only self-loops in H $ with probability
one are those in the consistent states (of the form (L + 1, j)) and in the inconsistent state (qin). In fact, due to the
particular structure of H $ (and given that there is a nonzero probability to leave the vertical loop at each stage), the
consistent and inconsistent states are the only absorbing states, while the rest of the states are transient. Therefore,
when the absorbing Markov chain H $ reaches its stationary distribution, these absorbing states are the only states
with nonzero probabilities (summing up to one). We are interested in the stationary distribution of H $ so that we
can account for output sequences yLs1 of any length that correspond to the observed sequence zL
1 . (Recall that
without sensor failures we have Ls = L.)
More formally, we arrange the states of H $ in the order (0, 0), (0, 1), ..., (0, |Q|" 1), (1, 0), (1, 1), ..., (1, |Q|"
1), ..., (L + 1, 0), (L + 1, 1), ..., (L + 1, |Q| " 1), qin. Let !H! [0] be a vector with |QH! | entries, each of which
represents the initial probability of each state of H $. We are interested in the stationary probability distribution of
H $ captured by
!H! = limn'(
!H! [n] = limn'(
AnH! · !H! [0], (21)
where the state transition matrix AH! of H $ is in its canonical form given by
AH! =
#
% T 0
R I
&
( . (22)
Recall that the state transition matrix AH (without state qin) has dimension (L + 2)|Q| # (L + 2)|Q|. Matrix T
consists of the first (L+1) · |Q| rows of AH and of the first (L+1) · |Q| columns of AH and captures the behavior
of the transient states of H $; the (|Q|+1)# (L+1) · |Q| matrix R captures the transitions from the transient states
to the absorbing states; 0 is a (L + 1) · |Q|# (|Q|+ 1) matrix with all zero entries; and I is the identity matrix of
dimension (|Q| + 1)# (|Q|+ 1). Note that, since H $ is an absorbing Markov chain, the limit limn'( AnH! exists
and it is given by
limn'(
AnH! =
#
% 0 0
(I " T )"1R I
&
( , (23)
where (I " T )"1 is called the fundamental matrix [15].
The only nonzero entries of !H! are those that correspond to the consistent states and the inconsistent state, i.e.,
the absorbing states. In fact, the probability that H $ ends up in a consistent state is equal to the complement of the
probability that H $ ends up in the inconsistent state which is also equal to the probability of the observed sequence
zL1 given the FSM model S, i.e.,
P (zL1 | S) =
|QH |"1"
j=L·|Q|
!H! (j) = 1 " !H!(|QH! |). (24)
Proposition 1: The likelihood of the observed sequence zL1 given model S in the presence of sensor failures (as
defined earlier) is given by
P (zL1 | S) = 1 " !H!(|QH! |), (25)
where !H! is the stationary distribution of the absorbing Markov chain H $ (which can be constructed from model
S and the observation machine So).
To gain some more intuition, let us consider the case of reliable sensors, where the arcs in H capture only
normal one-step forward transitions (except for the self-loops of the accepting states). Let !H! [0] be a vector with
|QH! | entries each of which represents the initial probability of each state of H $. Let !H! [L+1] represent the state
probabilities of H $ after L + 1 steps. Then,
!H! [L + 1] = AL+1H! · !H! [0], (26)
where AL+1H! denotes the matrix AH! raised to the (L+1)st power. The probability of the observed sequence given
FSM S would be given by the sum of the entries of the state probability vector at step L + 1 corresponding to the
accepting (consistent) states, i.e.,
P (zL1 | S) =
(L+2)·|Q|"
j=(L+1)·|Q|+1
!H! [L + 1](j)
=(L+2)·|Q|"
j=(L+1)·|Q|+1
(AL+1H! · !H! [0])(j).
(27)
Note that, in this case, AnH! ·!H! [0] = AL+1
H! ·!H! [0] for n + L+1 due to the transient nature of the first L stages
and the self-loops that occur with probability one in the consistent and inconsistent states.
VI. RECURSIVE LIKELIHOOD CALCULATION UNDER SENSOR FAILURES
In this section we exploit the structure of matrix AH! (which captures the transition probabilities of H $) to
perform the posterior probability calculations in an efficient manner. We first define the following submatrices
which will be used to express AH! .
• Matrices Bm,m+1, m = 0, 1, ..., L, capture the transitions from any state of H $ at stage m to any state of H $
at stage m + 1. They can be obtained from AH! as Bm,m+1(k, j) = AH!((m + 1) · |Q| + k, m · |Q| + j),
where k, j = 0, 1, ..., |Q|" 1.
• Matrices Bm, m = 0, 1, . . . , L, capture the vertical transitions (i.e., transitions from stage m to the same stage)
and account for deletion errors. They can be obtained from AH! as Bm(k, j) = AH! (m · |Q|+ k, m · |Q|+ j),
where k, j = 0, 1, ..., |Q|" 1. (Note that if deletions occur at each observation step with the same probability,
then Bm = B, m = 0, 1, ..., L, for some constant matrix B.)
• CT is a row vector with entries CT (j) = 1 "!|QH |"1
k=0 AH(k, j), for j = 0, 1, ..., |QH|" 1, i.e., CT ensures
that the sum of each column of AH! is equal to 1.
We should note here that an alternative way to compute the block matrices Bm,m+1 and Bm directly, without
the help of H $, is to use the following equations:
Bm(k, j) ="
"d!! #D s. t."(j,out(d!! ))=k
(pd!! [m] · A!! (k, j)),
Bm,m+1(k, j) = (1 ""
"d!! #D s. t.
"(j,out(d!! ))$=%
pd!! [m]) · Az[m+1](k, j),(28)
where k, j = 0, 1, ..., |Q|" 1 and pd!! [m], A! were defined earlier. Notice here that matrix Bm,m+1 captures the
forward transitions, from one stage to the next, which can be either due to insertions or due to normal transitions
(without sensor failures).
With the above notation at hand, we have the following block decomposition for matrix AH! :
AH! =
#
$$$$$$$$$$$$$$$$$$%
B0 0 0 ... 0 0 0 0
B0,1 B1 0 ... 0 0 0 0
0 B1,2 B2 ... 0 0 0 0...
...... ...
......
......
0 0 0 ... BL"1 0 0 0
0 0 0 ... BL"1,L BL 0 0
0 0 0 ... 0 I " BL I 0
CT 0 1
&
''''''''''''''''''(
. (29)
Notice that the matrix AH! is in its canonical form (see Equation 22) with submatrices T and R given by
T =
#
$$$$$$$$$$$$%
B0 0 0 ... 0 0
B0,1 B1 0 ... 0 0
0 B1,2 B2 ... 0 0...
...... ...
......
0 0 0 ... BL"1 0
0 0 0 ... BL"1,L BL
&
''''''''''''(
,
R =
#
% 0 ... 0 I " BL
CT
&
( .
(30)
The only nonzero entries in the initial probability distribution vector !H! [0] are its first |Q| entries, i.e., !H [0] =
("T [0] 0 ... 0)T , where "[0] denotes the initial probability distribution of S (i.e., it is a |Q|-dimensional vector, whose
jth entry denotes the probability that S is initially in state j). Recall that !H! denotes the stationary probability
distribution vector of H $ and has nonzero entries only in the absorbing states, i.e., its last |Q|+1 states. Hence, for
the observed sequence zL1 , we can express !H! as !H! = (0 ... 0 "T [L + 1] pin[L + 1])T , where vector "[L + 1]
captures the joint probabilities of the consistent states and the observed sequence zL1 , and scalar pin[L+1] denotes
the joint probability of the inconsistent state and the observed sequence zL1 . Notice that we get joint probabilities of
state occupancies and the observed sequence because FSM H was constructed for the particular observed sequence.
The following equations hold:
!H! = limn'(
AnH! · !H! [0]
(0 ... 0 "T [L + 1] pin[L + 1])T = limn'(
AnH! · ("T [0] 0 ... 0)T
"T [L + 1] = limn'(
AnH!(L + 1, 0) · "[0]
(31)
(here, we stretch notation a bit so that AH! (L + 1, 0) denotes the (L + 1, 0)th block of matrix AH! as opposed
to its (L + 1, 0)th entry). Therefore, in order to calculate the probability of the consistent states (jointly with the
observed sequence zL1 ), we only need the initial probability distribution of states of S and the (L + 1, 0)th block
of the matrix limn'( AnH! .
Next, we argue that by using induction on the power n, we can compute the limn'( AnH(L + 1, 0) with much
lower complexity than the standard computation in Equation 23. Equation 32 shows the state transition matrix AH
for the case when L = 2. We suppose that AkH is given by Equation 33 below where the indices j1, j2, j3, j4 in
the summations are nonnegative integers; we can prove that A(k+1)H satisfies Equation 34 below by performing the
multiplication A(k+1)H = AH Ak
H . For example, the calculation for the (1, 0)th block of A(k+1)H is performed in
Equation 35 below. Also note that when k = 1, the base case for AH is clearly satisfied.
AH =
#
$$$%
B0 0 0
B0,1 B1 0
0 B1,2 B2
&
'''((32)
AkH =
#
$$$$$$$%
Bk0 0 0
"
j1+j2=k&1
Bj21 B0,1B
j10 Bk
1 0
"
j1+j2+j3=k&2
Bj32 B1,2B
j21 B0,1B
j10
"
j1+j2=k&1
Bj22 B1,2B
j11 Bk
2
&
'''''''(
(33)
Ak+1H =
#
$$$$$$$%
Bk+10 0 0
"
j1+j2=k
Bj21 B0,1B
j10 Bk+1
1 0
"
j1+j2+j3=k&1
Bj32 B1,2B
j21 B0,1B
j10
"
j1+j2=k
Bj22 B1,2B
j11 Bk+1
2
&
'''''''(
(34)
A(k+1)H (1, 0) =
.
/"
j1+j2=k"1
Bj21 B0,1B
j10
0
1 B0 + Bk1B0,1
="
j1+j2=k"1
Bj21 B0,1B
j1+10 + Bk
1B0,1
="
j1+j2=k
Bj21 B0,1B
j10 .
(35)
Using this approach we can prove by induction that the expression for AnH is as in Equation 34.
As explained earlier, we are interested in the state probabilities of the consistent states (jointly with the observed
sequence zL1 ), which will be given by the bottom |Q| entries of the vector !H , i.e., the entries of the vector
"[3] = limn'(
AnH(3, 0) ·"[0] for the case when L = 2. If we manipulate further the (3, 0)th block of lim
m'(Am
H , we
have
limm'(
AmH(3, 0) = lim
m'(
"
j1+j2+j3+j4=m&3
Ij4 I Bj32 B1,2B
j21 B0,1B
j10
= (I + B2 + B22 + ...)B1,2(I + B1 + B2
1 + ...)B0,1(I + B0 + B20 + ...)
=
.
/("
j=0
Bj2
0
1B1,2
.
/("
j=0
Bj1
0
1B0,1
.
/("
j=0
Bj0
0
1
(36)
From the above equation, we get
"[3] = (I " B2)"1B1,2(I " B1)"1B0,1(I " B0)"1"[0]. (37)
Note that Equation 37 can also be obtained by directly inverting matrix (I " AH) in Equation 21. To simplify
notation let us define B$m,m+1 = (I" Bm+1)"1Bm,m+1. Hence, "[3] = B$
1,2B$0,1(I " B0)"1"[0].
Generalizing the above result for any number of observations L, the vector that describes the probabilities of the
consistent states (jointly with the observed sequence zL1 ) satisfies
"[L + 1] =
2L"13
i=0
B$m,m+1
4(I " B0)"1 "[0], (38)
where B$m,m+1 = (I " Bm+1)"1Bm,m+1, m = 0, 1, ..., L.
By inspection of Equation 38 we notice that the computation of "[L+1] can be performed recursively as follows:
"[1] = B$0,1(I " B0)"1 "[0],
"[m + 1] = B$m,m+1 "[m], m = 1, 2, ..., L,
(39)
where "[m + 1] represents the joint probability of consistent states and the observed sequence zm1 . The probability
that the observed sequence zL1 was produced by the particular FSM S is equal to the sum of the elements of the
state probability distribution vector "[L + 1], i.e., P (zL1 | S) =
!|Q|"1j=0 "[L + 1](j).
The probability that the observations were produced by the particular FSM S is equal to the sum of the elements
of the state probability distribution vector "[L + 1], i.e.,
P (zL1 | S) =
|Q|"1"
j=0
"[L + 1](j). (40)
Proposition 2: The likelihood of the observed sequence zL1 given model S and allowing for sensor failures (as
defined earlier) is given by
P (zL1 | S) =
|Q|"1"
j=0
"[L + 1](j), (41)
where "[m] is calculated recursively via the equations:
"[1] = B$0,1(I " B0)"1 "[0],
"[m + 1] = B$m,m+1 "[m], m = 1, 2, ..., L,
(42)
with "[0] representing the initial probability distribution vector of S and matrices B representing the blocks of
matrix AH as defined in Equation 28.
To gain intuition regarding the recursion, let us consider for now the case of reliable sensors. This case corresponds
to matrices Bm in AH! being equal to zero, which means that there are no vertical transitions (transitions within the
same stage) in the trellis diagram. The recursion (Equation 42) becomes "[m+1] = Bm,m+1· "[m], m = 0, 1, ..., L.
In fact, for the case of reliable sensors we can replace Bm,m+1 with Ay[m+1] and perform the following recursion
"[m + 1] = Bm,m+1 · "[m], m = 0, 1, ..., L. (43)
Intuitively, every time we get a new observation we update the current probability vector by multiplying it with
the state transition matrix of S that corresponds to the new observation. This corresponds to the standard version
of the forward algorithm.
With the above intuition at hand, we now return to the case of sensor failures. Here, we also need to take
into consideration the fact that any number of vertical transitions may occur. Therefore, every time we get a new
observation z[m+1], we multiply the current probability vector with the state transition matrix of S that corresponds
to the new observation (as before) and also with (I " Bm+1)"1 =!(
j=0 Bjm+1, thereby taking into account the
vertical transitions (possibly an infinite number of them) that can take place at stage m + 1.
The matrices Bm,m+1 have dimension |Q|# |Q|, while the matrix AH has dimension |QH |# |QH | with |QH | =
(L + 2) · |Q|. If we calculate !H without taking advantage of the structure of AH , the computational complexity
is proportional to O(((L + 2) · |Q|)3) = O(L3 · |Q|3). If we use the recursive approach instead, the computational
complexity reduces significantly to O((L+2)·(|Q|2+|Q|3)) = O(L·|Q|3) (each stage requires the inversion of a new
|Q|# |Q| matrix which has complexity O(|Q|3) and dominates the computational complexity associated with that
particular stage). In fact, if sensor failure probabilities remain invariant at each stage, then matrix Bm at stage m only
needs to be inverted once and the complexity of the recursive approach is O((L+2)·|Q|2+|Q|3) = O(L·|Q|2+|Q|3).
In addition to complexity gains, the recursive nature of the calculations allows us to monitor the system under
diagnosis online and calculate the probability of the observed sequence at each observation step by first updating
the state probabilities and then summing them up.
Example 1 (continued): In our example, if we assume that there are no sensor failures, the likelihoods for each
model are given by P (z71 | S1) = 5.7156 # 10"4 and P (z7
1 | S2) = 12.5743 # 10"4. For instance, if the priors
are P1 = P2 = 1/2, then we compute the posterior of each model given the observations as P (S1 | z71) = 0.3125
and P (S2 | z71) = 0.6875. Clearly, it is most likely that the machine that produced the observed sequence z7
1 =<
a b c b c a c > conforms to FSM model S2, i.e.,
P (S1 | z71) < P (S2 | z7
1).
If sensor failures occur with probabilities pia = 0.1, p$ia= 2 · pia , pdb = 0.05, and p$db
= 2 · pdb (as described in
this section), we can follow the procedure described earlier in this section to construct Bm, m = 0, 1, . . . , 7, and
Bm,m+1, m = 0, 1, . . . , 6, for each of the two machines. For example, the state transition matrices that capture the
vertical transitions at the first five stages under model S1 are given by
B0 = B1 = B2 = B3 = B4 =
#
$$$$$$%
0 0 0 0
0 0 0 pdb
0 pdb 0 0
0 0 pdb 0
&
''''''(.
The probability that the given observed sequence was produced by FSM S1 and the probability that it was
produced by FSM S2 are calculated to be
P (z71 =< a b c b c a c >| S1) = 5.0289# 10"4,
P (z71 =< a b c b c a c >| S2) = 12.7681# 10"4.
With priors P1 = P2 = 1/2, we compute the probabilities as P (S1 | z71) = 0.2826 and P (S2 | z7
1) = 0.7174.
Hence, it is still more probable that the machine that produced the observed sequence < a b c b c a c > conforms
to FSM model S2, i.e.,
P (S1 | z71) < P (S2 | z7
1);
we thus conclude that the underlying system is faulty. !
A. Numerical Stability
It is obvious from the recursive equation, that the entries of vector "[m] decrease with m. One way to avoid
numerical errors is to normalize the vector "[m] at every step so that its entries sum up to one. Since the likelihood
of the observations given a model depends on the sum of the entries of "[m] (before normalization), we need to
keep track of the normalization factor of each step. Moreover, numerical errors may arise because the likelihood
P (zL1 | S) for each model decreases as the number of observations grow. To fix that, we keep track of the negative
logarithm of the likelihood, i.e., the log likelihood which we denote by %% = " log P (zL1 | S). The following
algorithm introduces "̂[m] and shows how to compute the log likelihood of the observations given a model S.
Algorithm Input: Matrices Bm and Bm,m+1 for m = 0, 1, . . . , L (matrices correspond to the observed sequence
zL1 ) and initial probability distribution "[0].
1. Initialization. Let m = 0, z[m] = (,
compute B$0,1 = (I " B1)"1B0,1,
compute "̂[1] = B$0,1(I " B0)"1 "[0],
compute % =!|Q|"1
j=0 "̂[1](j),
compute "[1] = 1$ "̂[1],
compute %% = " log %.
2. For m = 1 : L, do
Consider the output z[m]
compute B$m,m+1 = (I " Bm+1)"1Bm,m+1,
compute "̂[m + 1] = B$m,m+1"[m],
compute l =!|Q|"1
j=0 "̂[m + 1](j),
compute "[m + 1] = 1$ "̂[m + 1],
compute %% = %% " log %.
end.
3. Set log P (zL1 | S) = %%. !
The overhead of the modifications introduced to avoid numerical errors is not significant. More specifically, at
each observation step we need to perform two additional operations as well as keep track of the log likelihood of
the observations so far given the model. Notice that the operation of inverting the matrices of the form (I"Bm) is
stable because such matrices are non-singular. Furthermore, these matrices have diagonal elements close (or equal)
to one and off-diagonal elements close (or equal) to zero. In fact, the smaller the probabilities of deletions are at
observation step m, the less likely it would be to run into numerical stability problems when inverting (I " Bm).
B. Transpositions
In addition to deletions and insertions, our approach can be modified to handle transpositions in a straightforward
manner. A transposition is denoted by t!j ,!k and represents the corruption of subsequence < #k #j > to < #j #k >.
We allow for errors to be overlapping, however, as we illustrate in Example 2 below, we require an output that is
involved in a transposition to not suffer simultaneously a deletion, an insertion or a different transposition.
Example 2: In this example we illustrate our assumption that there can be overlapping errors, however, each
output cannot be involved in two different transpositions. Suppose that the possible sensor failures are db, tab, tac and
we observe the sequence < a b c >. Then the set of possible output sequences is the following: < b% a b b% c b% >
, < b% b a b% c b% >. The observation FSM So for this example is shown in Figure 8. Notice that < b c a > is
not a possible output sequence because this would mean that two transpositions involving the same output a have
occurred, namely tab and tac. !
Due to space limitations, we do not explain in detail how to construct FSM H for the case of transpositions (the
entire scheme of construction of matrices H and H $ can be found in [40]). Since FSM H is constructed using FSMs
S and So (which may include transitions from state m to state m + 2), FSM H may have transitions that span two
0 1 2 3
tab
a b c
db db db db
Fig. 8. State transition diagram of So for observed sequence < a b c > in Example 2.
stages. Hence, the state transition matrix AH! includes submatrices of the form Bm,m+2, in addition to Bm,m+1
and Bm that were defined earlier. More specifically, matrices Bm,m+2, m = 0, 1, ..., L " 1, include transitions in
the state transition diagram that span two stages and account for transpositions. Using matrices B and row vector
CT , we can express AH! as
AH! =
#
$$$$$$$$$$$$$$$$$$$$$%
B0 0 0 0 ... 0 0 0 0 0 0
B0,1 B1 0 0 ... 0 0 0 0 0 0
B0,2 B1,2 B2 0 ... 0 0 0 0 0 0
0 B1,3 B2,3 B3 ... 0 0 0 0 0 0...
......
... ......
......
......
...
0 0 0 0 ... BL"3,L"1 BL"2,L"1 BL"1 0 0 0
0 0 0 0 ... 0 BL"2,L BL"1,L BL 0 0
0 0 0 0 ... 0 0 0 I " BL I 0
CT 0 1
&
'''''''''''''''''''''(
. (44)
Notice that matrix AH! remains lower triangular even with transpositions. As in the case of insertions and deletions,
we can find a closed form expression for AnH! due to the special structure of AH! . To simplify notation let us
define B$m"1,m+1 = (I"Bm+1)"1Bm"1,m+1. Then, we can follow a similar approach as before and compute the
probability distribution vector at time m + 1 recursively based on the vector at the previous two stages.
Proposition 3: The likelihood of the observed sequence zL1 given model S and allowing for sensor failures
(including transpositions) is given by
P (zL1 | S) =
|Q|"1"
j=0
"[L + 1](j), (45)
where "[0] is the initial probability distribution vector of S and "[m] is calculated recursively by the following
equations
"[1] = B$0,1(I " B0)"1 · "[0]
"[m + 1] = B$m,m+1 · "[m] + B$
m"1,m+1 · "[m " 1], m = 1, 2, ..., L(46)
with matrices B and B$ as defined earlier, i.e., B$m,m+1 = (I " Bm+1)"1Bm,m+1 and B$
m"1,m+1 = (I "
Bm+1)"1Bm"1,m+1.
Note that, if needed, we can apply the techniques in Section VI.A to avoid numerical errors in our computation.
C. Connections and Comparisons with Previous Work
Now that we have presented our recursive algorithm, we discuss how it differs from existing algorithms and how
it generalizes some of them. The techniques that we use relate to the evaluation problem in HMMs or the parsing
problem in probabilistic automata with vertical loops in the resulting trellis diagram. The forward algorithm is used
to evaluate the probability that a given set of observations is produced by a certain hidden Markov model (HMM).
To do that, the standard forward algorithm uses the HMM to build a trellis diagram based on the given sequence
of observations, and performs the likelihood calculation online. However, the standard forward algorithm cannot
handle the existence of vertical cycles in the trellis diagram. Ways around vertical cycles in the trellis diagram have
been suggested in speech recognition applications where HMMs are used to model speech patterns [16] – [19] and
may include null transitions (i.e., the HMM may move from the current state to the next state without producing
any output [17], [26]) as well as in the area of pattern recognition where one may have to deal null transitions
when solving the parsing problem for a given probabilistic finite state automaton [21].
While in most HMM formulations one deals with state observations, several authors have also studied the
evaluation problem in HMMs with transition observations, including null transitions (i.e., transitions with no outputs).
For instance, the authors of [17], [26], [27] develop HMM models that capture the generation of codewords in speech
recognition applications via observations that are associated with transitions rather than states. These HMMs also
include null transitions, i.e., transitions that change the state without producing outputs. The authors of [26] eliminate
loops in the resulting trellis diagram via an appropriate modification of the underlying HMM before constructing the
trellis diagram. In [21], an algorithm is presented to solve the parsing problem in pattern recognition applications
for the case where null transitions exist in a probabilistic finite-state automaton (PFSA) model (as pointed out in
[28], HMMs are equivalent to PFSAs with no final probabilities). The authors evaluate recursively the probability
that a sequence is produced by a "-PFSA (i.e., a PFSA that includes null transitions) and their approach can be
shown, after some manipulation, to be a special case of the algorithm we develop here. In particular, in contrast to
our algorithm, "-PFSA can not handle the case of time-varying sensor failures.
Also related to our likelihood computation algorithm is the well-known Viterbi algorithm [30], [31], which
solves the related problem of maximum-likelihood decoding of convolutional codes by choosing the most likely
state sequence based on a given sequence of observations. In fact, the Viterbi algorithm is a dynamic programming
algorithm which is amenable to online use and has found applications in various fields, e.g., in HMMs it is used
to find the most likely (hidden) state sequence corresponding to the observed output sequence [16]. Note that, in
contrast to the Viterbi algorithm, the maximum likelihood approach in this paper requires the total probability of
all paths (rather than the probability of the most likely path) which can be generated from the initial state(s) to the
final state(s). As a consequence of this requirement, the Viterbi algorithm or variations of it cannot obtain a solution
to the problem considered here. However, it is worth pointing out that the Viterbi algorithm has been frequently
suggested as a suboptimal alternative for likelihood evaluation in some applications [16]. Also note that a modified
Viterbi algorithm was proposed in [32] to identify the correct strings of data given an FSM representation of a
possibly erroneous output sequence; in [33] the same authors proposed a channel inversion algorithm for correcting
symbol sequences that have been corrupted by errors that can be described in terms of finite state automata (whose
transitions are weighted with costs representing the likelihood of different errors). The work in [34] proposes an
efficient implementation of the Viterbi algorithm to perform error-correcting parsing using an FSM and an error
model. The Viterbi algorithm can handle vertical cycles by unwrapping cycles so that each state on the cycle
is visited at most once (to avoid adding cost or decreasing the probability of the path — recall that the Viterbi
algorithm only searches for the most likely path).
Before closing this discussion, it is worth pointing out that the techniques used to solve our problem also relate
to maximum a posteriori (MAP) decoding of variable length codes (VLC). In MAP decoding of VLC, symbols that
are generated by a source may give rise to a different number of output bits and, given an observed bit sequence,
one has to recover the symbols that are transmitted according to the source codewords. The authors in [24], [25]
constructed a two-dimensional (symbol and bit) trellis diagram representation of the variable length coded data
and then applied the BCJR algorithm [29] to do either symbol or bit decoding. This setup resembles our setup
when only a finite number of sensor failures exist in the observed sequence (in such case, one can appropriately
enlarge the underlying model since, unlike our formulation, no vertical cycles can be present). More specifically,
if we assume a finite number of sensor failures, we could modify the models S1 and S2 to account for possible
sensor failures. However, since the probabilities of sensor failures can change with time (i.e., they can depend on
the observation step), the models for S1 and S2 would need to include these time variations. Even if we assume
that the probabilities of sensor failures for each observation step are known a priori (so that we are able to modify
a priori the models of S1 and S2 to account for sensor failures), the modified models will have an extended state
space. The next step would then involve the construction of the trellis diagram of these extended models and the
application of the standard forward algorithm. Our approach allows us to use a recursive algorithm and operate on
the original models S1 and S2, thereby dramatically reducing computational complexity and storage requirements.
To summarize, our approach is more general than the aforementioned approaches because it can handle different
kinds of loops at different stages of the trellis diagram (loops in our setup are not introduced by null transitions in
the underlying model but rather by errors in the observed sequence which can occur with time-varying probabilities).
Thus, the associated probabilities in the trellis diagram can be changing with time (which cannot be handled as
effectively using the techniques in [21] or in [26]). The problem is that the modification of the underlying model so
as to match the requirements of these earlier approaches results in a quite complex HMM (in which the evaluation
problem can still benefit from the techniques we propose here). Therefore, the attractive feature of the proposed
recursive algorithm for likelihood calculation is that it can handle time-varying and infinite number of sensor failures
(or, equivalently, vertical cycles in the trellis diagram) with reduced complexity.
D_CONNERRORRESET SETUP
NORMAL
ADM
Fig. 9. FSM model for part of the 802.2 protocol responsible for data link establishment, disconnection, and resetting states.
VII. A FAILURE DIAGNOSIS APPLICATION
As an example we consider the logical link control sublayer in the IEEE/Std 802.2 local area network protocol
[41], which is a peer protocol for use in a multi-station, multi-access environment. More specifically, we consider
the part of the protocol which is responsible for data link establishment, disconnection, and link resetting, and
which is modeled as the six-state FSM shown in Figure 9 with state transition functionality as defined in Table I.
The system to be diagnosed in our example is a system that supposedly complies to the 802.2 standard. The
model of the protocol and the model of a faulty (bogus) implementation of the protocol are known a priori. In order
to formulate our probabilistic framework we assume that the probability distribution of the inputs is known (e.g.,
these probabilities have been obtained from empirical measurements). In order to keep the example here simple
(and without any loss in generality), we associate three outputs to the inputs, i.e., the output set is Y = {a, b, c}
so that the resulting FSM denoted by Sff has six states and three outputs as shown in Figure 10. We assume that
input x1 appears with probability 13 and the remaining four inputs (namely x2, x3, x4, x5) appear with probability
16 each. Notice here that any input probability distribution could be used instead of the one we assume here.7
The particular fault we are interested in is a faulty transition from state “NORMAL” to state “ERROR” instead
of state “D CONN” under the transition with output a; this could be, for example, the result of a hardware fault or
a design error or a software bug. This, together with the nominal (fault-free) model description in Figure 9, fully
describe the faulty model denoted by Sf . Our goal is to determine whether the underlying FSM is executing Sff
or Sf when the observed sequence is z61 =< a b a b a c >. The additional challenge is that the output sequence
can be corrupted due to sensor failures. For this example, we consider that a can be deleted or inserted, b can be
inserted, and the sequence < c d > can be transposed, i.e, the failure set is defined as F = {da, ia, ib, tcd}. The
7We can always apply our algorithm to classify between two HMMs (as opposed to FSMs with i.i.d. inputs); the assumption on i.i.d. inputs
in this example is only made for convenience.
3
2
2
1
1
1
1
1
4
4
4
3
33
3
5
5
5
4
5
x |c
x |b
x |c
x |a
1x |a2x |a
x |dx |d
x |d4
x |d
x |a
x |ax |a
x |ax |a2x |a4
x |ax |a
x |cx |c
x |c2x |c
x |ex |cx |c
x |b
2
3x |bx |b
x |b
5
x |d5
0
1 2 3 4
5
Fig. 10. State transition diagram of FSM Sff .
probabilities of errors are assumed to be time-invariant in this example and are given by pda = 0.15, pia = 0.2,
pib = 0.2, and ptcd = 0.2.
We follow the recursive approach described in Section VI to compute the probability that the given output
sequence is produced by the fault-free FSM and the probability that it is produced by the faulty (bogus) FSM as
shown in the following tables.
m "T1 [m]
!6j=1 "1[m](j)
0 [1/6 1/6 1/6 1/6 1/6 1/6] 1
1 [0.0861 0.0278 0 0.0015 0.0086 0.0278] 0.1518
2 [0.0227 0 0 0.0108 0.0596 0 ] 0.0931
3 [0.0015 0 0 0.0004 0.0200 0.0076] 0.0295
4 [0.0001 0 0.0025 0.0001 0 0] 0.0027
m "T2 [m]
!6j=1 "2[m](j)
0 [1/6 1/6 1/6 1/6 1/6 1/6] 1
1 [0.2323 0 0 0.1502 0.1343 0] 0.5168
2 [0.0794 0 0 0.0812 0.1628 0] 0.3234
3 [0.0860 0 0 0.0439 0.0615 0] 0.1914
4 [0.0353 0 0 0.0237 0.0609 0] 0.1199
As long as Pff
Pf< 0.1199
0.0027 = 44.4074, the MAP rule dictates that, in order to minimize the probability of error,
we will decide that the underlying implementation of the protocol is a faulty one.
TABLE I
STATE TRANSITION FUNCTIONALITY OF FSM SHOWN IN FIGURE 9: EACH ENTRY SHOWS THE INPUTS THAT TAKE THE FSM FROM THE
CURRENT STATE (CORRESPONDING TO THE ENTRY’S ROW) TO THE NEXT STATE (CORRESPONDING TO THE ENTRY’S COLUMN).
Next State ADM RESET ERROR D CONN SETUP NORMAL
Current State
ADM Con Req R Sabm
RESET R Disc Cmd R Sabm R Ua Rsp
ERROR R Dm Rsp, TExp>N2 TExp, R Frmr R Sabm
R Disc Cmd R Ua Rsp
D CONN R Dm Rsp, R Disc Cmd
R Sabm,
R Ua Rsp,
TExp>N2
SETUP TExp>N2 R Sabm, R Ua Rsp
TExp
NORMAL R Ua Rsp, R Frmr Rsp, R Sabm
R Invi Cmd R Disc Req
VIII. CONCLUSIONS
In this paper we propose a probabilistic approach to the problem of failure diagnosis based on the observation of
a possibly corrupted output sequence. The a priori probability distribution of the input sequence of two given FSMs
is assumed known (equivalently, we are given two known hidden Markov models) and our goal is to determine
(e.g., with minimal probability of error) which of the two models has generated an observed sequence of outputs.
We assume that there are three types of errors (insertions, deletions, and transpositions) which can corrupt the
output sequence and that, given the observed sequence, the probabilities of these errors are independent from the
inputs. We construct an observation FSM that includes all possible output sequences that correspond to the given
observed sequence produced by the FSMs under diagnosis. Based on this observation machine, we develop a
recursive algorithm that can efficiently compute the total probability with which each FSM model, together with
a combination of sensor failures, can generate the observed sequence. Our algorithm is able to deal with cycles
which are present in the trellis diagram due to output deletions.
In this work, we considered the fault-free operation of the system and one mode of a fault-prone operation (given
information from unreliable sensors). Multiple fault-prone operation modes can be handled in a straightforward
manner by our proposed algorithm (by evaluating how well the observations match each possible model so that
we can eventually choose the best match). However, following our current approach, we would need to invoke the
algorithm as many times as the number of operation modes of the system. We plan to extend our current approach to
situations where we can take advantage of the system structure to evaluate the likelihood of multiple faulty models
more efficiently. One possible extension is to introduce more structured fault-free model as well as more structured
fault-prone models. For example, we could consider systems that consist of some independent or loosely coupled
components. Another direction would be to use factorial hidden Markov models by imposing constraints on the state
transition functionality of the system so that each state variable evolves independently of the remaining variables
and it is a priori decoupled from the others. Our results can be easily extended to classification of several hidden
Markov models with applications in various fields such as document or image classification, pattern recognition,
and bioinformatics. An interesting extension of this work would be to modify the algorithm to be able to diagnose
a system based on a partially observable output sequence. It would also be interesting to study the sensitivity of
this approach to the probabilities of the inputs and/or the sensor failures.
REFERENCES
[1] C. G. Cassandras and S. Lafortune, Introduction to Discrete-Event Systems, Kluwer, 1999.
[2] F. Lin, “Diagnosability of discrete event systems and its applications,” Discrete Event Dynamic Systems: Theory and Applications, vol. 4,
no. 2, pp. 197–212, May 1994.
[3] M. Sampath, R. Sengupta, S. Lafortune, K. Sinnamohideen, and D. Teneketzis, “Diagnosability of discrete-event systems,” IEEE Trans.
Automatic Control, vol. 40, no. 9, pp. 1555–1575, September 1995.
[4] M. Sampath, R. Sengupta, S. Lafortune, K. Sinnamohideen, and D. Teneketzis, “Failure diagnosis using discrete-event models,” IEEE Trans.
Control Systems Technology, vol. 4, no. 2, pp. 105–124, February 1996.
[5] A. Benveniste, E. Fabre, S. Haar, and C. Jard, “Diagnosis of asynchronous discrete-event systems: a net unfolding approach,” IEEE Trans.
Automatic Control, vol. 48, no. 5, pp. 714–727, May 2003.
[6] S. H. Zad, R. H. Kwong, and W. M. Wonham, “Fault diagnosis in discrete-event systems: framework and model reduction,” IEEE Trans.
Automatic Control, vol. 48, no. 7, pp. 1199–1212, July 2003.
[7] Y. Wu and C. N. Hadjicostis, “Algebraic approaches for fault identification in discrete-event systems,” IEEE Trans. Automatic Control, vol.
50, no. 12, pp. 2048–2055, December 2005.
[8] A. Benveniste, E. Fabre, and S. Haar, “Markov nets: probabilistic models for distributed and concurrent systems,” IEEE Trans. Automatic
Control, vol. 48, no. 11, pp. 1936–1950, November 2003.
[9] C. N. Hadjicostis, “Probabilistic detection of FSM single state-transition faults based on state occupancy measurements,” IEEE Trans.
Automatic Control, vol. 50, no. 12, pp. 2078–2083, December 2005.
[10] M. Blanke, M. Kinnaert, J. Lunze, M. Staroswiecki, Diagnosis and Fault-Tolerant Control. Springer-Verlag, 2003.
[11] D. Thorsley and D. Teneketzis, “Diagnosability of stochastic discrete-event systems,” IEEE Trans. Automatic Control, vol. 50, no. 4, pp.
476–492, April 2005.
[12] A. T. Bouloutas, G. W. Hart, and M. Schwartz, “Simple finite-state fault detectors for communication networks,” IEEE Trans.
Communications, vol. 40, no. 3, pp. 477–479, March 1992.
[13] A. T. Bouloutas, G. W. Hart, and M. Schwartz, “Fault identification using a finite state machine model with unreliable partially observed
data sequences,” IEEE Trans. Communications, vol. 41, no. 7, pp. 1074–1083, July 1993.
[14] S. H. Low, “Probabilistic conformance testing of protocols with unobservable transitions,” in Proc. IEEE Int. Conf. on Network protocols,
pp. 368–375, October 1993.
[15] J. G. Kemeny, J. L. Snell, and A. W. Knapp, Denumerable Markov Chains. 2nd ed., New York: Springer-Verlag, 1976.
[16] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. of the IEEE, vol. 77, no. 2,
pp. 257–286, February 1989.
[17] F. Jelinek, Statistical Methods for Speech Recognition, The MIT Press, 1997.
[18] A. M. Poritz, “Hidden Markov models: A guided tour,” Proc. 1988 IEEE Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 7–13,
April 1988.
[19] Y. Ephraim and N. Merhav, “Hidden Markov processes,” IEEE Trans. Information Theory, vol. 48, no. 6, pp. 1518–1569, June 2002.
[20] K. S. Fu, Syntactic Pattern Recognition and Applications. Prentice-Hall, 1982.
[21] E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta, and R. C. Carrasco, “Probabilistic finite-state machines–part I,” IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 27, no. 7, pp. 1013–1025, July 2005.
[22] R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological sequence analysis: Probablistic Models of Proteins and Nucleic Acids.
University Press, Cambridge, 1998.
[23] T. Koski, Hidden Markov Models of Bioinformatics. Kluwer Academic Publishers, 2001.
[24] R. Bauer and J. Hagenauer, “Symbol-by-symbol MAP decoding of variable length codes,” in Proc. 3rd ITG Conf. on Source and Channel
Coding, pp. 111–116, January 2000.
[25] A. Guyader, E. Fabre, C. Guillemot, and M. Robert, “Joint source-channel turbo decoding of entropy-coded sources,” IEEE Journal on
Sel. Areas in Comm., vol. 19, no. 9, pp. 1680–1696, September 2001.
[26] L. R. Bahl and F. Jelinek,“Decoding for channels with insertions, deletions and substitutions with applications to speech recognition,”
IEEE Trans. Information Theory, vol. IT-21, no. 4, pp. 404–411, July 1975.
[27] L. R. Bahl, F. Jelinek, and R. L. Mercer, “A maximum likelihood approach to continuous speech recognition,” IEEE Trans. Pattern Analysis
and Machine Intelligence, vol. PAMI-5, no. 2, pp. 179–190, March 1983.
[28] P. Dupont, F. Denis, and Y. Esposito,“Links between probabilistic automata and hidden Markov models: probability distributions, learning
models and induction algorithms,” Pattern Recognition, vol. 38, no. 9, pp. 1349–1371, September 2005.
[29] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing label error rate,” IEEE Trans. Information
Theory, vol. IT-20, no. 2, pp. 284–287, March 1974.
[30] A. D. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” IEEE Trans. Information Theory,
vol. IT-13, no. 2, pp. 260–269, April 1967.
[31] G. D. Forney, Jr., “The Viterbi algorithm,” Proc. of the IEEE, vol. 61, no. 3, pp. 268–278, March 1973.
[32] A. Bouloutas, G. W. Hart, and M. Schwartz, “Two extensions of the Viterbi algorithm,” IEEE Trans. Information Theory, vol. 37, no. 2,
pp. 430–436, March 1991.
[33] G. W. Hart and A. T. Bouloutas, “Correcting dependent errors in sequences generated by finite-state processes,” IEEE Trans. Information
Theory, vol. 39, no. 4, pp. 1249–1260, July 1993.
[34] J. C. Amengual and E. Vidal, “Efficient error-correcting Viterbi parsing,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20,
no. 10, pp. 1109–1116, October 1998.
[35] P. Bremaud, Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer-Verlag, 1999.
[36] T. Yoo and H. E. Garcia, “New results on discrete-event counting under reliable and unreliable observation information,” Proc. IEEE Conf.
on Networking, Sensing, and Control, pp. 688 - 693, March 2005.
[37] M. C. Davey and D. J. C. Mackay, “Reliable communication over channels with insertions, deletions, and substitutions,” IEEE Trans.
Inform. Theory, vol. 47, no. 2, pp. 687–698, February 2001.
[38] E. Athanasopoulou and C. N. Hadjicostis, “Maximum likelihood diagnosis in partially observable finite-state machines,” in Proc. IEEE
Intl. Symp. on Intelligent Control, pp. 896–901, 2005.
[39] A. Graham, Kronecker Products and Matrix Calculus with Applications. Mathematics and its Applications, Chichester, UK: Ellis Horwood
Ltd, 1981.
[40] E. Athanasopoulou, “Diagnosis of finite state models under partial or unreliable observations,” Ph.D. thesis, University of Illinois at
Urbana-Champaign, 2007.
[41] The Institute of Electrical and Electronics Engineers, “Logical link control,” American National Standards Institute, ANSI/IEEE Std.
802.2-1985.