on the adverse effect of increasing the number of binary symptons in medical diagnosis_springer...
TRANSCRIPT
![Page 1: On the Adverse Effect of Increasing the Number of Binary Symptons in Medical Diagnosis_Springer 1979_pp. 658-667](https://reader031.vdocuments.us/reader031/viewer/2022030316/577cce751a28ab9e788e127a/html5/thumbnails/1.jpg)
ON THE ADVERSE EFFECT OF INCREASING THE NUMBER OF BINARY SYMPTOMS IN MEDICAL
SUMMARY
DIAGNOSIS USING THE KERNEL METHOD
E. Girelli Bruni
Department of Statistics
University College London
London-England
The modern tendency in the diagnostic process is to use as many technologies as
possible to investigate the maximum number of biological functions. This tendency
finds its justification in the opinion by which a greater quantity of information
must also correspond with a greater comprehension and analysis of the state of
health of the patient (The Lancetl 1976) (LindleYl 1977l.
In this work we try to oppose such a relationship, showing that a greater request
for exams can correspond to a poorer statistical identification of the patient's
state of health. The statistical method we analysed for the diagnostic allocation
is the one suggested by Aitchison (1976).
P.S.
This research has been supported by the National Research Council of Italy by
contract no. 203.10.11.
B. Barber et al. (eds.), Medical Informatics Berlin 1979© Online Conferences Ltd., Uxbridge, England 1979
![Page 2: On the Adverse Effect of Increasing the Number of Binary Symptons in Medical Diagnosis_Springer 1979_pp. 658-667](https://reader031.vdocuments.us/reader031/viewer/2022030316/577cce751a28ab9e788e127a/html5/thumbnails/2.jpg)
659
INTRODUCTION
The development of the present paper finds its origins in the attempt to solve a
statistical problem which at the present moment has not been completely and
formally solved.
The problem in question is expressed as the belief that the larger the considered
number of variables, the greater the expected percentage of correct allocations
of the new evidence. This belief would lead us to suppose, for practical purposes
such as medical diagnosis, that we should support the use of more and more complex
multi-dimensional statistical methods, for example global analysis (Gremy, et all
1977). The above-mentioned belief finds support in a paper by Lindley (1977)
opposing the results obtained by Hughes (196B). These results were also discussed
by Ghandresekdran and others (1971) and also by Ghandresekdran and Jain (,975).
It is in studying such a problem that the author has considered the kernel method,
suggested by Aitchison (1976) and subsequently developed by Aitchison (1977). Even
though this paper does not provide a general answer for the above problem it reveals
an unsatisfactory feature of the kernel method when the number of variables is
increased.
In this paper we will refer to the problem of medical diagnosis with J diseases
such that j = 1.2, ••• ,J indicates all possible diseases. The symbol S is the total
amount of information provided by the data bank for the J diseases such that
S = (Sl,S2"",SJ) where Sj is the information in the data bank for disease j.
For each patient in the data bank we suppose we have observed I variables / symptoms
expressed by the index i = 1,2, ••• ,I and we also assume that each symptom has only
two possible facets, k. and k. (usually k. = 1 and k. 0) that are the presence 'Z- 'Z- 'Z- 'Z-
and the absence of symptom i, respectively.
Because the solution of the diagnostic problem in Bayesian terms is to define the
posterior probability
the posterior probability that a new patient - not included in the data bank - has
disease j, conditioned on the vector of facets of the observed symptoms
k = (k 1 ,k2 , ••• ,kI ) and on the information given by the data bank, where 1T(jIA) is
the prior probability for disease j and where A represents all relevant doctor's
knowledge including the information in S.
![Page 3: On the Adverse Effect of Increasing the Number of Binary Symptons in Medical Diagnosis_Springer 1979_pp. 658-667](https://reader031.vdocuments.us/reader031/viewer/2022030316/577cce751a28ab9e788e127a/html5/thumbnails/3.jpg)
660
It is important to note that the probability of k might differ if computed on S or
on Sj' Probability p(~\j.S) indicates that the relevant information for k when
disease j occurs is not only contained in Sj but also in other partitions on the
data bank referring to different diseases.
In the present paper. however. we assume that the vector k is differently and
exclusively determined by each disease such that the previous expression can be
rewritten
and we will only concentrate on the problem of defining the likelihoods p(k\j,Sj)'
THE KERNEL METHOD
Using the kernel method suggested by Aitchison (1967) and Aitchison (1977). the
probability p(k\j,Sj) is defined as follows
p(k\j,SJ .• M) =; L Mt(k\S .• A.) j t=1.F. J J
J
(1 )
where Fj is the number of patients with disease j. and where M in the left-hand
side of (1) is a remainder of the function Mt ( . ). this being the adopted kernel
model indexed for the tth patient - though having the same form for any disease -
and where Aj (j = 1 ••.•• J) are the parameters of the kernel. In particular. the
kernel model for binary data is
(2)
in which
(3)
is a measure of distance between the two multi-dimensional points k and
§jt = (8jt1·8jt2·····8jtI) where 8jti is the facet of symptom i of the patient t
with disease j in the data bank. so that ~jt is the facet vector of patient t with
disease j in the data bank.
In (2) Aj is the smoothing parameter between 0 and 1. For Aj = i a uniform
distribution is obtained whatever the data. and for A. = 1 the method estimates J
density simply by the corresponding relative frequencies. Since there is a problem
![Page 4: On the Adverse Effect of Increasing the Number of Binary Symptons in Medical Diagnosis_Springer 1979_pp. 658-667](https://reader031.vdocuments.us/reader031/viewer/2022030316/577cce751a28ab9e788e127a/html5/thumbnails/4.jpg)
661
of estimating the parameters Aj'S, the jack-knife likelihood method (leaving one
out) suggested by Habbema (1974) could provide the estimates ~j of Aj'S'
A CRITICAL EXAMPLE
The above kernel method gives good results if the permutations of the 0.1 facets of
the I symptoms are very similar among patients with the same disease, and they
differ significantly from patients with a different disease. In fact, if the data
bank is not particularly good in the above sense, i.e. not very homogeneous, the
kernel method will give poor results, as shown in the following example. At this
point it is also important to note that if the data bank is not very homogeneous in
terms of symptoms - this peculiarity being often considered by any mathematical
model attempting to handle real data - this is not necessarily a bad thing. In fact
it is quite common to have situations in which a non-homogeneous data bank for
various classes of disease could be a good data bank in the medical sense if used
directly by a doctor.
As an example, let us consider two disease classes, a data bank with three patients
in each class and 22 recorded binary symptoms. This situation can be realized as
shown in Table 1. In Table 1 we can see that disease 1 could be defined by a high
probability that the first three symptoms are present while the second three are
absent (the 0 and 1 standing for absence and presence of the symptom respectively),
where disease 2 is. instead. recognisable by the high probability of absence of the
first three symptoms and the presence of the second three. In both diseases the
facets of the symptoms from the seventh to the twentysecond were chosen at random
so that they would be meaningless in discriminating between the two diseases.
We are aware of the fact that the knowledge of the data bank structure is
important extra information which cannot be used by the kernel method. and it is
available to the reader only for the sake of critically-judging the kernel method.
![Page 5: On the Adverse Effect of Increasing the Number of Binary Symptons in Medical Diagnosis_Springer 1979_pp. 658-667](https://reader031.vdocuments.us/reader031/viewer/2022030316/577cce751a28ab9e788e127a/html5/thumbnails/5.jpg)
662
Facets of the Symptoms in the Data Bank
Symptoms Patients
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
1 1 1 1 a a a a 1 1 1 1 a a 1 a a a 1 a a 1 1
Dis. 2 1 1 1 a a a 1 1 1 a 1 a a a a 1 a a 1 a 1 1 1 3 1 1 a 1 a a 1 1 1 1 1 1 a a 1 a 1 1 1 a a 1
1 a a a 1 1 1 a 1 1 a a a a a 1 1 a 1 1 1 a a Dis. 2 a a a 1 1 1 a 1 1 1 1 1 a 1 a 1 1 a 1 a 1 1
2 3 a a 1 a 1 1 a a 1 a a a 1 1 a 1 a a a 1 1 a
Table 1
Let us now allocate a new patient to one of the two diseases, after we have observed
his vector of facets
k (1 a a a a a a a a a a a a 0)
In order to do so we have to compute (1) and, if we consider the estimates of
lamda to be ~1 = ~2 = ~ = 0.80, the ratio of expression (1) for the two diseases
is
(4) 0.00781
showing that the kernel method provides a higher probability for disease 2 rather
than for disease 1, contrary to expectation.
What has happened in the above example is the annulment of the relevant information
on the problem of class allocation in one of the two diseases by the presence of
the sixteen irrelevant pieces of information. The new patient - through the
analysis of his first six symptoms - should have been allocated to disease 1, but
because the remaining sixteen facets match those of the 3rd patient of the data
bank with disease 2, the likelihood has ~xpressed a higher'conformity'(Pompilj;
1968) for disease 2 rather than disease 1.
The same conclusion would be obtained for any value of A within the interval
(0.5-1) and for any combination of the sixteen random facets of the new patient
that are appreciabily closer to one of the three vectors of random facets in the
data bank for disease 2 rather than for disease 1.
![Page 6: On the Adverse Effect of Increasing the Number of Binary Symptons in Medical Diagnosis_Springer 1979_pp. 658-667](https://reader031.vdocuments.us/reader031/viewer/2022030316/577cce751a28ab9e788e127a/html5/thumbnails/6.jpg)
663
MATHEMATICAL CONSIDERATIONS
In this section the mathematical structure of the kernel method is considered. We
shall prove that the phenomena shown in the previous section represents a general
feature of the kernel method.
~ ~
Let us still consider two diseases. If, for simplicity, Al A2 A, Fl .. F2 a T
and we also write d(~'~jt) • djt , (1) becomes
( 5)
so that if the vector ~ is thought to come from disease 1 and we compute (5) for
j = 1 and j = 2, we obtain the following ratio of likelihoods for the two diseases
(6) p(~ Ij=1~8l~M)
p(~ Ij=2~82~M) R(kIH)
where H represents (j = 1, j = 2, 81' 82' M). If we now consider the simplest
case where T = 1, using (5) and (6) we obtain
(7)
We can see that the power of ~ = A/(1 - A) :'~ .. d2l - dll' has a symmetrical pdf,
obtained from a sum of binomial density functions, when both d 21 and dll have a
binomial pdf of the type B(I,i).
If c " 0 and there are only I random symptoms (with probd of being either present
or absent), the variable ~ varies between -I and +I with probability
(8) p(~) "p(d21 a ~~ dll= 0) + p(d21 = ~ + 1~ dll = 1) +
+ ... + p(d21 = I~ dll= I -~) =
p(-~) .. P(d2l .. o~ dll .. ~) + P(d2l = 1~ dll = ~ + 1) +
+ ... + P(d2l .. I - ~~ dll = I) =
l P(d2l = ~ + i~ dll = i) i=O~I-I~1
l P(d2l = ~ + i)p(dll i) .. i=O~I-I~1
i=oJ-I~1 (I~f+i) [~r[f) [~r
[irIi=oJ_I~1 [IA+i) [f)
![Page 7: On the Adverse Effect of Increasing the Number of Binary Symptons in Medical Diagnosis_Springer 1979_pp. 658-667](https://reader031.vdocuments.us/reader031/viewer/2022030316/577cce751a28ab9e788e127a/html5/thumbnails/7.jpg)
664
from which we can obtain E{x} O. Note also that p(x) = p(R). where R = R(~IH).
For example, if I = 1 and the symptom has the probability ~ of being either present
or absent in both diseases, there would only be the four combinations of the data
bank shown in table 2 in which x, p(x) and R are also computed for A = o.Bo, and
where in Figures 1 and 2
Combinations data bank 1 2 3 4
Disease j 1 2 1 2 1 2 1 2
djl 0 0 0 1 1 0 1 1
x 0 1 -1 0
p(x) 0.25 0.25 0.25 0.25
R 1 4 0.25 1
table 2
we have plotted p(x) against x and peR) against R,
p(x) .50
Figure 1
-1 o +1
peR)
Figure 2
o .25 1 + 2 3 4 R R = 1.562
where R = E{R}.
In case the new patient has disease j =2, and there are a symptoms that have prob
ability 1 to be either present if j = 1, or absent if j = 2, and the remaining I - a
symptoms are random for both diseases, expression (m can be written
(9) R = ~x
![Page 8: On the Adverse Effect of Increasing the Number of Binary Symptons in Medical Diagnosis_Springer 1979_pp. 658-667](https://reader031.vdocuments.us/reader031/viewer/2022030316/577cce751a28ab9e788e127a/html5/thumbnails/8.jpg)
665
where
If we write y = P21 - P 11 , we have that y varies between (-I + 0) and (I - 0) and
has a symmetrical pdf given by
(10) p(y) = (1JI-O L ( I-a] (I-a] l2 i=o,I-o-lyl IYI+i i
such that E{y} O.
From the above results we than obtain
(11) p{R} = p(cpx) = p(x) = p(y - 0) = p(y)
and
( 12) E{x} = E{y - a} = E{y} - a = - a
x also having a symmetrical pdf between -I and I - 20.
Because the pdf of x is symmetrical around -0 we would also have
( 13)
and, moreover, for any fixed value of -0,
( 14) -0 0 lim (cp ~ R ~ cp ) o I-+«>
so that
( 15) lim peR < cpO) = lim peR > cpO) I-+«> I-+«>
even though, at the same time, E{R} will go to ~, as empirically shown in table 3,
and as can be proved theoretically.
Because for R > cpO or R < cpO we obtain higher conformity'for disease 1 or 2,
respectively - independently of the degree of'conformity'for either disease -
expression (15) tells us that there is an equal probability to 'conform' better to
disease 1 or 2 if I increases to infinite.
In the following table, the probabilities peR > cpa), peR < cpO), p(cp-l ~ R ~ cpa), the
ratio peR < cpO)/P(R > cpO) and E{R} have been computed for 0= 1, I = 2,3.10 respec
tively, and for ~ = o.Bo.
![Page 9: On the Adverse Effect of Increasing the Number of Binary Symptons in Medical Diagnosis_Springer 1979_pp. 658-667](https://reader031.vdocuments.us/reader031/viewer/2022030316/577cce751a28ab9e788e127a/html5/thumbnails/9.jpg)
666
I = 2 I = 3 I = 10 ... I = 00
peR > 4>0) 0.0 0.0625 0.2403 ... 0.50
peR < 4>0) 0.75 0.6875 0.5927 ... 0.50 p(4)-l :; R :; 4>0) 0.75 0.6250 0.3520 ... 0.0
peR < 4>0)/p(R > 4>0) 00 5 2.467 ... 1
E{R} 0.391 0.610 13.877 ... 00
Table 3
It could be easily proved that for any value of T, expression (9) would become
(16 ) R = -c [(1 + 4»2]I-C[ L T 4> l 44> 1 + t.:1, (T-1)
where Y t = 1'2U - 1'2 t (t = 1, ••• , u-1, u+1, T), and where U refers to a specific
patient. It is then possible to notice that for I + 00, Yt+ 0 and expression
(16) will go to 00.
It seems also interesting to end this paragraph by noticing that E{R} is a bad
parameter in representing the problem of this paper.
CONCLUSIONS
The results of this paper are conditional on the assumption that there is only a
finite number of symptoms enabling the discrimination of two or more states of
health, and that any increase in the number of symptoms is due only to the increase
of the random symptoms.
Even though we only analysed the kernel method, this paper tries to support the idea
that although statistical multi-d1mensional approaches are increasingly regarded
as important for a better understanding of nature in its complexity, it does not
imply that it is always worthwhile to increase the number of dimensions in order to
solve a diagnostic problem. The main reason for this are that:
(i) doctors are not easily capable of understanding states of health with a
great number of simultaneous relationships, and
(ii) that quite often it is more important to know which are the meaningful
symptoms for defining the state of health of the patients, rather than
increasing the number of symptoms to be considered without having a
complete understanding of their use.
![Page 10: On the Adverse Effect of Increasing the Number of Binary Symptons in Medical Diagnosis_Springer 1979_pp. 658-667](https://reader031.vdocuments.us/reader031/viewer/2022030316/577cce751a28ab9e788e127a/html5/thumbnails/10.jpg)
667
REFERENCES
Gremy F .. Goldberg M. (1977), "Decision Making Method in Medicine" in Informatics and Medicine - An Advance Course, edited by P. L. Reichertz and G. Goos. Spinger-Verlag Berlin, Heidelberg.
Lindley D. v. (1977). "The concept of coherence in inference", meeting on 'I fondamenti dell'inferenza statistica' 20-30 April 1977. Published by the Dipartimento Statistico Universita degli studi di Firenze, (1978), pp. 178-207.
Hughes G. F. (1968), "On the mean accuracy of statistical pattern recognizers", IEEE Trans Information Theory, 14, pp. 55-63.
Chandrasekaran B. (1971)' "Independence of measurements and the mean recognition accuracy, IEEE Trans Information Theory, 17, pp. 452-456.
Chandrasekaran B. and Jain A. K. (1975), "Independence, measurement complexity and classification performance, IEEE Trans Systems Men. Cybernet., 5, pp. 240-244.
Aitchison I. J. and Aitken C. G. (1976), "Multiveriate binary discrimination by the Kernel method", Biometrika, 63, pp. 413-420.
Aitchison I. J., Habbema J. D. F. and Kay J. W. (1977)' "A critical comparison of the two methods of statistical discrimination", Applied Statistics, 26, pp.
Habbema J. D. F .. Hermans J. and Van den Broek K. (1974). "A stepwise discriminant _ analysis program using density estimation", Compstat 1974, edited by
G. Bruckman, Vienna: Physica Verlag.
Pompilj G. (1968), "Teoria della conformita", Teorie dei Campioni, Roma.
The Lancet (1976), "Admission Multiphasic Screening", Lancet, 2, p. 7997.
ACKNOWLEDGEMENT
I am grateful to A.F.M. Smith and a.v. Lindley for their helpful comments.