researchondiagnosispredictionoftraditionalchinesemedicine ...in disease diagnosis, the use of...

Research ArticleResearchonDiagnosisPredictionofTraditionalChineseMedicineDiseases Based on Improved Bayesian Combination Model

Zhulv Zhang Jinghua Li Wanting Zheng Shaolei Tian Yang Wu Qi Yu and Ling Zhu

Institute of Information on Traditional Chinese Medicine China Academy of Chinese Medical Sciences Beijing China

Correspondence should be addressed to Ling Zhu jjzhuling163com

Received 8 January 2021 Accepted 13 May 2021 Published 10 June 2021

Academic Editor Yanggang Yuan

Copyright copy 2021 Zhulv Zhang et alis is an open access article distributed under theCreative CommonsAttribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Traditional Chinese Medicine (TCM) clinical intelligent decision-making assistance has been a research hotspot in recent yearsHowever the recommendations of TCM disease diagnosis based on the current symptoms are difficult to achieve a good accuracyrate because of the ambiguity of the names of TCM diseases e medical record data downloaded from ancient and modernmedical records cloud platform developed by the Institute of Medical Information on TCM of the Chinese Academy of ChineseMedical Sciences (CACMC) and the practice guidelines data in the TCM clinical decision supporting system were utilized as thecorpus Based on the empirical analysis a variety of improved Naıve Bayes algorithms are presented e research findings showthat the Naıve Bayes algorithm with main symptom weighted and equal probability has achieved better results with an accuracyrate of 842 which is 152 higher than the 69 of the classic Naıve Bayes algorithm (without prior probability) e per-formance of the Naıve Bayes classifier is greatly improved and it has certain clinical practicabilityemodel is currently availableat httptcmcdsmvcyiankbcom

1 Introduction

e disease diagnosis in TCM has a long history ere aremore than 100 disease names recorded in the ldquoHuangdiNeijingrdquo and 13 formulas are specially designed for diseases[1] It can be seen that the field of TCM pays great attentionto disease diagnosis ldquoDiseaserdquo in TCM is a generalization ofbasic regularities and contradictions in the entire evolutionof the disease including certain specific symptoms andcorresponding syndromes [2] TCM disease diagnosis refersto the complex process of physicians using various methodssuch as inspection listening and smelling examinationinquiry and palpation to collect patient clinical informationand analyze the patientrsquos clinical information based on thetheoretical knowledge of TCM and finally confirm the pa-tientrsquos complicated disease Disease diagnosis is a key link forphysicians to diagnosis and treatment of diseases and itsaccuracy is directly related to the effect and standardizationof clinical diagnosis and treatment In this study TCMdisease prediction is modelled as a text classification task in

natural language processing which is known to be a domainwith high-dimensional feature space challenge [3]

In recent years deep learning is a focused research di-rection of machine learning which seeks to identify aclassification scheme with higher predictive performancebased on multiple layers of nonlinear information pro-cessing Despite many researches in the field of sentimentanalysis [4] topic identification and genre classification[5ndash8] have shown deep learning and ensemble learning suchas recurrent neural network in conjunction with GloVe orattention mechanism in which the accuracy is superior toconventional supervised learning methods but because ofthe particularity of Chinese medicine field a large amount ofreal clinical record is very difficult to collect Furthermoreconventional supervised learning has better interpretabilitythan deep learning erefore Naıve Bayes is chosen as theresearch method in this study In disease diagnosis the useof mathematical algorithm models can often achieve goodresults [9] e Bayesian classification algorithm is a typicalstatistical method that can be used for reasoning and

HindawiEvidence-Based Complementary and Alternative MedicineVolume 2021 Article ID 5513748 9 pageshttpsdoiorg10115520215513748

forecasting research which was proposed by the Britishmathematician omas Bayes in the 18th century based onthe ldquoinverse probabilitiesrdquo problem It is based on theBayesian formula e method of probabilistic reasoning isutilized to calculate the probability that the sample belongsto a particular class it assumes that all feature variables Xkare independent of each other is assumption seems a bitunreasonable but it has been proved by many studies tohave better performance in classification tasks [10] whichcan effectively solve the problem of uncertain knowledgereasoning [11] Bayesian classification algorithm is widelyused in biology [12] transportation [13] meteorology [14]economy [15] medicine [16] and other fields because of itshigh practicability

In 1980 a scientific researcher [17] put forward the ideaof applying Bayesian algorithm to disease diagnosis of TCMQin [18] improved the traditional Naıve Bayesian classifi-cation method and applied it to the diagnosis of asthma inTCM and achieved good experimental results Du [19]applied the improved weighted hidden Naıve Bayes classi-fication algorithm to the actual infertility diagnosis of TCMproviding a good idea and method for the modelling ofinfertility TCM diagnosis In addition there are still manyrelated works that have achieved outstanding results[20ndash23] e above work has accelerated the pace of diag-nostic research in TCM improved the accuracy speed andefficiency of clinical disease diagnosis and laid a goodfoundation for artificial intelligence research in TCMHowever due to the limitations of data quality terminologystandard computing power and so forth the TCM diseasediagnosis model based on Bayesian algorithm still hascertain shortcomings It needs to be further upgraded andimproved to meet the increasing TCM clinical and scientificresearch needs

e Big Health TCM Intelligent RampD Center of theInstitute of Information CACMC has more than ten yearsof research foundation in TCM informatization softwaredevelopment TCM algorithm research ontology con-structing and TCMdata Based on the research of the centerthis research has made certain explorations in the diagnosisand prediction of TCM diseases based on the modifiedBayesian joint model It is introduced as follows

2 Basic Data Preparation

Due to the complexity of TCM diseases the medical recordsof some diseases are too scarce and the guidelines aremissing which leads to serious imbalances in data and af-fects the effect of machine learning erefore this study isbased on the top 100 common diseases in DongzhimenHospital of Beijing University of TCM (see Table 1)e dataof the study mainly comes from the medical record data ofthe ancient and modern medical record cloud platform(httpwwwyiankbcomtotaldatavolumeof300000+) aswell as the practical clinical guidelines of the TCM clinicaldecision support system (httpswwwtcmcdscomtotaldatavolume4000+) developed by the Institute of In-formation CACMS extracts the medical records andguidelines data of 100 common diseases in Table 1 and

removes the data of multidisease diagnosis ere are a totalof 37103 items of which 23 are the training data and 13 arethe test data

3 Data Cleaning

It is well acknowledged that the problem of data cleaning isthe basic work inmachine learning and deep learning In thisstudy ontology data (Table 2) in more than 80000 fields ofTCM diseases symptoms and signs in the background ofthe TCM clinical auxiliary decision support system are usedas the data standard and the TCM disease diagnosis dataand symptom data in the medical records and guide data arestandardized for example ldquoMenstrual periodrdquo is stan-dardized as ldquolate menstrual periodrdquo and ldquoEasy to wake upearlyrdquo ldquoWake up midnightrdquo ldquoWake up frequently everynightrdquo ldquoDifficulty falling asleeprdquo and other specifications areldquoInsomniardquo e standard of symptoms and TCM diseasenames is an aid to TCM diseases intelligent diagnosis whichis very important Because the Bayesian-based TCM diseasediagnosis prediction model does not check the establishedsymptom words but supports the doctor to input thesymptom words in natural language the recognition of thesymptom words and the matching rate in the existing corpushave a large impact on the accuracy of decision-making

Based on the characteristics of the description ofsymptoms in the medical record corpus abandoning thetraditional-dictionary-based and statistical and machine-learning-based word segmentation methods the medicalrecord corpus is segmented using a comma as a segmen-tation method

4 Method

is project uses the Naıve Bayes method for modellingNaıve Bayes is a simplification of the Bayesian method It isbased on the conditional independence between each featureand the label e joint probability of characteristics and thelabel need to be obtained in the Bayesian method

For a sample D to be classified its sample attribute X

X1 X2 Xn and categorical variable C C1 C2 Cmaccording to Bayesrsquo theorem the posterior probability can berepresented by the prior probability P (C) the class con-ditional probability P (X|C) and the standardized constantP (X)

P(C|X) P(X|C)P(C)

P(X) (1)

While NB assumes that all feature variables Xk are in-dependent of each other given category C and sample at-tribute X the conditional independence assumption can beexpressed as

P(X|C c) 1113945n

k1P Xk|C c( 1113857 (2)

According to the above formula if you want to calculatethe probability p (Disease A| Symptom A Symptom BSymptom C) that Symptom A Symptom B and Symptom C

2 Evidence-Based Complementary and Alternative Medicine

are diagnosed as Disease X you need to get P (Symptoms) inthe data set Symptom A Symptom B Symptom C Disease Xjoint probability if there is no cooccurrence of Symptoms AB and C and a certain disease in the data set the Bayesianmethod cannot give a result

In order to make better use of the excellent performanceof Naıve Bayes in classification while avoiding this kind ofnondiagnostic recommendation and ensuring the accuracyof the classification results this study uses an improvedNaıve Bayes model to calculate the conditional probabilitynamely when calculating p(DiseaseX)P(SymptomA

SymptomB SymptomC) you only need to calculatep(SymptomA)|DiseaseXP(SymptomA) p(SymptomB|

DiseaseX)P(SymptomB) and p(SymptomC|DiseaseX)P(SymptomC) for the case where there is no Disease X andSymptom A in the data set and give P (Disease X|SymptomA) a very small number See formulas (3) and (4)

e Bayesian formula is as follows

P(DiseaseX|SymptomA SymptomB SymptomC) p(SymptomA SymptomB SymptomC|DiseaseX)

P(SymptomA SymptomB SymptomC)lowastP(DiseaseX)

(3)

Naıve Bayes is as follows

P(SymptomA|DiseaseX)lowastP(SymptomB|DiseaseX)lowastP(SymptomC)|DiseaseXP(SymptomA)lowastP(SymptomB)lowastP(SymptomC)

lowastP(DiseaseX) (4)

Table 2 Domain ontology status

Classification QuantityWestern medicine disease 3041TCM disease 2212Syndromes 844Symptom 69649Tongue and pulse 8307

84053

Table 1 Top 100 common diseases in TCM

Top100 common diseasesCough Postpartum depression Heat stranguria Summer nonacclimationInsomnia Amenorrhea Spontaneous sweating EnterobiasisMenstrual disorder Pelvic mass in woman Night sweating DysphagiaCommon cold Metrorrhagia Asthma syndrome Stranguria due to hematuriaConstipation Leukorrheal diseases Prospermia Qi goiterLumbodynia Advanced menstruation Impotence RegurgitationHeadache Menorrhagia Postpartum hypogalactia SomnolenceConsumptive disease Delayed menstruation Acute mastitis Consumptive thirst involving kidneyChest discomfort Lump in breast Acute appendicitis DementiaPalpitation Menostaxis Enuresis HemoptysisStomach ache Apoplexy Eczema MumpsStomach distension Consumptive thirst Nodule in breast HysteriaArthralgia syndrome Vomiting Anorexia Heat stroke and sunstrokeTinnitus Oral aphthae in children Epistaxis EpilepsyAbdominal pain Gastric discomfort Purpura Lung distentionBone bidisease Acute and chronic sinusitis Jaundice Lung abscessDepression syndrome Globus hystericus Tympanites GallHypomenorrhea Diarrhea Urolithic stranguria Sallow complexionWind-warm disease with lung heat Hypochondriac pain Frozen shoulder DacryocystitisVertigo Facial palsy rush Cold tear induced by windFever Edema Snake-like sores Manic-depressive psychosisInfertility Aphtha Deafness Lung-wind acneDysmenorrhea Frequent micturition Stiff neck Hemorrhoidal diseaseMenopausal syndrome Infantile malnutrition Neck arthralgia Hidden rashesPremenstrual syndrome Irregular menstrual cycle Stranguria due to overstrain Dysentery

Evidence-Based Complementary and Alternative Medicine 3

As mentioned earlier Naıve Bayes requires each featureto be independent of the others but it is difficult to make allthe features independent of each other in the real world andsome studies have shown that Naıve Bayes performs well notonly in the classic situation where each feature is inde-pendent of the others but also in other situations [24 25]which also motivates us to develop this research to increase

the use of Bayesian scenarios and to find suitable methodsfor the auxiliary diagnosis of TCM diseases

As we all know in the diagnosis of TCM disease thevarious symptoms of each disease are related In order toobtain a better generalization ability of the model this studyuses formula (5) as the calculation method which may lose acertain accuracy From formula (4) we can get the following


P(DiseaseX|SymptomA SymptomB SymptomC) P(SymptomA|DiseaseX)

P(SymptomA)lowast

P(SymptomB|DiseaseX)

P(SymptomB)

lowastP(SymptomC|DiseaseX)

P(SymptomC)

(5)

Formula (5) is equivalent to formula (4) It can be seenthat after deformation each (disease symptom) cooccur-rence pair is regarded as a feature item and each feature itemhas the same weight

In the diagnosis and prediction of TCM diseases there isa situation where a group of immediate symptoms corre-spond to two disease diagnoses which belong to two cat-egories SymptomA SymptomB and SymptomC andDiseaseX1 and DiseaseX2 are classified into two categoriesand it is equivalent to judge

P(DiseaseX1|SymptomA SymptomB SymptomC)

P(DiseaseX2|SymptomA SymptomB SymptomC)gt 1

(6)

that is the probability of DiseaseX1 is higher than theprobability of DiseaseX2 According to the Naıve Bayesformula we can get



P(SymptomA|DiseaseX1) lowastP(SymptomB|DiseaseX1) lowastP(SymptomC|DiseaseX1)lowastP(DiseaseX1)

P(SymptomA|DiseaseX2) lowastP(SymptomB|DiseaseX1) lowastP(Symptom|DiseaseX1)lowastP(DiseaseX2)

(7)

Since the division of formula (7) is prone to produce toosmall numbers take the log function on both sides to get log

P(DiseaseX1)|SymptomA SymptomB SymptomC

P(DiseaseX2)|SymptomA SymptomB SymptomC log

P(SymptomA|DiseaseX1)

P(SymptomA|DiseaseX2)+ log

P(SymptomB|DiseaseX1)


+ logP(SymptomC|DiseaseX1)

P(SymptomC|DiseaseX1)+ log

P(DiseaseX1)

P(DiseaseX2)

(8)

e left side of formula (8)rsquos equal sign greater than 0is classified as DiseaseX1 and the classification result canbe obtained e above disease prediction exampleconsiders the logistic regression model which is

equivalent to using the prediction result of the linearregression model to approximate the logistic ratio of theposterior probability then we have the followingformula


logP(DiseaseX1)|SymptomA SymptomB SymptomC

P(DiseaseX2)|SymptomA SymptomB SymptomC w1lowast SymptomA + w2lowast SymptomB + w3lowast SymptomC + b

(9)

w is the feature item which means the weight of thesymptom in formula (9) If the feature item is binary dis-crete the value is [0 1] in formula (9) then formula (10) canbe produced

logP(DiseaseX1|SymptomA SymptomB SymptomC)


w1 + w2 + w3 + b

(10)

It can be seen that formulas (8) and (10) are very similare feature items are added together and an independentitem is added e logPDiseaseX1PDiseaseX2 in formula(8) is similar to b in formula (10) e relationship betweenNaıve Bayes and logistic regression is deduced here edifference is that each feature item of logistic regression hasW1 W2 weights Naıve Bayes (formula (8)) is hereregarded as the equal weight of each feature item or weightis obtained only by the ratio of the conditional probability ofeach feature For example the weight of the feature item ofSymptom A is calculated by P(SymptomA|DiseaseX)P(SymptomA) and the log-linear in Naıve Bayes and lo-gistic regression have different effects

e data set in this study mainly comes from clinicalmedical records According to the expertsrsquo experience thefirst three symptoms in the clinic are more likely to be themain symptoms and have the largest weight in the diag-nosis prediction that is the greatest contribution to thediagnosis of the disease erefore this article uses amethod to add a weight coefficient greater than 1 to the firstthree main symptoms in the study When calculating thefeature item operator of each symptom if the symptom anddisease cooccur in the data set follow formula (5) and ifthere is no cooccurrence according to Laplaciansmoothing calculation the feature operator will get a verysmall value so that each input symptom feature operatorwould have a value If the symptom is the main symptom(the first 3 inputs) add a coefficient greater than 1 in front

of the feature item operator to increase the weight of theoperator See Figure 1

e symptom set Xi was input to calculate all thediseases Yi involved in the symptoms while calculatingP(Yi|X1 X2) according to each disease in order to get theresult set of the posterior probability of the disease P(Y1)P(Y2) P(Yi) the top 3 in the result set as the recom-mended result

In this paper formula (5) is used to calculate the pos-terior probability of disease From formula (5) two calcu-lation methods of weighted and unweighted main symptomsare derived through deformation and data processingConsidering the meaning of Bayesian formula we canunderstand it from another perspective

P(Y|X) P(Y|X)

P(X)lowastP(Y) (11)

where P(Y) term is the prior probability of Y theP(X|Y)P(X) term is regarded as a feature term operatorcalled likelihood the conditional probability of numerator yto x numerator p(x) is the normalization term and P(Y|X)on the left side of the equation is the posterior probability ofY under the fact that x occurs then the probability of Yoccurring after X has changed from p(y) to p(y|x) and theoriginal probability of P(Y) is the prior probability of adisease in the data set in this study Both sides of formula(11) are divided by p(y) to get

P(y|x)

P(y)

P(X|Y)

P(X) (12)

e left side of formula (12) can be regarded as the rate ofchange between the posterior probability of Y(p(yx)) andthe prior probability which also cleverly avoids the problemof imbalance in the prior probability of p(y) in the data seterefore we have made a modification and update for theNaıve Bayes formula which are the method of adding priorprobability and the method of not adding prior probabilitywill be discussed later e above is the first algorithm used

Symptom

Feature operator Laplacian smoothingoperator

Weighted Unweighted

Main symptom

Cooccurrence No cooccurrence

Not the main symptom

Figure 1 Main symptom weighted diagram


in this article All eight different Bayesian algorithms used inthis article can be shown in Figure 2 In addition log form isshown in Figure 3

We have transformed formula (5) into formula (12) inthe previous article Formulas (8) and (9) are logarithmicforms of Naıve Bayes and logistic regression respectivelye linear functions of the two formulas are different ebasic assumption of Naıve Bayes is that each dimension ofthe sample is conditionally independent that is P(X1 X2X3) P(X1)lowast P(X2)lowastP(X3) in order to avoid under-flow of floating-point numbers we add a log function infront to get formula (8) which does not change themonotonicity It can be seen that when the log base isbigger than log(P(y)) which is the prior probability itbecomes smoothed under the action of the log functionFurthermore this term is changed from multiplication toaddition which reduces the influence of the prior proba-bility to a certain extent For example the number of acertain disease in the data set is small that is the prioriprobability product term is very small resulting in a verysmall posteriori value so the algorithm adds a branch of logform

In order to solve the problem of imbalanced priorprobability we also adopted an oversampling method tomake 100 diseases in the data set to be processed with equalprobability Here we assume that the prior probability ofeach disease is 1100 and then use the main symptomsweighted and unweighted methods for calculation

5 Results and Discussion

In the experiment we use 8 calculation methods of NaıveBayes method and its variants shown in Figure 2 using

3-fold cross-validation of the data We get a list of thediseases involved in all symptoms in each piece of test dataAccording to the 8 algorithms we get the ranking of thedisease probabilities e diseases with the top 3 probabilities

NaiveBayes

Mainsymptomweighted

Mainsymptoms

notweighted

Addprior

probability

Noprior

probability

Logform

Equalprobabi

lity

Addprior

probability

Noprior

probability

Logform

Equalprobabi

lity

Figure 2 Eight different Bayesian algorithms

0 1

y

x

y = loga x(a gt 1)

y = loga x

Figure 3 Log form


Tabl

e3

Accuracyof

8algorithms

Calculatio

nmetho

dCalculatio

nform

ula

Accuracy

()

Mainsymptom

weigh

t(for

thefirst

3symptom

sifthe

diseasehascooccurrencem

ultip

lyitby

theweigh

tcoeffi

cient)

Add

prior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C||Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)

(for

thefirst

3symptom

sifthediseasehascooccurrencem

ultip

lyitby

theweigh

tcoefficient)

73

Noprior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P(Dise

aseX

)

P(Symptom

A|Dise

aseX

)P

(Dise

aseA

)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C|Dise

aseX

)P

(Symptom

C)

69

Log

functio

nform

log(

PDise

asex1|Symptom

ASymptom

BSymptom

C)

logP

(Symptom

A|Dise

aseX

1)P

(Symptom

A|Dise

aseX

2)+logP

(Symptom

B|Dise

aseX

1)P

(Symptom

B|Dise

aseX

2)+logP

(Symptom

C|Dise

aseX

1)P

(Sym

ptom

C|Dise

aseX

2)+logP

Dise

aseX

1PDise

aseX

277

Equal

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C

P(Symptom

A|Dise

aseX

)P

(Symptom

A))

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)842

Mainsymptom

sareno

tweigh

ted

Add

prior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)73

Noprior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P(Dise

aseX

)

P(Symptom

A|Dise

aseX

)P

(Symptom

A)

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)

67

Log

functio

nform

log(

PDise

asex1|Symptom

ASymptom

BSymptom

C)

logP

(Symptom

A|Dise

aseX

1)P

(Symptom

A|Dise

aseX

2)+logP

(Symptom

B|Dise

aseX

1)P

(Symptom

B|Dise

aseX

2)+logP

(Symptom

C|Dise

aseX

1)P

(Symptom

C|Dise

aseX

2)+logP

Dise

aseX

1PDise

aseX

276

Equal

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)83


are used as the recommended results In the evaluation of theresults if the recommended results hit the disease corre-sponding to the data then it is recorded as the correctprediction according to this rule to calculate the accuracyrate shown in Table 3

6 Conclusion

As can be seen from the above figure this study is based onthe classic TCM syndrome differentiation idea and proposesan algorithm improvement method for the weighting of themain symptoms Among all 8 modified Naıve Bayes algo-rithms the algorithm with the highest accuracy is theweighted and equal probability algorithm for the mainsymptoms reaching 842 of accuracy which is 152higher than the 69 of the classic Naıve Bayes algorithm(without prior probability) which greatly improves theperformance of the Naıve Bayes classifier and has certainclinical practicability e model is currently available athttptcmcdsmvcyiankbcom

However due to the privacy of TCM medical recordcorpus it is difficult to obtain large-scale real effective andhigh-quality medical record corpus Moreover the diagnosisof TCM disease is vague and the boundary between diseaseand symptoms is not very clear For example cough is alsothe name of the disease and the name of the syndromewhich makes it difficult to improve the accuracy of theprediction and recommendation of TCM disease diagnosisere is also some room for improvement in the process ofthis research For example word segmentation is toogranular according to punctuation e matching betweenuser input symptoms and Bayesian corpus symptoms shouldbe too dependent on the domain ontology and if the on-tology is not covered its accuracy will be greatly reducedBoth issues need optimization in the next version

Secondly the main symptoms weight coefficient isartificially set with a certain degree of randomness anduncontrollability In the future on the basis of having morelabeled corpus we can further try more updated algorithmsto provide methodological guarantee for optimizing theperformance of the TCM clinical decision-making systemFurthermore some schemes based on conventional machinelearning method and ensemble learning methods (such asBoosting Bagging and Random Subspace) have achievedgood performance in text genre classification and sentimentanalysis [26ndash28] which shall be a promisingmethod that canbe explored in subsequent studies [29] Meanwhile somedata mining method and feature selection methods [30 31]can be useful to discover the relationship between diseaseand symptoms and improve the accuracy of TCM diseasediagnosis recommendation Further research may yieldmore promising results by exploring more methods in thisstudy

Data Availability

e medical cases data used to support the findings of thisstudy have not been made available because of patientsrsquoprivacy

Conflicts of Interest

e authors declare that they have no conflicts of interest

Acknowledgments

e work was supported by grants from the 13th Five-YearPlan for National Key RampD Program of China(2018YFC1705401) literature mining and evidence-basedresearch on ulcerative colitis Beijing Natural ScienceFoundation (7202144) sequential decision-making opti-mization of traditional Chinese medicine treatment of ul-cerative colitis based on deep intensive learning StateNatural Science Fund Project (81873390) study on pedigreeconstruction of ancient knowledge of acupuncture andmoxibustion based on text vector CKCEST-2019-2-12China Knowledge Centre for Engineering Sciences andTechnology construction project TCM knowledge servicesystem State Natural Science Fund Project (81873200)research on key diagnosis and treatment factors of spleenand stomach disease and clinical optimization decisionbased on deep learning and basic scientific research businessexpense independent topic selection project of ChinaAcademy of Chinese Medical Sciences (ZZ140316) con-struction and application study on decision support systemfor gynecological diseases of traditional Chinese medicinebased on electronic medical records

References

[1] Z Wang ldquoOn disease differentiation and dialecticsrdquo ModernMedicine and Health vol 12 pp 1105-1106 2002

[2] Z You and Y Zhao ldquoA brief analysis of the function andconnection of disease differentiation and syndrome differ-entiation in TCMrdquo Guangming Chinese Medicine vol 35no 12 pp 1908-1909 2020

[3] A Onan S Korukoglu H Bulut et al ldquoEnsemble of keywordextraction methods and classifiers in text classificationrdquoExpert Systems with Application vol 57 pp 232ndash247 2016

[4] A Onana S Korukoglu and H Bulut ldquoA multiobjectiveweighted voting ensemble classifier based on differentialevolution algorithm for text sentiment classificationrdquo ExpertSystems with Applications vol 62 pp 1ndash16 2016

[5] A Onan ldquoMining opinions from instructor evaluation re-views a deep learning approachrdquo Computer Applications inEngineering Education vol 28 no 1 pp 117ndash138 2020

[6] A Onan ldquoSentiment analysis on massive open online courseevaluations a text mining and deep learning approachrdquoComputer Applications in Engineering Education vol 29no 3 pp 572ndash589 2020

[7] A Onan and M A Toolu ldquoWeighted word embeddings andclustering-based identification of question topics in MOOCdiscussion forum postsrdquo Computer Applications in Engi-neering Education pp 1ndash15 2020

[8] A Onan and M A Tocoglu ldquoA term weighted neural lan-guage model and stacked bidirectional LSTM based frame-work for sarcasm identificationrdquo IEEE Access vol 9pp 7701ndash7722 2021

[9] J Hong and L Huang ldquoBayesian analysis in medical diag-nosisrdquo Journal of Xianning Medical College vol 14 no 3pp 179-180 2000


[10] N Friedman D Geiger and M Goldszmidt ldquoBayesiannetwork classifiersrdquo Machine Learning vol 29 pp 131ndash1631997

[11] G Hacken ldquoBayesian statistics an introduction (4th ed)rdquoComputing Reviews vol 55 no 3 pp 167-168 2014

[12] H Zhang C Shen R-Z Liu J Mao C-T Liu and B MuldquoDeveloping novel in silico prediction models for assessingchemical reproductive toxicity using the naıve bayes classifiermethodrdquo Journal of Applied Toxicology vol 40 no 9pp 1198ndash1209 2020

[13] T Liu and S Shi X Gu Naıve bayes classifier based drivinghabit prediction scheme for VANET stable clusteringrdquo Ar-tificial Intelligence for Communications and Networks vol 25pp 1708ndash1714 2019

[14] Y Shen X Huang and S Huang Y Shen and X Chen Waveecho recognition and effect inspection of doppler weatherradar based on Bayesian classifierrdquo Marine Science vol 44no 6 pp 83ndash90 2020

[15] D Troy and A S Hall ldquoRecession forecasting using Bayesianclassificationrdquo International Journal of Forecasting vol 35no 3 pp 848ndash867 2019

[16] G Wang Application of Bayesian Algorithm in HumanPhysiological State Recognition Dalian University of Tech-nology Dalian China 2008

[17] S Ye and Y Ye ldquoComputer diagnosis and treatment of TCMrdquoShanghai Journal of Traditional Chinese Medicine no 6 p 331980

[18] H Qin Research and Application of Several Improved NaıveBayes Classification Algorithms Shandong University ofScience and Technology Qingdao China 2018

[19] T Du Research and Application of Naıve Bayes ClassificationBased on Attribute Selection University of Science andTechnology of China Hefei China 2016

[20] C-S Mu and P Zhang C-Y Kong and Y-N Li Applicationof bayes probability model in differentiation of yin and yangjaundice syndromes in neonatesrdquo Chinese Journal of Inte-grated Traditional And Western Medicine vol 35 no 9pp 1078ndash1082 2015

[21] B Pang and D Zhang N Li and K Wang Computerizedtongue diagnosis based on Bayesian networksrdquo IEEE Trans-actions on Bio-Medical Engineering vol 51 no 10pp 1803ndash1810 2004

[22] L Yuan Research On Some Key Technologies of TCM Syn-drome Differentiation and Diagnosis of Spleen and StomachDiseases Zhejiang University of Science and TechnologyHangzhou China 2013

[23] Y Wang H Wang and T Yang Research on online diagnosisof diseases in TCM based on Bayesian algorithmrdquo SoftwareGuide vol 9 no 12 pp 97ndash99 2010

[24] A Mccallum and K Nigam ldquoComparison of event models fornaıve bayes text classificationrdquo in Proceedings of the AAAI-98Workshop on Learning for Text Categorization DortmundJermany July 1998

[25] I Rish ldquoAn empirical study of the naıve bayes classifierrdquoJournal of Universal Computer Science vol 1 no 2 p 1272001

[26] A Onan ldquoHybrid supervised clustering based ensemblescheme for text classificationrdquo Kybernetes the InternationalJournal of Systems amp Cybernetics vol 46 no 2 2017

[27] A Onan ldquoSentiment analysis on twitter based on ensemble ofpsychological and linguistic feature setsrdquo Balkan Journal ofElectrical and Computer Engineering vol 6 pp 1ndash9 2018

[28] A Onan S Korukoglu and H Bulut ldquoLDA-based topicmodelling in text sentiment classification an empirical

analysisrdquo International Journal of Computational Linguisticsand Applications vol 7 no 1 pp 101ndash119 2016

[29] A Onan ldquoAn ensemble scheme based on language functionanalysis and feature engineering for text genre classificationrdquoJournal of Information Science Principles amp Practice vol 44no 1 pp 28ndash47 2018

[30] A Onan V Bal and B Yanar Bayam ldquoe use of data miningfor strategic management a case study on mining associationrules in student information systemrdquo Croatian Journal ofEducationmdashHrvatski Casopis Za Odgoj I Obrazovanje vol 18no 1 2016

[31] A Onan and S Korukoglu ldquoA feature selection model basedon genetic rank aggregation for text sentiment classificationrdquoJournal of Information Science vol 43 no 1 pp 25ndash38 2017


forecasting research which was proposed by the Britishmathematician omas Bayes in the 18th century based onthe ldquoinverse probabilitiesrdquo problem It is based on theBayesian formula e method of probabilistic reasoning isutilized to calculate the probability that the sample belongsto a particular class it assumes that all feature variables Xkare independent of each other is assumption seems a bitunreasonable but it has been proved by many studies tohave better performance in classification tasks [10] whichcan effectively solve the problem of uncertain knowledgereasoning [11] Bayesian classification algorithm is widelyused in biology [12] transportation [13] meteorology [14]economy [15] medicine [16] and other fields because of itshigh practicability

In 1980 a scientific researcher [17] put forward the ideaof applying Bayesian algorithm to disease diagnosis of TCMQin [18] improved the traditional Naıve Bayesian classifi-cation method and applied it to the diagnosis of asthma inTCM and achieved good experimental results Du [19]applied the improved weighted hidden Naıve Bayes classi-fication algorithm to the actual infertility diagnosis of TCMproviding a good idea and method for the modelling ofinfertility TCM diagnosis In addition there are still manyrelated works that have achieved outstanding results[20ndash23] e above work has accelerated the pace of diag-nostic research in TCM improved the accuracy speed andefficiency of clinical disease diagnosis and laid a goodfoundation for artificial intelligence research in TCMHowever due to the limitations of data quality terminologystandard computing power and so forth the TCM diseasediagnosis model based on Bayesian algorithm still hascertain shortcomings It needs to be further upgraded andimproved to meet the increasing TCM clinical and scientificresearch needs

e Big Health TCM Intelligent RampD Center of theInstitute of Information CACMC has more than ten yearsof research foundation in TCM informatization softwaredevelopment TCM algorithm research ontology con-structing and TCMdata Based on the research of the centerthis research has made certain explorations in the diagnosisand prediction of TCM diseases based on the modifiedBayesian joint model It is introduced as follows

2 Basic Data Preparation

Due to the complexity of TCM diseases the medical recordsof some diseases are too scarce and the guidelines aremissing which leads to serious imbalances in data and af-fects the effect of machine learning erefore this study isbased on the top 100 common diseases in DongzhimenHospital of Beijing University of TCM (see Table 1)e dataof the study mainly comes from the medical record data ofthe ancient and modern medical record cloud platform(httpwwwyiankbcomtotaldatavolumeof300000+) aswell as the practical clinical guidelines of the TCM clinicaldecision support system (httpswwwtcmcdscomtotaldatavolume4000+) developed by the Institute of In-formation CACMS extracts the medical records andguidelines data of 100 common diseases in Table 1 and

removes the data of multidisease diagnosis ere are a totalof 37103 items of which 23 are the training data and 13 arethe test data

3 Data Cleaning

It is well acknowledged that the problem of data cleaning isthe basic work inmachine learning and deep learning In thisstudy ontology data (Table 2) in more than 80000 fields ofTCM diseases symptoms and signs in the background ofthe TCM clinical auxiliary decision support system are usedas the data standard and the TCM disease diagnosis dataand symptom data in the medical records and guide data arestandardized for example ldquoMenstrual periodrdquo is stan-dardized as ldquolate menstrual periodrdquo and ldquoEasy to wake upearlyrdquo ldquoWake up midnightrdquo ldquoWake up frequently everynightrdquo ldquoDifficulty falling asleeprdquo and other specifications areldquoInsomniardquo e standard of symptoms and TCM diseasenames is an aid to TCM diseases intelligent diagnosis whichis very important Because the Bayesian-based TCM diseasediagnosis prediction model does not check the establishedsymptom words but supports the doctor to input thesymptom words in natural language the recognition of thesymptom words and the matching rate in the existing corpushave a large impact on the accuracy of decision-making

Based on the characteristics of the description ofsymptoms in the medical record corpus abandoning thetraditional-dictionary-based and statistical and machine-learning-based word segmentation methods the medicalrecord corpus is segmented using a comma as a segmen-tation method

4 Method

is project uses the Naıve Bayes method for modellingNaıve Bayes is a simplification of the Bayesian method It isbased on the conditional independence between each featureand the label e joint probability of characteristics and thelabel need to be obtained in the Bayesian method

For a sample D to be classified its sample attribute X

X1 X2 Xn and categorical variable C C1 C2 Cmaccording to Bayesrsquo theorem the posterior probability can berepresented by the prior probability P (C) the class con-ditional probability P (X|C) and the standardized constantP (X)

P(C|X) P(X|C)P(C)

P(X) (1)

While NB assumes that all feature variables Xk are in-dependent of each other given category C and sample at-tribute X the conditional independence assumption can beexpressed as

P(X|C c) 1113945n

k1P Xk|C c( 1113857 (2)

According to the above formula if you want to calculatethe probability p (Disease A| Symptom A Symptom BSymptom C) that Symptom A Symptom B and Symptom C









(3)






84053









P(SymptomA)lowast


P(SymptomB)


P(SymptomC)

(5)





(6)






(7)










P(DiseaseX1)

P(DiseaseX2)

(8)






(9)




w1 + w2 + w3 + b

(10)






P(Y|X) P(Y|X)

P(X)lowastP(Y) (11)


P(y|x)

P(y)

P(X|Y)

P(X) (12)


Symptom


Weighted Unweighted

Main symptom











NaiveBayes

Mainsymptomweighted

Mainsymptoms

notweighted

Addprior

probability

Noprior

probability

Logform

Equalprobabi

lity

Addprior

probability

Noprior

probability

Logform

Equalprobabi

lity


0 1

y

x

y = loga x(a gt 1)

y = loga x

Figure 3 Log form


Tabl

e3

Accuracyof

8algorithms

Calculatio

nmetho

dCalculatio

nform

ula

Accuracy

()

Mainsymptom

weigh

t(for

thefirst

3symptom

sifthe


ultip

lyitby

theweigh

tcoeffi

cient)

Add

prior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C||Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)

(for

thefirst

3symptom


ultip

lyitby

theweigh

tcoefficient)

73

Noprior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P(Dise

aseX

)

P(Symptom

A|Dise

aseX

)P

(Dise

aseA

)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C|Dise

aseX

)P

(Symptom

C)

69

Log

functio

nform

log(

PDise

asex1|Symptom

ASymptom

BSymptom

C)

logP

(Symptom

A|Dise

aseX

1)P

(Symptom

A|Dise

aseX

2)+logP

(Symptom

B|Dise

aseX

1)P

(Symptom

B|Dise

aseX

2)+logP

(Symptom

C|Dise

aseX

1)P

(Sym

ptom

C|Dise

aseX

2)+logP

Dise

aseX

1PDise

aseX

277

Equal

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C

P(Symptom

A|Dise

aseX

)P

(Symptom

A))

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)842

Mainsymptom

sareno

tweigh

ted

Add

prior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)73

Noprior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P(Dise

aseX

)

P(Symptom

A|Dise

aseX

)P

(Symptom

A)

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)

67

Log

functio

nform

log(

PDise

asex1|Symptom

ASymptom

BSymptom

C)

logP

(Symptom

A|Dise

aseX

1)P

(Symptom

A|Dise

aseX

2)+logP

(Symptom

B|Dise

aseX

1)P

(Symptom

B|Dise

aseX

2)+logP

(Symptom

C|Dise

aseX

1)P

(Symptom

C|Dise

aseX

2)+logP

Dise

aseX

1PDise

aseX

276

Equal

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)83



6 Conclusion




Data Availability




Acknowledgments


References










































(3)






84053









P(SymptomA)lowast


P(SymptomB)


P(SymptomC)

(5)





(6)






(7)










P(DiseaseX1)

P(DiseaseX2)

(8)






(9)




w1 + w2 + w3 + b

(10)






P(Y|X) P(Y|X)

P(X)lowastP(Y) (11)


P(y|x)

P(y)

P(X|Y)

P(X) (12)


Symptom


Weighted Unweighted

Main symptom











NaiveBayes

Mainsymptomweighted

Mainsymptoms

notweighted

Addprior

probability

Noprior

probability

Logform

Equalprobabi

lity

Addprior

probability

Noprior

probability

Logform

Equalprobabi

lity


0 1

y

x

y = loga x(a gt 1)

y = loga x

Figure 3 Log form


Tabl

e3

Accuracyof

8algorithms

Calculatio

nmetho

dCalculatio

nform

ula

Accuracy

()

Mainsymptom

weigh

t(for

thefirst

3symptom

sifthe


ultip

lyitby

theweigh

tcoeffi

cient)

Add

prior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C||Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)

(for

thefirst

3symptom


ultip

lyitby

theweigh

tcoefficient)

73

Noprior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P(Dise

aseX

)

P(Symptom

A|Dise

aseX

)P

(Dise

aseA

)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C|Dise

aseX

)P

(Symptom

C)

69

Log

functio

nform

log(

PDise

asex1|Symptom

ASymptom

BSymptom

C)

logP

(Symptom

A|Dise

aseX

1)P

(Symptom

A|Dise

aseX

2)+logP

(Symptom

B|Dise

aseX

1)P

(Symptom

B|Dise

aseX

2)+logP

(Symptom

C|Dise

aseX

1)P

(Sym

ptom

C|Dise

aseX

2)+logP

Dise

aseX

1PDise

aseX

277

Equal

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C

P(Symptom

A|Dise

aseX

)P

(Symptom

A))

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)842

Mainsymptom

sareno

tweigh

ted

Add

prior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)73

Noprior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P(Dise

aseX

)

P(Symptom

A|Dise

aseX

)P

(Symptom

A)

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)

67

Log

functio

nform

log(

PDise

asex1|Symptom

ASymptom

BSymptom

C)

logP

(Symptom

A|Dise

aseX

1)P

(Symptom

A|Dise

aseX

2)+logP

(Symptom

B|Dise

aseX

1)P

(Symptom

B|Dise

aseX

2)+logP

(Symptom

C|Dise

aseX

1)P

(Symptom

C|Dise

aseX

2)+logP

Dise

aseX

1PDise

aseX

276

Equal

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)83



6 Conclusion




Data Availability




Acknowledgments


References








































P(SymptomA)lowast


P(SymptomB)


P(SymptomC)

(5)





(6)






(7)










P(DiseaseX1)

P(DiseaseX2)

(8)






(9)




w1 + w2 + w3 + b

(10)






P(Y|X) P(Y|X)

P(X)lowastP(Y) (11)


P(y|x)

P(y)

P(X|Y)

P(X) (12)


Symptom


Weighted Unweighted

Main symptom











NaiveBayes

Mainsymptomweighted

Mainsymptoms

notweighted

Addprior

probability

Noprior

probability

Logform

Equalprobabi

lity

Addprior

probability

Noprior

probability

Logform

Equalprobabi

lity


0 1

y

x

y = loga x(a gt 1)

y = loga x

Figure 3 Log form


Tabl

e3

Accuracyof

8algorithms

Calculatio

nmetho

dCalculatio

nform

ula

Accuracy

()

Mainsymptom

weigh

t(for

thefirst

3symptom

sifthe


ultip

lyitby

theweigh

tcoeffi

cient)

Add

prior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C||Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)

(for

thefirst

3symptom


ultip

lyitby

theweigh

tcoefficient)

73

Noprior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P(Dise

aseX

)

P(Symptom

A|Dise

aseX

)P

(Dise

aseA

)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C|Dise

aseX

)P

(Symptom

C)

69

Log

functio

nform

log(

PDise

asex1|Symptom

ASymptom

BSymptom

C)

logP

(Symptom

A|Dise

aseX

1)P

(Symptom

A|Dise

aseX

2)+logP

(Symptom

B|Dise

aseX

1)P

(Symptom

B|Dise

aseX

2)+logP

(Symptom

C|Dise

aseX

1)P

(Sym

ptom

C|Dise

aseX

2)+logP

Dise

aseX

1PDise

aseX

277

Equal

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C

P(Symptom

A|Dise

aseX

)P

(Symptom

A))

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)842

Mainsymptom

sareno

tweigh

ted

Add

prior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)73

Noprior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P(Dise

aseX

)

P(Symptom

A|Dise

aseX

)P

(Symptom

A)

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)

67

Log

functio

nform

log(

PDise

asex1|Symptom

ASymptom

BSymptom

C)

logP

(Symptom

A|Dise

aseX

1)P

(Symptom

A|Dise

aseX

2)+logP

(Symptom

B|Dise

aseX

1)P

(Symptom

B|Dise

aseX

2)+logP

(Symptom

C|Dise

aseX

1)P

(Symptom

C|Dise

aseX

2)+logP

Dise

aseX

1PDise

aseX

276

Equal

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)83



6 Conclusion




Data Availability




Acknowledgments


References





































(9)




w1 + w2 + w3 + b

(10)






P(Y|X) P(Y|X)

P(X)lowastP(Y) (11)


P(y|x)

P(y)

P(X|Y)

P(X) (12)


Symptom


Weighted Unweighted

Main symptom











NaiveBayes

Mainsymptomweighted

Mainsymptoms

notweighted

Addprior

probability

Noprior

probability

Logform

Equalprobabi

lity

Addprior

probability

Noprior

probability

Logform

Equalprobabi

lity


0 1

y

x

y = loga x(a gt 1)

y = loga x

Figure 3 Log form


Tabl

e3

Accuracyof

8algorithms

Calculatio

nmetho

dCalculatio

nform

ula

Accuracy

()

Mainsymptom

weigh

t(for

thefirst

3symptom

sifthe


ultip

lyitby

theweigh

tcoeffi

cient)

Add

prior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C||Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)

(for

thefirst

3symptom


ultip

lyitby

theweigh

tcoefficient)

73

Noprior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P(Dise

aseX

)

P(Symptom

A|Dise

aseX

)P

(Dise

aseA

)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C|Dise

aseX

)P

(Symptom

C)

69

Log

functio

nform

log(

PDise

asex1|Symptom

ASymptom

BSymptom

C)

logP

(Symptom

A|Dise

aseX

1)P

(Symptom

A|Dise

aseX

2)+logP

(Symptom

B|Dise

aseX

1)P

(Symptom

B|Dise

aseX

2)+logP

(Symptom

C|Dise

aseX

1)P

(Sym

ptom

C|Dise

aseX

2)+logP

Dise

aseX

1PDise

aseX

277

Equal

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C

P(Symptom

A|Dise

aseX

)P

(Symptom

A))

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)842

Mainsymptom

sareno

tweigh

ted

Add

prior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)73

Noprior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P(Dise

aseX

)

P(Symptom

A|Dise

aseX

)P

(Symptom

A)

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)

67

Log

functio

nform

log(

PDise

asex1|Symptom

ASymptom

BSymptom

C)

logP

(Symptom

A|Dise

aseX

1)P

(Symptom

A|Dise

aseX

2)+logP

(Symptom

B|Dise

aseX

1)P

(Symptom

B|Dise

aseX

2)+logP

(Symptom

C|Dise

aseX

1)P

(Symptom

C|Dise

aseX

2)+logP

Dise

aseX

1PDise

aseX

276

Equal

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)83



6 Conclusion




Data Availability




Acknowledgments


References









































NaiveBayes

Mainsymptomweighted

Mainsymptoms

notweighted

Addprior

probability

Noprior

probability

Logform

Equalprobabi

lity

Addprior

probability

Noprior

probability

Logform

Equalprobabi

lity


0 1

y

x

y = loga x(a gt 1)

y = loga x

Figure 3 Log form


Tabl

e3

Accuracyof

8algorithms

Calculatio

nmetho

dCalculatio

nform

ula

Accuracy

()

Mainsymptom

weigh

t(for

thefirst

3symptom

sifthe


ultip

lyitby

theweigh

tcoeffi

cient)

Add

prior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C||Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)

(for

thefirst

3symptom


ultip

lyitby

theweigh

tcoefficient)

73

Noprior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P(Dise

aseX

)

P(Symptom

A|Dise

aseX

)P

(Dise

aseA

)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C|Dise

aseX

)P

(Symptom

C)

69

Log

functio

nform

log(

PDise

asex1|Symptom

ASymptom

BSymptom

C)

logP

(Symptom

A|Dise

aseX

1)P

(Symptom

A|Dise

aseX

2)+logP

(Symptom

B|Dise

aseX

1)P

(Symptom

B|Dise

aseX

2)+logP

(Symptom

C|Dise

aseX

1)P

(Sym

ptom

C|Dise

aseX

2)+logP

Dise

aseX

1PDise

aseX

277

Equal

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C

P(Symptom

A|Dise

aseX

)P

(Symptom

A))

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)842

Mainsymptom

sareno

tweigh

ted

Add

prior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)73

Noprior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P(Dise

aseX

)

P(Symptom

A|Dise

aseX

)P

(Symptom

A)

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)

67

Log

functio

nform

log(

PDise

asex1|Symptom

ASymptom

BSymptom

C)

logP

(Symptom

A|Dise

aseX

1)P

(Symptom

A|Dise

aseX

2)+logP

(Symptom

B|Dise

aseX

1)P

(Symptom

B|Dise

aseX

2)+logP

(Symptom

C|Dise

aseX

1)P

(Symptom

C|Dise

aseX

2)+logP

Dise

aseX

1PDise

aseX

276

Equal

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)83



6 Conclusion




Data Availability




Acknowledgments


References



































Tabl

e3

Accuracyof

8algorithms

Calculatio

nmetho

dCalculatio

nform

ula

Accuracy

()

Mainsymptom

weigh

t(for

thefirst

3symptom

sifthe


ultip

lyitby

theweigh

tcoeffi

cient)

Add

prior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C||Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)

(for

thefirst

3symptom


ultip

lyitby

theweigh

tcoefficient)

73

Noprior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P(Dise

aseX

)

P(Symptom

A|Dise

aseX

)P

(Dise

aseA

)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C|Dise

aseX

)P

(Symptom

C)

69

Log

functio

nform

log(

PDise

asex1|Symptom

ASymptom

BSymptom

C)

logP

(Symptom

A|Dise

aseX

1)P

(Symptom

A|Dise

aseX

2)+logP

(Symptom

B|Dise

aseX

1)P

(Symptom

B|Dise

aseX

2)+logP

(Symptom

C|Dise

aseX

1)P

(Sym

ptom

C|Dise

aseX

2)+logP

Dise

aseX

1PDise

aseX

277

Equal

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C

P(Symptom

A|Dise

aseX

)P

(Symptom

A))

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)842

Mainsymptom

sareno

tweigh

ted

Add

prior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)73

Noprior

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P(Dise

aseX

)

P(Symptom

A|Dise

aseX

)P

(Symptom

A)

lowastP

(Symptom

B|Dise

aseX

)P

(Symptom

B)lowast

P(Symptom

C|Dise

aseX

)P

(Symptom

C)

67

Log

functio

nform

log(

PDise

asex1|Symptom

ASymptom

BSymptom

C)

logP

(Symptom

A|Dise

aseX

1)P

(Symptom

A|Dise

aseX

2)+logP

(Symptom

B|Dise

aseX

1)P

(Symptom

B|Dise

aseX

2)+logP

(Symptom

C|Dise

aseX

1)P

(Symptom

C|Dise

aseX

2)+logP

Dise

aseX

1PDise

aseX

276

Equal

prob

ability

P(Dise

aseX

|Sym

ptom

ASymptom

BSymptom

C)

P

(Symptom

A|Dise

aseX

)P

(Symptom

A)lowast

P(Symptom

B|Dise

aseX

)P

(Symptom

B)

lowastP

(Symptom

C|Dise

aseX

)P

(Symptom

C)lowast

P(Dise

aseX

)83



6 Conclusion




Data Availability




Acknowledgments


References




































6 Conclusion




Data Availability




Acknowledgments


References



































researchondiagnosispredictionoftraditionalchinesemedicine ...in disease diagnosis, the use of...

Documents