a data-driven model for predicting the course of covid-19 ... · 3/28/2020 · 45 epidemic using...
TRANSCRIPT
![Page 1: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/1.jpg)
1
Adata-drivenmodelforpredictingthecourseofCOVID-19epidemicwith1applicationsforChina,Korea,Italy,Germany,Spain,UKandUSA23(Revised:April5,2020)45Authors:6NordenE.Huang#,FangliQiao#andKa-KitTung*78Affiliations:9#DataAnalysisLaboratory,FIO,Qingdao266061,China10*DepartmentofAppliedMathematics,UniversityofWashington,Seattle,WA9819511#Co-firstauthors:Theseauthorscontributeequallytothiswork.12FQ:[email protected]:[email protected]*Correspondingauthor:[email protected]
15KEYWORDS16Covid-19;Epidemiology;Covid-19predictionsforItaly,Germany,Spain,UKand17USA;Data-drivenapproach;predictionofturningpoints;peakactiveinfected18number;Covid-19recoveryperiod;endofCovid-19epidemic;local-in-timemetric19forepidemicmanagement.2021 22
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.
![Page 2: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/2.jpg)
2
ABSTRACT2324Foranemergentdisease,suchasCovid-19,withnopastepidemiologicaldata25toguidemodels,modelersstruggletomakepredictionsofthecourseofthe26epidemic(Cyranoski,NatureNews18February2020).Thewildlyvarying27predictionsmakeitdifficulttobasepolicydecisionson.Ontheotherhand28muchempiricalinformationisalreadycontainedindataofevolving29epidemiologicalprofiles.Weofferanadditionaltool,basedongeneral30theoreticalprinciplesandvalidatedwithdata,fortrackingtheturningpoints,31peakandaccumulatedcasenumbersofinfectedandrecoveredforan32epidemic,andtopredictitscourse.Abilitytopredicttheturningpointsand33theepidemic’sendisofcrucialimportanceforfightingtheepidemicand34planningforareturntonormalcy.Theaccuracyofthepredictionofthepeaks35oftheepidemicisvalidatedusingdataindifferentregionsinChinashowing36theeffectsofdifferentlevelsofquarantine.Thevalidatedtoolcanbeapplied37toothercountrieswhereCovid-19hasspread,andgenerallytofuture38epidemics.USisfoundtohavethelargestnetinfectionrate,andispredicted39tohavethelargesttotalinfectedcases(708K)andwilltaketwoweekslonger40thanWuhantoreachitsturningpoint,andoneweeklongerthanItalyand41Germany.4243SIGNIFICANCE:Weofferapracticaltoolfortrackingandpredictingthecourseofan44epidemicusingthedailydataontheinfectionandrecovery.Thisdata-driventool45canpredicttheturningpointstwoweeksinadvance,withanaccuracyof2-3days,46validatedusingdatafromvariousregionsinChinaselectedtoshowtheeffectsof47quarantine.Italsogivesinformationonhowrapidtheriseandfallofthecase48numbersare,andwhatthepeakandtotalnumberofinfectedare.Although49empirical,thisapproachhasasoundtheoreticalfoundation;themaincomponents50oftheresultsarevalidatedaftertheepidemicisnearanend,asisthecaseforChina,51andthereforeisgenerallyapplicabletofutureepidemics. 52
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 3: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/3.jpg)
3
531.Introduction.54ThecurrentCOVID-19epidemiciscausedbyanovelcoronavirus,designated55officiallyasSARS-CoV-2,spreadingfromWuhan,thecapitalcityofHubeiprovincein56China(2-4).ThenewvirusseemstohavecharacteristicsdifferentfromSARS57(severeacuterespiratorysyndrome)(5,6):itislessdeadlybutspreadsmorewidely58(7-10).Modelingtheepidemicasitdevelopshasbeendifficult(1).Dependingon59themodelassumptions,predictionsofwhenit“turnsacorner”forChinavaries60greatly(11-21),uptoafter650millionpeoplehavebeeninfectedbeforepeaking;61manyhavenowbeenshowntobeinaccurate(22).Nowastheepidemichas62subsidedinChinaandbecomeaglobalpandemic(23,24),areliableforecastofthe63courseoftheoutbreakineachregioniscriticalforthemanagementand64containmentoftheepidemic,andforbalancingtheimpactfromthepublichealth65crisisvstheeconomiccrisis.Chinahasinstitutedsomeofthestrictestquarantine66measuresaroundWuhanandHubei,whichmayormaynotbeadoptableinother67countries(25-27).Itwouldbeusefultoextractthedependenceoftheepidemic’s68evolutiononthedegreeofquarantinetoguidepolicydecisions,whilealsoto69characterizepropertiesofCovid-19thatareapplicabletoothercountries.7071MainstreamepidemiologicalmodelshavetheiroriginintheSIR(Susceptible,72Infected,andRecoveredorRemoved)model(28)anditsmanyvariations.We73explainintheSupplementaryInformationwhyexistingmodelpredictionsvaryso74widelybycommentingontheassumptionsunderlyingthesemodels.7576TheseSIR-typemodels,however,servesacriticalpurposeforlong-rangepolicy77planning,suchaswarningpolicy-decisionmakersofthegravityofthepotential78impactandpromptingthemtotakeproperactionsbeforeitistoolate.Afterthe79breakout,moreinformationisneededformoredetailedplanning,suchasthe80arrivalofthecriticalturningpoints,thenumberofhospitalbedwemightneedat81thepeak,andtheestimateforwhentoliftthequarantine,andwhentoreturnto82normalcy.Weofferhereanadditionaltoolthathastheadvantagethatithasdoes83notdependontheelusiveinfectionrateorthesusceptiblepopulation,information84neededformostmodels,buthasthedisadvantagethatitcannotbeusedwhenthe85epidemicfirststartedandthedataareinaccurateorincomplete.Itisbasedondaily86casenumbers(i.e.newlyconfirmedcases),N(t),andrecoveredcases,R(t).8788Withoutuniversaltesting,theconfirmedcasenumbermightbeonlyasubsetofthe89truetotalinfectednumber,whichmayneverbeknownunlessfrequentuniversal90testingisinstituted.Theasymptomaticinfectedwhoarenottestedandthen91recoverontheirowndonotgetcountedbuttheyalsodonottaxhospitalresources.92Nevertheless,theycaninfectothersandsomeofthelattermaydevelopmore93serioussymptomsthatrequirehospitalization.Thenthesesecondaryinfectionsare94includedinourcasedata.Ouraimistoprovideatoolthatcanbeusedforthe95managementofmedicalresources.Sincewedonotuseamodeltocalculatehowthe96asymptomaticinfectivesinfectothers,wedonotneedtoknoweithertheinfection97rateortheasymptomaticinfectivenumbers.Sincethosewhoareadmittedtothe98
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 4: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/4.jpg)
4
hospitaleitherrecoverafterahospitalstayofTdays,ordeadafterasimilarnumber99ofdays,thereshouldbeadelayedrelationshipbetweenN(t)andR(t),whichwewill100exploreintheTheorysection.NowthattheepidemicinChinaappearstohavecome101toanend,thedatafromvariousregionsinChinacanbeusedtovalidatethemodel.102Aftervalidationwethenapplyittootherregionsintheworld.103104Ourestimateoftheenddateoftheepidemicisnotbasedonthenumberof105susceptibles,S,approachingzeroasinmostmodels(i.e.mostofthepopulationis106infected,henceacquiringimmunity),butN(t)approachingzeroandremainingso107fortwoincubationperiods.Thefirstincubationperiodistoallowtheasymptomatic108infectedtoshowsymptomsandthesecondperiodtoallowthosethatareinfected109bytheasymptomaticinfectedtoshowsymptoms.Forpredictionpurpose,thedate110whentheN(t)iszeroisestimatedby3standarddeviationsfromitspeak.These111twoquantitiescanbeextractedfromthedataastheepidemicisdeveloping.Our112estimateoftheendoftheepidemicisearlierthanmostmodelpredictions,usually113significantlyso,becauseitdoesnotdependontheherdimmunityconcept.114115Asistrueforalldata-drivenapproaches,ourresultinevitablydependsonthe116qualityofthedataused,andsomeoftheearlydataoftheepidemicarenotasgood117asthelaterdata,whenbetterdiagnosticmethodsandmorecompletereportingare118established.However,manyofthemetricscommonlyinuserequireaccumulation119ofdatafromthebeginningoftheepidemic,andconsequentlyareaffectedbypoor120dataorchangeofdiagnosticmethodsalongtheway.Wetrytoavoidaccumulation121anduselocal-in-timemetrics.Nevertheless,dataproblemscannotbeavoided.122Sensitivityofourconclusionondataproblemsisextensivelydiscussedinthiswork.123FigureS1displaysexamplesofdatausedinthisstudy.Oneproblemimmediately124becomesobviousfortheChinesedata:On12February,whenHubeichangedits125definitionofconfirmedinfectionfromthegoldstandardofnucleicacidgene-126sequencingteststoclinicalobservationsandradiologicalchestscans,over14,000127newlyinfectedcaseswereaddedthatday,creatingapeakthathasnotbeen128exceededsince.OverwhelmeddoctorsinWuhanpleadedforthechangesothat129theydidnothavetowaitforthereturnedteststoconfirmtheinfection.Outside130Hubei,therewasnochangeindefinitionforthe“infected”.Howthisartifactaffects131ourconclusionwillbediscussed.1321332.Modelanditsvalidationusingdata134Definition:LetI(t) bethenumberofactiveinfectedattimet . Itschangeisgiven135by;136
ddtI =N(t)−R(t), 137
whereN(t) isthenumberofnewlyinfected,andR(t) ,designatedasremoved,is138thesumofthedailyrecoveredanddead.ForadiseasesuchasCovid-19withlow139fatalityrate,R(t)consistsofmainlyrecovered.Howeverevenforthisdisease,the140fatalityrateinsomeregions,suchasinNorthernItaly,approaches10%.Forthese141regionsR(t)shouldincludethedeadaswell.Notethatforthetheorypart,N(t)142
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 5: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/5.jpg)
5
includesbothconfirmedandunconfirmedcases.Theterm:ExistingInfectedCase143(EIC)numberisusedtodenotetheconfirmedI(t)whenwedealwithdata.144145Let
tp ,theturningpointdefinedasthepeakoftheactiveinfectednumber.Atthis146pointmaximummedicalresourceisneeded.Thismaximumoccurswhen147
ddtI =0,implyingN(tp)= R(tp). 148
Thisisalocal-in-timemetric.ThereisthereforenoneedtofirstfindI(t)tolocate149thispeak.Aftertheturningpoint,thenewlyrecoveredstartstoexceedthenewly150infected.Thedemandformedicalresources,suchashospitalbeds,isolationwards151andrespirators,startstodecrease.152153ThetheoreticalfoundationforourmodelisgiveninSupplementaryInformation.154Herewediscussthemainresultsandoffervalidationoftheseresultsusingdata155fromChina.156157MainResult:Thedailynewlyrecovered/removednumberR(t),isrelatedtothe158dailynewlyinfectednumberN(t)as:159
160R(t)=N(t −T), 161
fort>T,whereTisthemeanrecoveryperiod.162Thisresultcanberigorouslyderivedusinganage-structuredpopulationmodel(see163reference(36)).Itisalsocommonsense:theinfectedeventuallyrecoveredaftera164numberofdays,ordeadafterasimilarnumberofdays.Ofcourse,thenumberof165daysapatientstaysinthehospitalbeforedischargedependsontheefficacyof166treatmentandsovariessomewhat,andthetimeittakesforapatienttodiemayalso167dependontheageandunderlyingconditions.Tisthereforeastatisticalquantity.168Validation:Thisfundamentalrelationshipcanbevalidatedstatisticallywithdata.169Figures1,obtainedusingdatafromChinaduringtheCovid-19epidemic,showsthat170N(t)andR(t)arehighlycorrelated:withcorrelationcoefficientof0.95whenboth171distributionsaresmoothedwith5-pointboxcar.Theunsmootheddailydataalso172yieldahighcorrelationcoefficientof0.80,withR(t)laggingN(t)byT~15days.Both173correlationcoefficientsarestatisticallysignificant.Asimilarresultisfoundfor174Hubei(FigureS2)andotherregions(notshown).Thisisoneofthewaysthemean175recoveryperiodisdeterminedstatisticallyfromdata,butitisnotpracticalinthe176earlyphaseoftheepidemic.Wewillgivedifferentmethodsforthelatterpurpose.177TheresultonTisconsistentwiththatestimatedorpredictedlaterusingtheslopeof178thedistributioninFigure4.Thelatter,obtainedbytheinterceptofthestraightline,179islessaccuratebecauseoftheslopeisrathershallow.180181
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 6: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/6.jpg)
6
182Figure1.LaggedcorrelationofcasenumbersR(t)andN(t)forChinaasawhole.183184MainResult:ThenaturallogarithmoftheratioofNandRisalinearfunctionof185timefort >T .Thisrelationshipisimportantforthepurposeofforecastbecauseitis186easytoextrapolatefromastraightlineintothefuture.187188ValidationoflogNRasastraightline:189Fromdataweusethereportnewlyconfirmedcasenumberandtherecoveredcase190numbertodefineNRratioas191
NR(t)=N(t)/R(t).192Attp,NR=1.193194WeshowinFigure2,usingthedataoftheepidemicforCOVID-19,thatthe195logarithmofNR(t)liesonastraightline,withsmallscatter,passingthroughthe196turningpointtp.Anddataforvariousstagesoftheepidemic,fromtheinitial197exponentialgrowthstage,tonearthepeakofEIC,andthenpastthepeak,alllieon198thesamestraightline.TheinterceptwithlogNR=0yieldstheturningpoint.This199line,obtainedbylinear-least-squarefitinthesemi-logplot,islittleaffectedbythe200ratherlargeartificialspikeinthedataon12Februarybecauseofitsshortduration201andthelogarithmicvalue.Thatreportingproblemisnecessarilyofshortduration202because,onthedateofdefinitionchange,previousweek’scasesofinfected203accordingtothenewcriteriawerereportedinoneday.Afterthat,thebookis204cleared,andN(t)returnedtoitsnormalrange.205206
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 7: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/7.jpg)
7
207208Figure2.Logarithmoftheratioofdailynewlyinfectedtonewlyrecovered.Theylie209onstraightlineswithsomesmallscatter.Thedottedstraightlinesareobtainedby210linear-leastsquaresfitis.Theslopesofthelinesarealmostthesamebutwith211differentintercept;thetrendlinescrosszero(theblacksolidline)atdifferenttime212fordifferentregionsindicatingdifferentpeakingtimeforEIC.TheepicenterWuhan213(green)haslatestturningpointthanitsprovinceHubei(pink),whichhasalater214turningpointthanChinaasawhole(cyan).215216ThetheoreticalresultinSIsuggeststhattheslopeofthelinearlineis-T/ σR
2,where217σR is the standard deviation of the R(t) profile. Ingeneral,theslopecanbedifferentfor218differentregionswithdifferentlevelsofquarantineandepidemiccharacteristics.219ThehospitaltreatmentefficacywouldinfluenceTdirectly.Theeffectofquarantine220wouldinfluencethevalueofσN,thestandarddeviationofthenewlyinfected,andso221indirectlyR(t)andσR.OurempiricalresultfromFig.2howevershowsthattheslope222isthealmostthesamefordifferentregionsinChina,implyingthatefficacyof223treatmentandlevelofquarantineaffectTandσ2proportionally.224225Result:ThederivativeoflogN(t)andoflogR(t)iseachalinearfunctionoftime,226withknownslope.Theirinterceptwiththezeroderivativelineyieldsthetimefor227theirrespectivepeak.228
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 8: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/8.jpg)
8
229Validation230Empirically,thederivativeoflogN(t)orlogR(t)liesonastraightline,asshownin231Fig.3(althoughthescatterislargerastobeexpectedforanydifferentiationof232empiricaldata).Thepositiveandnegativeoutliersonedaybeforeandafter12Feb233arecausedbythespikeupandthendown,withlittleeffectonthefittedlineartrend234(butincreasesitsvarianceandthereforeuncertainty).Moreover,thestraightline235extendswithoutappreciablechangeinslopebeyondthepeakofN(t),suggesting236thatthedistributionofthenewlyinfectednumberisapproximatelyGaussian.The237meanrecoverytimeTcanbepredictedastR −tN ,wheretRisthepeakofR(t)andtN238isthepeakofN(t).Thesetwopeaktimescanbeobtainedbyextendingthestraight239lineinFig.3tointersectthezeroline.Thispredictedresultcanbeverified240statisticallyafterthefactbythelaggedcorrelationofR(t)andN(t).Ifthe241distributionisindeedGaussianorevenapproximatelyso,theslopeinFig.3would242beproportionaltothereciprocalofthesquareofitsstandarddeviation,σ,as(See243SI):244
d logN(t)
dt=−(t −tN )
σ N2 . 245
246Similarlyresultholdsforthedailynumberofrecovered,R(t).247248AftertheepidemicisnearingtheendasisthecaseinChina,fittingthedatatoa249Gaussiancanbedoneafterthefact(seeFiguresS3andS4).Thefitissatisfactory250evenwithoutusinganydisposalparameters.Theparametersusedaredetermined251usingslopesoflogNandlogR(seeTableS1)252253TheinferredstatisticalcharacteristicsoftheCovid-19epidemicaresummarizedin254TableS1forvariousregions.ThemeanrecoverytimeT,isabout13daysforChina255asawhole.ForWuhan,thecityattheepicenterwhosehospitalsweremore256overwhelmedandthepatientsadmittedintohospitalsmoreseriouslyillthanthose257inotherprovinces,T~16days,whilethatforHubeiis14days.Thestandard258deviation,σ,isfoundtobearound8days,withslightdifferencebetweenthatfor259N(t)andforR(t),withoneexceptionforHubeioutsideWuhan.Suchafine260subdivisionmaynotbepracticalforthedataqualitywehave.Theσ tendstobe261smallerforChinaasawholethanWuhan.OnecanseethatTandσ2 indeed varying 262approximately in proportion. 263 264The peak infected cases: 265
Writing N(tN )=N(tB )exp{ n(t)dt}
tB
tN∫ , and noting n(t)= d
dtlogN(t)= − (t −tN )
σ N2 , the 266
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 9: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/9.jpg)
9
exponent is
n(t)dt = (tN −tB )2
2σ N2 = 12tB
tN∫ n(tB )(tN −tB ). Hence the peak infected case number 267
can be predicted, using the predicted value for tN starting from a conveniently chosen 268
time tB , such as the latest time with data available, as: 269
N(tN )=N(tB )exp{
n(tB )(tN −tB )2 } . 270
271Accumulatedquantities.272TocalculateI(t)usingreporteddata,onlyconfirmedcasesareused(WecallitEIC).273Itisgivenbytheaccumulatednewlyconfirmedcasesminustheaccumulated274confirmedrecovered.Sincetheaccumulationofearlypoordatacanintroduce275errorsamorelocal-in-timeformulaisgivenas:276277
I(t) = N (t)dt−−∞
t
∫ R(t)−∞
t
∫ dt = N (t)dt− N (t−T )−∞
t
∫−∞
t
∫ dt
= N (t)dt.t−T
t
∫278
Thatis,tofindIattimet,oneonlyneedstoaddupthedailynewlyinfectedcase279numbersforaperiodofTprecedingt.Thisisanalmostlocal-in-timepropertyeven280forthisaccumulatedquantity.Forvalidation,weestimatethepeakoftheIcase281numberon18Februarybycomputingthesumofdailynewlyinfectedcasenumbers282for15days,fromFebruary4toFebruary18,whichyieldsapeakvalueforthetotal283infectedcaseson18Februaryof54,747.Thisiswithin10%ofthereportednumber284of57,805,evenaftertakingintoaccountthedeaths(bysubtractingthe285accumulateddeathsof2,004fromourestimate).286287
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 10: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/10.jpg)
10
288Figure3Thederivativeofthelogarithmofdailynewlyinfectedorrecovered.289Noticetheclearseparationofthenewandrecoveredcasesandalsothesubtle290differenceoftheirslopes.Thezerocrossingsofthetrendlinegivethepeakdatesof291thenewandrecoveredcaserespectively.Andtheslopesgiveanestimateofσ 292values. InthisFigure,thefollowingabbreviationsareused:C=China;H=Hubei;293N=NewCase;R=Recovered.2942953.Predictability296PredictionoftheturningpointusingNRratio.297Wefirstdiscusshowthetrueturningpointcanbedeterminedfromdataafterithas298occurred.Thenwegiveamethodforpredictingthistruevalueinadvanceand299assesstheaccuracyoftheforecastsasfunctionofdaysinadvancewhenthe300predictionismade.Anote:afterthismanuscriptwassubmittedforreviewthe301predictionsthatwemadepreviouslyhavecometopass.Althoughconsequentlythe302valueofourpredictionshasgreatlydiminished,itgivesusachancetocompareour303predictionsagainstthetruths.Thismodelvalidationprocessisimportantifweare304toapplythesamemethodforpredictiontootherregions.305306Theturningpointandtheendoftheepidemicarethetwomostwatchedmarkerson307itsdevelopment(28,29),alongwiththenumberofinfectedateachstageofthe308epidemic.Therearevariousdefinitionsoftheturningpoint.Acommononedefines309theturningpointoftheepidemicasthereporteddailynumberofnewlyinfected310reachingapeakandthendeclining.Thisistheonetoutedinthevariousnews311announcements,andalsousedbysomeresearchgroups(22).Thefactthatthe312numberofnewlyinfectedreachingapeakandthendecliningdoesnotnecessarily313
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 11: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/11.jpg)
11
implythattheepidemichas“turnedacorner”,becausethetotalnumberofactive314infectedcanstillberisingwiththeassociatedurgentneedforadditionalmedical315resources,suchashospitalbeds,isolationwardsandventilators.Furthermore,316locatingthispeakishighlysusceptibletodataglitchesandchangeindiagnostic317definition.Amoremeaningfulturningpointshouldbebasedonthenumberof318confirmedinfectedindividuals,designatedasEIC)(15),reachingapeakandthen319startingtodecline.EICisintheoryobtainablefromdataofthedailynumberofnew320confirmedcases,N(t),andthedailynumberofnewlyrecovered,R(t),bysubtracting321theaccumulatedsumofR(t)fromtheaccumulatedsumofN(t).Analysisofthis322accumulatedquantityissensitivelyaffectedbyaccumulationofpoorerearlydataof323reportedcases,includingunder-reportingandunder-detectionofthenumberof324infectedcausedbyinsufficienttestkits,inadditiontothehistoryofchanging325diagnosticcriteria.Moreoverinpracticeitspeakisoftennotdetecteduntilseveral326weeksafterithasoccurred.327328SincethemaximumofEIC,canbelocatedbythezeroofitsderivative,wepropose329usingalocal-in-timemetricofN(tp)=R(tp)atthepeakofEIC,tp.330331ReferringtoFigureS1,forChinaaswhole,tpisfoundtobeFebruary18;forHubei,332theprovinceoftheepicenterWuhan,tpisfoundtobe19February,andforChina333outsideHubei(ChinaexHubei),12February,coincidentallyonthesamedayasthe334Hubeidataspike.HoweverthereisnosuchbumpinthedataoutsideHubei,andso335isnotlikelytheresultofthedataartifact.Theseresults,evenincludingthatfor336Hubei,arenotaffectedbythehistoricaldataproblemsbecauseofourlocal-in-time337methodfordeterminingtheturningpoint.338339Cansuchaturningpointbepredictedbeforeithappened,andifsobyhowmany340daysinadvance?SincethelogarithmofNRliesonastraightlinepassingthrough341theturningpointofEIC,itwouldbeinterestingtoexploreiftheturningpointcanbe342predictedbyextrapolationusingdataweeksbeforeithappenedbyextrapolation343alongthestraightline(seeFigureS5).Howfarinadvancethiscanbedoneappears344tobelimitedbythepoorqualityoftheinitialdata.Fig.4showstheresultsofsuch345predictions.Thehorizontalaxisindicatesthelastdateofthedatausedinthe346prediction.Thebeginningdateofthedatausedis24Januaryforallexperiments.347Priortothatday,dataqualitywaspoorandthenewlyrecoverednumberwaszero348insomedays,givinganinfiniteNRratio.349350ForChinaoutsideHubei,thepredictionmadeon6Februarygivestheturningpoint351as14February,twodayslaterthanthetruth.Apredictionmadeon8February352alreadyconvergedtothetruthof12February,andstaysnearthetruth,differingby353nomorethanfractionsofadaywithmoredata.354355Thehugedataglitchon12FebruaryinHubeiaffectedthepredictionforHubei,for356Chinaaswhole,andforHubei-exWuhan.Thesethreecurvesallshowabumpup357starting12February,astheslopeofN(t)isartificiallylifted.Ironically,predictions358madeearlierthan12Februaryareactuallybetter.Forexample,forChinaasawhole,359
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 12: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/12.jpg)
12
predictionsmadeon9Februaryand10Februarybothgive19Februaryasthe360turningpoint,onlyonedayoffthetruthof18February.Apredictionmadeon11361Februaryactuallygivesthecorrectturningpointthatwouldoccuroneweeklater.362Atthetimethesepredictionsaremade,thenewlyinfectedcaseswererisingrapidly,363byover2,000eachday,andlaterbyover14,000.Itwouldhavebeenincredulousif364oneweretoannounceatthattimethattheepidemicwouldturnthecorneraweek365later.366367EvenwiththehugespikefortheregionsaffectedbytheHubei’schangingof368diagnosiscriteria,becauseofitsshortdurationtheartifactaffectsthepredicted369valuebynomorethan3days,andthepredictionaccuracysoonrecoversforChina370asawhole.ForHubei,thepredictionneverconvergestothetruevalue,butthe371over-predictionisonly2days.Thissmallnessoftheerrorisremarkablegiventhat372othermodelpredictionsdifferbyweeksormonths.373374TableS2liststhemeanandstandarddeviationofthepredictions.Forapplications375toothercountriesandtofutureepidemicswithoutachangeinthedefinitionofthe376“infection”tosuchalargeextent,weexpectevenbetterpredictionaccuracy.377378379
380381Figure4PredictionoftheturningpointinEICbyextrapolatingthetrendin382logarithmofNR.Thehorizontalaxisindicatesthedatethepredictionismadeusing383datapriortothatdate.Theverticalaxisgivesthedatesofthepredictedturning384point.Dashedhorizontallinesindicatedthetruedatesfortheturningpoint,as385determinedfromFig.S1.386 387
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 13: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/13.jpg)
13
Estimateof“allclear”declaration388Wecannowestimateatimeforadeclarationof“allclear”.Noverificationisyet389possibleasthepredicteddatehasnotoccurred.Attheturningpoint,theEICisstill390atitspeak.Forthediseasetohaverunitscourse,andan“allclear”declarationcan391beannounced,werequirethatthenewlyinfectedcasenumbertodroptozero.For392predictionpracticethis“zero”ismeasuredbythreestandarddeviationsfromthe393peakofN(t).Thenwewaitfortwoincubationperiods,each14days,topass,before394wedeclare“allclear”.UsingtheinferreddiseasecharacteristicsinTableS1,our395predictionis,forChinaoutsideHubei:thelastweekofMarch.ForChinaasawhole:396thefirstweekofApril,barring“imports”ofinfectedfromabroad.Atthispointthere397maystillbesomepatientsinthehospitalwhoareinfectedwiththevirus.The“all398clear”callassumesthatthesepatientsarenotroamingfreelytocausenew399infections.400401PredictionforSouthKorea402FigureS6summarizedtheavailabledataforKoreaatthepresent.Therecovered403casenumbershoveredaround1and2dailyuptoMarch1st.Itonlypickedup404towardtheend.Startingfrom19February,thereseemstobeenoughnewdaily405infectedcases.TheSouthKoreaGovernmenthasidentifiedthattheepiccenterof406theepidemicwasatchurchgatheringsinthecityofDaeguandNorthGyeongsang407province,where90%ofthecasesarefound.Specifically,aconfirmedCOVID-19408patientwasreportedtohaveattendtheShincheonjiChurchofJesusservicestwice409onFebruary9thand16th.Giventheincubationperiodof7to14days,theinitial410explosionatFebruary19thandthefirstpeakvaluearoundFebruary24tharenot411accidents.412413Ifweusetheavailabledailynewcasesdata,wecangetthestatisticalcharacteristics414ofthedistributionofthedailynewcasesfromFigureS7,whichgivesthetNasMarch4153rdandaσ N valueof4.5days.Ifwefurtherusetheturningpointasapproximately416tN+T/2,thentheturningpointshouldfallonMarch10,assumingTas14daysbased417ontheoverallmeanfromdifferentregionsinChina.418419FortheNRratio,itislimitedbytheavailabilityofrecoveredcasenumber.Ifweuse420thelimitedrecoveredcasesstartingfromMarch1st,wehave7daysofdata.The421computedtheNRratiotogetherwiththetrendisgiveninFigureS8.Theturning422point,atthezero-crossingoftheextendedtrendline,wouldoccurbetweenMarch42311thand12th.ThisapproachdoesnotneedtouseavalueforT.424425AnestimateoftheendoftheepidemiccanbegivenasthesecondweekofApril,426usingtheestimatedvaluefortN=3March,σ=4.5days.Remarkably,thisdateis427aroundthesametimeasforWuhan,China.SouthKoreaowesitsquickturningpoint428andendoftheepidemicdatetoitsabilitytoidentitythefirstinfectionandthe429secondaryinfectionsatShincheonjiChurch(31),wheremostoftheinfectedwere430concentrated.Thisisreflectedinthedata:σforSouthKoreaisonlyhalfthatof431China,withamorerapidriseandfallofthenewlyinfected.Itsdataforthenewly432
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 14: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/14.jpg)
14
infectedareprobablymoreaccuratecomparedtoothercountriesinsimilarstageof433theepidemic,duetoitsmassiveandspeedy(within6hours)testingofthe434populationinits“trace,testandtreat”policy.4354364.Effectsofquarantinejudgingfromdataonthenetinfectionrate.437Following(15),wedefineatime-dependentnetinfectionrateas:438
α(t)= dI dt
I= ddtlog I(t). 439
Intraditionalmodels,suchastheSIRmodel,thereisalsoatime-dependent440infectionrate,whichatt=0isrelatedtotheBasicReproductiveNumberR0.Ifthis441numberisgreaterthan1thenanepidemicwillensue,i.e.theinfectedpopulation442willincreaseexponentiallyaftertheintroductiontoasusceptiblepopulationSatt=0443someinitialinfected.Thatis,fromtheSIRmodelequation:444
dI dt = aSI −bI = bI(aS /b−1), 445
whereaS(t)istheinfectionrateandb=1/T isthemeanrecoveryrate. 446
Thereforeα(t)= dI /dt
I= b(aS
b−1).α(0)= b(R0 −1),withR0 =
aS(0)b
. 447
Ourtime-dependentnetinfectionrategeneralizesthisconcepttobeindependentof448theSIRorothermodelsandbeapplicableatlatertimesaswell:Ifinthecourseofan449epidemic,α(t) ispositive,thenumberofinfectedwillgrowexponentially,reaching450apeaknumberofinfectedwhenα(t)=0att = tp .Thenthetotalnumberofactive451
infectedwilldecreaseexponentially.OnecouldinanalogytoR0 ,defineatime-452
dependentReproductiveNumberRt =α(t)T −1 ,sothatifthisnumberisgreater453(less)than1thenumberofinfectedwillgrow(decrease)attimet .Wewillhere454useα(t) directly.455456Sinceα(t)hasazeroanditsfirstderivativenearthezeroisnonzero(viznegative)457because
tp isthemaximumofI(t),itisalinearfunctionoftwithnegativeslopein458thatneighborhood.Thisexpectationisverifiedempirically,usingdataforthetotal459existingcasenumbers.Wefindthatthistimedependentinfectionrateis460approximatelyalinearfunctionoftimeintheneighborhoodofitszerooveraperiod461ofafewweeks.462463Thisgivesanotherwaytopredicttheturningpoint
tpwhichcanbeusedinsteadof464(butislessaccuratethan)theNRratio,duringtheearlystageoftheepidemicwhen465notenoughR(t)dataisavailable,asisthecasecurrentlyforUS.466467ThepeakEICnumbercanbepredictedas468
I(tp)= I(tB )exp{12α(tB )(tB −tp)}, wheretB isthelastavailabledatabeforethe469
turningpoint.Itisassumedthatα(t) liesonastraightlinebetweentB andtp .470
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 15: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/15.jpg)
15
Predictingthepeakactiveinfectedcases471Sincetheturningpointcanbepredictedtwoweeksinadvance,theaboveformula472canbeusedtopredictthepeakEICnumbers.473474Predictingthetotalinfectedcases(TIC):475
TIC(t)= N(t)dt .0
t
∫ 476
Topredictthetotalinfectedcasesfortheepidemicforaregion,weneedtodothe477aboveaccumulationofN(t)intothefuture,totheenddateoftheepidemic.That478totalisapproximately:479
TIC∞ =2⋅TIC(tN ) ,480assumingthatN(t)isapproximatelysymmetricaboutitspeakattN.InrealityN(t)481maynotbesymmetricandlikelyhasalongtail.However,sincethenumberofcases482alongthetailissmall,theaboveapproximationforthetotalisstillgood.Ifthe483presenttimetBisbeforetN,weneedawaytopredictTIC(tN ). Letthetotalinfection484ratebedefinedas:485
β(t)= d
dtlogTIC(t)=
ddtTIC(t)TIC(t) = N(t)
TIC(t). 486
Byextrapolatethetotalinfectionrateforwardintimewecanpredict:487
TIC(tN )=TIC(tB )exp{ β(t)dt}.
tB
tN∫ 488
USA:tNispredictedtobe2.2daysfromtoday,April5.Today’sTICis308,850.Sothe489predictedtotalinfectedcasesfortheepidemicwhenitisoverispredictedtobe490708,750.Thisisaverylargenumber,ninetimeslargerthanthatofChina,whichhas491amuchlargerpopulation,butisneverthelessmuchlowerthansomeother492predictionsofafewmillioninfected.ItscurrentEICis285,000,andispredictedto493peak12daysfromnowat547,000.Thisisthepeakdemandforhospitalbeds.494Germany:Germanyhasgooddata.ItstNwas2daysago.OnthatdayitsTICwas49585,000.Sothetotalinfectedcasesfortheepidemicwhenitisoverispredictedtobe4962timesthat:170,000.ItscurrentEICis68,248.ThepeakEICis69,627,whichwill497occurtwodaysfromnow.498Spain:tNoccurredonMarch31.OnthatdayitsTICwas94,417.ThetotalTICwhen499theepidemicisoverispredictedtobetwicethat,188,800.ItspeakEICispredicted500tobe84,250,tooccur4daysfromnow.501UK:ItscurrentTICis41,000.ItstNis7.5daysfromnow.CalculatingitsTICatthat502timeandthendoublingityieldsatotalTICof134,000whentheepidemicisover.503ThepeakEICispredictedtobe83,200,16daysfromnow.Thisisthepeakdemand504forhospitalbeds.(ItshouldbenotedthattherecoveredcasenumberforUKis505unusuallylow,currentlyat209,whilethedeadismuchhigher,at4,900.Thedata506maybedoubtful.)507508Commentsontheeffectsofquarantine509
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 16: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/16.jpg)
16
Additionallythenetinfectionraterevealstheeffectofmeasurestakenwithsocial510distancingandquarantine.Figure5showsthetime-dependentnetinfectionratefor511eachregionstartingwhenthenewlyconfirmedcasesexceed100.Thiswayof512plottingfacilitatescomparisonofdifferentregionsatthesamestageoftheepidemic.513First,ChinaoutsideHubeihasthelowesttime-dependentinfectionrateafterHubei514waslockdown.GermanyandItalyhavesimilarexponentialgrowthrateofthenet515infectedcasenumbers,bothhigherthanevenWuhan,theepicenterinChina.More516surprisingly,UShasthehighestexponentialnetinfectionrate,higherthanGermany517andItalyandChina.ThiscanbeattributedtothefactthatUSsofardoesnothavea518nation-wideshutdown,unliketheseothercountries.Secondly,ChinaoutsideHubei519reacheditsturningpointearly,infact20daysearlierthantheepicenter,Wuhan.520Wehadpreviouslypredictedthis,whichisqualitativelydifferentthanmanymodel521predictions,whichhadtheepicenterachievingitsturningpoint1-2weeksearlier522thanChinaoutsideHubei(13).ItalyandGermanyarepredictedtotakeaweek523longertoreachtheirturningpoint,whileUSwilltakeanotherweeklongerthanit524willtakeGermanyandItaly.525526ThatItalywouldtakethesameamountoftimetoreachitsturningpointasGermany527andhasapproximatelythesamenetinfectionratemaybeduetotworeasons:Italy528doesnottestaswidelyasGermanyandsothecasenumbersrepresentasmaller529portionoftheinfected.SecondlyGermanyhasthelowestfatalityratewhileItalyone530ofthehighest.Thenumberofthedeadisaboutthesameasthenumberofcuredfor531Italy.Thedeadisincludedinrecovered/removed.Ifitwerenotincluded,EIC532wouldhavebeenhigherforItaly.Nevertheless,whetherdeadorcured,thehospital533bedisvacated.534535536
537 538539
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 17: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/17.jpg)
17
Figure5.Thetime-dependentnetinfectionrate(inunitsof1/day)asafunctionof540timestartingonthedate(listedintheinset)whenthenewlyconfirmedcase541numberexceeds100foreachregion.Toobtaintheactualcalendardate,addthe542datesonthehorizontalaxistothestartingdateindicatedintheinset.Thenumberof543confirmedcasesonthestartingdateislistedatthetop.5445455.Conclusion.546Weofferanadditionaldata-drivenapproachtotrackandpredictthecourseofthe547epidemic.Manyparameterscharacterizinganepidemiccanbedeterminedfrom548local-in-timedata.Validatedbyrealdata,wesuggestthatourapproachcouldbe549appliednotjusttothecurrentCovid-19epidemic,butalsogenerallytofuture550epidemics.Itcouldalsobeusedasapracticaltoolforepidemicmanagement551decisionssuchasquarantineinstitutionandmedicalresourceplanningand552allocations(32-35).553554Tworesultsareofspecialsignificanceforfuturepolicymakers.Firsttheturning555pointfortheepidemicinChinaexHubeioccurredaweekearlierthanthatfor556Wuhan.SecondtheUSwilltake2weekslongertoreachitsturningpointthaneven557Wuhan.Afteritslockdown,Wuhan,withalargesusceptiblepopulationof11million,558enforcedstraightsocialdistancing,whichismorestrictthanthatadoptedintheUS..559Asaconsequence,evenwithalargepoolofpotentialsusceptible,theoutbreakcould560endsooner,ascomparedtothetimeitwilltakeUSandEurope.ForChina,the561lockdownofWuhanandHubeiwasthereasonwhytheepidemicoutsideHubeiwas562undercontrol,andtheturningpointoccurredearlier.InWuhan,withhospitals563facingthenumberofinfectedpatientsfarexceedingavailablehospitalbedsinthe564initialperiod,someinfectedpatientswerenotadequatelyisolated.Theinfected565weresenthomeandcausedsecondaryinfectionsamongfamilymembers.This566mighthaveplayedaroleindelayingtheturningpoint.Ontheotherhand,outside567Hubei,hospitalswerenotasoverwhelmedbecauseofthestrictquarantineplaced568onHubei,whichdrasticallyreducedtheimportofthediseaseoriginatingfrom569Hubei.Theinfectedwerebetterisolated,reducingfurtherspread,andtreatedin570hospitals,resultinginshortertimetorecovery(seeTableS1).Thisisevidenceofthe571effectivenessofthecityandprovince-widelockdownin“flatteningthecurve”572outside.573574TheadditionalandsurprisingfindingthatthenetinfectionratesinItaly,Germany575andtheUSarehigherthanevenWuhanalsomanifeststheeffectoftheenforcement576oflockdown,stay-at-homeandstrictsocialdistancingpolicyinWuhan,whichwas577muchstricterthanthoseadoptedcurrentlyinEuropeandUS.Themorelaxlatitude578andlackofenforcementintheUSwillleadtoalongerperiodoftheepidemic,longer579thanevenItaly,andthelargestnumberoftotalinfectedcasesof710,000beforeitis580over.581582583584 585
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 18: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/18.jpg)
18
References5865871. DavidAdam,ModelersStruggletoPredicttheFutureoftheCOVID-19Pandemic.588
The Scientist, https://www.the-scientist.com/news-opinion/modelers-struggle-589to-predict-the-future-of-the-covid-19-pandemic-67261(2020).590
2. WHO,Laboratorytestingofhumansuspectedcasesofnovelcoronavirus(nCoV)591infection:interimguidance,WorldHealthOrganization,Geneva(2020).592
3. N.Zhu,D.Zhang,W.Wang,X.Li,B.Yang,J.Song,X.Zhao,B.Huang,W.Shi,R.Lu,593P. Niu, F. Zhan, A novel coronavirus from patients with pneumonia in China,5942019.N.Engl.J.Med.382,727-733(2020).595
4. R.Lu,X.Zhao,J.Li,P.Niu,B.Yang,H.Wu,W.Wang,H.Song,B.Huang,N.Zhu,Y.596Bi,X.Ma,F.Zhan,L.Wang,T.Hu,H.Zhou,Z.Hu,W.Zhou,L.Zhao,….,W.Tan,597Genomic characterisation and epidemiology of 2019 novel coronavirus:598implications for virus origins and receptor binding. The Lancet 395(10224),599565-574(2020).600
5. Y. Liu, A. A. Gayle, A. Wilder-Smith, J. Rocklöv, The reproductive number of601COVID-19 is higher compared to SARS coronavirus. J. Travel Med. taaa021602(2020).603
6. J.W.Glasser,N.Hupert,M.M.McCauley,R.Hatchett,Modelingandpublichealth604emergency responses: Lessons from SARS. Epidemics 3: 32-37 (2011),605doi:10.1016/j.epidem.2011.01.001.606
7. P.Zhou,X.Yang,X.Wang,B.Hu,L.Zhang,W.Zhang,H.Si,Y.Zhu,B.Li,C.Huang,607H. Chen, J. Chen, …, Z. Shi, A pneumonia outbreak associated with a new608coronavirusofprobablebatorigin.Nature579,270-273(2020).609
8. C.Huang,Y.Wang,X. Li, L.Ren, J. Zhao,Y.Hu, L. Zhang,G. Fan, J. Xu,X.Gu, Z.610Cheng,T.Yu,J.Xia,Y.Wei,W.Wu,X.Xie,W.Yin,H.Li,M.Liu,Y.Xiao,H.Gao,L.611Guo, J. Xie,G.Wang,R. Jiang, Z.Gao,Q. Jin, J.Wang,B.Cao,Clinical featuresof612patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet613395(10223),497-506(2020).614
9. J.F-K.Chan,S.Yuan,K.-H.Kok,K.K.-W.To,H.Chu,J.Yang,F.Xing,J.L.BNurs,C.615C.-Y. Yip,R.W.-S. Poon,H.-W.Tsoi, S. S.-F. Lo,K.-H.Chan,V.K.-M.Poon,W.-M.616Chan,J.D.Lp,J.-P.Cai,V.C.-C.Cheng,H.Chen,C.K.-M.Hui,K-Y.Yuen,Afamilial617cluster of pneumonia associated with the 2019 novel coronavirus indicating618person-to-person transmission: a study of a family cluster. The Lancet619395(10223),514-523(2020).620
10. X.Xu,P.Chen,J.Wang,J.Feng,H.Zhou,X.Li,W.Zhong,P.Hao,Evolutionofthe621novelcoronavirusfromtheongoingWuhanoutbreakandmodelingof itsspike622proteinforriskofhumantransmission.Sci.ChinaLifeSci.63,457-460(2020).623
11. Z.Chen,W.Zhang,Y.Lu.C.Guo,Z.Guo,C.Liao,X.Zhang,Y.Zhang,X.Han,Q.Li,W.624lanLipkin, J. Lu, FromSARS-CoV toWuhan2019-nCoVOutbreak: Similarity of625Early Epidemic and Prediction of Future Trends. Biorxiv preprint (2020),626doi:https://doi.org/10.1101/2020.01.24.919241.627
12. J. M. Read, J. R. E. Bridgen, D. A. T. Cummings, A. Ho, C. P. Jewell, Novel628coronavirus 2019-nCoV: early estimation of epidemiological parameters and629epidemic predictions. medRxiv preprint (2020), doi:630https://doi.org/10.1101/2020.01.23.20018549.631
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 19: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/19.jpg)
19
13. J. T. Wu, K. Leung, G. M. Leung, Nowcasting and forecasting the potential632domestic and international spread of the 2019-nCoV outbreak originating in633Wuhan,China:amodellingstudy.TheLancet395(10225),689-697(2020).634
14. S.Zhao,S.S.Musa,Q.Lin,J.Ran,G.Yang,W.Wang,Y.Lou,L.Yang,D.Gao,D.He,635M. S. Wang, Estimating the Unreported Number of Novel Coronavirus (2019-636nCoV)CasesinChinaintheFirstHalfofJanuary2020:AData-DrivenModelling637AnalysisoftheEarlyOutbreak.J.Clin.Med.9,388(2020).638
15. N.E.Huang,F.Qiao,Adatadriventime-dependenttransmissionratefortracking639an epidemic: a case study of 2019-nCoV. Sci. Bull. 65, 425-427(2020) ,640https://doi.org/10.1016/j.scib.2020.02.005.641
16. Q. Li, W. Feng, Trend and forecasting of the COVID-19 outbreak in China. J.642InfectionarXiv:2002.05866v1,(2020).643
17. H.Xiong,H.Yan,Simulating the infectedpopulationandspread trendof2019-644nCov under different policy by EIR model. medRxiv preprint (2020), doi:645https://doi.org/10.1101/2020.02.10.20021519.646
18. L.Damon,E.Brooks-Pollock,M.Bailey,M.J.Keeling,AspatialmodelofCoVID-19647transmission in England and Wales: early spread and peak timing. medRxiv648preprint(2020),doi:https://doi.org/10.1101/2020.02.12.20022566.649
19. H.Sun,Y.Qiu,H.Yan,Y.Huang,Y.Zhu,S.Chen,TrackingandPredictingCOVID-65019 Epidemic in China Mainland. Medrxiv preprint (2020),651doi:https://doi.org/10.1101/2020.02.17.20024257.652
20. Q. Liu, Z. Liu,D. Li, Z. Gao, J. Zhu, J. Yang,Q.Wang,Assessing theTendency of6532019-nCoV (COVID-19) Outbreak in China. medRxiv preprint (2020), doi:654https://doi.org/10.1101/2020.02.09.20021444.655
21. L.Peng,W.Yang,D.Zhang,C.Zhuge,L.Hong,EpidemicanalysisofCOVID-19in656Chinabydynamicalmodeling.arXiv2002.06563,(2020).657
22. D.Cyranoski,Whenwillthecoronavirusoutbreakpeak?Naturenews(2020).65823. C. R. MacIntyre, Global spread of COVID-19 and pandemic potential.Global659
Biosecurity1(3),(2020).66024. WHO, Coronavirus latest:WHO describes outbreak as pandemic, Nature news661
(2020),https://www.nature.com/articles/d41586-020-00154-w.66225. K. Kupferschmidt, J. Cohen, Can China’s COVID-19 strategy work elsewhere?663
Science367(6482),1061-1062(2020).66426. J. M. Read,J. R.Bridgen,D. A.Cummings,A.Ho,C. P.Jewell, Novel coronavirus665
2019-nCoV: early estimation of epidemiological parameters and epidemic666predictions.medRxiv(2020),doi:10.1101/2020.01.23.20018549.667
27. S.Zhao,Q.Lin,J.Ran,S.S.Musa,G.Yang,W.Wang,Y.Lou,D.Gao,L.Yang,D.He,668M.H.Wang,Preliminaryestimationof thebasic reproductionnumberofnovel669coronavirus(2019-nCoV)inChina,from2019to2020:Adata-drivenanalysisin670the early phase of the outbreak. Int. J. Infect. Dis.,92: 214-217(2020),671https://doi.org/10.1016/j.ijid.2020.01.050.672
28. W. Kermack, A. McKendrick, A contribution to the mathematical theory of673epidemics.Proc.Roy.Soc.LondonA115,700-721(1927).674
29. D. L.Heymann,N.Shindo, COVID-19: what is next for public health?675Lancet,395(10224):542-545(2020), https://doi.org/10.1016/S0140-6766736(20)30374-3.677
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 20: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/20.jpg)
20
30. The Novel Coronavirus Pneumonia Emergency Response Epidemiology Team,678The Epidemiological characteristics of an outbreak of 2019 novel coronavirus679disease(COVID-19)-China,2020,ChinaCDCWeekly(2020).680
31. E.Shim,A.Tariq,W.Choi,Y.Lee,G.Chowell,TransmissionpotentialofCOVID-19681in South Korea. medRxiv preprint, (2020), doi:682https://doi.org/10.1101/2020.02.27.20028829.683
32. C. M.Peak,L. M.Childs,Y. H.Grad,C. O.Buckee, Comparing nonpharmaceutical684interventions forcontainingemergingepidemics.Proc.Natl.Acad.Sci.114(15):6854023-4028(2017),doi:10.1073/pnas.1616438114.686
33. R. S.Dhillon,D.Srikrishna, When is contact tracing not enough to stop an687outbreak? Lancet Infect. Dis.,18:1302-1304 (2018),688https://doi.org/10.1016/S1473-3099(18)30656-X.689
34. X.Pang,Z.Zhu,F.Xu,J. Guo, X. Gong, D. Liu, Z. Liu, D. P. Chin, D. R. Feikin,690Evaluation of control measures implemented in the severe acute respiratory691syndromeoutbreakinBeijing,2003.JAMA,290(24):3215-3221(2003).692
35. G.Wang,N.E.Huang,F.Qiao,Quantitativeevaluationoncontrolmeasuresforan693epidemic:AcasestudyofCOVID-19.Sci.Bull.65(2020),doi:10.1360/TB-2020-6940159.695
36. J.D.Murray:MathematicalBiologyI.ThirdEdition.Springer,551pp.696697
698699700 701
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 21: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/21.jpg)
21
Acknowledgements:NEHandFQaresupportedbytheNationalNaturalScience702FoundationofChinaunderGrant41821004.KKT’sresearchissupportedbythe703FredericandJuliaWanEndowedProfessorship.704CompetingInterests:Theauthorsdeclarenocompetinginterests.705DataAvailability:AlldatainthisstudyarepubliclyavailablefromWorldHealth706Organization(WHO)athttps://www.who.int/emergencies/diseases/novel-707coronavirus-2019/situation-reports/708andontheDailyBriefsiteoftheChina’sNationalHealthCommissionat709http://en.nhc.gov.cn/710TheKoreandataisavailableat711https://sa.sogou.com/new-weball/page/sgs/epidemic712CoronavirusCOVID-19GlobalCasesbyJohnsHopkinsCSSE713https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594747140fd40299423467b48e9ecf6715716 717
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 22: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/22.jpg)
22
SupplementaryInformation:718719CommentsonSIR-typeofmodels720MainstreamepidemiologicalmodelshavetheiroriginintheSIR(Susceptible,721Infected,andRecoveredorRemoved)model(28)anditsmanyvariations.We722discussherewhyexistingmodelpredictionsvarysowidelybycommentingonthe723assumptionsunderlyingthesemodels,usingtheclassicalSIRmodelasanexample.724725Thenumberoftheinfectedisincreasedbytherateofinfection,whichisassumedto726beaIS ,anddecreasedbytherateofrecovery/removed,whoareassumedtobe727immunetofurtherinfection,bI .(dI /dt = aIS −bI ).Thislattercategory,R,increases728attherateofbI. (dR/dt = bI ).Bothoftheparameters,especiallytheinfectionrate729a,islargelyunknownforanemergentdisease(1),andhavetobeeitherestimated730basedonlimitedstatisticsthenumberofcontactsasingleinfectedpersonmayhave731andhowmanyofthecontactedpeoplewillbeinfected,orobtainedbycurvefitting732ofreportedI(t).Thisapproachhasproblems.First,forCovid-19,thereisa733populationofasymptomaticinfected,whichmaybelargerthanthe734reported/confirmedIduringtheearlystagesoftheepidemic,whenwide-scale735testingisnotavailable.Sincethisasymptomaticinfectedisalsoinfectious,and736someofthoseinfectedbythemcouldlaterbecomeconfirmedinfected,fittingthe737rateofincreaseofreportedItoestimatetheinfectionratewouldinevitablyyielda738muchlargera.Consequentlythemodelpredictionsusingthisestimatedparameter739valuemayyieldaverylargepeakinfectionnumber.Secondly,since740dI /dt = aI(S −b/a) ,thenumberofinfectedwillincrease(approximately741exponentially)forS >b/a,butforanewvirustowhichthepopulationhasno742immunity,thesusceptiblepopulationcouldbeverylarge.ForChinaitcouldbeas743largeas1.4billion.ForHubeiitis65million.Thesusceptiblepopulationshouldbe744lowerifthequarantineofWuhanweretight,butitisdifficulttoestimatewhatitis745underrealisticconditions.Thirdly,predictionsoftheendoftheepidemicvary746widelybecauseofthebasicassumptioninsomeSIRtypeofmodelsthatifmostof747thepopulationisinfectedandrecovered(andhenceacquiredimmunity)ordead748(andhencenolongerinfectious).Theconceptof“herdimmunity”isrootedinthe749ideathatwithenoughofthepopulationacquiringimmunitythisway(orfroma750vaccineifoneisdevelopedintime),thesusceptiblepopulationisreducedto751S <b/a ,andtherateofincreaseofIwillturnnegative.Thiswillrequireasmuchas75270%-90%ofthepopulationbeinfected,ahugenumber.Modernmodelstakeinto753accountoftheeffectofquarantineandisolationinreducingthenumberofpeople754eachinfectedcancontact,thusreducingtheinfectionratea,sothatSdoesnothave755tobereducedtosuchanextentfortheepidemictopeakandIstartstodecrease.756However,suchestimatesareverydependentonmodelassumptions.757758OfcoursethemodernmodelsofepidemiologyaremoresophisticatedthantheSIR759model(12-14,17,21).760761
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 23: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/23.jpg)
23
Weofferhereanadditionaltoolthathastheadvantagethatithasdoesnotdepend762ontheelusiveinfectionrateorthesusceptiblepopulation,informationneededfor763mostmodels,buthasthedisadvantagethatitcannotbeusedwhentheepidemic764firststartedandthedataareinaccurateorincomplete.Itisbasedondailycase765numbers(i.e.newlyconfirmedcases),N(t),andrecoveredcases,R(t).766767Ourestimateoftheenddateoftheepidemicisnotbasedonthenumberof768susceptibles,S,approachingzeroasinmostmodels(i.e.mostofthepopulationis769infected,henceacquiringimmunity),butN(t)approachingzeroandremainingso770fortwoincubationperiods.Thefirstincubationperiodistoallowtheasymptomatic771infectedtoshowsymptomsandthesecondperiodtoallowthosethatareinfected772bytheasymptomaticinfectedtoshowsymptoms.Forpredictionpurpose,thedate773whentheN(t)iszeroisestimatedby3standarddeviationsfromitspeak.These774twoquantitiescanbeextractedfromthedataastheepidemicisdeveloping.Our775estimateoftheendoftheepidemicisearlierthanmostmodelpredictions,usually776significantlyso,becauseitdoesnotdependontheherdimmunityconcept.777778779THEORY:780Definition:LetI(t) bethenumberofactiveinfectedattimet . Itschangeisgiven781by;782
ddtI =N(t)−R(t), 783
whereN(t) isthenumberofnewlyinfected,andR(t) thatofthenewlyrecoveredor784removed(dead).Notethatforthetheorypart,N(t)includesbothconfirmedand785unconfirmedcases.Theterm:ExistingInfectedCase(EIC)numberisusedtodenote786theconfirmedI(t)whenwedealwithdata.787788Let
tp ,theturningpointdefinedasthepeakoftheactiveinfectednumber.Atthis789pointmaximummedicalresourceisneeded.Thismaximumoccurswhen790
ddtI =0,implyingN(tp)= R(tp). 791
ThereisnoneedtofirstfindI(t)tolocatethispeak.Aftertheturningpoint,the792newlyrecoveredstartstoexceedthenewlyinfected.Thedemandformedical793resources,suchashospitalbeds,isolationwardsandrespirators,startstodecrease.794795Lett =0 bewhenthefirstinfectionbegan.ForWuhan,China,thisdateisnearthe796endof2019,perhapsevenearlier.LettB bethebeginningofthebetterqualitydata.797Thistimeisbeyondtheinitialincubationperiodofthediseaseanditcanbe798assumedthatatthattimethereisalreadyapopulationofinfected,someofthem799asymptomaticbutneverthelessinfectious.800801
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 24: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/24.jpg)
24
LetX(t ,s) bethenumberofinfectedcasesattimet ,withs beingthe“age”802distribution,i.e.numberofdayssick.803Thetotalnumberofinfectedisgivenby:804
I(t)= X(t ,s)ds0
T
∫ .805
AfterbeingsickforTdays,apatienteitherrecoversorisremoved(dead).Tiscalled806therecoveryperiod(orremovalperiod).Itisalsocalledtheinfectioustimeifthe807patientisinfectiousduringthisperiod.Ofcourseitsvaluevariesbypatientandby808theefficacyoftreatmentforeachhospital.Fortheremoveditalsodependsonthe809ageofthepatientandwhetherthereareunderlyingmedicalconditions.Onlya810meanrecoveryperiodisobtainablefromdata,andsothisisinrealityastatistical811quantity.Wewilldiscusslaterhowthisstatisticalquantitycanbeobtainedfrom812data.813814Conservationlaw(seeMurray:MathematicalBiologyPartI,Chapter1(36)):815Afterfirstinfectedanduntilremovedorcured,wehave:816
dX(t ,s)= ∂∂tX idt + ∂
∂sX ids =0,0< s <T .
So,sinceds /dt =1,∂∂tX + ∂
∂sX =0.
817
ThisequationistobesolvedusingthemethodofcharacteristicsasX(t ,s)= constantalongcharacteristicsdefinedbyds dt =1. 818
Boundarycondition:X(t ,0),specifiesthe"birth"process,i.e.howthediseasespawnsnewlyinfected(with"age"s =0).Initialcondition:X(0,s)=0fors >0,specifiestheinitialagedistributionatt =0
819
Therearetwotypesofcharacteristics:820(i)s > t , (ii)s < t . 821Thefirsttypeofcharacteristicsintersectsthet =0 axis,andsincetheinitial822conditioniszero,wehavethesolution:823824
X(t ,s)≡0fors > t . 825826Thatis,thereisnoinfectedpopulationwhoissickformoredaysthanthelapsed827timesincethefirstinfectionoccurred.828829Forthesecondtypeofcharacteristics,t > s thesolutionis830831
X(t ,s)= f (s −t) 832withtheformoff tobedeterminedbytheboundarycondition.Evenwithout833determiningtheformoff wehavethefollowinggeneralresults:834Fort >T , andthereforet > s : 835
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 25: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/25.jpg)
25
I(t)= X(t ,s)ds0
T
∫ = f0
T
∫ (s −t)ds
= ft−T
t
∫ (p)dp.836
ddtI = f (t)− f (t −T). 837
838SincetherateofincreaseofconfirmedI(t) isbydefinitionequaltothenewly839confirmedinfectednumber,N(t),minusthenewlyrecovered(orremoved)number,840R(t),wehave:841
N(t)−R(t)= d
dtI = f (t)− f (t −T). 842
843Forafataldiseasewithlowfatalityrate,wherealmostallinfectedcaseseventually844recoverafterahospitalstayofTdays,wecanidentify845f (t)withN(t),andf (t −T)withR(t). 846Ifthediseasehasanon-negligiblefatalityrate,weincludethedeadinR(t). 847MainResult:Thedailynewlyrecovered/removednumberR(t),isrelatedtothe848dailynewlyinfectednumberN(t)as,fort >T :849
850R(t)=N(t −T). 851
Validation:Thisfundamentalrelationshipcanbevalidatedstatisticallywithdata.852Figures1,obtainedusingdatafromChinaduringtheCovid-19epidemic,showsthat853N(t)andR(t)arehighlycorrelated:withcorrelationcoefficientof0.95whenboth854distributionsaresmoothedwith5-pointboxcar.Theunsmootheddailydataalso855yieldahighcorrelationcoefficientof0.80,withR(t)laggingN(t)byT~15days.Both856correlationcoefficientsarestatisticallysignificant.Asimilarresultisfoundfor857Hubei(FigureS2)andotherregions(notshown).Thisisoneofthewaysthemean858recoveryperiodisdeterminedstatisticallyfromdata,butitisnotpracticalinthe859earlyphaseoftheepidemic.Wewillgivedifferentmethodsforthelatterpurpose.860TheresultonTisconsistentwiththatestimatedorpredictedlaterusingtheslopeof861thedistributioninFigure4.Thelatter,obtainedbytheinterceptofthestraightline,862islessaccuratebecauseoftheslopeisrathershallow.Fortheregionsconsideredin863Figure1,thefatalityrateissmallandsothedeadarenotincludedinR(t)for864convenience.865866Thesecondtypeofcharacteristicsintersectstheboundarys =0 .Theboundary867conditionitselfneedstobesolvedasafunctionofttodescribehownewinfection868(ats =0 )occurs.Thiscanbedoneusingabirthmodel,suchasEq.(1.56)in(36).869Forourpurposeweassumethatthesolutionofthismodelyieldsadistributionwith870agethathasafullspectrum0< s <T ofinfectivesatatimetB ,longafterafull871incubationperiodhaspassed.872
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 26: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/26.jpg)
26
X(tB ,s)= f0(s)= Aexp{−(s − s0)22b2 };Aindependentofs.
b= 12T .
873
874Thereforethesolutionis,fort > tB >0: 875
X(t ,s)= X(tB ,s −t)= f0(s −t)
= Aexp{− (s − s0 −t)2
2b2 }.876
877
N(t)= Aexp{− (t − s0)2
2b2 }= Aexp{− (t −tN )2
2b2 }
R(t)=N(t −T)= Aexp{− (t −tR )2
2b2 },878
wheretN isthepeakofN(t), andtR = tN +T isthepeakofR(t). 879BothdistributionsareGaussians.880881
FortB < t <T ,
882
I(t)= X(t ,s)ds + X(t ,s)dst
T
∫0
t
∫ = f (s −t)ds + 0ds
t
T
∫0
t
∫ = f (p)dp
−t
0∫
883
ddtI = f (−t)= Aexp[− (t − s0)
2
T2 ].
884
N(t)= Aexp[− (t −tN )2
T2 ]
R(t)=0.885
Again,N(t)isGaussian,butthereisnorecoveredorremovedduringthisearlystage.886887MainResult:ThenaturallogarithmoftheratioofNandRisalinearfunctionof888timefort >T :889890
NR(t)= N(t)R(t) =
N(t)N(t −T)
= exp{− (t −tN )2
2b2 +(t −tN −T)2
2b2 }= exp{ T2
2b2 −T(t −tN )b2
}.
logNR = −T(t −tN )−12T
2
b2
891
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 27: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/27.jpg)
27
alinearfunctionoft . Thisrelationshipisimportantforthepurposeofforecast892becauseitiseasytoextrapolatefromastraightlineintothefuture.893894Itintersects0att −tN =
12T . Thisyieldstheturningpoint,whentheNRratiois1,and895
thereforeitslogarithmiszero.896897Result:Theturningpoint,definedasthemaximumofI(t),isgivenbytp = tN +
12T . 898
Result:TheslopeoflogNRisequalto4/T . 899900WhentimeisnormalizedbyT , thederivativeisgivenby:901
d logNRd(t /T) =
T2
b2= 4,adimensionlessconstant.
902
903HeterogeneousData904Theaboveresultsareobtainedforthecaseofasingleintroductionintoaregionof905infectedatt=0andwesolveforthesubsequentdevelopmentoftheepidemicfrom906thatsinglesource.Considernowalargeregionconsistingofanumberofsmall907regions,andthe“seeding”oftheinfectedoccursatdifferenttimesfordifferent908regions.ThelargeregioncouldbeChina,andthefirstinfectioncouldbeWuhan,909HubeiandthentheregionsoutsideHubei.ThenwemayhavefortheChinaasa910wholedataforthenewlyinfectedasumofseveralGaussiansstaggeredintime.As911longastheGaussiansarenotseparatedsomuchthattherearedifferentpeaksinthe912combineddata,thecombineddatacanstillbeconsideredasGaussian,asisthecase913intherealdata.However,thestandarddeviationσ ofthecombinedGaussianis914inevitablylargerandisnolongergivenbyb:915
N(t)= B
2πσ N
exp{− (t −tN )2
σ N2 } .916
WestillhaveR(t)=N(t −T) sincethisresultholdsforeachsub-region.Theresult917
thatlogNR isalinearfunctionoftimestillholds:918
logNR(t)= log N(t)
N(t −T) = −T(t −tN )− 1
2T2
σ N2 .919
TheslopeofthestraightlineisT σ N2.920
921SincethehospitalstatecanaddasasmoothingfilteronN(t)toyieldR(t),the922standarddeviationforR(t)couldbeslightlywiderthanthatforN(t).Sowecould923havetwodifferentGaussians(buttheirintegraloveralltimeshouldbethesame):924
N(t)= B
2πσ N
exp{− (t −tN )2
2σ N2 };R(t)= B
2πσ R
exp{− (t −tR )2
2σ R2 }. 925
Takingthisintoaccount,wehave,denotingT = tR −tN :926
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 28: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/28.jpg)
28
logNR− logσ R
σ N
= −{(t −tN )2
2σ N2 −
(t −tN −T)22σ R
2 }= − (σ R2 −σ N
2 )(t −tN )2 +2σ N2T(t −tN )−σ N
2T2
2σ N2σ R
2 927
928AsthevaluesofσN and σR areveryclosebasedontheempiricaldata,thequadratic929termisalwayssmallcomparingtotheothertermsforthelengthoftimeweare930consideringhere.Hence.931
logNR(t)= 1
σ R2 {−T(t −tN )+ 1
2T2} ,932
alinearfunctionoftime.Itsslopeis− Tσ R
2 . 933
Result:ThenaturallogarithmoftheratiooftwoGaussiansofslightlydifferent934standarddeviationisapproximatelyastraightline.935936ValidationoflogNRasastraightline:937Fromdataweusethereportnewlyconfirmedcasenumberandtherecoveredcase938numbertodefineNRratioas939
NR(t)=N(t)/R(t).940Attp,NR=1.941942WeshowinFigure2,usingthedataoftheepidemicforCOVID-19,thatthe943logarithmofNR(t)liesonastraightline,withsmallscatter,passingthroughthe944turningpointtp.Anddataforvariousstagesoftheepidemic,fromtheinitial945exponentialgrowthstage,tonearthepeakofEIC,andthenpastthepeak,alllieon946thesamestraightline.TheinterceptwithlogNR=0yieldstheturningpoint.This947line,obtainedbylinear-least-squarefitinthesemi-logplot,islittleaffectedbythe948ratherlargeartificialspikeinthedataon12Februarybecauseofitsshortduration949andthelogarithmicvalue.Thatreportingproblemisnecessarilyofshortduration950because,onthedateofdefinitionchange,previousweek’scasesofinfected951accordingtothenewcriteriawerereportedinoneday.Afterthat,thebookis952cleared,andN(t)returnedtoitsnormalrange.953954955Thetheoreticalresultsuggeststhattheslopeofthelinearlineis-T/ σR
2,whereσR is 956the standard deviation of the R(t) profile. Ingeneral,theslopecanbedifferentfor957differentregionswithdifferentlevelsofquarantineandepidemiccharacteristics.958ThehospitaltreatmentefficacywouldinfluenceTdirectly.Theeffectofquarantine959wouldinfluencethevalueofσN,thestandarddeviationofthenewlyinfected,andso960indirectlyR(t)andσR.OurempiricalresultfromFig.2howevershowsthattheslope961isthealmostthesamefordifferentregionsinChina,implyingthatefficacyof962treatmentandlevelofquarantineaffectTandσ2proportionally.963964ValidationoftheslopeoflogN(t)andlogR(t)965Interestingly,thederivativeoflogN(t)orlogR(t)alsoliesonastraightline,as966showninFig.3(althoughthescatterislargerastobeexpectedforany967
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 29: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/29.jpg)
29
differentiationofempiricaldata).Thepositiveandnegativeoutliersonedaybefore968andafter12Febarecausedbythespikeupandthendown,withlittleeffectonthe969fittedlineartrend(butincreasesitsvarianceandthereforeuncertainty).Moreover,970thestraightlineextendswithoutappreciablechangeinslopebeyondthepeakof971N(t),suggestingthatthedistributionofthenewlyinfectednumberisapproximately972Gaussian.ThemeanrecoverytimeTcanbepredictedastR −tN ,wheretRisthepeak973ofR(t)andtNisthepeakofN(t).Thesetwopeaktimescanbeobtainedbyextending974thestraightlineinFig.3tointersectthezeroline.Thispredictedresultcanbe975verifiedstatisticallyafterthefactbythelaggedcorrelationofR(t)andN(t).Ifthe976distributionisindeedGaussianorevenapproximatelyso,theslopeinFig.3would977beproportionaltothereciprocalofthesquareofitsstandarddeviation,σ,as:978
d logN(t)
dt=−(t −tN )
σ N2 . 979
980Similarlyresultholdsforthedailynumberofrecovered,R(t).981982AftertheepidemicisnearingtheendasisthecaseinChina,fittingthedatatoa983Gaussiancanbedoneafterthefact(seeFiguresS3andS4).Thefitissatisfactory984evenwithoutusinganydisposalparameters.Theparametersusedaredetermined985usingslopesoflogNandlogR(seeTableS1)986987TheinferredstatisticalcharacteristicsoftheCovid-19epidemicaresummarizedin988TableS1forvariousregions.ThemeanrecoverytimeT,isabout13daysforChina989asawhole.ForWuhan,thecityattheepicenterwhosehospitalsweremore990overwhelmedandthepatientsadmittedintohospitalsmoreseriouslyillthanthose991inotherprovinces,T~16days,whilethatforHubeiis14days.Thestandard992deviation,σ,isfoundtobearound8days,withslightdifferencebetweenthatfor993N(t)andforR(t),withoneexceptionforHubeioutsideWuhan.Suchafine994subdivisionmaynotbepracticalforthedataqualitywehave.Theσ tendstobe995smallerforChinaasawholethanWuhan.OnecanseethatTandσ2 indeed varying 996approximately in proportion. 997998
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 30: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/30.jpg)
30
9991000FigureS1.Thedailynewlyinfected(inblue)andthedailynewlyrecovered(inred),1001asafunctionoftimeforChinaasawhole(insolidlines)andHubei(indottedlines).1002Theturningpointisdeterminedbywhentheredandbluecurvescross.1003Inset:ForChinaoutsideHubei.10041005
10061007FigureS2.LaggedcorrelationofR(t)withN(t)forHubeiprovince.10081009
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 31: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/31.jpg)
31
1010FigureS3.GaussianfitofN(t)andR(t),forChinaasawhole.10111012
1013FigureS4.GaussianfitofN(t)andR(t),forHubeiProvince.10141015 Crossing t0 t1 T Sigma N Sigma R China
2/18 2/11 2/24 13 7.5 7.7
Hubei
2/20 2/12 2/26 14 7.9 8.5
Wuhan
2/21 2/14 3/01 16 8.8 8.8
C exHubei
2/12 2/08 2/21 13 7.5 7.1
H exWuhan 2/16 2/13 2/27 14 5.0 8.8
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 32: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/32.jpg)
32
TableS1:StatisticalcharacteristicsoftheCOVID-19epidemicindifferentregionsin1016Chinainferredfromdata,forN(t),thedailynumberofnewlyinfectedandforR(t),1017thedailynumberofrecovered.10181019 China Hubei China-Hubei Hubei-WuhanTruth(data) 18 19 12 15NRRatio
20.3±1.6(Feb20nd)
22.3±1.0(Feb22rd)
12.4±0.9(Feb12th)
16.0±1.2(Feb16th)
1020TableS2:Predictedturningpointdates.Shownarethemeanandstandard1021deviationofthepredictionsoverthepredictionperiod,usingtheNRratiomethod10221023
1024FigureS5.PredictionoftheturningpointofEICusinglinearleast-squarestrends1025usingvariousdatalengthsforChinaexHubei.Alldatausedstartfrom24January.1026Differentcoloredstraightlinesshowthelineartrendcalculatedfrom24Januarytoa1027particulardate.Thespreadisoveraverysmallrange.Thenthesetrendsare1028extrapolated(extrapolationsnotshown)tointersectthezerolinetoyielda1029predictionfortheturningpoint.Thebluedotsarethedata.10301031103210331034
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 33: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/33.jpg)
33
10351036FigureS6:TheavailabledatafromSouthKorea(asofMarch7th).Thesporadic1037recoveredcasenumbersaremostlyinthesingledigit.Ifweusethesuddenincrease1038ofrecoveredcasematchingwiththesuddenexplosiveincreaseofnewinfected,the1039distanceisapproximately14days,areasonableTvaluewhencomparedtothe1040meanvalueinChina.Forourdataanalysis,weuseddailynewlycasesstarting1041February19th,forthederivativeofindividualdistributionstudy;weuseddatacase1042fromMarch1st,fortheNRratiostudy,inordertohaveenoughrecoveredcases.1043
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint
![Page 34: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020 · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict](https://reader034.vdocuments.us/reader034/viewer/2022050507/5f986103306c3d38b744e0a0/html5/thumbnails/34.jpg)
34
1044FigureS7:Thederivativeofthelogarithmicvalueofdailynewinfectedcase1045distribution.10461047
1048FigureS8:TheNRratiofrom7daysofdatafromMarch1stto7th.Theestimated1049zero-crossingtimewouldoccurbetweenMarch11thand12th,avalueconsistent1050withthestatisticsfromthedailynewcasedistributiononMarch10th.1051
. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint