a data-driven model for predicting the course of covid-19 ... · 3/28/2020  · 45 epidemic using...

34
1 A data-driven model for predicting the course of COVID-19 epidemic with 1 applications for China, Korea, Italy, Germany, Spain, UK and USA 2 3 (Revised: April 5, 2020) 4 5 Authors: 6 Norden E. Huang # , Fangli Qiao # and Ka-Kit Tung* 7 8 Affiliations: 9 # Data Analysis Laboratory, FIO, Qingdao 266061, China 10 * Department of Applied Mathematics, University of Washington, Seattle, WA 98195 11 # Co-first authors: These authors contribute equally to this work. 12 FQ: [email protected]. NEH: [email protected] 13 *Corresponding author: [email protected] 14 15 KEYWORDS 16 Covid-19; Epidemiology; Covid-19 predictions for Italy, Germany, Spain, UK and 17 USA; Data-driven approach; prediction of turning points; peak active infected 18 number; Covid-19 recovery period; end of Covid-19 epidemic; local-in-time metric 19 for epidemic management. 20 21 22 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177 doi: medRxiv preprint NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.

Upload: others

Post on 06-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

1

Adata-drivenmodelforpredictingthecourseofCOVID-19epidemicwith1applicationsforChina,Korea,Italy,Germany,Spain,UKandUSA23(Revised:April5,2020)45Authors:6NordenE.Huang#,FangliQiao#andKa-KitTung*78Affiliations:9#DataAnalysisLaboratory,FIO,Qingdao266061,China10*DepartmentofAppliedMathematics,UniversityofWashington,Seattle,WA9819511#Co-firstauthors:Theseauthorscontributeequallytothiswork.12FQ:[email protected]:[email protected]*Correspondingauthor:[email protected]

15KEYWORDS16Covid-19;Epidemiology;Covid-19predictionsforItaly,Germany,Spain,UKand17USA;Data-drivenapproach;predictionofturningpoints;peakactiveinfected18number;Covid-19recoveryperiod;endofCovid-19epidemic;local-in-timemetric19forepidemicmanagement.2021 22

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.

Page 2: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

2

ABSTRACT2324Foranemergentdisease,suchasCovid-19,withnopastepidemiologicaldata25toguidemodels,modelersstruggletomakepredictionsofthecourseofthe26epidemic(Cyranoski,NatureNews18February2020).Thewildlyvarying27predictionsmakeitdifficulttobasepolicydecisionson.Ontheotherhand28muchempiricalinformationisalreadycontainedindataofevolving29epidemiologicalprofiles.Weofferanadditionaltool,basedongeneral30theoreticalprinciplesandvalidatedwithdata,fortrackingtheturningpoints,31peakandaccumulatedcasenumbersofinfectedandrecoveredforan32epidemic,andtopredictitscourse.Abilitytopredicttheturningpointsand33theepidemic’sendisofcrucialimportanceforfightingtheepidemicand34planningforareturntonormalcy.Theaccuracyofthepredictionofthepeaks35oftheepidemicisvalidatedusingdataindifferentregionsinChinashowing36theeffectsofdifferentlevelsofquarantine.Thevalidatedtoolcanbeapplied37toothercountrieswhereCovid-19hasspread,andgenerallytofuture38epidemics.USisfoundtohavethelargestnetinfectionrate,andispredicted39tohavethelargesttotalinfectedcases(708K)andwilltaketwoweekslonger40thanWuhantoreachitsturningpoint,andoneweeklongerthanItalyand41Germany.4243SIGNIFICANCE:Weofferapracticaltoolfortrackingandpredictingthecourseofan44epidemicusingthedailydataontheinfectionandrecovery.Thisdata-driventool45canpredicttheturningpointstwoweeksinadvance,withanaccuracyof2-3days,46validatedusingdatafromvariousregionsinChinaselectedtoshowtheeffectsof47quarantine.Italsogivesinformationonhowrapidtheriseandfallofthecase48numbersare,andwhatthepeakandtotalnumberofinfectedare.Although49empirical,thisapproachhasasoundtheoreticalfoundation;themaincomponents50oftheresultsarevalidatedaftertheepidemicisnearanend,asisthecaseforChina,51andthereforeisgenerallyapplicabletofutureepidemics. 52

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 3: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

3

531.Introduction.54ThecurrentCOVID-19epidemiciscausedbyanovelcoronavirus,designated55officiallyasSARS-CoV-2,spreadingfromWuhan,thecapitalcityofHubeiprovincein56China(2-4).ThenewvirusseemstohavecharacteristicsdifferentfromSARS57(severeacuterespiratorysyndrome)(5,6):itislessdeadlybutspreadsmorewidely58(7-10).Modelingtheepidemicasitdevelopshasbeendifficult(1).Dependingon59themodelassumptions,predictionsofwhenit“turnsacorner”forChinavaries60greatly(11-21),uptoafter650millionpeoplehavebeeninfectedbeforepeaking;61manyhavenowbeenshowntobeinaccurate(22).Nowastheepidemichas62subsidedinChinaandbecomeaglobalpandemic(23,24),areliableforecastofthe63courseoftheoutbreakineachregioniscriticalforthemanagementand64containmentoftheepidemic,andforbalancingtheimpactfromthepublichealth65crisisvstheeconomiccrisis.Chinahasinstitutedsomeofthestrictestquarantine66measuresaroundWuhanandHubei,whichmayormaynotbeadoptableinother67countries(25-27).Itwouldbeusefultoextractthedependenceoftheepidemic’s68evolutiononthedegreeofquarantinetoguidepolicydecisions,whilealsoto69characterizepropertiesofCovid-19thatareapplicabletoothercountries.7071MainstreamepidemiologicalmodelshavetheiroriginintheSIR(Susceptible,72Infected,andRecoveredorRemoved)model(28)anditsmanyvariations.We73explainintheSupplementaryInformationwhyexistingmodelpredictionsvaryso74widelybycommentingontheassumptionsunderlyingthesemodels.7576TheseSIR-typemodels,however,servesacriticalpurposeforlong-rangepolicy77planning,suchaswarningpolicy-decisionmakersofthegravityofthepotential78impactandpromptingthemtotakeproperactionsbeforeitistoolate.Afterthe79breakout,moreinformationisneededformoredetailedplanning,suchasthe80arrivalofthecriticalturningpoints,thenumberofhospitalbedwemightneedat81thepeak,andtheestimateforwhentoliftthequarantine,andwhentoreturnto82normalcy.Weofferhereanadditionaltoolthathastheadvantagethatithasdoes83notdependontheelusiveinfectionrateorthesusceptiblepopulation,information84neededformostmodels,buthasthedisadvantagethatitcannotbeusedwhenthe85epidemicfirststartedandthedataareinaccurateorincomplete.Itisbasedondaily86casenumbers(i.e.newlyconfirmedcases),N(t),andrecoveredcases,R(t).8788Withoutuniversaltesting,theconfirmedcasenumbermightbeonlyasubsetofthe89truetotalinfectednumber,whichmayneverbeknownunlessfrequentuniversal90testingisinstituted.Theasymptomaticinfectedwhoarenottestedandthen91recoverontheirowndonotgetcountedbuttheyalsodonottaxhospitalresources.92Nevertheless,theycaninfectothersandsomeofthelattermaydevelopmore93serioussymptomsthatrequirehospitalization.Thenthesesecondaryinfectionsare94includedinourcasedata.Ouraimistoprovideatoolthatcanbeusedforthe95managementofmedicalresources.Sincewedonotuseamodeltocalculatehowthe96asymptomaticinfectivesinfectothers,wedonotneedtoknoweithertheinfection97rateortheasymptomaticinfectivenumbers.Sincethosewhoareadmittedtothe98

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 4: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

4

hospitaleitherrecoverafterahospitalstayofTdays,ordeadafterasimilarnumber99ofdays,thereshouldbeadelayedrelationshipbetweenN(t)andR(t),whichwewill100exploreintheTheorysection.NowthattheepidemicinChinaappearstohavecome101toanend,thedatafromvariousregionsinChinacanbeusedtovalidatethemodel.102Aftervalidationwethenapplyittootherregionsintheworld.103104Ourestimateoftheenddateoftheepidemicisnotbasedonthenumberof105susceptibles,S,approachingzeroasinmostmodels(i.e.mostofthepopulationis106infected,henceacquiringimmunity),butN(t)approachingzeroandremainingso107fortwoincubationperiods.Thefirstincubationperiodistoallowtheasymptomatic108infectedtoshowsymptomsandthesecondperiodtoallowthosethatareinfected109bytheasymptomaticinfectedtoshowsymptoms.Forpredictionpurpose,thedate110whentheN(t)iszeroisestimatedby3standarddeviationsfromitspeak.These111twoquantitiescanbeextractedfromthedataastheepidemicisdeveloping.Our112estimateoftheendoftheepidemicisearlierthanmostmodelpredictions,usually113significantlyso,becauseitdoesnotdependontheherdimmunityconcept.114115Asistrueforalldata-drivenapproaches,ourresultinevitablydependsonthe116qualityofthedataused,andsomeoftheearlydataoftheepidemicarenotasgood117asthelaterdata,whenbetterdiagnosticmethodsandmorecompletereportingare118established.However,manyofthemetricscommonlyinuserequireaccumulation119ofdatafromthebeginningoftheepidemic,andconsequentlyareaffectedbypoor120dataorchangeofdiagnosticmethodsalongtheway.Wetrytoavoidaccumulation121anduselocal-in-timemetrics.Nevertheless,dataproblemscannotbeavoided.122Sensitivityofourconclusionondataproblemsisextensivelydiscussedinthiswork.123FigureS1displaysexamplesofdatausedinthisstudy.Oneproblemimmediately124becomesobviousfortheChinesedata:On12February,whenHubeichangedits125definitionofconfirmedinfectionfromthegoldstandardofnucleicacidgene-126sequencingteststoclinicalobservationsandradiologicalchestscans,over14,000127newlyinfectedcaseswereaddedthatday,creatingapeakthathasnotbeen128exceededsince.OverwhelmeddoctorsinWuhanpleadedforthechangesothat129theydidnothavetowaitforthereturnedteststoconfirmtheinfection.Outside130Hubei,therewasnochangeindefinitionforthe“infected”.Howthisartifactaffects131ourconclusionwillbediscussed.1321332.Modelanditsvalidationusingdata134Definition:LetI(t) bethenumberofactiveinfectedattimet . Itschangeisgiven135by;136

ddtI =N(t)−R(t), 137

whereN(t) isthenumberofnewlyinfected,andR(t) ,designatedasremoved,is138thesumofthedailyrecoveredanddead.ForadiseasesuchasCovid-19withlow139fatalityrate,R(t)consistsofmainlyrecovered.Howeverevenforthisdisease,the140fatalityrateinsomeregions,suchasinNorthernItaly,approaches10%.Forthese141regionsR(t)shouldincludethedeadaswell.Notethatforthetheorypart,N(t)142

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 5: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

5

includesbothconfirmedandunconfirmedcases.Theterm:ExistingInfectedCase143(EIC)numberisusedtodenotetheconfirmedI(t)whenwedealwithdata.144145Let

tp ,theturningpointdefinedasthepeakoftheactiveinfectednumber.Atthis146pointmaximummedicalresourceisneeded.Thismaximumoccurswhen147

ddtI =0,implyingN(tp)= R(tp). 148

Thisisalocal-in-timemetric.ThereisthereforenoneedtofirstfindI(t)tolocate149thispeak.Aftertheturningpoint,thenewlyrecoveredstartstoexceedthenewly150infected.Thedemandformedicalresources,suchashospitalbeds,isolationwards151andrespirators,startstodecrease.152153ThetheoreticalfoundationforourmodelisgiveninSupplementaryInformation.154Herewediscussthemainresultsandoffervalidationoftheseresultsusingdata155fromChina.156157MainResult:Thedailynewlyrecovered/removednumberR(t),isrelatedtothe158dailynewlyinfectednumberN(t)as:159

160R(t)=N(t −T), 161

fort>T,whereTisthemeanrecoveryperiod.162Thisresultcanberigorouslyderivedusinganage-structuredpopulationmodel(see163reference(36)).Itisalsocommonsense:theinfectedeventuallyrecoveredaftera164numberofdays,ordeadafterasimilarnumberofdays.Ofcourse,thenumberof165daysapatientstaysinthehospitalbeforedischargedependsontheefficacyof166treatmentandsovariessomewhat,andthetimeittakesforapatienttodiemayalso167dependontheageandunderlyingconditions.Tisthereforeastatisticalquantity.168Validation:Thisfundamentalrelationshipcanbevalidatedstatisticallywithdata.169Figures1,obtainedusingdatafromChinaduringtheCovid-19epidemic,showsthat170N(t)andR(t)arehighlycorrelated:withcorrelationcoefficientof0.95whenboth171distributionsaresmoothedwith5-pointboxcar.Theunsmootheddailydataalso172yieldahighcorrelationcoefficientof0.80,withR(t)laggingN(t)byT~15days.Both173correlationcoefficientsarestatisticallysignificant.Asimilarresultisfoundfor174Hubei(FigureS2)andotherregions(notshown).Thisisoneofthewaysthemean175recoveryperiodisdeterminedstatisticallyfromdata,butitisnotpracticalinthe176earlyphaseoftheepidemic.Wewillgivedifferentmethodsforthelatterpurpose.177TheresultonTisconsistentwiththatestimatedorpredictedlaterusingtheslopeof178thedistributioninFigure4.Thelatter,obtainedbytheinterceptofthestraightline,179islessaccuratebecauseoftheslopeisrathershallow.180181

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 6: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

6

182Figure1.LaggedcorrelationofcasenumbersR(t)andN(t)forChinaasawhole.183184MainResult:ThenaturallogarithmoftheratioofNandRisalinearfunctionof185timefort >T .Thisrelationshipisimportantforthepurposeofforecastbecauseitis186easytoextrapolatefromastraightlineintothefuture.187188ValidationoflogNRasastraightline:189Fromdataweusethereportnewlyconfirmedcasenumberandtherecoveredcase190numbertodefineNRratioas191

NR(t)=N(t)/R(t).192Attp,NR=1.193194WeshowinFigure2,usingthedataoftheepidemicforCOVID-19,thatthe195logarithmofNR(t)liesonastraightline,withsmallscatter,passingthroughthe196turningpointtp.Anddataforvariousstagesoftheepidemic,fromtheinitial197exponentialgrowthstage,tonearthepeakofEIC,andthenpastthepeak,alllieon198thesamestraightline.TheinterceptwithlogNR=0yieldstheturningpoint.This199line,obtainedbylinear-least-squarefitinthesemi-logplot,islittleaffectedbythe200ratherlargeartificialspikeinthedataon12Februarybecauseofitsshortduration201andthelogarithmicvalue.Thatreportingproblemisnecessarilyofshortduration202because,onthedateofdefinitionchange,previousweek’scasesofinfected203accordingtothenewcriteriawerereportedinoneday.Afterthat,thebookis204cleared,andN(t)returnedtoitsnormalrange.205206

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 7: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

7

207208Figure2.Logarithmoftheratioofdailynewlyinfectedtonewlyrecovered.Theylie209onstraightlineswithsomesmallscatter.Thedottedstraightlinesareobtainedby210linear-leastsquaresfitis.Theslopesofthelinesarealmostthesamebutwith211differentintercept;thetrendlinescrosszero(theblacksolidline)atdifferenttime212fordifferentregionsindicatingdifferentpeakingtimeforEIC.TheepicenterWuhan213(green)haslatestturningpointthanitsprovinceHubei(pink),whichhasalater214turningpointthanChinaasawhole(cyan).215216ThetheoreticalresultinSIsuggeststhattheslopeofthelinearlineis-T/ σR

2,where217σR is the standard deviation of the R(t) profile. Ingeneral,theslopecanbedifferentfor218differentregionswithdifferentlevelsofquarantineandepidemiccharacteristics.219ThehospitaltreatmentefficacywouldinfluenceTdirectly.Theeffectofquarantine220wouldinfluencethevalueofσN,thestandarddeviationofthenewlyinfected,andso221indirectlyR(t)andσR.OurempiricalresultfromFig.2howevershowsthattheslope222isthealmostthesamefordifferentregionsinChina,implyingthatefficacyof223treatmentandlevelofquarantineaffectTandσ2proportionally.224225Result:ThederivativeoflogN(t)andoflogR(t)iseachalinearfunctionoftime,226withknownslope.Theirinterceptwiththezeroderivativelineyieldsthetimefor227theirrespectivepeak.228

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 8: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

8

229Validation230Empirically,thederivativeoflogN(t)orlogR(t)liesonastraightline,asshownin231Fig.3(althoughthescatterislargerastobeexpectedforanydifferentiationof232empiricaldata).Thepositiveandnegativeoutliersonedaybeforeandafter12Feb233arecausedbythespikeupandthendown,withlittleeffectonthefittedlineartrend234(butincreasesitsvarianceandthereforeuncertainty).Moreover,thestraightline235extendswithoutappreciablechangeinslopebeyondthepeakofN(t),suggesting236thatthedistributionofthenewlyinfectednumberisapproximatelyGaussian.The237meanrecoverytimeTcanbepredictedastR −tN ,wheretRisthepeakofR(t)andtN238isthepeakofN(t).Thesetwopeaktimescanbeobtainedbyextendingthestraight239lineinFig.3tointersectthezeroline.Thispredictedresultcanbeverified240statisticallyafterthefactbythelaggedcorrelationofR(t)andN(t).Ifthe241distributionisindeedGaussianorevenapproximatelyso,theslopeinFig.3would242beproportionaltothereciprocalofthesquareofitsstandarddeviation,σ,as(See243SI):244

d logN(t)

dt=−(t −tN )

σ N2 . 245

246Similarlyresultholdsforthedailynumberofrecovered,R(t).247248AftertheepidemicisnearingtheendasisthecaseinChina,fittingthedatatoa249Gaussiancanbedoneafterthefact(seeFiguresS3andS4).Thefitissatisfactory250evenwithoutusinganydisposalparameters.Theparametersusedaredetermined251usingslopesoflogNandlogR(seeTableS1)252253TheinferredstatisticalcharacteristicsoftheCovid-19epidemicaresummarizedin254TableS1forvariousregions.ThemeanrecoverytimeT,isabout13daysforChina255asawhole.ForWuhan,thecityattheepicenterwhosehospitalsweremore256overwhelmedandthepatientsadmittedintohospitalsmoreseriouslyillthanthose257inotherprovinces,T~16days,whilethatforHubeiis14days.Thestandard258deviation,σ,isfoundtobearound8days,withslightdifferencebetweenthatfor259N(t)andforR(t),withoneexceptionforHubeioutsideWuhan.Suchafine260subdivisionmaynotbepracticalforthedataqualitywehave.Theσ tendstobe261smallerforChinaasawholethanWuhan.OnecanseethatTandσ2 indeed varying 262approximately in proportion. 263 264The peak infected cases: 265

Writing N(tN )=N(tB )exp{ n(t)dt}

tB

tN∫ , and noting n(t)= d

dtlogN(t)= − (t −tN )

σ N2 , the 266

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 9: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

9

exponent is

n(t)dt = (tN −tB )2

2σ N2 = 12tB

tN∫ n(tB )(tN −tB ). Hence the peak infected case number 267

can be predicted, using the predicted value for tN starting from a conveniently chosen 268

time tB , such as the latest time with data available, as: 269

N(tN )=N(tB )exp{

n(tB )(tN −tB )2 } . 270

271Accumulatedquantities.272TocalculateI(t)usingreporteddata,onlyconfirmedcasesareused(WecallitEIC).273Itisgivenbytheaccumulatednewlyconfirmedcasesminustheaccumulated274confirmedrecovered.Sincetheaccumulationofearlypoordatacanintroduce275errorsamorelocal-in-timeformulaisgivenas:276277

I(t) = N (t)dt−−∞

t

∫ R(t)−∞

t

∫ dt = N (t)dt− N (t−T )−∞

t

∫−∞

t

∫ dt

= N (t)dt.t−T

t

∫278

Thatis,tofindIattimet,oneonlyneedstoaddupthedailynewlyinfectedcase279numbersforaperiodofTprecedingt.Thisisanalmostlocal-in-timepropertyeven280forthisaccumulatedquantity.Forvalidation,weestimatethepeakoftheIcase281numberon18Februarybycomputingthesumofdailynewlyinfectedcasenumbers282for15days,fromFebruary4toFebruary18,whichyieldsapeakvalueforthetotal283infectedcaseson18Februaryof54,747.Thisiswithin10%ofthereportednumber284of57,805,evenaftertakingintoaccountthedeaths(bysubtractingthe285accumulateddeathsof2,004fromourestimate).286287

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 10: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

10

288Figure3Thederivativeofthelogarithmofdailynewlyinfectedorrecovered.289Noticetheclearseparationofthenewandrecoveredcasesandalsothesubtle290differenceoftheirslopes.Thezerocrossingsofthetrendlinegivethepeakdatesof291thenewandrecoveredcaserespectively.Andtheslopesgiveanestimateofσ 292values. InthisFigure,thefollowingabbreviationsareused:C=China;H=Hubei;293N=NewCase;R=Recovered.2942953.Predictability296PredictionoftheturningpointusingNRratio.297Wefirstdiscusshowthetrueturningpointcanbedeterminedfromdataafterithas298occurred.Thenwegiveamethodforpredictingthistruevalueinadvanceand299assesstheaccuracyoftheforecastsasfunctionofdaysinadvancewhenthe300predictionismade.Anote:afterthismanuscriptwassubmittedforreviewthe301predictionsthatwemadepreviouslyhavecometopass.Althoughconsequentlythe302valueofourpredictionshasgreatlydiminished,itgivesusachancetocompareour303predictionsagainstthetruths.Thismodelvalidationprocessisimportantifweare304toapplythesamemethodforpredictiontootherregions.305306Theturningpointandtheendoftheepidemicarethetwomostwatchedmarkerson307itsdevelopment(28,29),alongwiththenumberofinfectedateachstageofthe308epidemic.Therearevariousdefinitionsoftheturningpoint.Acommononedefines309theturningpointoftheepidemicasthereporteddailynumberofnewlyinfected310reachingapeakandthendeclining.Thisistheonetoutedinthevariousnews311announcements,andalsousedbysomeresearchgroups(22).Thefactthatthe312numberofnewlyinfectedreachingapeakandthendecliningdoesnotnecessarily313

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 11: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

11

implythattheepidemichas“turnedacorner”,becausethetotalnumberofactive314infectedcanstillberisingwiththeassociatedurgentneedforadditionalmedical315resources,suchashospitalbeds,isolationwardsandventilators.Furthermore,316locatingthispeakishighlysusceptibletodataglitchesandchangeindiagnostic317definition.Amoremeaningfulturningpointshouldbebasedonthenumberof318confirmedinfectedindividuals,designatedasEIC)(15),reachingapeakandthen319startingtodecline.EICisintheoryobtainablefromdataofthedailynumberofnew320confirmedcases,N(t),andthedailynumberofnewlyrecovered,R(t),bysubtracting321theaccumulatedsumofR(t)fromtheaccumulatedsumofN(t).Analysisofthis322accumulatedquantityissensitivelyaffectedbyaccumulationofpoorerearlydataof323reportedcases,includingunder-reportingandunder-detectionofthenumberof324infectedcausedbyinsufficienttestkits,inadditiontothehistoryofchanging325diagnosticcriteria.Moreoverinpracticeitspeakisoftennotdetecteduntilseveral326weeksafterithasoccurred.327328SincethemaximumofEIC,canbelocatedbythezeroofitsderivative,wepropose329usingalocal-in-timemetricofN(tp)=R(tp)atthepeakofEIC,tp.330331ReferringtoFigureS1,forChinaaswhole,tpisfoundtobeFebruary18;forHubei,332theprovinceoftheepicenterWuhan,tpisfoundtobe19February,andforChina333outsideHubei(ChinaexHubei),12February,coincidentallyonthesamedayasthe334Hubeidataspike.HoweverthereisnosuchbumpinthedataoutsideHubei,andso335isnotlikelytheresultofthedataartifact.Theseresults,evenincludingthatfor336Hubei,arenotaffectedbythehistoricaldataproblemsbecauseofourlocal-in-time337methodfordeterminingtheturningpoint.338339Cansuchaturningpointbepredictedbeforeithappened,andifsobyhowmany340daysinadvance?SincethelogarithmofNRliesonastraightlinepassingthrough341theturningpointofEIC,itwouldbeinterestingtoexploreiftheturningpointcanbe342predictedbyextrapolationusingdataweeksbeforeithappenedbyextrapolation343alongthestraightline(seeFigureS5).Howfarinadvancethiscanbedoneappears344tobelimitedbythepoorqualityoftheinitialdata.Fig.4showstheresultsofsuch345predictions.Thehorizontalaxisindicatesthelastdateofthedatausedinthe346prediction.Thebeginningdateofthedatausedis24Januaryforallexperiments.347Priortothatday,dataqualitywaspoorandthenewlyrecoverednumberwaszero348insomedays,givinganinfiniteNRratio.349350ForChinaoutsideHubei,thepredictionmadeon6Februarygivestheturningpoint351as14February,twodayslaterthanthetruth.Apredictionmadeon8February352alreadyconvergedtothetruthof12February,andstaysnearthetruth,differingby353nomorethanfractionsofadaywithmoredata.354355Thehugedataglitchon12FebruaryinHubeiaffectedthepredictionforHubei,for356Chinaaswhole,andforHubei-exWuhan.Thesethreecurvesallshowabumpup357starting12February,astheslopeofN(t)isartificiallylifted.Ironically,predictions358madeearlierthan12Februaryareactuallybetter.Forexample,forChinaasawhole,359

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 12: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

12

predictionsmadeon9Februaryand10Februarybothgive19Februaryasthe360turningpoint,onlyonedayoffthetruthof18February.Apredictionmadeon11361Februaryactuallygivesthecorrectturningpointthatwouldoccuroneweeklater.362Atthetimethesepredictionsaremade,thenewlyinfectedcaseswererisingrapidly,363byover2,000eachday,andlaterbyover14,000.Itwouldhavebeenincredulousif364oneweretoannounceatthattimethattheepidemicwouldturnthecorneraweek365later.366367EvenwiththehugespikefortheregionsaffectedbytheHubei’schangingof368diagnosiscriteria,becauseofitsshortdurationtheartifactaffectsthepredicted369valuebynomorethan3days,andthepredictionaccuracysoonrecoversforChina370asawhole.ForHubei,thepredictionneverconvergestothetruevalue,butthe371over-predictionisonly2days.Thissmallnessoftheerrorisremarkablegiventhat372othermodelpredictionsdifferbyweeksormonths.373374TableS2liststhemeanandstandarddeviationofthepredictions.Forapplications375toothercountriesandtofutureepidemicswithoutachangeinthedefinitionofthe376“infection”tosuchalargeextent,weexpectevenbetterpredictionaccuracy.377378379

380381Figure4PredictionoftheturningpointinEICbyextrapolatingthetrendin382logarithmofNR.Thehorizontalaxisindicatesthedatethepredictionismadeusing383datapriortothatdate.Theverticalaxisgivesthedatesofthepredictedturning384point.Dashedhorizontallinesindicatedthetruedatesfortheturningpoint,as385determinedfromFig.S1.386 387

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 13: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

13

Estimateof“allclear”declaration388Wecannowestimateatimeforadeclarationof“allclear”.Noverificationisyet389possibleasthepredicteddatehasnotoccurred.Attheturningpoint,theEICisstill390atitspeak.Forthediseasetohaverunitscourse,andan“allclear”declarationcan391beannounced,werequirethatthenewlyinfectedcasenumbertodroptozero.For392predictionpracticethis“zero”ismeasuredbythreestandarddeviationsfromthe393peakofN(t).Thenwewaitfortwoincubationperiods,each14days,topass,before394wedeclare“allclear”.UsingtheinferreddiseasecharacteristicsinTableS1,our395predictionis,forChinaoutsideHubei:thelastweekofMarch.ForChinaasawhole:396thefirstweekofApril,barring“imports”ofinfectedfromabroad.Atthispointthere397maystillbesomepatientsinthehospitalwhoareinfectedwiththevirus.The“all398clear”callassumesthatthesepatientsarenotroamingfreelytocausenew399infections.400401PredictionforSouthKorea402FigureS6summarizedtheavailabledataforKoreaatthepresent.Therecovered403casenumbershoveredaround1and2dailyuptoMarch1st.Itonlypickedup404towardtheend.Startingfrom19February,thereseemstobeenoughnewdaily405infectedcases.TheSouthKoreaGovernmenthasidentifiedthattheepiccenterof406theepidemicwasatchurchgatheringsinthecityofDaeguandNorthGyeongsang407province,where90%ofthecasesarefound.Specifically,aconfirmedCOVID-19408patientwasreportedtohaveattendtheShincheonjiChurchofJesusservicestwice409onFebruary9thand16th.Giventheincubationperiodof7to14days,theinitial410explosionatFebruary19thandthefirstpeakvaluearoundFebruary24tharenot411accidents.412413Ifweusetheavailabledailynewcasesdata,wecangetthestatisticalcharacteristics414ofthedistributionofthedailynewcasesfromFigureS7,whichgivesthetNasMarch4153rdandaσ N valueof4.5days.Ifwefurtherusetheturningpointasapproximately416tN+T/2,thentheturningpointshouldfallonMarch10,assumingTas14daysbased417ontheoverallmeanfromdifferentregionsinChina.418419FortheNRratio,itislimitedbytheavailabilityofrecoveredcasenumber.Ifweuse420thelimitedrecoveredcasesstartingfromMarch1st,wehave7daysofdata.The421computedtheNRratiotogetherwiththetrendisgiveninFigureS8.Theturning422point,atthezero-crossingoftheextendedtrendline,wouldoccurbetweenMarch42311thand12th.ThisapproachdoesnotneedtouseavalueforT.424425AnestimateoftheendoftheepidemiccanbegivenasthesecondweekofApril,426usingtheestimatedvaluefortN=3March,σ=4.5days.Remarkably,thisdateis427aroundthesametimeasforWuhan,China.SouthKoreaowesitsquickturningpoint428andendoftheepidemicdatetoitsabilitytoidentitythefirstinfectionandthe429secondaryinfectionsatShincheonjiChurch(31),wheremostoftheinfectedwere430concentrated.Thisisreflectedinthedata:σforSouthKoreaisonlyhalfthatof431China,withamorerapidriseandfallofthenewlyinfected.Itsdataforthenewly432

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 14: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

14

infectedareprobablymoreaccuratecomparedtoothercountriesinsimilarstageof433theepidemic,duetoitsmassiveandspeedy(within6hours)testingofthe434populationinits“trace,testandtreat”policy.4354364.Effectsofquarantinejudgingfromdataonthenetinfectionrate.437Following(15),wedefineatime-dependentnetinfectionrateas:438

α(t)= dI dt

I= ddtlog I(t). 439

Intraditionalmodels,suchastheSIRmodel,thereisalsoatime-dependent440infectionrate,whichatt=0isrelatedtotheBasicReproductiveNumberR0.Ifthis441numberisgreaterthan1thenanepidemicwillensue,i.e.theinfectedpopulation442willincreaseexponentiallyaftertheintroductiontoasusceptiblepopulationSatt=0443someinitialinfected.Thatis,fromtheSIRmodelequation:444

dI dt = aSI −bI = bI(aS /b−1), 445

whereaS(t)istheinfectionrateandb=1/T isthemeanrecoveryrate. 446

Thereforeα(t)= dI /dt

I= b(aS

b−1).α(0)= b(R0 −1),withR0 =

aS(0)b

. 447

Ourtime-dependentnetinfectionrategeneralizesthisconcepttobeindependentof448theSIRorothermodelsandbeapplicableatlatertimesaswell:Ifinthecourseofan449epidemic,α(t) ispositive,thenumberofinfectedwillgrowexponentially,reaching450apeaknumberofinfectedwhenα(t)=0att = tp .Thenthetotalnumberofactive451

infectedwilldecreaseexponentially.OnecouldinanalogytoR0 ,defineatime-452

dependentReproductiveNumberRt =α(t)T −1 ,sothatifthisnumberisgreater453(less)than1thenumberofinfectedwillgrow(decrease)attimet .Wewillhere454useα(t) directly.455456Sinceα(t)hasazeroanditsfirstderivativenearthezeroisnonzero(viznegative)457because

tp isthemaximumofI(t),itisalinearfunctionoftwithnegativeslopein458thatneighborhood.Thisexpectationisverifiedempirically,usingdataforthetotal459existingcasenumbers.Wefindthatthistimedependentinfectionrateis460approximatelyalinearfunctionoftimeintheneighborhoodofitszerooveraperiod461ofafewweeks.462463Thisgivesanotherwaytopredicttheturningpoint

tpwhichcanbeusedinsteadof464(butislessaccuratethan)theNRratio,duringtheearlystageoftheepidemicwhen465notenoughR(t)dataisavailable,asisthecasecurrentlyforUS.466467ThepeakEICnumbercanbepredictedas468

I(tp)= I(tB )exp{12α(tB )(tB −tp)}, wheretB isthelastavailabledatabeforethe469

turningpoint.Itisassumedthatα(t) liesonastraightlinebetweentB andtp .470

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 15: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

15

Predictingthepeakactiveinfectedcases471Sincetheturningpointcanbepredictedtwoweeksinadvance,theaboveformula472canbeusedtopredictthepeakEICnumbers.473474Predictingthetotalinfectedcases(TIC):475

TIC(t)= N(t)dt .0

t

∫ 476

Topredictthetotalinfectedcasesfortheepidemicforaregion,weneedtodothe477aboveaccumulationofN(t)intothefuture,totheenddateoftheepidemic.That478totalisapproximately:479

TIC∞ =2⋅TIC(tN ) ,480assumingthatN(t)isapproximatelysymmetricaboutitspeakattN.InrealityN(t)481maynotbesymmetricandlikelyhasalongtail.However,sincethenumberofcases482alongthetailissmall,theaboveapproximationforthetotalisstillgood.Ifthe483presenttimetBisbeforetN,weneedawaytopredictTIC(tN ). Letthetotalinfection484ratebedefinedas:485

β(t)= d

dtlogTIC(t)=

ddtTIC(t)TIC(t) = N(t)

TIC(t). 486

Byextrapolatethetotalinfectionrateforwardintimewecanpredict:487

TIC(tN )=TIC(tB )exp{ β(t)dt}.

tB

tN∫ 488

USA:tNispredictedtobe2.2daysfromtoday,April5.Today’sTICis308,850.Sothe489predictedtotalinfectedcasesfortheepidemicwhenitisoverispredictedtobe490708,750.Thisisaverylargenumber,ninetimeslargerthanthatofChina,whichhas491amuchlargerpopulation,butisneverthelessmuchlowerthansomeother492predictionsofafewmillioninfected.ItscurrentEICis285,000,andispredictedto493peak12daysfromnowat547,000.Thisisthepeakdemandforhospitalbeds.494Germany:Germanyhasgooddata.ItstNwas2daysago.OnthatdayitsTICwas49585,000.Sothetotalinfectedcasesfortheepidemicwhenitisoverispredictedtobe4962timesthat:170,000.ItscurrentEICis68,248.ThepeakEICis69,627,whichwill497occurtwodaysfromnow.498Spain:tNoccurredonMarch31.OnthatdayitsTICwas94,417.ThetotalTICwhen499theepidemicisoverispredictedtobetwicethat,188,800.ItspeakEICispredicted500tobe84,250,tooccur4daysfromnow.501UK:ItscurrentTICis41,000.ItstNis7.5daysfromnow.CalculatingitsTICatthat502timeandthendoublingityieldsatotalTICof134,000whentheepidemicisover.503ThepeakEICispredictedtobe83,200,16daysfromnow.Thisisthepeakdemand504forhospitalbeds.(ItshouldbenotedthattherecoveredcasenumberforUKis505unusuallylow,currentlyat209,whilethedeadismuchhigher,at4,900.Thedata506maybedoubtful.)507508Commentsontheeffectsofquarantine509

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 16: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

16

Additionallythenetinfectionraterevealstheeffectofmeasurestakenwithsocial510distancingandquarantine.Figure5showsthetime-dependentnetinfectionratefor511eachregionstartingwhenthenewlyconfirmedcasesexceed100.Thiswayof512plottingfacilitatescomparisonofdifferentregionsatthesamestageoftheepidemic.513First,ChinaoutsideHubeihasthelowesttime-dependentinfectionrateafterHubei514waslockdown.GermanyandItalyhavesimilarexponentialgrowthrateofthenet515infectedcasenumbers,bothhigherthanevenWuhan,theepicenterinChina.More516surprisingly,UShasthehighestexponentialnetinfectionrate,higherthanGermany517andItalyandChina.ThiscanbeattributedtothefactthatUSsofardoesnothavea518nation-wideshutdown,unliketheseothercountries.Secondly,ChinaoutsideHubei519reacheditsturningpointearly,infact20daysearlierthantheepicenter,Wuhan.520Wehadpreviouslypredictedthis,whichisqualitativelydifferentthanmanymodel521predictions,whichhadtheepicenterachievingitsturningpoint1-2weeksearlier522thanChinaoutsideHubei(13).ItalyandGermanyarepredictedtotakeaweek523longertoreachtheirturningpoint,whileUSwilltakeanotherweeklongerthanit524willtakeGermanyandItaly.525526ThatItalywouldtakethesameamountoftimetoreachitsturningpointasGermany527andhasapproximatelythesamenetinfectionratemaybeduetotworeasons:Italy528doesnottestaswidelyasGermanyandsothecasenumbersrepresentasmaller529portionoftheinfected.SecondlyGermanyhasthelowestfatalityratewhileItalyone530ofthehighest.Thenumberofthedeadisaboutthesameasthenumberofcuredfor531Italy.Thedeadisincludedinrecovered/removed.Ifitwerenotincluded,EIC532wouldhavebeenhigherforItaly.Nevertheless,whetherdeadorcured,thehospital533bedisvacated.534535536

537 538539

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 17: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

17

Figure5.Thetime-dependentnetinfectionrate(inunitsof1/day)asafunctionof540timestartingonthedate(listedintheinset)whenthenewlyconfirmedcase541numberexceeds100foreachregion.Toobtaintheactualcalendardate,addthe542datesonthehorizontalaxistothestartingdateindicatedintheinset.Thenumberof543confirmedcasesonthestartingdateislistedatthetop.5445455.Conclusion.546Weofferanadditionaldata-drivenapproachtotrackandpredictthecourseofthe547epidemic.Manyparameterscharacterizinganepidemiccanbedeterminedfrom548local-in-timedata.Validatedbyrealdata,wesuggestthatourapproachcouldbe549appliednotjusttothecurrentCovid-19epidemic,butalsogenerallytofuture550epidemics.Itcouldalsobeusedasapracticaltoolforepidemicmanagement551decisionssuchasquarantineinstitutionandmedicalresourceplanningand552allocations(32-35).553554Tworesultsareofspecialsignificanceforfuturepolicymakers.Firsttheturning555pointfortheepidemicinChinaexHubeioccurredaweekearlierthanthatfor556Wuhan.SecondtheUSwilltake2weekslongertoreachitsturningpointthaneven557Wuhan.Afteritslockdown,Wuhan,withalargesusceptiblepopulationof11million,558enforcedstraightsocialdistancing,whichismorestrictthanthatadoptedintheUS..559Asaconsequence,evenwithalargepoolofpotentialsusceptible,theoutbreakcould560endsooner,ascomparedtothetimeitwilltakeUSandEurope.ForChina,the561lockdownofWuhanandHubeiwasthereasonwhytheepidemicoutsideHubeiwas562undercontrol,andtheturningpointoccurredearlier.InWuhan,withhospitals563facingthenumberofinfectedpatientsfarexceedingavailablehospitalbedsinthe564initialperiod,someinfectedpatientswerenotadequatelyisolated.Theinfected565weresenthomeandcausedsecondaryinfectionsamongfamilymembers.This566mighthaveplayedaroleindelayingtheturningpoint.Ontheotherhand,outside567Hubei,hospitalswerenotasoverwhelmedbecauseofthestrictquarantineplaced568onHubei,whichdrasticallyreducedtheimportofthediseaseoriginatingfrom569Hubei.Theinfectedwerebetterisolated,reducingfurtherspread,andtreatedin570hospitals,resultinginshortertimetorecovery(seeTableS1).Thisisevidenceofthe571effectivenessofthecityandprovince-widelockdownin“flatteningthecurve”572outside.573574TheadditionalandsurprisingfindingthatthenetinfectionratesinItaly,Germany575andtheUSarehigherthanevenWuhanalsomanifeststheeffectoftheenforcement576oflockdown,stay-at-homeandstrictsocialdistancingpolicyinWuhan,whichwas577muchstricterthanthoseadoptedcurrentlyinEuropeandUS.Themorelaxlatitude578andlackofenforcementintheUSwillleadtoalongerperiodoftheepidemic,longer579thanevenItaly,andthelargestnumberoftotalinfectedcasesof710,000beforeitis580over.581582583584 585

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 18: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

18

References5865871. DavidAdam,ModelersStruggletoPredicttheFutureoftheCOVID-19Pandemic.588

The Scientist, https://www.the-scientist.com/news-opinion/modelers-struggle-589to-predict-the-future-of-the-covid-19-pandemic-67261(2020).590

2. WHO,Laboratorytestingofhumansuspectedcasesofnovelcoronavirus(nCoV)591infection:interimguidance,WorldHealthOrganization,Geneva(2020).592

3. N.Zhu,D.Zhang,W.Wang,X.Li,B.Yang,J.Song,X.Zhao,B.Huang,W.Shi,R.Lu,593P. Niu, F. Zhan, A novel coronavirus from patients with pneumonia in China,5942019.N.Engl.J.Med.382,727-733(2020).595

4. R.Lu,X.Zhao,J.Li,P.Niu,B.Yang,H.Wu,W.Wang,H.Song,B.Huang,N.Zhu,Y.596Bi,X.Ma,F.Zhan,L.Wang,T.Hu,H.Zhou,Z.Hu,W.Zhou,L.Zhao,….,W.Tan,597Genomic characterisation and epidemiology of 2019 novel coronavirus:598implications for virus origins and receptor binding. The Lancet 395(10224),599565-574(2020).600

5. Y. Liu, A. A. Gayle, A. Wilder-Smith, J. Rocklöv, The reproductive number of601COVID-19 is higher compared to SARS coronavirus. J. Travel Med. taaa021602(2020).603

6. J.W.Glasser,N.Hupert,M.M.McCauley,R.Hatchett,Modelingandpublichealth604emergency responses: Lessons from SARS. Epidemics 3: 32-37 (2011),605doi:10.1016/j.epidem.2011.01.001.606

7. P.Zhou,X.Yang,X.Wang,B.Hu,L.Zhang,W.Zhang,H.Si,Y.Zhu,B.Li,C.Huang,607H. Chen, J. Chen, …, Z. Shi, A pneumonia outbreak associated with a new608coronavirusofprobablebatorigin.Nature579,270-273(2020).609

8. C.Huang,Y.Wang,X. Li, L.Ren, J. Zhao,Y.Hu, L. Zhang,G. Fan, J. Xu,X.Gu, Z.610Cheng,T.Yu,J.Xia,Y.Wei,W.Wu,X.Xie,W.Yin,H.Li,M.Liu,Y.Xiao,H.Gao,L.611Guo, J. Xie,G.Wang,R. Jiang, Z.Gao,Q. Jin, J.Wang,B.Cao,Clinical featuresof612patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet613395(10223),497-506(2020).614

9. J.F-K.Chan,S.Yuan,K.-H.Kok,K.K.-W.To,H.Chu,J.Yang,F.Xing,J.L.BNurs,C.615C.-Y. Yip,R.W.-S. Poon,H.-W.Tsoi, S. S.-F. Lo,K.-H.Chan,V.K.-M.Poon,W.-M.616Chan,J.D.Lp,J.-P.Cai,V.C.-C.Cheng,H.Chen,C.K.-M.Hui,K-Y.Yuen,Afamilial617cluster of pneumonia associated with the 2019 novel coronavirus indicating618person-to-person transmission: a study of a family cluster. The Lancet619395(10223),514-523(2020).620

10. X.Xu,P.Chen,J.Wang,J.Feng,H.Zhou,X.Li,W.Zhong,P.Hao,Evolutionofthe621novelcoronavirusfromtheongoingWuhanoutbreakandmodelingof itsspike622proteinforriskofhumantransmission.Sci.ChinaLifeSci.63,457-460(2020).623

11. Z.Chen,W.Zhang,Y.Lu.C.Guo,Z.Guo,C.Liao,X.Zhang,Y.Zhang,X.Han,Q.Li,W.624lanLipkin, J. Lu, FromSARS-CoV toWuhan2019-nCoVOutbreak: Similarity of625Early Epidemic and Prediction of Future Trends. Biorxiv preprint (2020),626doi:https://doi.org/10.1101/2020.01.24.919241.627

12. J. M. Read, J. R. E. Bridgen, D. A. T. Cummings, A. Ho, C. P. Jewell, Novel628coronavirus 2019-nCoV: early estimation of epidemiological parameters and629epidemic predictions. medRxiv preprint (2020), doi:630https://doi.org/10.1101/2020.01.23.20018549.631

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 19: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

19

13. J. T. Wu, K. Leung, G. M. Leung, Nowcasting and forecasting the potential632domestic and international spread of the 2019-nCoV outbreak originating in633Wuhan,China:amodellingstudy.TheLancet395(10225),689-697(2020).634

14. S.Zhao,S.S.Musa,Q.Lin,J.Ran,G.Yang,W.Wang,Y.Lou,L.Yang,D.Gao,D.He,635M. S. Wang, Estimating the Unreported Number of Novel Coronavirus (2019-636nCoV)CasesinChinaintheFirstHalfofJanuary2020:AData-DrivenModelling637AnalysisoftheEarlyOutbreak.J.Clin.Med.9,388(2020).638

15. N.E.Huang,F.Qiao,Adatadriventime-dependenttransmissionratefortracking639an epidemic: a case study of 2019-nCoV. Sci. Bull. 65, 425-427(2020) ,640https://doi.org/10.1016/j.scib.2020.02.005.641

16. Q. Li, W. Feng, Trend and forecasting of the COVID-19 outbreak in China. J.642InfectionarXiv:2002.05866v1,(2020).643

17. H.Xiong,H.Yan,Simulating the infectedpopulationandspread trendof2019-644nCov under different policy by EIR model. medRxiv preprint (2020), doi:645https://doi.org/10.1101/2020.02.10.20021519.646

18. L.Damon,E.Brooks-Pollock,M.Bailey,M.J.Keeling,AspatialmodelofCoVID-19647transmission in England and Wales: early spread and peak timing. medRxiv648preprint(2020),doi:https://doi.org/10.1101/2020.02.12.20022566.649

19. H.Sun,Y.Qiu,H.Yan,Y.Huang,Y.Zhu,S.Chen,TrackingandPredictingCOVID-65019 Epidemic in China Mainland. Medrxiv preprint (2020),651doi:https://doi.org/10.1101/2020.02.17.20024257.652

20. Q. Liu, Z. Liu,D. Li, Z. Gao, J. Zhu, J. Yang,Q.Wang,Assessing theTendency of6532019-nCoV (COVID-19) Outbreak in China. medRxiv preprint (2020), doi:654https://doi.org/10.1101/2020.02.09.20021444.655

21. L.Peng,W.Yang,D.Zhang,C.Zhuge,L.Hong,EpidemicanalysisofCOVID-19in656Chinabydynamicalmodeling.arXiv2002.06563,(2020).657

22. D.Cyranoski,Whenwillthecoronavirusoutbreakpeak?Naturenews(2020).65823. C. R. MacIntyre, Global spread of COVID-19 and pandemic potential.Global659

Biosecurity1(3),(2020).66024. WHO, Coronavirus latest:WHO describes outbreak as pandemic, Nature news661

(2020),https://www.nature.com/articles/d41586-020-00154-w.66225. K. Kupferschmidt, J. Cohen, Can China’s COVID-19 strategy work elsewhere?663

Science367(6482),1061-1062(2020).66426. J. M. Read,J. R.Bridgen,D. A.Cummings,A.Ho,C. P.Jewell, Novel coronavirus665

2019-nCoV: early estimation of epidemiological parameters and epidemic666predictions.medRxiv(2020),doi:10.1101/2020.01.23.20018549.667

27. S.Zhao,Q.Lin,J.Ran,S.S.Musa,G.Yang,W.Wang,Y.Lou,D.Gao,L.Yang,D.He,668M.H.Wang,Preliminaryestimationof thebasic reproductionnumberofnovel669coronavirus(2019-nCoV)inChina,from2019to2020:Adata-drivenanalysisin670the early phase of the outbreak. Int. J. Infect. Dis.,92: 214-217(2020),671https://doi.org/10.1016/j.ijid.2020.01.050.672

28. W. Kermack, A. McKendrick, A contribution to the mathematical theory of673epidemics.Proc.Roy.Soc.LondonA115,700-721(1927).674

29. D. L.Heymann,N.Shindo, COVID-19: what is next for public health?675Lancet,395(10224):542-545(2020), https://doi.org/10.1016/S0140-6766736(20)30374-3.677

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 20: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

20

30. The Novel Coronavirus Pneumonia Emergency Response Epidemiology Team,678The Epidemiological characteristics of an outbreak of 2019 novel coronavirus679disease(COVID-19)-China,2020,ChinaCDCWeekly(2020).680

31. E.Shim,A.Tariq,W.Choi,Y.Lee,G.Chowell,TransmissionpotentialofCOVID-19681in South Korea. medRxiv preprint, (2020), doi:682https://doi.org/10.1101/2020.02.27.20028829.683

32. C. M.Peak,L. M.Childs,Y. H.Grad,C. O.Buckee, Comparing nonpharmaceutical684interventions forcontainingemergingepidemics.Proc.Natl.Acad.Sci.114(15):6854023-4028(2017),doi:10.1073/pnas.1616438114.686

33. R. S.Dhillon,D.Srikrishna, When is contact tracing not enough to stop an687outbreak? Lancet Infect. Dis.,18:1302-1304 (2018),688https://doi.org/10.1016/S1473-3099(18)30656-X.689

34. X.Pang,Z.Zhu,F.Xu,J. Guo, X. Gong, D. Liu, Z. Liu, D. P. Chin, D. R. Feikin,690Evaluation of control measures implemented in the severe acute respiratory691syndromeoutbreakinBeijing,2003.JAMA,290(24):3215-3221(2003).692

35. G.Wang,N.E.Huang,F.Qiao,Quantitativeevaluationoncontrolmeasuresforan693epidemic:AcasestudyofCOVID-19.Sci.Bull.65(2020),doi:10.1360/TB-2020-6940159.695

36. J.D.Murray:MathematicalBiologyI.ThirdEdition.Springer,551pp.696697

698699700 701

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 21: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

21

Acknowledgements:NEHandFQaresupportedbytheNationalNaturalScience702FoundationofChinaunderGrant41821004.KKT’sresearchissupportedbythe703FredericandJuliaWanEndowedProfessorship.704CompetingInterests:Theauthorsdeclarenocompetinginterests.705DataAvailability:AlldatainthisstudyarepubliclyavailablefromWorldHealth706Organization(WHO)athttps://www.who.int/emergencies/diseases/novel-707coronavirus-2019/situation-reports/708andontheDailyBriefsiteoftheChina’sNationalHealthCommissionat709http://en.nhc.gov.cn/710TheKoreandataisavailableat711https://sa.sogou.com/new-weball/page/sgs/epidemic712CoronavirusCOVID-19GlobalCasesbyJohnsHopkinsCSSE713https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594747140fd40299423467b48e9ecf6715716 717

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 22: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

22

SupplementaryInformation:718719CommentsonSIR-typeofmodels720MainstreamepidemiologicalmodelshavetheiroriginintheSIR(Susceptible,721Infected,andRecoveredorRemoved)model(28)anditsmanyvariations.We722discussherewhyexistingmodelpredictionsvarysowidelybycommentingonthe723assumptionsunderlyingthesemodels,usingtheclassicalSIRmodelasanexample.724725Thenumberoftheinfectedisincreasedbytherateofinfection,whichisassumedto726beaIS ,anddecreasedbytherateofrecovery/removed,whoareassumedtobe727immunetofurtherinfection,bI .(dI /dt = aIS −bI ).Thislattercategory,R,increases728attherateofbI. (dR/dt = bI ).Bothoftheparameters,especiallytheinfectionrate729a,islargelyunknownforanemergentdisease(1),andhavetobeeitherestimated730basedonlimitedstatisticsthenumberofcontactsasingleinfectedpersonmayhave731andhowmanyofthecontactedpeoplewillbeinfected,orobtainedbycurvefitting732ofreportedI(t).Thisapproachhasproblems.First,forCovid-19,thereisa733populationofasymptomaticinfected,whichmaybelargerthanthe734reported/confirmedIduringtheearlystagesoftheepidemic,whenwide-scale735testingisnotavailable.Sincethisasymptomaticinfectedisalsoinfectious,and736someofthoseinfectedbythemcouldlaterbecomeconfirmedinfected,fittingthe737rateofincreaseofreportedItoestimatetheinfectionratewouldinevitablyyielda738muchlargera.Consequentlythemodelpredictionsusingthisestimatedparameter739valuemayyieldaverylargepeakinfectionnumber.Secondly,since740dI /dt = aI(S −b/a) ,thenumberofinfectedwillincrease(approximately741exponentially)forS >b/a,butforanewvirustowhichthepopulationhasno742immunity,thesusceptiblepopulationcouldbeverylarge.ForChinaitcouldbeas743largeas1.4billion.ForHubeiitis65million.Thesusceptiblepopulationshouldbe744lowerifthequarantineofWuhanweretight,butitisdifficulttoestimatewhatitis745underrealisticconditions.Thirdly,predictionsoftheendoftheepidemicvary746widelybecauseofthebasicassumptioninsomeSIRtypeofmodelsthatifmostof747thepopulationisinfectedandrecovered(andhenceacquiredimmunity)ordead748(andhencenolongerinfectious).Theconceptof“herdimmunity”isrootedinthe749ideathatwithenoughofthepopulationacquiringimmunitythisway(orfroma750vaccineifoneisdevelopedintime),thesusceptiblepopulationisreducedto751S <b/a ,andtherateofincreaseofIwillturnnegative.Thiswillrequireasmuchas75270%-90%ofthepopulationbeinfected,ahugenumber.Modernmodelstakeinto753accountoftheeffectofquarantineandisolationinreducingthenumberofpeople754eachinfectedcancontact,thusreducingtheinfectionratea,sothatSdoesnothave755tobereducedtosuchanextentfortheepidemictopeakandIstartstodecrease.756However,suchestimatesareverydependentonmodelassumptions.757758OfcoursethemodernmodelsofepidemiologyaremoresophisticatedthantheSIR759model(12-14,17,21).760761

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 23: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

23

Weofferhereanadditionaltoolthathastheadvantagethatithasdoesnotdepend762ontheelusiveinfectionrateorthesusceptiblepopulation,informationneededfor763mostmodels,buthasthedisadvantagethatitcannotbeusedwhentheepidemic764firststartedandthedataareinaccurateorincomplete.Itisbasedondailycase765numbers(i.e.newlyconfirmedcases),N(t),andrecoveredcases,R(t).766767Ourestimateoftheenddateoftheepidemicisnotbasedonthenumberof768susceptibles,S,approachingzeroasinmostmodels(i.e.mostofthepopulationis769infected,henceacquiringimmunity),butN(t)approachingzeroandremainingso770fortwoincubationperiods.Thefirstincubationperiodistoallowtheasymptomatic771infectedtoshowsymptomsandthesecondperiodtoallowthosethatareinfected772bytheasymptomaticinfectedtoshowsymptoms.Forpredictionpurpose,thedate773whentheN(t)iszeroisestimatedby3standarddeviationsfromitspeak.These774twoquantitiescanbeextractedfromthedataastheepidemicisdeveloping.Our775estimateoftheendoftheepidemicisearlierthanmostmodelpredictions,usually776significantlyso,becauseitdoesnotdependontheherdimmunityconcept.777778779THEORY:780Definition:LetI(t) bethenumberofactiveinfectedattimet . Itschangeisgiven781by;782

ddtI =N(t)−R(t), 783

whereN(t) isthenumberofnewlyinfected,andR(t) thatofthenewlyrecoveredor784removed(dead).Notethatforthetheorypart,N(t)includesbothconfirmedand785unconfirmedcases.Theterm:ExistingInfectedCase(EIC)numberisusedtodenote786theconfirmedI(t)whenwedealwithdata.787788Let

tp ,theturningpointdefinedasthepeakoftheactiveinfectednumber.Atthis789pointmaximummedicalresourceisneeded.Thismaximumoccurswhen790

ddtI =0,implyingN(tp)= R(tp). 791

ThereisnoneedtofirstfindI(t)tolocatethispeak.Aftertheturningpoint,the792newlyrecoveredstartstoexceedthenewlyinfected.Thedemandformedical793resources,suchashospitalbeds,isolationwardsandrespirators,startstodecrease.794795Lett =0 bewhenthefirstinfectionbegan.ForWuhan,China,thisdateisnearthe796endof2019,perhapsevenearlier.LettB bethebeginningofthebetterqualitydata.797Thistimeisbeyondtheinitialincubationperiodofthediseaseanditcanbe798assumedthatatthattimethereisalreadyapopulationofinfected,someofthem799asymptomaticbutneverthelessinfectious.800801

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 24: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

24

LetX(t ,s) bethenumberofinfectedcasesattimet ,withs beingthe“age”802distribution,i.e.numberofdayssick.803Thetotalnumberofinfectedisgivenby:804

I(t)= X(t ,s)ds0

T

∫ .805

AfterbeingsickforTdays,apatienteitherrecoversorisremoved(dead).Tiscalled806therecoveryperiod(orremovalperiod).Itisalsocalledtheinfectioustimeifthe807patientisinfectiousduringthisperiod.Ofcourseitsvaluevariesbypatientandby808theefficacyoftreatmentforeachhospital.Fortheremoveditalsodependsonthe809ageofthepatientandwhetherthereareunderlyingmedicalconditions.Onlya810meanrecoveryperiodisobtainablefromdata,andsothisisinrealityastatistical811quantity.Wewilldiscusslaterhowthisstatisticalquantitycanbeobtainedfrom812data.813814Conservationlaw(seeMurray:MathematicalBiologyPartI,Chapter1(36)):815Afterfirstinfectedanduntilremovedorcured,wehave:816

dX(t ,s)= ∂∂tX idt + ∂

∂sX ids =0,0< s <T .

So,sinceds /dt =1,∂∂tX + ∂

∂sX =0.

817

ThisequationistobesolvedusingthemethodofcharacteristicsasX(t ,s)= constantalongcharacteristicsdefinedbyds dt =1. 818

Boundarycondition:X(t ,0),specifiesthe"birth"process,i.e.howthediseasespawnsnewlyinfected(with"age"s =0).Initialcondition:X(0,s)=0fors >0,specifiestheinitialagedistributionatt =0

819

Therearetwotypesofcharacteristics:820(i)s > t , (ii)s < t . 821Thefirsttypeofcharacteristicsintersectsthet =0 axis,andsincetheinitial822conditioniszero,wehavethesolution:823824

X(t ,s)≡0fors > t . 825826Thatis,thereisnoinfectedpopulationwhoissickformoredaysthanthelapsed827timesincethefirstinfectionoccurred.828829Forthesecondtypeofcharacteristics,t > s thesolutionis830831

X(t ,s)= f (s −t) 832withtheformoff tobedeterminedbytheboundarycondition.Evenwithout833determiningtheformoff wehavethefollowinggeneralresults:834Fort >T , andthereforet > s : 835

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 25: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

25

I(t)= X(t ,s)ds0

T

∫ = f0

T

∫ (s −t)ds

= ft−T

t

∫ (p)dp.836

ddtI = f (t)− f (t −T). 837

838SincetherateofincreaseofconfirmedI(t) isbydefinitionequaltothenewly839confirmedinfectednumber,N(t),minusthenewlyrecovered(orremoved)number,840R(t),wehave:841

N(t)−R(t)= d

dtI = f (t)− f (t −T). 842

843Forafataldiseasewithlowfatalityrate,wherealmostallinfectedcaseseventually844recoverafterahospitalstayofTdays,wecanidentify845f (t)withN(t),andf (t −T)withR(t). 846Ifthediseasehasanon-negligiblefatalityrate,weincludethedeadinR(t). 847MainResult:Thedailynewlyrecovered/removednumberR(t),isrelatedtothe848dailynewlyinfectednumberN(t)as,fort >T :849

850R(t)=N(t −T). 851

Validation:Thisfundamentalrelationshipcanbevalidatedstatisticallywithdata.852Figures1,obtainedusingdatafromChinaduringtheCovid-19epidemic,showsthat853N(t)andR(t)arehighlycorrelated:withcorrelationcoefficientof0.95whenboth854distributionsaresmoothedwith5-pointboxcar.Theunsmootheddailydataalso855yieldahighcorrelationcoefficientof0.80,withR(t)laggingN(t)byT~15days.Both856correlationcoefficientsarestatisticallysignificant.Asimilarresultisfoundfor857Hubei(FigureS2)andotherregions(notshown).Thisisoneofthewaysthemean858recoveryperiodisdeterminedstatisticallyfromdata,butitisnotpracticalinthe859earlyphaseoftheepidemic.Wewillgivedifferentmethodsforthelatterpurpose.860TheresultonTisconsistentwiththatestimatedorpredictedlaterusingtheslopeof861thedistributioninFigure4.Thelatter,obtainedbytheinterceptofthestraightline,862islessaccuratebecauseoftheslopeisrathershallow.Fortheregionsconsideredin863Figure1,thefatalityrateissmallandsothedeadarenotincludedinR(t)for864convenience.865866Thesecondtypeofcharacteristicsintersectstheboundarys =0 .Theboundary867conditionitselfneedstobesolvedasafunctionofttodescribehownewinfection868(ats =0 )occurs.Thiscanbedoneusingabirthmodel,suchasEq.(1.56)in(36).869Forourpurposeweassumethatthesolutionofthismodelyieldsadistributionwith870agethathasafullspectrum0< s <T ofinfectivesatatimetB ,longafterafull871incubationperiodhaspassed.872

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 26: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

26

X(tB ,s)= f0(s)= Aexp{−(s − s0)22b2 };Aindependentofs.

b= 12T .

873

874Thereforethesolutionis,fort > tB >0: 875

X(t ,s)= X(tB ,s −t)= f0(s −t)

= Aexp{− (s − s0 −t)2

2b2 }.876

877

N(t)= Aexp{− (t − s0)2

2b2 }= Aexp{− (t −tN )2

2b2 }

R(t)=N(t −T)= Aexp{− (t −tR )2

2b2 },878

wheretN isthepeakofN(t), andtR = tN +T isthepeakofR(t). 879BothdistributionsareGaussians.880881

FortB < t <T ,

882

I(t)= X(t ,s)ds + X(t ,s)dst

T

∫0

t

∫ = f (s −t)ds + 0ds

t

T

∫0

t

∫ = f (p)dp

−t

0∫

883

ddtI = f (−t)= Aexp[− (t − s0)

2

T2 ].

884

N(t)= Aexp[− (t −tN )2

T2 ]

R(t)=0.885

Again,N(t)isGaussian,butthereisnorecoveredorremovedduringthisearlystage.886887MainResult:ThenaturallogarithmoftheratioofNandRisalinearfunctionof888timefort >T :889890

NR(t)= N(t)R(t) =

N(t)N(t −T)

= exp{− (t −tN )2

2b2 +(t −tN −T)2

2b2 }= exp{ T2

2b2 −T(t −tN )b2

}.

logNR = −T(t −tN )−12T

2

b2

891

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 27: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

27

alinearfunctionoft . Thisrelationshipisimportantforthepurposeofforecast892becauseitiseasytoextrapolatefromastraightlineintothefuture.893894Itintersects0att −tN =

12T . Thisyieldstheturningpoint,whentheNRratiois1,and895

thereforeitslogarithmiszero.896897Result:Theturningpoint,definedasthemaximumofI(t),isgivenbytp = tN +

12T . 898

Result:TheslopeoflogNRisequalto4/T . 899900WhentimeisnormalizedbyT , thederivativeisgivenby:901

d logNRd(t /T) =

T2

b2= 4,adimensionlessconstant.

902

903HeterogeneousData904Theaboveresultsareobtainedforthecaseofasingleintroductionintoaregionof905infectedatt=0andwesolveforthesubsequentdevelopmentoftheepidemicfrom906thatsinglesource.Considernowalargeregionconsistingofanumberofsmall907regions,andthe“seeding”oftheinfectedoccursatdifferenttimesfordifferent908regions.ThelargeregioncouldbeChina,andthefirstinfectioncouldbeWuhan,909HubeiandthentheregionsoutsideHubei.ThenwemayhavefortheChinaasa910wholedataforthenewlyinfectedasumofseveralGaussiansstaggeredintime.As911longastheGaussiansarenotseparatedsomuchthattherearedifferentpeaksinthe912combineddata,thecombineddatacanstillbeconsideredasGaussian,asisthecase913intherealdata.However,thestandarddeviationσ ofthecombinedGaussianis914inevitablylargerandisnolongergivenbyb:915

N(t)= B

2πσ N

exp{− (t −tN )2

σ N2 } .916

WestillhaveR(t)=N(t −T) sincethisresultholdsforeachsub-region.Theresult917

thatlogNR isalinearfunctionoftimestillholds:918

logNR(t)= log N(t)

N(t −T) = −T(t −tN )− 1

2T2

σ N2 .919

TheslopeofthestraightlineisT σ N2.920

921SincethehospitalstatecanaddasasmoothingfilteronN(t)toyieldR(t),the922standarddeviationforR(t)couldbeslightlywiderthanthatforN(t).Sowecould923havetwodifferentGaussians(buttheirintegraloveralltimeshouldbethesame):924

N(t)= B

2πσ N

exp{− (t −tN )2

2σ N2 };R(t)= B

2πσ R

exp{− (t −tR )2

2σ R2 }. 925

Takingthisintoaccount,wehave,denotingT = tR −tN :926

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 28: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

28

logNR− logσ R

σ N

= −{(t −tN )2

2σ N2 −

(t −tN −T)22σ R

2 }= − (σ R2 −σ N

2 )(t −tN )2 +2σ N2T(t −tN )−σ N

2T2

2σ N2σ R

2 927

928AsthevaluesofσN and σR areveryclosebasedontheempiricaldata,thequadratic929termisalwayssmallcomparingtotheothertermsforthelengthoftimeweare930consideringhere.Hence.931

logNR(t)= 1

σ R2 {−T(t −tN )+ 1

2T2} ,932

alinearfunctionoftime.Itsslopeis− Tσ R

2 . 933

Result:ThenaturallogarithmoftheratiooftwoGaussiansofslightlydifferent934standarddeviationisapproximatelyastraightline.935936ValidationoflogNRasastraightline:937Fromdataweusethereportnewlyconfirmedcasenumberandtherecoveredcase938numbertodefineNRratioas939

NR(t)=N(t)/R(t).940Attp,NR=1.941942WeshowinFigure2,usingthedataoftheepidemicforCOVID-19,thatthe943logarithmofNR(t)liesonastraightline,withsmallscatter,passingthroughthe944turningpointtp.Anddataforvariousstagesoftheepidemic,fromtheinitial945exponentialgrowthstage,tonearthepeakofEIC,andthenpastthepeak,alllieon946thesamestraightline.TheinterceptwithlogNR=0yieldstheturningpoint.This947line,obtainedbylinear-least-squarefitinthesemi-logplot,islittleaffectedbythe948ratherlargeartificialspikeinthedataon12Februarybecauseofitsshortduration949andthelogarithmicvalue.Thatreportingproblemisnecessarilyofshortduration950because,onthedateofdefinitionchange,previousweek’scasesofinfected951accordingtothenewcriteriawerereportedinoneday.Afterthat,thebookis952cleared,andN(t)returnedtoitsnormalrange.953954955Thetheoreticalresultsuggeststhattheslopeofthelinearlineis-T/ σR

2,whereσR is 956the standard deviation of the R(t) profile. Ingeneral,theslopecanbedifferentfor957differentregionswithdifferentlevelsofquarantineandepidemiccharacteristics.958ThehospitaltreatmentefficacywouldinfluenceTdirectly.Theeffectofquarantine959wouldinfluencethevalueofσN,thestandarddeviationofthenewlyinfected,andso960indirectlyR(t)andσR.OurempiricalresultfromFig.2howevershowsthattheslope961isthealmostthesamefordifferentregionsinChina,implyingthatefficacyof962treatmentandlevelofquarantineaffectTandσ2proportionally.963964ValidationoftheslopeoflogN(t)andlogR(t)965Interestingly,thederivativeoflogN(t)orlogR(t)alsoliesonastraightline,as966showninFig.3(althoughthescatterislargerastobeexpectedforany967

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 29: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

29

differentiationofempiricaldata).Thepositiveandnegativeoutliersonedaybefore968andafter12Febarecausedbythespikeupandthendown,withlittleeffectonthe969fittedlineartrend(butincreasesitsvarianceandthereforeuncertainty).Moreover,970thestraightlineextendswithoutappreciablechangeinslopebeyondthepeakof971N(t),suggestingthatthedistributionofthenewlyinfectednumberisapproximately972Gaussian.ThemeanrecoverytimeTcanbepredictedastR −tN ,wheretRisthepeak973ofR(t)andtNisthepeakofN(t).Thesetwopeaktimescanbeobtainedbyextending974thestraightlineinFig.3tointersectthezeroline.Thispredictedresultcanbe975verifiedstatisticallyafterthefactbythelaggedcorrelationofR(t)andN(t).Ifthe976distributionisindeedGaussianorevenapproximatelyso,theslopeinFig.3would977beproportionaltothereciprocalofthesquareofitsstandarddeviation,σ,as:978

d logN(t)

dt=−(t −tN )

σ N2 . 979

980Similarlyresultholdsforthedailynumberofrecovered,R(t).981982AftertheepidemicisnearingtheendasisthecaseinChina,fittingthedatatoa983Gaussiancanbedoneafterthefact(seeFiguresS3andS4).Thefitissatisfactory984evenwithoutusinganydisposalparameters.Theparametersusedaredetermined985usingslopesoflogNandlogR(seeTableS1)986987TheinferredstatisticalcharacteristicsoftheCovid-19epidemicaresummarizedin988TableS1forvariousregions.ThemeanrecoverytimeT,isabout13daysforChina989asawhole.ForWuhan,thecityattheepicenterwhosehospitalsweremore990overwhelmedandthepatientsadmittedintohospitalsmoreseriouslyillthanthose991inotherprovinces,T~16days,whilethatforHubeiis14days.Thestandard992deviation,σ,isfoundtobearound8days,withslightdifferencebetweenthatfor993N(t)andforR(t),withoneexceptionforHubeioutsideWuhan.Suchafine994subdivisionmaynotbepracticalforthedataqualitywehave.Theσ tendstobe995smallerforChinaasawholethanWuhan.OnecanseethatTandσ2 indeed varying 996approximately in proportion. 997998

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 30: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

30

9991000FigureS1.Thedailynewlyinfected(inblue)andthedailynewlyrecovered(inred),1001asafunctionoftimeforChinaasawhole(insolidlines)andHubei(indottedlines).1002Theturningpointisdeterminedbywhentheredandbluecurvescross.1003Inset:ForChinaoutsideHubei.10041005

10061007FigureS2.LaggedcorrelationofR(t)withN(t)forHubeiprovince.10081009

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 31: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

31

1010FigureS3.GaussianfitofN(t)andR(t),forChinaasawhole.10111012

1013FigureS4.GaussianfitofN(t)andR(t),forHubeiProvince.10141015 Crossing t0 t1 T Sigma N Sigma R China

2/18 2/11 2/24 13 7.5 7.7

Hubei

2/20 2/12 2/26 14 7.9 8.5

Wuhan

2/21 2/14 3/01 16 8.8 8.8

C exHubei

2/12 2/08 2/21 13 7.5 7.1

H exWuhan 2/16 2/13 2/27 14 5.0 8.8

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 32: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

32

TableS1:StatisticalcharacteristicsoftheCOVID-19epidemicindifferentregionsin1016Chinainferredfromdata,forN(t),thedailynumberofnewlyinfectedandforR(t),1017thedailynumberofrecovered.10181019 China Hubei China-Hubei Hubei-WuhanTruth(data) 18 19 12 15NRRatio

20.3±1.6(Feb20nd)

22.3±1.0(Feb22rd)

12.4±0.9(Feb12th)

16.0±1.2(Feb16th)

1020TableS2:Predictedturningpointdates.Shownarethemeanandstandard1021deviationofthepredictionsoverthepredictionperiod,usingtheNRratiomethod10221023

1024FigureS5.PredictionoftheturningpointofEICusinglinearleast-squarestrends1025usingvariousdatalengthsforChinaexHubei.Alldatausedstartfrom24January.1026Differentcoloredstraightlinesshowthelineartrendcalculatedfrom24Januarytoa1027particulardate.Thespreadisoveraverysmallrange.Thenthesetrendsare1028extrapolated(extrapolationsnotshown)tointersectthezerolinetoyielda1029predictionfortheturningpoint.Thebluedotsarethedata.10301031103210331034

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 33: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

33

10351036FigureS6:TheavailabledatafromSouthKorea(asofMarch7th).Thesporadic1037recoveredcasenumbersaremostlyinthesingledigit.Ifweusethesuddenincrease1038ofrecoveredcasematchingwiththesuddenexplosiveincreaseofnewinfected,the1039distanceisapproximately14days,areasonableTvaluewhencomparedtothe1040meanvalueinChina.Forourdataanalysis,weuseddailynewlycasesstarting1041February19th,forthederivativeofindividualdistributionstudy;weuseddatacase1042fromMarch1st,fortheNRratiostudy,inordertohaveenoughrecoveredcases.1043

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint

Page 34: A data-driven model for predicting the course of COVID-19 ... · 3/28/2020  · 45 epidemic using the daily data on the infection and recovery. This data-driven tool 46 can predict

34

1044FigureS7:Thederivativeofthelogarithmicvalueofdailynewinfectedcase1045distribution.10461047

1048FigureS8:TheNRratiofrom7daysofdatafromMarch1stto7th.Theestimated1049zero-crossingtimewouldoccurbetweenMarch11thand12th,avalueconsistent1050withthestatisticsfromthedailynewcasedistributiononMarch10th.1051

. CC-BY-NC-ND 4.0 International licenseIt is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 12, 2020. ; https://doi.org/10.1101/2020.03.28.20046177doi: medRxiv preprint