peer review file manuscript title: reconstruction of the ...10.1038/s41586-020-255… · 3....

1

Peer Review File

Manuscript Title: Reconstruction of the full transmission dynamics of COVID-19 in

Wuhan

Redactions – Mention of other journals

This document only contains reviewer comments, rebuttal and decision letters for versions

considered at Nature. Mentions of the other journal have been redacted.

Reviewer Comments & Author Rebuttals

Reviewer Reports on the Initial Version:

Referee #1 (Remarks to the Author):

Thank you for the opportunity to review this work. I have a few comments:

1. I think my major comments relates to overdispersion of R0 in COVID-19. Such

overdispersion, which is reflected in both "dead end" cases that don't transmit, and

superspreader events, is an important manifestation of this disease, and one which may

make it challenging to control.

It seems as though there might be room in the stochastic simulations on re-emergence to

factor in an overdispersed R0. That in turn might provide some insights into how "luck" and

superspreading may result in very different outcomes, from stochastic extinction to

explosive resurgent epidemics.

2. Lines 3-15 really do reflect official narratives related to COVID-19 emergence; some

might question these. There should be references for these statements, or this section

should be revised.

3. Page 5, line 16: "For example, misspecification of the infectious period by 25% lower

would lead to ~20% overestimation in the transmission rate but only ~10%

underestimation in 𝑅0, both for the first period". It is predictable that lower serial interval

would result in lower R0 estimation. This sentence is misleading, as the contribution of R0

to growth is exponential, so these changes aren't proportionate on a linear scale.

4. The authors may wish to note that their estimates of unrecognized case fraction are

consistent with those seen in many emerging serological studies, including those completed

this week in New York, California and Switzerland.

2


I enjoyed reading this paper. It is the first study that incorporates our most up-to-date

understanding of the epidemiology of the novel coronavirus including the role of subclinical

and asymptomatic infections on the transmission dynamics together with the

comprehensive dataset of confirmed cases from Wuhan, the pandemic's epicenter. Overall,

their analysis and results are sound. It highlights the large R0 from the epicenter and the

role of undetected infections, and the consequences on the dynamics and control of the

epidemic. I have a few comments:

1. Some of the estimates reported should be accompanied by confidence intervals.

2. Authors compare R0 estimates derived from different areas. Their references need to be

updated to account for more recent studies including estimates from Hong Kong, Singapore,

and Korea.

3. Strictly speaking R0 applies to the early growth phase of the epidemic in the absence of

interventions. Some reproduction number estimates from some settings are influenced by

the early onset of social distancing interventions. This is the case for countries like Taiwan,

Singapore, Korea, and Hong Kong. I do not think this was made sufficiently clear in the

discussion.


Overall this is a very interesting analysis that attempts to understand the effect that Wuhan

interventions had on the course of the epidemic, as well as how these dynamics might

influence the course of the epidemic once interventions are lifted. The analysis is predicated

on a simplistic epidemiological model that appears to lack known and important features

related to COVID-19 transmission dynamics, and there may be key issues with the current

statistical fitting approach. These major issues are outlined below, and the results will be

significant, if they can be properly addressed. We also would like to take the time to thank

the authors for the rigorous sensitivity analysis to explore possible biases, as well as for

making their code and data available. These resources were extremely important for helping

us properly understand the methodology the authors used.

Major Concerns

1. Data

a. From ref.1, we see that COVID-19 case definitions changed multiple times,

dramatically impacting the observed Wuhan epidemic curve. Seven

different case definitions for COVID-19 were used in Wuhan throughout the

https://paperpile.com/c/3NVE9V/UKyQ

3

Jan-March 2020 study period. Each revision changed the case total, for

example cases increased by 7.1 times from version 1 to 2, 2.8 times from

version 2 to 4, and 4.2 times from version 4 to 5. The authors should more

explicitly consider these case definitions in their analysis, as they will have

dramatic repercussions for the results of the analysis.

2. Parameterization

a. The authors use a somewhat extended incubation period where there is

no transmission (5.2 days). This period of time before someone is

infectious is known to be shorter now, and important for epidemiological

dynamics 2,3

. Either the authors should include a non-transmitting

incubation period that is short, followed by a pre-symptomatic

infectiousness period, or reformulate their model in a different appropriate

manner.

b. The infectious period is extremely short (2.3 days) compared to other

estimates 2. The authors attempt to fit the proper serial interval by

combining the incubation period and the infectious period, however recent

studies suggest a shorter incubation period and longer period of

infectiousness than the authors assume. This is important in the analysis,

because the infectious period of individuals connects the effects of 1 period

being analyzed on the other periods, so changes to this duration could

influence the resulting epidemic dynamics and estimated R0.

c. The ratio of transmission rate between ascertained and unascertained cases is

assumed to be 1 in the main scenario. There is now fairly significant

evidence that this is likely not to be the case 3,4

. I appreciate your

sensitivity analyses in S5 and S6 for this, but it further highlights the issue,

because of the large effect that this parameter can have on the overall

results. I would suggest choosing the value for this parameter in a more

systematic way.

3. Model formulation

a. It is unclear why the authors are modeling a hospitalized compartment.

Individuals move from the infectious compartment to being hospitalized at

a rate that is set based on half the time between symptom onset and the

case reporting date. It appears the hospitalization compartment is more

similar to a quarantine compartment than hospitalized. This has a major

impact on transmission dynamics, because in the final time period

considered, the modeled infectious period has a duration of roughly 1.4

days, with a 69% chance of ascertained cases being hospitalized. Is this

suggested by the data, or is there a specific reason the authors model

hospitalized individuals? Also, it would be helpful if the authors could

explain why this time period is determined to be 50% the median time

https://paperpile.com/c/3NVE9V/yEPy%2BhuLK

https://paperpile.com/c/3NVE9V/yEPy

https://paperpile.com/c/3NVE9V/huLK%2BdiHo

4

between symptom onset and reporting date.

b. People transition from the exposed compartment (E) to ascertained or

unascertained infectious groups. This assumes that cases are reported right

away and doesn’t allow for changes in how quickly cases are detected nor

how the infectiousness of reported cases might change over time -- as

quarantine and isolation improves, ascertained cases may be unlikely to

transmit. Maybe this is contained within the speed in which people become

hospitalized mentioned above, but if the authors have data on the timing of

symptom onset and the timing of detection then it would seem cleaner to

model these dynamics more explicitly.

4. Fitting methodology

a. One of the main concerns from the paper is that it doesn’t seem that the

authors may have the necessary data or power to detect the true

ascertainment rates for each of the five periods. The ascertainment rate is

basically the reporting rate for the number of cases, and the authors are

fitting to the number of cases. In such a situation, one could imagine a

number of combinations of transmission rates and reporting rates that

could give rise to similar detected epidemic trajectories. It’s unclear if this

is the case for this paper, and the authors should do more to ensure

proper fitting. For example there is high correlation between the MCMC

sampling of a number of important parameters (Figure 1), which should be

resolved to ensure proper sampling for the posterior distributions of these

parameters.

b. One other consideration is the simulation study the authors propose to

validate their fitting procedure. We greatly appreciate the author’s

determination to validate their fitting procedure, but would request that the

simulated data be generated in a slightly more complex fashion.

Specifically, the authors could run the same study with a different

ascertainment rate in the second period. It is also fairly clear that these

parameter estimates can be dramatically impacted if parameters are

misspecified (Extended Data Fig. 2). We recommend running a similar

analysis to better understand the impact that misspecification of Dq and

alpha might have on eventual fits.

5

Figure 1: Pairwise scatterplot from traceplots of a single MCMC chain, showing extremely

high correlation between five of the estimated parameters.

Refs.

1. Tsang, T. K. et al. Effect of changing case definitions for COVID-19 on the epidemic

curve and transmission parameters in mainland China: a modelling study. Lancet

Public Health (2020) doi:10.1016/S2468-2667(20)30089-X.

2. He, X. et al. Temporal dynamics in viral shedding and transmissibility of COVID-19. Nat.

Med. (2020) doi:10.1038/s41591-020-0869-5.

3. Ferretti, L. et al. Quantifying SARS-CoV-2 transmission suggests epidemic control

with digital contact tracing. Science (2020) doi:10.1126/science.abb6936.

4. Liu, Y. et al. Viral dynamics in mild and severe cases of COVID-19. Lancet Infect. Dis.

(2020) doi:10.1016/S1473-3099(20)30232-2.

http://paperpile.com/b/3NVE9V/UKyQ










http://paperpile.com/b/3NVE9V/yEPy






http://dx.doi.org/10.1038/s41591-020-0869-5

http://dx.doi.org/10.1038/s41591-020-0869-5

http://paperpile.com/b/3NVE9V/huLK









http://paperpile.com/b/3NVE9V/diHo







6

Author Rebuttals to Initial Comments:

Referee #1 (Remarks to the Author)

Thank you for the opportunity to review this work. I have a few comments:

1. I think my major comments relates to overdispersion of R0 in COVID-19.

Such overdispersion, which is reflected in both "dead end" cases that don't

transmit, and superspreader events, is an important manifestation of this

disease, and one which may make it challenging to control.

It seems as though there might be room in the stochastic simulations on re-

emergence to factor in an overdispersed R0. That in turn might provide some insights

into how "luck" and superspreading may result in very different outcomes, from

stochastic extinction to explosive resurgent epidemics.

Response: We thank the reviewer for this insightful comment. As the reviewer

pointed out, superspreading events would lead to overdispersion of the number of

secondary cases infected by each primary case (denoted as ν) and would influence

the probability of resurgence due to stochastic effects. This has been discussed in

detail by Lloyd-Smith et al. (Nature 2005, Superspreading and the effect of individual

variation on disease emergence). Our model assumes homogeneous transmission

and thus ν is expected to follow an exponential distribution. Different from the

branching process model used by Lloyd-Smith et al. (Nature 2005), we used a

population-based differential-equation model, which is difficult to account for the

heterogeneous individual effect. We have therefore added discussion on the impact

of superspreading events and admitted as a limitation of our analysis (page 10 lines

27-29). On the other hand, we expect the superspreading events to have a small

impact on our estimates of population average R0 and the ascertainment rates

because our model fit the observed data well, and that these estimates were derived

from an established outbreak where stochastic fluctuations caused by superspreading

events were less pronounced.

2. Lines 3-15 really do reflect official narratives related to COVID-19

emergence; some might question these. There should be references for these

statements, or this section should be revised.

Response: Thank you for the suggestion. We have added references to the statements.

3. Page 5, line 16: "For example, misspecification of the infectious period by

25% lower would lead to ~20% overestimation in the transmission rate but

only ~10% underestimation in 𝑅0, both for the first period". It is predictable

that lower serial interval would result in lower R0 estimation. This sentence is

misleading, as the contribution of R0 to growth is exponential, so these

7

changes aren't proportionate on a linear scale.

Response: Thank you for the suggestion. We have now removed the misleading

sentence and added more detailed discussion on the simulation results (see page 5

lines 10-16).

4. The authors may wish to note that their estimates of unrecognized case

fraction are consistent with those seen in many emerging serological studies,

including those completed this week in New York, California and Switzerland.

Response: Thank you for the nice suggestion. We noted the findings from recent

serological studies in the discussion (page 9 lines 6-8).


I enjoyed reading this paper. It is the first study that incorporates our most up-to-

date understanding of the epidemiology of the novel coronavirus including the role of

subclinical and asymptomatic infections on the transmission dynamics together with

the comprehensive dataset of confirmed cases from Wuhan, the pandemic's

epicenter. Overall, their analysis and results are sound. It highlights the large R0 from

the epicenter and the role of undetected infections, and the consequences on the

dynamics and control of the epidemic. I have a few comments:

Response: We appreciate the highly positive comments on our work.

1. Some of the estimates reported should be accompanied by confidence

intervals.

Response: Thank you for the suggestion. We have included 95% credible intervals

for all our estimates except for the probability of resurgence. We calculated the

probability as the proportion of resurgence out of n=10,000 stochastic simulations,

similar to the approach by Lloyd-Smith et al. (Nature 2005, Superspreading and the

effect of individual variation on disease emergence). As n was large, our estimates of

the probability of resurgence approached the theoretical values with very tight 95% CI.

We thus did not report 95% CI for the probability of resurgence, as was done by

Lloyd- Smith et al. (Nature 2005).

2. Authors compare R0 estimates derived from different areas. Their

references need to be updated to account for more recent studies including

estimates from Hong Kong, Singapore, and Korea.

Response: Thank you for the suggestion. We did not compare R0 from different

areas, because R0 is expected to be different across regions due to different

demographic features and intervention strengths. For example, Singapore started

strong surveillance from early January, three weeks before the first imported case

was identified, and thus had much smaller R0 than Wuhan. Following your

8

suggestion, we compared the estimated R0 from different studies of the early

outbreak in Wuhan. We have added clarification on this point on page 9 lines 20-25.

3. Strictly speaking R0 applies to the early growth phase of the epidemic in the

absence of interventions. Some reproduction number estimates from some

settings are influenced by the early onset of social distancing interventions.

This is the case for countries like Taiwan, Singapore, Korea, and Hong Kong. I

do not think this was made sufficiently clear in the discussion.

Response: We apologize for the confusion. Please see our response to the comment

2 above – we did not compare R0 across regions for exactly the reason pointed out by

the reviewer. We have now clarified in the Discussion that our comparison was for the

basic reproduction number before interventions in Wuhan (page 9 lines 20-25).


Overall this is a very interesting analysis that attempts to understand the effect that

Wuhan interventions had on the course of the epidemic, as well as how these

dynamics might influence the course of the epidemic once interventions are lifted. The

analysis is predicated on a simplistic epidemiological model that appears to lack

known and important features related to COVID-19 transmission dynamics, and there

may be key issues with the current statistical fitting approach. These major issues are

outlined below, and the results will be significant, if they can be properly addressed.

We also would like to take the time to thank the authors for the rigorous sensitivity

analysis to explore possible biases, as well as for making their code and data

available. These resources were extremely important for helping us properly

understand the methodology the authors used.

Response: We thank the reviewer for appreciating the importance of our work, and

the detailed and constructive comments. We have now substantially revised the

paper in response to your comments. Please see our point-to-point response below.

Major Concerns

1. Data

a. From ref.1, we see that COVID-19 case definitions changed multiple times,

dramatically impacting the observed Wuhan epidemic curve. Seven different case

definitions for COVID-19 were used in Wuhan throughout the Jan-March 2020

study period. Each revision changed the case total, for example cases increased by

7.1 times from version 1 to 2, 2.8 times from version 2 to 4, and 4.2 times

from version 4 to 5. The authors should more explicitly consider these

case definitions in their analysis, as they will have dramatic repercussions

9

for the results of the analysis.

Response: Yes, the case definition has changed multiple times as pointed out

by the reviewer. However, we believe our analysis should be robust because:

(1) We have explicitly modeled the ascertainment rate, which was

allowed to vary across different time periods;

(2) Based on key intervention events, we defined five time periods, which

turned out to largely align with the timeline of case definitions (period

2 corresponds to versions 1 and 2; period 3 corresponds to versions

3 and 4; period 4 corresponds to version 5; and period 5

corresponds to versions 6 and 7);

(3) We excluded clinically diagnosed cases without laboratory

confirmation (introduced by version 5), because many patients with flu

or other pneumonia displayed symptoms similar to COVID-19, which

could lead to higher false positive rate without laboratory

confirmation.

Therefore, we believe the impact of case definitions should have been largely

addressed by excluding clinically diagnosed patients and explicitly modeling

different ascertainment rates across time periods. In fact, variation in in our

estimates of ascertainment rate should reflect a combined effect of the evolving

surveillance, interventions, medical resources, and case definitions across time

periods, which are difficult to separate from each other. We have added this

point in the Discussion (page 10 lines 19-24).

2. Parameterization

a. The authors use a somewhat extended incubation period where there

is no transmission (5.2 days). This period of time before someone is

infectious is known to be shorter now, and important for epidemiological

dynamics2,3. Either the authors should include a non-transmitting

incubation period that is short, followed by a presymptomatic

infectiousness period, or reformulate their model in a different appropriate

manner.

Response: This is a very good point. We developed this model to analyze

Wuhan’s data in February when very little was known about COVID-19. As we

have more information now, the model should be updated accordingly. Following

the reviewer’s suggestion, we have incorporated a presymptomatic infectious

period in the revised model, as displayed in Figure R1 below. The parameters,

such as latent period, presymptomatic infectious period, and total infectious

period were set based on the most recent study pointed out by the reviewer (He

et al. [Redacted] 2020, Temporal dynamics in viral shedding and transmissibility

of COVID-19). Briefly, we set the latent period to 2.9 days, presymptomatic

infectious period to 2.3 days, and symptomatic infectious period to 2.9 days in

the main analysis, followed by a series of sensitivity analyses based on the 95%

credible intervals of these quantities reported in the literature. (Figure 1; page 4

lines 13-21; page 18 line 30, page 19 lines 1-26; page 20 lines 3-19; page 20

10

lines 25-26; page 21 lines 21-29)

Figure R1. Illustration of the SAPHIRE model. We extended the classic SEIR

model to include seven compartments, namely S (susceptible), E (exposed), P

(presymptomatic infectious), I (ascertained infectious), A (unascertained

infectious), H (isolated), and R (removed). (A) Relationship between different

compartments in the model. Two parameters of interests are r (ascertainment

rate) and b (transmission rate), which are assumed to be varying across time

periods. (B) Schematic timeline of an individual from being exposed to the virus

to recovery without isolation. In this model, the unascertained compartment A

includes asymptomatic and some mild symptomatic cases who were not

detected. While there is no presymptomatic phase for asymptomatic cases, we

treated asymptomatic as a special case of mild symptomatic and modeled both

with a “presymptomatic” phase for simplicity.

b. The infectious period is extremely short (2.3 days) compared to other

estimates2 . The authors attempt to fit the proper serial interval by

combining the incubation period and the infectious period, however recent

studies suggest a shorter incubation period and longer period of

infectiousness than the authors assume. This is important in the analysis,

because the infectious period of individuals connects the effects of 1

period being analyzed on the other periods, so changes to this duration

could influence the resulting epidemic dynamics and estimated R0 .

Response: Please see our response to the comment 2a above. We have

11

revised the model to account for presymptomatic infectiousness and updated

the parameter settings. According to He et al. ([redacted] 2020, Temporal

dynamics in viral shedding and transmissibility of COVID-19), we set

presymptomatic infectious period to 2.3 days, and symptomatic infectious period

to 2.9 days in the main analysis, which sums to a total infectious period of 5.2

days. The total infectious period was determined based on the estimate that

presymptomatic infectiousness accounted for 44% of the total infections (page

20 lines 3-8).

c. The ratio of transmission rate between ascertained and unascertained

cases is assumed to be 1 in the main scenario. There is now fairly

significant evidence that this is likely not to be the case3,4. I appreciate

your sensitivity analyses in S5 and S6 for this, but it further highlights the

issue, because of the large effect that this parameter can have on the

overall results. I would suggest choosing the value for this parameter in a

more systematic way.

Response: We agree with the reviewer that there have been evidences

suggesting mild cases are less infectious than severe. In addition to the

references pointed out by the reviewer, another study (Li et al. [REDACTED]

2020, Substantial undocumented infection facilitates the rapid dissemination of

novel coronavirus (SARS-CoV2)) has estimated that the transmissibility of

unascertained cases was 0.55 (95% CrI: 0.46-0.62) of the ascertained cases.

Accordingly, we have now set 𝛼 = 0.55 in the main analysis and 0.46 and 0.62 in

the sensitivity analyses. It is worth pointing that our estimates of the effective

reproductive number R0 and the ascertainment rate r were robust to different

choices of 𝛼, as illustrated by both simulated data and sensitivity analyses.

(page 5 lines 6-7; page 20 lines 3-4; page 24 lines 5-8; Extended Data Tables

2, 4, 6; Extended Data Figs. 2, 7-8).

3. Model formulation

a. It is unclear why the authors are modeling a hospitalized compartment.

Individuals move from the infectious compartment to being hospitalized at

a rate that is set based on half the time between symptom onset and the

case reporting date. It appears the hospitalization compartment is more

similar to a quarantine compartment than hospitalized. This has a major

impact on transmission dynamics, because in the final time period

considered, the modeled infectious period has a duration of roughly 1.4

days, with a 69% chance of ascertained cases being hospitalized. Is this

suggested by the data, or is there a specific reason the authors model

hospitalized individuals? Also, it would be helpful if the authors could

explain why this time period is determined to be 50% the median time

between symptom onset and reporting date.

Response: We apologize for the confusion. We introduced compartment H to

distinguish ascertained cases and unascertained cases, considering that

12

ascertained cases would be isolated by hospitalization and thus had a shorter

effective infectious period. We have clarified by renaming compartment H as

“isolation” rather than “hospitalization” and explicitly stated our motivation (page

4 lines 19-21). We initially set the Dq to be 50% of the median time between

symptom onset and diagnosis date because we thought patients might have

been admitted to the hospital before confirmed diagnosis by laboratory test.

However, it might be more realistic to assume isolation upon confirmed

diagnosis. So, we have now changed Dq to be the median time between

symptom onset and diagnosis date (page 20 lines 8-11; Extended Data Table 2).

b. People transition from the exposed compartment (E) to ascertained or

unascertained infectious groups. This assumes that cases are reported

right away and doesn’t allow for changes in how quickly cases are

detected nor how the infectiousness of reported cases might change over

time -- as quarantine and isolation improves, ascertained cases may be

unlikely to transmit. Maybe this is contained within the speed in which

people become hospitalized mentioned above, but if the authors have data

on the timing of symptom onset and the timing of detection then it would

seem cleaner to model these dynamics more explicitly.

Response: We have now included a presymptomatic compartment (P) after the

exposed compartment (E) (see Figure R1). The ascertained and unascertained

cases in our data were defined retrospectively and we did not assumed cases

were ascertained right after the onset of symptoms. Our analyses were based on

the date of symptom onset, which was collected retrospectively after confirmed

diagnosis. As the reviewer correctly pointed out, we have incorporated the effect

of isolation on ascertained cases by modeling compartment H and setting the

speed from compartment I to compartment H based on the time delay between

symptom onset and confirmed diagnosis, which became shorter as medical

resources improved (page 4 lines 19-21; page 20 lines 8-11).

4. Fitting methodology

a. One of the main concerns from the paper is that it doesn’t seem that the

authors may have the necessary data or power to detect the true

ascertainment rates for each of the five periods. The ascertainment rate is

basically the reporting rate for the number of cases, and the authors are

fitting to the number of cases. In such a situation, one could imagine a

number of combinations of transmission rates and reporting rates that

could give rise to similar detected epidemic trajectories. It’s unclear if this

is the case for this paper, and the authors should do more to ensure

proper fitting. For example, there is high correlation between the MCMC

sampling of a number of important parameters (Figure 1), which should be

resolved to ensure proper sampling for the posterior distributions of these

parameters.

Response: We have carefully considered this important comment. Based on

13

results from the extensive simulations (including new simulations added in this

revision), we believe our data and model can provide information for estimating

the effective reproduction number R0 and the ascertainment rates r across

different periods when the other model parameters are reasonably specified

(Extended Data Fig. 2). Our simulation results showed that the latent period De,

infectious periods Dp and Di, and the initial ascertainment rate r0 are key

parameters that can affect the results. In our real data analysis, we have

carefully specified the values of De, Dp and Di based on the most updated

literature suggested by the reviewer (page 20 lines 3- 11), and performed

sensitivity analyses to test the robustness of our conclusions (page 23 lines 27-

29; page 24 lines 1-8).

For the initial ascertainment rate r0, we have added a new analysis based on

COVID-19 cases exported from Wuhan to Singapore to obtain a reasonable

estimate of the ascertainment rate in the early phase of the outbreak, and used

the point estimate and 95% CI to specify r0 in the main analysis and sensitivity

analyses, respectively (Extended Data Table 1; page 5 lines 20-22; page 18 lines

12-27; page 24 lines 9-12). We have also tested the theoretical limit by setting

r0=1 in a sensitivity analysis, which provides a lower bound estimate of the

proportion of unascertained cases (page 24 lines 13-14).

Regarding the correlation between MCMC samples of parameters, we have

made several technical improvements. First, we have incorporated the

information of ascertainment rate estimated from Singapore data into the prior

distribution of r12. Second, considering that ascertainment rates in different

periods are naturally correlated because the ending status of a period is the

starting status of the next period, we reparameterized r3, r4 and r5 as

logit(𝑟3) = logit(𝑟12) + 𝛿3



where logit(𝑟) = log ( 𝑟

1−𝑟

), and 𝛿3 , 𝛿4 , and 𝛿5 were sampled from the prior of

(0, 1). Under this new parameterization, we expect 𝑟12, 𝛿3, 𝛿4, and 𝛿5 to be weakly

correlated. Third, we have replaced our original Metropolis-Hastings algorithm with a

more efficient Delayed Rejection Adaptive Metropolis (DRAM) algorithm implemented

in an established R package to ensure convergence and

proper MCMC sampling. See page 21 lines 1-11 for the changes.

The new sampling results were shown below (Figures R2 and R3). As we can see

from Figure R2, b12 and r12 are the only two parameters that remain highly

correlated. This, however, is expected because if we sample a higher transmission

rate b12, the total number of infections will become larger, thus that the ascertainment

rate r12 needs to be smaller in order to fit the observed data. Although b12 and r12 are

highly correlated, it is important to point out that their posterior joint distribution is

centralizing in a small parameter space (Figure R2), suggesting we have information

to estimate these parameters.

14

In addition, as shown in Figure R3, the posterior distributions of 𝑟12, 𝛿3, 𝛿4, and 𝛿5

are much narrower than their prior distributions, indicating that we have gained

additional information for these parameters from the data. In our sensitivity analysis

S9, we found that the model without ascertainment rates fit the observed data

significantly worse than our full model with ascertainment rates (Extended Data Fig.

12, p-value = 0 based on likelihood ratio test), also suggesting that the data can

provide information about ascertainment rates.

Finally, we examined the MCMC sampling from simulated datasets, for which we

know the true parameter values. We confirmed that our method can correctly

estimate the parameter values despite strong correlation between MCMC samples of

b1, r1, and δ2 (Figure R4 and Table R1).

15

Figure R2. Pairwise correlations between the MCMC samples of eight parameters.

Numbers in the upper triangle are Pearson correlations, while the lower triangle

displays scatter plots with colors indicating point density.

16

Figure R3. Prior (black) and posterior (red) distributions of four parameters related

to the ascertainment rates (𝒓𝟏𝟐, 𝜹𝟑, 𝜹𝟒 and 𝜹𝟓).

Figure R4. Pairwise correlations between the MCMC samples of four parameters

estimated from simulated datasets (Extended Data Fig. 2). The estimates were based on

simulation settings when other parameters were correctly specified. Numbers in the

upper triangle are Pearson correlations, while the lower triangle displays scatter plots

with colors indicating point density. Red horizontal and vertical lines indicate true values

of the corresponding parameters. We see that the joint posterior distributions of

parameters centered correctly around their true values.

17

Table R1. Comparison of parameter values estimated by MCMC and their true

values used to generate the simulated epidemic curves.

Parameter True value Posterior mean 95% CrI

b1 1.270 1.277 (1.135-1.432)

b2 0.406 0.406 (0.385-0.427)

r1 0.2 0.200 (0.153-0.254)

δ2 0.982 0.981 (0.680-1.359)

r2 0.4 0.401 (0.273-0.556)

(R0)1 3.5 3.516 (3.188-3.877)

(R0)2 1.2 1.201 (1.145-1.257)

Results in this table correspond to those shown in Figure R4.

b. One other consideration is the simulation study the authors propose to

validate their fitting procedure. We greatly appreciate the author’s

determination to validate their fitting procedure, but would request that the

simulated data be generated in a slightly more complex fashion. Specifically,

the authors could run the same study with a different ascertainment rate in the

second period. It is also fairly clear that these parameter estimates can be

dramatically impacted if parameters are misspecified (Extended Data Fig. 2).

We recommend running a similar analysis to better understand the impact

that misspecification of Dq and alpha might have on eventual fits.

Response: We thank the reviewer for appreciating our efforts and sharing our view of

the importance to validate the method rigorously before applying to real data

analysis. Because the model has been updated in the revision, we have redone all

the simulations and analyses using the new model. For the simulation studies, we

have generated simulated data of two periods with different parameter values of the

effective reproductive number R0 and the ascertainment rate r: (R0)1=3.5 and r1=0.2

for period 1 and (R0)2=1.2 and r2=0.4 for period 2 (page 22 lines 27-29; page 23 lines

1-2). We now focus on the estimation of R0 instead of the transmission rate b, because

R0 has direct implications for the epidemic trend and is a quantity of focus in the

research community. As suggested by the reviewer, we have added evaluation of the

impact of misspecification of Dq and α (page 23 lines 7-13). It turns out that

misspecification of these two parameters has little impact on the estimation of R0 and

r for both periods (Extended Data Fig. 2). We have also added sensitivity analyses for

different values of α for the real data, further confirming that our results are

insensitive to α (page 24 lines 5-8; Extended Data Tables 4 and 6; Extended Data

Figs. 7-8).

18

Reviewer Reports on the First Revision:


Thank you. I find the paper much improved.

I am satisfied with responses to my earlier comments.


My comments have been adequately addressed. I only have one additional comment on notation.

The concept of R0 only applies at the early transmission phase of the epidemic. For later time

periods, authors should refer to the reproduction number, frequently denoted by R.


Summary

The authors have gone above and beyond in attempting to address the concerns raised in the first

review, and we thank them for the extended and thorough analyses. Overall the authors have

satisfied the major concerns raised in the initial review. While there are still a couple of major

comments that should be addressed, these are less major than the earlier round and are

suggested for completeness. The results provide interesting retrospective results from the

epidemic in Wuhan and actionable insight for countries still attempting to control the pandemic.

Major comments

1. Model choice: In your equations 1 to 2, you treat ‘A’ and ‘P’ with the same relative ratio of

transmission rate. However, asymptomatic and presymptomatic infections have been estimated to

be different (See ref below). You’ve used the relative infectiousness of “undocumented” infections,

but some of the presymtomatic individuals go on to be ascertained, so it seems incongruent to use

this parameter as such. One option would be to have ascertained and unascertained compartments

completely separate, and you can model the presymptomatic phase separately for the ascertained

group, while the unascertained group is only a single compartment.

2. In your section ‘Estimation of ascertainment rate using cases exported to Singapore’, you try to

estimate the initial ascertainment rate of Jan. 1, 2020 as the starting of your first period for

analysis, but your data starts from the 21st of January to the 28th, which is in the third period of

the analysis. Since this initial rate is important, it seems like this assumption should be better

justified. Also, since the value is important as the denominator, why are flights only investigated

from January 18th to the 23rd?

3. The finding that the ascertainment rates were so low through time, and didn’t seem to change

substantially, should be discussed carefully. Media reports suggested that China dramatically

increased testing and case identification through time. So, it is a surprising result and is worthy of

speculation/explanation. Specifically, these results suggest Wuhan was able to control the

epidemic without vastly increasing case reporting, which may bear on future efforts to contain

spread worldwide.

Minor comments

1. In your new design (Figure 1), you state ‘no presymptomatic phase for asymptomatic cases’,

19

but you add it before the asymptomatic compartment. Fig1 also says

“Symptomatic/asymptomatic” but should actually say ascertained and unascertained infections.

2. How can someone move from infectious to recovered without becoming isolated? The arrow

from I to R is odd, because it would be unlikely for someone to be ascertained if they never wind

up being isolated.

3. The logic for the equation to calculate the duration in the presymptomatic vs infectious period

(page 20 line 7) is unclear. If presymptomatic infectiousness leads to 44% of total secondary

cases, then it’s important to normalize the transmission rate and duration of this compartment

with the infectiousness and duration of the I compartment, rather than just the durations.

4. The authors do a great job investigating many different models in the sensitivity analysis. Since

there are many different proposed models, it might be useful to be able to directly compare the

fits of these models using model comparison. I would suggest using a metric like Bayes factors,

BIC, or AIC and including the results as a new table or added on column of one of the others.

Reference

Ferretti, L. et al. Quantifying SARS-CoV-2 transmission suggests epidemic control with digital

contact tracing. Science 368, (2020).

20

Author Rebuttals to First Revision:


Thank you. I find the paper much improved.

I am satisfied with responses to my earlier comments.


My comments have been adequately addressed. I only have one additional comment on

notation. The concept of R0 only applies at the early transmission phase of the epidemic. For

later time periods, authors should refer to the reproduction number, frequently denoted by R.

Response: Thank you for the suggestion. We have changed R0 to Re, which is a common

notation for the effective reproduction number.


Summary

The authors have gone above and beyond in attempting to address the concerns raised in the

first review, and we thank them for the extended and thorough analyses. Overall the authors

have satisfied the major concerns raised in the initial review. While there are still a couple of

major comments that should be addressed, these are less major than the earlier round and

are suggested for completeness. The results provide interesting retrospective results from the

epidemic in Wuhan and actionable insight for countries still attempting to control the

pandemic.

Major comments

1. Model choice: In your equations 1 to 2, you treat ‘A’ and ‘P’ with the same relative ratio

of transmission rate. However, asymptomatic and presymptomatic infections have been

estimated to be different (See ref below). You’ve used the relative infectiousness of

“undocumented” infections, but some of the presymtomatic individuals go on to be

ascertained, so it seems incongruent to use this parameter as such. One option would be to

21

have ascertained and unascertained compartments completely separate, and you can model

the presymptomatic phase separately for the ascertained group, while the unascertained

group is only a single compartment.

Response: Thank you. We actually tried exactly the same model in our last round of

revision (please see the model and results in Fig. R1 below). But we chose not to use this

model for two reasons. First, unascertained cases (compartment A) include mild-

symptomatic cases, who have a presymptomatic phase. In fact, the definition of

symptoms has been evolving as our knowledge of COVID-19 increases, and there is no

clear fixed boundary between “asymptomatic” and “mild-symptomatic” cases. Some

COVID-related symptoms were not well known in February, e.g., loss of taste and

smell. Hence unascertained cases in Compartment A consist of those who are

asymptomatic and mild-symptomatic including those who have unknown symptoms.

Second, the model illustrated in Fig. R1A implies that ascertained cases and

unascertained cases could be distinguished at the presymptomatic phase, while in

reality, cases in Wuhan were identified after symptom onset. For the second reason, the

model in Fig. R1A couldn’t capture the immediate spike of ascertained cases with

symptom onset on Feb 17-18, when the “universal symptom survey” was implemented

(Fig. R1C). In contrast, the model presented in the paper addresses these two issues.

Therefore, because of these two reasons, we prefer to keep our current model.

The difference in transmissibility of presymptomatic and asymptomatic cases is

difficult to be directly modelled, because unascertained cases in compartment A include

both asymptomatic cases and mild-symptomatic cases. We followed the reference

provided by the reviewer (Ferretti et al. 2020 [REDACTED]) to assume the same

transmissibility between the presymptomatic phase and symptomatic phase. Our

compartment P contains both ascertained and unascertained cases, who are assumed to

have transmission rates 𝒃 and 𝜶𝒃 respectively. We set the transmission rate of P to be

the same as the unascertained cases (i.e., 𝜶𝒃), because it has been reported that most of

the cases were unascertained (Li et al. 2020, Substantial undocumented infection

facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) [REDACTED]).

We explained this in the Methods section (page 16, lines 425-428).

22

Fig. R1. Model and results with unascertained and ascertained cases completely

separate after the latent stage. (A) Illustration of the model. Ip and Is denote the

presymptomatic and the symptomatic phases of ascertained cases, respectively. (B)

Schematic timeline of an ascertained case from being exposed to the virus to recovery

without isolation. (C) Fitting the real data from January 1 to February 29 and predicting from

March 1 to 8.

2. In your section ‘Estimation of ascertainment rate using cases exported to Singapore’, you

try to estimate the initial ascertainment rate of Jan. 1, 2020 as the starting of your first

period for analysis, but your data starts from the 21st of January to the 28th, which is in the

third period of the analysis. Since this initial rate is important, it seems like this assumption

should be better justified. Also, since the value is important as the denominator, why are

flights only investigated from January 18th to the 23rd?

Response: We agree with the reviewer that it is not ideal to use the exported cases to

Singapore to estimate the initial ascertainment rate due to different timing. But these

are the best data we could find to estimate the ascertainment rate in the early outbreak.

For this reason, we have performed several sensitivity analyses considering different

values of the initial ascertainment rate ranging from 0.14 (sensitivity analysis S6) to 0.42

(sensitivity analysis S7), as well as an extreme scenario assuming the initial

ascertainment rate equal to 1 (sensitivity analysis S8). We have added our justification

to the end of the section (page 15 lines 381-384), as “Without direct information to

23

estimate the initial ascertainment rate before January 1, 2020, we used the results based

on the Singapore data to set the initial value and the prior distribution of the

ascertainment rate in our model, and performed sensitivity analyses under various

assumptions.”

We investigated the flights from January 18 to 23 because the earliest exported

case arrived at Singapore on January 18 and the lockdown of Wuhan City started on

January 23. Although there were some flights from Wuhan to Singapore after January

23, those flights were special evacuated flights rather than regular flights. We have also

added this information to the section (page 14 lines 365-366).

3. The finding that the ascertainment rates were so low through time, and didn’t seem to

change substantially, should be discussed carefully. Media reports suggested that China

dramatically increased testing and case identification through time. So, it is a surprising

result and is worthy of speculation/explanation. Specifically, these results suggest Wuhan

was able to control the epidemic without vastly increasing case reporting, which may bear on

future efforts to contain spread worldwide.

Response: Thank you for the suggestion. We have added the following sentence to the

Discussion (page 8 lines 209-212): “In particular, despite relatively low ascertainment

rates due to undetected symptoms of many cases, the outbreak could be controlled by

extensive interventions to block the transmission from unascertained cases, such as

wearing face masks, social distancing, and quarantining close contacts.”

Minor comments

1. In your new design (Figure 1), you state ‘no presymptomatic phase for asymptomatic

cases’, but you add it before the asymptomatic compartment. Fig1 also says

“Symptomatic/asymptomatic” but should actually say ascertained and unascertained

infections.

24

Response: We used “symptomatic/asymptomatic” to reflect the disease course from

presymptomatic phase to symptomatic phase. To avoid confusion, we have modified

Figure 1b as below, removing “asymptomatic”.

Fig. 1. Illustration of the SAPHIRE model. We extended the classic SEIR model to include

seven compartments, namely S (susceptible), E (exposed), P (presymptomatic infectious), I

(ascertained infectious), A (unascertained infectious), H (isolated), and R (removed). (a)

Relationship between different compartments. Two parameters of interests are r

(ascertainment rate) and b (transmission rate), which are assumed to be varying across time

periods. (b) Schematic disease course of a symptomatic case. In this model, the unascertained

compartment A includes asymptomatic and some mild-symptomatic cases who were not

detected. While there is no presymptomatic phase for asymptomatic cases, we treated

asymptomatic as a special case of mild-symptomatic and modelled both with a

“presymptomatic” phase for simplicity.

2. How can someone move from infectious to recovered without becoming isolated? The

arrow from I to R is odd, because it would be unlikely for someone to be ascertained if they

25

never wind up being isolated.

Response: Like in the classic SEIR model, compartment R stands for “removed”,

including all cases when they are no longer infectious. Given that the infectious period is

much shorter than the time course of full recovery from clinical symptoms, it is possible

that some ascertained cases may no longer be infectious by the time when they were

hospitalized for medical treatment, especially during the early outbreak when there was

a long delay from symptom onset to confirmed diagnosis due to lack of medical

resource.

3. The logic for the equation to calculate the duration in the presymptomatic vs infectious

period (page 20 line 7) is unclear. If presymptomatic infectiousness leads to 44% of total

secondary cases, then it’s important to normalize the transmission rate and duration of this

compartment with the infectiousness and duration of the I compartment, rather than just the

durations.

Response: The result that presymptomatic infectiousness leads to 44% of total

secondary cases is from reference 10 (He et al. 2020, Temporal dynamics in viral

shedding and transmissibility of COVID-19. Nat Med 26, 672-675), which only analyzed

ascertained cases. The equation to calculate the duration of symptomatic infectious

period was based on the assumption of a constant transmission rate across both

presymptomatic and symptomatic phases for the ascertained cases (please see our

response to major comment 1). We have added the justification to the Methods section

(page 16 line 432; page 17 line 433) as below: “Because presymptomatic infectiousness

was estimated to account for 44% of the total infections of ascertained cases,10

we set the

mean of total infectious period as (𝑫𝒑 +𝑫𝒊) =𝑫𝒑

𝟎.𝟒𝟒= 𝟓. 𝟐 days assuming constant

infectiousness across presymptomatic and symptomatic phases for the ascertained

cases,12

thus the mean symptomatic infectious period was 𝑫𝒊 = 𝟐. 𝟗 days.”

26

4. The authors do a great job investigating many different models in the sensitivity analysis.

Since there are many different proposed models, it might be useful to be able to directly

compare the fits of these models using model comparison. I would suggest using a metric like

Bayes factors, BIC, or AIC and including the results as a new table or added on column of

one of the others.

Response: Thank you for the suggestion. We have added one column to our Extended

Table 4, listing the Deviance Information Criterion (DIC) of different models, as we

used the Bayesian methods for fitting the model.

Reference

Ferretti, L. et al. Quantifying SARS-CoV-2 transmission suggests epidemic control with

digital contact tracing. Science 368, (2020).

peer review file manuscript title: reconstruction of the ...10.1038/s41586-020-255… · 3....

Documents