peer review file manuscript title: reconstruction of the ...10.1038/s41586-020-255… · 3....
TRANSCRIPT
1
Peer Review File
Manuscript Title: Reconstruction of the full transmission dynamics of COVID-19 in
Wuhan
Redactions – Mention of other journals
This document only contains reviewer comments, rebuttal and decision letters for versions
considered at Nature. Mentions of the other journal have been redacted.
Reviewer Comments & Author Rebuttals
Reviewer Reports on the Initial Version:
Referee #1 (Remarks to the Author):
Thank you for the opportunity to review this work. I have a few comments:
1. I think my major comments relates to overdispersion of R0 in COVID-19. Such
overdispersion, which is reflected in both "dead end" cases that don't transmit, and
superspreader events, is an important manifestation of this disease, and one which may
make it challenging to control.
It seems as though there might be room in the stochastic simulations on re-emergence to
factor in an overdispersed R0. That in turn might provide some insights into how "luck" and
superspreading may result in very different outcomes, from stochastic extinction to
explosive resurgent epidemics.
2. Lines 3-15 really do reflect official narratives related to COVID-19 emergence; some
might question these. There should be references for these statements, or this section
should be revised.
3. Page 5, line 16: "For example, misspecification of the infectious period by 25% lower
would lead to ~20% overestimation in the transmission rate but only ~10%
underestimation in 𝑅0, both for the first period". It is predictable that lower serial interval
would result in lower R0 estimation. This sentence is misleading, as the contribution of R0
to growth is exponential, so these changes aren't proportionate on a linear scale.
4. The authors may wish to note that their estimates of unrecognized case fraction are
consistent with those seen in many emerging serological studies, including those completed
this week in New York, California and Switzerland.
2
Referee #2 (Remarks to the Author):
I enjoyed reading this paper. It is the first study that incorporates our most up-to-date
understanding of the epidemiology of the novel coronavirus including the role of subclinical
and asymptomatic infections on the transmission dynamics together with the
comprehensive dataset of confirmed cases from Wuhan, the pandemic's epicenter. Overall,
their analysis and results are sound. It highlights the large R0 from the epicenter and the
role of undetected infections, and the consequences on the dynamics and control of the
epidemic. I have a few comments:
1. Some of the estimates reported should be accompanied by confidence intervals.
2. Authors compare R0 estimates derived from different areas. Their references need to be
updated to account for more recent studies including estimates from Hong Kong, Singapore,
and Korea.
3. Strictly speaking R0 applies to the early growth phase of the epidemic in the absence of
interventions. Some reproduction number estimates from some settings are influenced by
the early onset of social distancing interventions. This is the case for countries like Taiwan,
Singapore, Korea, and Hong Kong. I do not think this was made sufficiently clear in the
discussion.
Referee #3 (Remarks to the Author):
Overall this is a very interesting analysis that attempts to understand the effect that Wuhan
interventions had on the course of the epidemic, as well as how these dynamics might
influence the course of the epidemic once interventions are lifted. The analysis is predicated
on a simplistic epidemiological model that appears to lack known and important features
related to COVID-19 transmission dynamics, and there may be key issues with the current
statistical fitting approach. These major issues are outlined below, and the results will be
significant, if they can be properly addressed. We also would like to take the time to thank
the authors for the rigorous sensitivity analysis to explore possible biases, as well as for
making their code and data available. These resources were extremely important for helping
us properly understand the methodology the authors used.
Major Concerns
1. Data
a. From ref.1, we see that COVID-19 case definitions changed multiple times,
dramatically impacting the observed Wuhan epidemic curve. Seven
different case definitions for COVID-19 were used in Wuhan throughout the
3
Jan-March 2020 study period. Each revision changed the case total, for
example cases increased by 7.1 times from version 1 to 2, 2.8 times from
version 2 to 4, and 4.2 times from version 4 to 5. The authors should more
explicitly consider these case definitions in their analysis, as they will have
dramatic repercussions for the results of the analysis.
2. Parameterization
a. The authors use a somewhat extended incubation period where there is
no transmission (5.2 days). This period of time before someone is
infectious is known to be shorter now, and important for epidemiological
dynamics 2,3
. Either the authors should include a non-transmitting
incubation period that is short, followed by a pre-symptomatic
infectiousness period, or reformulate their model in a different appropriate
manner.
b. The infectious period is extremely short (2.3 days) compared to other
estimates 2. The authors attempt to fit the proper serial interval by
combining the incubation period and the infectious period, however recent
studies suggest a shorter incubation period and longer period of
infectiousness than the authors assume. This is important in the analysis,
because the infectious period of individuals connects the effects of 1 period
being analyzed on the other periods, so changes to this duration could
influence the resulting epidemic dynamics and estimated R0.
c. The ratio of transmission rate between ascertained and unascertained cases is
assumed to be 1 in the main scenario. There is now fairly significant
evidence that this is likely not to be the case 3,4
. I appreciate your
sensitivity analyses in S5 and S6 for this, but it further highlights the issue,
because of the large effect that this parameter can have on the overall
results. I would suggest choosing the value for this parameter in a more
systematic way.
3. Model formulation
a. It is unclear why the authors are modeling a hospitalized compartment.
Individuals move from the infectious compartment to being hospitalized at
a rate that is set based on half the time between symptom onset and the
case reporting date. It appears the hospitalization compartment is more
similar to a quarantine compartment than hospitalized. This has a major
impact on transmission dynamics, because in the final time period
considered, the modeled infectious period has a duration of roughly 1.4
days, with a 69% chance of ascertained cases being hospitalized. Is this
suggested by the data, or is there a specific reason the authors model
hospitalized individuals? Also, it would be helpful if the authors could
explain why this time period is determined to be 50% the median time
4
between symptom onset and reporting date.
b. People transition from the exposed compartment (E) to ascertained or
unascertained infectious groups. This assumes that cases are reported right
away and doesn’t allow for changes in how quickly cases are detected nor
how the infectiousness of reported cases might change over time -- as
quarantine and isolation improves, ascertained cases may be unlikely to
transmit. Maybe this is contained within the speed in which people become
hospitalized mentioned above, but if the authors have data on the timing of
symptom onset and the timing of detection then it would seem cleaner to
model these dynamics more explicitly.
4. Fitting methodology
a. One of the main concerns from the paper is that it doesn’t seem that the
authors may have the necessary data or power to detect the true
ascertainment rates for each of the five periods. The ascertainment rate is
basically the reporting rate for the number of cases, and the authors are
fitting to the number of cases. In such a situation, one could imagine a
number of combinations of transmission rates and reporting rates that
could give rise to similar detected epidemic trajectories. It’s unclear if this
is the case for this paper, and the authors should do more to ensure
proper fitting. For example there is high correlation between the MCMC
sampling of a number of important parameters (Figure 1), which should be
resolved to ensure proper sampling for the posterior distributions of these
parameters.
b. One other consideration is the simulation study the authors propose to
validate their fitting procedure. We greatly appreciate the author’s
determination to validate their fitting procedure, but would request that the
simulated data be generated in a slightly more complex fashion.
Specifically, the authors could run the same study with a different
ascertainment rate in the second period. It is also fairly clear that these
parameter estimates can be dramatically impacted if parameters are
misspecified (Extended Data Fig. 2). We recommend running a similar
analysis to better understand the impact that misspecification of Dq and
alpha might have on eventual fits.
5
Figure 1: Pairwise scatterplot from traceplots of a single MCMC chain, showing extremely
high correlation between five of the estimated parameters.
Refs.
1. Tsang, T. K. et al. Effect of changing case definitions for COVID-19 on the epidemic
curve and transmission parameters in mainland China: a modelling study. Lancet
Public Health (2020) doi:10.1016/S2468-2667(20)30089-X.
2. He, X. et al. Temporal dynamics in viral shedding and transmissibility of COVID-19. Nat.
Med. (2020) doi:10.1038/s41591-020-0869-5.
3. Ferretti, L. et al. Quantifying SARS-CoV-2 transmission suggests epidemic control
with digital contact tracing. Science (2020) doi:10.1126/science.abb6936.
4. Liu, Y. et al. Viral dynamics in mild and severe cases of COVID-19. Lancet Infect. Dis.
(2020) doi:10.1016/S1473-3099(20)30232-2.
6
Author Rebuttals to Initial Comments:
Referee #1 (Remarks to the Author)
Thank you for the opportunity to review this work. I have a few comments:
1. I think my major comments relates to overdispersion of R0 in COVID-19.
Such overdispersion, which is reflected in both "dead end" cases that don't
transmit, and superspreader events, is an important manifestation of this
disease, and one which may make it challenging to control.
It seems as though there might be room in the stochastic simulations on re-
emergence to factor in an overdispersed R0. That in turn might provide some insights
into how "luck" and superspreading may result in very different outcomes, from
stochastic extinction to explosive resurgent epidemics.
Response: We thank the reviewer for this insightful comment. As the reviewer
pointed out, superspreading events would lead to overdispersion of the number of
secondary cases infected by each primary case (denoted as ν) and would influence
the probability of resurgence due to stochastic effects. This has been discussed in
detail by Lloyd-Smith et al. (Nature 2005, Superspreading and the effect of individual
variation on disease emergence). Our model assumes homogeneous transmission
and thus ν is expected to follow an exponential distribution. Different from the
branching process model used by Lloyd-Smith et al. (Nature 2005), we used a
population-based differential-equation model, which is difficult to account for the
heterogeneous individual effect. We have therefore added discussion on the impact
of superspreading events and admitted as a limitation of our analysis (page 10 lines
27-29). On the other hand, we expect the superspreading events to have a small
impact on our estimates of population average R0 and the ascertainment rates
because our model fit the observed data well, and that these estimates were derived
from an established outbreak where stochastic fluctuations caused by superspreading
events were less pronounced.
2. Lines 3-15 really do reflect official narratives related to COVID-19
emergence; some might question these. There should be references for these
statements, or this section should be revised.
Response: Thank you for the suggestion. We have added references to the statements.
3. Page 5, line 16: "For example, misspecification of the infectious period by
25% lower would lead to ~20% overestimation in the transmission rate but
only ~10% underestimation in 𝑅0, both for the first period". It is predictable
that lower serial interval would result in lower R0 estimation. This sentence is
misleading, as the contribution of R0 to growth is exponential, so these
7
changes aren't proportionate on a linear scale.
Response: Thank you for the suggestion. We have now removed the misleading
sentence and added more detailed discussion on the simulation results (see page 5
lines 10-16).
4. The authors may wish to note that their estimates of unrecognized case
fraction are consistent with those seen in many emerging serological studies,
including those completed this week in New York, California and Switzerland.
Response: Thank you for the nice suggestion. We noted the findings from recent
serological studies in the discussion (page 9 lines 6-8).
Referee #2 (Remarks to the Author)
I enjoyed reading this paper. It is the first study that incorporates our most up-to-
date understanding of the epidemiology of the novel coronavirus including the role of
subclinical and asymptomatic infections on the transmission dynamics together with
the comprehensive dataset of confirmed cases from Wuhan, the pandemic's
epicenter. Overall, their analysis and results are sound. It highlights the large R0 from
the epicenter and the role of undetected infections, and the consequences on the
dynamics and control of the epidemic. I have a few comments:
Response: We appreciate the highly positive comments on our work.
1. Some of the estimates reported should be accompanied by confidence
intervals.
Response: Thank you for the suggestion. We have included 95% credible intervals
for all our estimates except for the probability of resurgence. We calculated the
probability as the proportion of resurgence out of n=10,000 stochastic simulations,
similar to the approach by Lloyd-Smith et al. (Nature 2005, Superspreading and the
effect of individual variation on disease emergence). As n was large, our estimates of
the probability of resurgence approached the theoretical values with very tight 95% CI.
We thus did not report 95% CI for the probability of resurgence, as was done by
Lloyd- Smith et al. (Nature 2005).
2. Authors compare R0 estimates derived from different areas. Their
references need to be updated to account for more recent studies including
estimates from Hong Kong, Singapore, and Korea.
Response: Thank you for the suggestion. We did not compare R0 from different
areas, because R0 is expected to be different across regions due to different
demographic features and intervention strengths. For example, Singapore started
strong surveillance from early January, three weeks before the first imported case
was identified, and thus had much smaller R0 than Wuhan. Following your
8
suggestion, we compared the estimated R0 from different studies of the early
outbreak in Wuhan. We have added clarification on this point on page 9 lines 20-25.
3. Strictly speaking R0 applies to the early growth phase of the epidemic in the
absence of interventions. Some reproduction number estimates from some
settings are influenced by the early onset of social distancing interventions.
This is the case for countries like Taiwan, Singapore, Korea, and Hong Kong. I
do not think this was made sufficiently clear in the discussion.
Response: We apologize for the confusion. Please see our response to the comment
2 above – we did not compare R0 across regions for exactly the reason pointed out by
the reviewer. We have now clarified in the Discussion that our comparison was for the
basic reproduction number before interventions in Wuhan (page 9 lines 20-25).
Referee #3 (Remarks to the Author)
Overall this is a very interesting analysis that attempts to understand the effect that
Wuhan interventions had on the course of the epidemic, as well as how these
dynamics might influence the course of the epidemic once interventions are lifted. The
analysis is predicated on a simplistic epidemiological model that appears to lack
known and important features related to COVID-19 transmission dynamics, and there
may be key issues with the current statistical fitting approach. These major issues are
outlined below, and the results will be significant, if they can be properly addressed.
We also would like to take the time to thank the authors for the rigorous sensitivity
analysis to explore possible biases, as well as for making their code and data
available. These resources were extremely important for helping us properly
understand the methodology the authors used.
Response: We thank the reviewer for appreciating the importance of our work, and
the detailed and constructive comments. We have now substantially revised the
paper in response to your comments. Please see our point-to-point response below.
Major Concerns
1. Data
a. From ref.1, we see that COVID-19 case definitions changed multiple times,
dramatically impacting the observed Wuhan epidemic curve. Seven different case
definitions for COVID-19 were used in Wuhan throughout the Jan-March 2020
study period. Each revision changed the case total, for example cases increased by
7.1 times from version 1 to 2, 2.8 times from version 2 to 4, and 4.2 times
from version 4 to 5. The authors should more explicitly consider these
case definitions in their analysis, as they will have dramatic repercussions
9
for the results of the analysis.
Response: Yes, the case definition has changed multiple times as pointed out
by the reviewer. However, we believe our analysis should be robust because:
(1) We have explicitly modeled the ascertainment rate, which was
allowed to vary across different time periods;
(2) Based on key intervention events, we defined five time periods, which
turned out to largely align with the timeline of case definitions (period
2 corresponds to versions 1 and 2; period 3 corresponds to versions
3 and 4; period 4 corresponds to version 5; and period 5
corresponds to versions 6 and 7);
(3) We excluded clinically diagnosed cases without laboratory
confirmation (introduced by version 5), because many patients with flu
or other pneumonia displayed symptoms similar to COVID-19, which
could lead to higher false positive rate without laboratory
confirmation.
Therefore, we believe the impact of case definitions should have been largely
addressed by excluding clinically diagnosed patients and explicitly modeling
different ascertainment rates across time periods. In fact, variation in in our
estimates of ascertainment rate should reflect a combined effect of the evolving
surveillance, interventions, medical resources, and case definitions across time
periods, which are difficult to separate from each other. We have added this
point in the Discussion (page 10 lines 19-24).
2. Parameterization
a. The authors use a somewhat extended incubation period where there
is no transmission (5.2 days). This period of time before someone is
infectious is known to be shorter now, and important for epidemiological
dynamics2,3. Either the authors should include a non-transmitting
incubation period that is short, followed by a presymptomatic
infectiousness period, or reformulate their model in a different appropriate
manner.
Response: This is a very good point. We developed this model to analyze
Wuhan’s data in February when very little was known about COVID-19. As we
have more information now, the model should be updated accordingly. Following
the reviewer’s suggestion, we have incorporated a presymptomatic infectious
period in the revised model, as displayed in Figure R1 below. The parameters,
such as latent period, presymptomatic infectious period, and total infectious
period were set based on the most recent study pointed out by the reviewer (He
et al. [Redacted] 2020, Temporal dynamics in viral shedding and transmissibility
of COVID-19). Briefly, we set the latent period to 2.9 days, presymptomatic
infectious period to 2.3 days, and symptomatic infectious period to 2.9 days in
the main analysis, followed by a series of sensitivity analyses based on the 95%
credible intervals of these quantities reported in the literature. (Figure 1; page 4
lines 13-21; page 18 line 30, page 19 lines 1-26; page 20 lines 3-19; page 20
10
lines 25-26; page 21 lines 21-29)
Figure R1. Illustration of the SAPHIRE model. We extended the classic SEIR
model to include seven compartments, namely S (susceptible), E (exposed), P
(presymptomatic infectious), I (ascertained infectious), A (unascertained
infectious), H (isolated), and R (removed). (A) Relationship between different
compartments in the model. Two parameters of interests are r (ascertainment
rate) and b (transmission rate), which are assumed to be varying across time
periods. (B) Schematic timeline of an individual from being exposed to the virus
to recovery without isolation. In this model, the unascertained compartment A
includes asymptomatic and some mild symptomatic cases who were not
detected. While there is no presymptomatic phase for asymptomatic cases, we
treated asymptomatic as a special case of mild symptomatic and modeled both
with a “presymptomatic” phase for simplicity.
b. The infectious period is extremely short (2.3 days) compared to other
estimates2 . The authors attempt to fit the proper serial interval by
combining the incubation period and the infectious period, however recent
studies suggest a shorter incubation period and longer period of
infectiousness than the authors assume. This is important in the analysis,
because the infectious period of individuals connects the effects of 1
period being analyzed on the other periods, so changes to this duration
could influence the resulting epidemic dynamics and estimated R0 .
Response: Please see our response to the comment 2a above. We have
11
revised the model to account for presymptomatic infectiousness and updated
the parameter settings. According to He et al. ([redacted] 2020, Temporal
dynamics in viral shedding and transmissibility of COVID-19), we set
presymptomatic infectious period to 2.3 days, and symptomatic infectious period
to 2.9 days in the main analysis, which sums to a total infectious period of 5.2
days. The total infectious period was determined based on the estimate that
presymptomatic infectiousness accounted for 44% of the total infections (page
20 lines 3-8).
c. The ratio of transmission rate between ascertained and unascertained
cases is assumed to be 1 in the main scenario. There is now fairly
significant evidence that this is likely not to be the case3,4. I appreciate
your sensitivity analyses in S5 and S6 for this, but it further highlights the
issue, because of the large effect that this parameter can have on the
overall results. I would suggest choosing the value for this parameter in a
more systematic way.
Response: We agree with the reviewer that there have been evidences
suggesting mild cases are less infectious than severe. In addition to the
references pointed out by the reviewer, another study (Li et al. [REDACTED]
2020, Substantial undocumented infection facilitates the rapid dissemination of
novel coronavirus (SARS-CoV2)) has estimated that the transmissibility of
unascertained cases was 0.55 (95% CrI: 0.46-0.62) of the ascertained cases.
Accordingly, we have now set 𝛼 = 0.55 in the main analysis and 0.46 and 0.62 in
the sensitivity analyses. It is worth pointing that our estimates of the effective
reproductive number R0 and the ascertainment rate r were robust to different
choices of 𝛼, as illustrated by both simulated data and sensitivity analyses.
(page 5 lines 6-7; page 20 lines 3-4; page 24 lines 5-8; Extended Data Tables
2, 4, 6; Extended Data Figs. 2, 7-8).
3. Model formulation
a. It is unclear why the authors are modeling a hospitalized compartment.
Individuals move from the infectious compartment to being hospitalized at
a rate that is set based on half the time between symptom onset and the
case reporting date. It appears the hospitalization compartment is more
similar to a quarantine compartment than hospitalized. This has a major
impact on transmission dynamics, because in the final time period
considered, the modeled infectious period has a duration of roughly 1.4
days, with a 69% chance of ascertained cases being hospitalized. Is this
suggested by the data, or is there a specific reason the authors model
hospitalized individuals? Also, it would be helpful if the authors could
explain why this time period is determined to be 50% the median time
between symptom onset and reporting date.
Response: We apologize for the confusion. We introduced compartment H to
distinguish ascertained cases and unascertained cases, considering that
12
ascertained cases would be isolated by hospitalization and thus had a shorter
effective infectious period. We have clarified by renaming compartment H as
“isolation” rather than “hospitalization” and explicitly stated our motivation (page
4 lines 19-21). We initially set the Dq to be 50% of the median time between
symptom onset and diagnosis date because we thought patients might have
been admitted to the hospital before confirmed diagnosis by laboratory test.
However, it might be more realistic to assume isolation upon confirmed
diagnosis. So, we have now changed Dq to be the median time between
symptom onset and diagnosis date (page 20 lines 8-11; Extended Data Table 2).
b. People transition from the exposed compartment (E) to ascertained or
unascertained infectious groups. This assumes that cases are reported
right away and doesn’t allow for changes in how quickly cases are
detected nor how the infectiousness of reported cases might change over
time -- as quarantine and isolation improves, ascertained cases may be
unlikely to transmit. Maybe this is contained within the speed in which
people become hospitalized mentioned above, but if the authors have data
on the timing of symptom onset and the timing of detection then it would
seem cleaner to model these dynamics more explicitly.
Response: We have now included a presymptomatic compartment (P) after the
exposed compartment (E) (see Figure R1). The ascertained and unascertained
cases in our data were defined retrospectively and we did not assumed cases
were ascertained right after the onset of symptoms. Our analyses were based on
the date of symptom onset, which was collected retrospectively after confirmed
diagnosis. As the reviewer correctly pointed out, we have incorporated the effect
of isolation on ascertained cases by modeling compartment H and setting the
speed from compartment I to compartment H based on the time delay between
symptom onset and confirmed diagnosis, which became shorter as medical
resources improved (page 4 lines 19-21; page 20 lines 8-11).
4. Fitting methodology
a. One of the main concerns from the paper is that it doesn’t seem that the
authors may have the necessary data or power to detect the true
ascertainment rates for each of the five periods. The ascertainment rate is
basically the reporting rate for the number of cases, and the authors are
fitting to the number of cases. In such a situation, one could imagine a
number of combinations of transmission rates and reporting rates that
could give rise to similar detected epidemic trajectories. It’s unclear if this
is the case for this paper, and the authors should do more to ensure
proper fitting. For example, there is high correlation between the MCMC
sampling of a number of important parameters (Figure 1), which should be
resolved to ensure proper sampling for the posterior distributions of these
parameters.
Response: We have carefully considered this important comment. Based on
13
results from the extensive simulations (including new simulations added in this
revision), we believe our data and model can provide information for estimating
the effective reproduction number R0 and the ascertainment rates r across
different periods when the other model parameters are reasonably specified
(Extended Data Fig. 2). Our simulation results showed that the latent period De,
infectious periods Dp and Di, and the initial ascertainment rate r0 are key
parameters that can affect the results. In our real data analysis, we have
carefully specified the values of De, Dp and Di based on the most updated
literature suggested by the reviewer (page 20 lines 3- 11), and performed
sensitivity analyses to test the robustness of our conclusions (page 23 lines 27-
29; page 24 lines 1-8).
For the initial ascertainment rate r0, we have added a new analysis based on
COVID-19 cases exported from Wuhan to Singapore to obtain a reasonable
estimate of the ascertainment rate in the early phase of the outbreak, and used
the point estimate and 95% CI to specify r0 in the main analysis and sensitivity
analyses, respectively (Extended Data Table 1; page 5 lines 20-22; page 18 lines
12-27; page 24 lines 9-12). We have also tested the theoretical limit by setting
r0=1 in a sensitivity analysis, which provides a lower bound estimate of the
proportion of unascertained cases (page 24 lines 13-14).
Regarding the correlation between MCMC samples of parameters, we have
made several technical improvements. First, we have incorporated the
information of ascertainment rate estimated from Singapore data into the prior
distribution of r12. Second, considering that ascertainment rates in different
periods are naturally correlated because the ending status of a period is the
starting status of the next period, we reparameterized r3, r4 and r5 as
logit(𝑟3) = logit(𝑟12) + 𝛿3
logit(𝑟4) = logit(𝑟3) + 𝛿4
logit(𝑟5) = logit(𝑟4) + 𝛿5
where logit(𝑟) = log ( 𝑟
1−𝑟
), and 𝛿3 , 𝛿4 , and 𝛿5 were sampled from the prior of
(0, 1). Under this new parameterization, we expect 𝑟12, 𝛿3, 𝛿4, and 𝛿5 to be weakly
correlated. Third, we have replaced our original Metropolis-Hastings algorithm with a
more efficient Delayed Rejection Adaptive Metropolis (DRAM) algorithm implemented
in an established R package to ensure convergence and
proper MCMC sampling. See page 21 lines 1-11 for the changes.
The new sampling results were shown below (Figures R2 and R3). As we can see
from Figure R2, b12 and r12 are the only two parameters that remain highly
correlated. This, however, is expected because if we sample a higher transmission
rate b12, the total number of infections will become larger, thus that the ascertainment
rate r12 needs to be smaller in order to fit the observed data. Although b12 and r12 are
highly correlated, it is important to point out that their posterior joint distribution is
centralizing in a small parameter space (Figure R2), suggesting we have information
to estimate these parameters.
14
In addition, as shown in Figure R3, the posterior distributions of 𝑟12, 𝛿3, 𝛿4, and 𝛿5
are much narrower than their prior distributions, indicating that we have gained
additional information for these parameters from the data. In our sensitivity analysis
S9, we found that the model without ascertainment rates fit the observed data
significantly worse than our full model with ascertainment rates (Extended Data Fig.
12, p-value = 0 based on likelihood ratio test), also suggesting that the data can
provide information about ascertainment rates.
Finally, we examined the MCMC sampling from simulated datasets, for which we
know the true parameter values. We confirmed that our method can correctly
estimate the parameter values despite strong correlation between MCMC samples of
b1, r1, and δ2 (Figure R4 and Table R1).
15
Figure R2. Pairwise correlations between the MCMC samples of eight parameters.
Numbers in the upper triangle are Pearson correlations, while the lower triangle
displays scatter plots with colors indicating point density.
16
Figure R3. Prior (black) and posterior (red) distributions of four parameters related
to the ascertainment rates (𝒓𝟏𝟐, 𝜹𝟑, 𝜹𝟒 and 𝜹𝟓).
Figure R4. Pairwise correlations between the MCMC samples of four parameters
estimated from simulated datasets (Extended Data Fig. 2). The estimates were based on
simulation settings when other parameters were correctly specified. Numbers in the
upper triangle are Pearson correlations, while the lower triangle displays scatter plots
with colors indicating point density. Red horizontal and vertical lines indicate true values
of the corresponding parameters. We see that the joint posterior distributions of
parameters centered correctly around their true values.
17
Table R1. Comparison of parameter values estimated by MCMC and their true
values used to generate the simulated epidemic curves.
Parameter True value Posterior mean 95% CrI
b1 1.270 1.277 (1.135-1.432)
b2 0.406 0.406 (0.385-0.427)
r1 0.2 0.200 (0.153-0.254)
δ2 0.982 0.981 (0.680-1.359)
r2 0.4 0.401 (0.273-0.556)
(R0)1 3.5 3.516 (3.188-3.877)
(R0)2 1.2 1.201 (1.145-1.257)
Results in this table correspond to those shown in Figure R4.
b. One other consideration is the simulation study the authors propose to
validate their fitting procedure. We greatly appreciate the author’s
determination to validate their fitting procedure, but would request that the
simulated data be generated in a slightly more complex fashion. Specifically,
the authors could run the same study with a different ascertainment rate in the
second period. It is also fairly clear that these parameter estimates can be
dramatically impacted if parameters are misspecified (Extended Data Fig. 2).
We recommend running a similar analysis to better understand the impact
that misspecification of Dq and alpha might have on eventual fits.
Response: We thank the reviewer for appreciating our efforts and sharing our view of
the importance to validate the method rigorously before applying to real data
analysis. Because the model has been updated in the revision, we have redone all
the simulations and analyses using the new model. For the simulation studies, we
have generated simulated data of two periods with different parameter values of the
effective reproductive number R0 and the ascertainment rate r: (R0)1=3.5 and r1=0.2
for period 1 and (R0)2=1.2 and r2=0.4 for period 2 (page 22 lines 27-29; page 23 lines
1-2). We now focus on the estimation of R0 instead of the transmission rate b, because
R0 has direct implications for the epidemic trend and is a quantity of focus in the
research community. As suggested by the reviewer, we have added evaluation of the
impact of misspecification of Dq and α (page 23 lines 7-13). It turns out that
misspecification of these two parameters has little impact on the estimation of R0 and
r for both periods (Extended Data Fig. 2). We have also added sensitivity analyses for
different values of α for the real data, further confirming that our results are
insensitive to α (page 24 lines 5-8; Extended Data Tables 4 and 6; Extended Data
Figs. 7-8).
18
Reviewer Reports on the First Revision:
Referee #1 (Remarks to the Author):
Thank you. I find the paper much improved.
I am satisfied with responses to my earlier comments.
Referee #2 (Remarks to the Author):
My comments have been adequately addressed. I only have one additional comment on notation.
The concept of R0 only applies at the early transmission phase of the epidemic. For later time
periods, authors should refer to the reproduction number, frequently denoted by R.
Referee #3 (Remarks to the Author):
Summary
The authors have gone above and beyond in attempting to address the concerns raised in the first
review, and we thank them for the extended and thorough analyses. Overall the authors have
satisfied the major concerns raised in the initial review. While there are still a couple of major
comments that should be addressed, these are less major than the earlier round and are
suggested for completeness. The results provide interesting retrospective results from the
epidemic in Wuhan and actionable insight for countries still attempting to control the pandemic.
Major comments
1. Model choice: In your equations 1 to 2, you treat ‘A’ and ‘P’ with the same relative ratio of
transmission rate. However, asymptomatic and presymptomatic infections have been estimated to
be different (See ref below). You’ve used the relative infectiousness of “undocumented” infections,
but some of the presymtomatic individuals go on to be ascertained, so it seems incongruent to use
this parameter as such. One option would be to have ascertained and unascertained compartments
completely separate, and you can model the presymptomatic phase separately for the ascertained
group, while the unascertained group is only a single compartment.
2. In your section ‘Estimation of ascertainment rate using cases exported to Singapore’, you try to
estimate the initial ascertainment rate of Jan. 1, 2020 as the starting of your first period for
analysis, but your data starts from the 21st of January to the 28th, which is in the third period of
the analysis. Since this initial rate is important, it seems like this assumption should be better
justified. Also, since the value is important as the denominator, why are flights only investigated
from January 18th to the 23rd?
3. The finding that the ascertainment rates were so low through time, and didn’t seem to change
substantially, should be discussed carefully. Media reports suggested that China dramatically
increased testing and case identification through time. So, it is a surprising result and is worthy of
speculation/explanation. Specifically, these results suggest Wuhan was able to control the
epidemic without vastly increasing case reporting, which may bear on future efforts to contain
spread worldwide.
Minor comments
1. In your new design (Figure 1), you state ‘no presymptomatic phase for asymptomatic cases’,
19
but you add it before the asymptomatic compartment. Fig1 also says
“Symptomatic/asymptomatic” but should actually say ascertained and unascertained infections.
2. How can someone move from infectious to recovered without becoming isolated? The arrow
from I to R is odd, because it would be unlikely for someone to be ascertained if they never wind
up being isolated.
3. The logic for the equation to calculate the duration in the presymptomatic vs infectious period
(page 20 line 7) is unclear. If presymptomatic infectiousness leads to 44% of total secondary
cases, then it’s important to normalize the transmission rate and duration of this compartment
with the infectiousness and duration of the I compartment, rather than just the durations.
4. The authors do a great job investigating many different models in the sensitivity analysis. Since
there are many different proposed models, it might be useful to be able to directly compare the
fits of these models using model comparison. I would suggest using a metric like Bayes factors,
BIC, or AIC and including the results as a new table or added on column of one of the others.
Reference
Ferretti, L. et al. Quantifying SARS-CoV-2 transmission suggests epidemic control with digital
contact tracing. Science 368, (2020).
20
Author Rebuttals to First Revision:
Referee #1 (Remarks to the Author):
Thank you. I find the paper much improved.
I am satisfied with responses to my earlier comments.
Referee #2 (Remarks to the Author):
My comments have been adequately addressed. I only have one additional comment on
notation. The concept of R0 only applies at the early transmission phase of the epidemic. For
later time periods, authors should refer to the reproduction number, frequently denoted by R.
Response: Thank you for the suggestion. We have changed R0 to Re, which is a common
notation for the effective reproduction number.
Referee #3 (Remarks to the Author):
Summary
The authors have gone above and beyond in attempting to address the concerns raised in the
first review, and we thank them for the extended and thorough analyses. Overall the authors
have satisfied the major concerns raised in the initial review. While there are still a couple of
major comments that should be addressed, these are less major than the earlier round and
are suggested for completeness. The results provide interesting retrospective results from the
epidemic in Wuhan and actionable insight for countries still attempting to control the
pandemic.
Major comments
1. Model choice: In your equations 1 to 2, you treat ‘A’ and ‘P’ with the same relative ratio
of transmission rate. However, asymptomatic and presymptomatic infections have been
estimated to be different (See ref below). You’ve used the relative infectiousness of
“undocumented” infections, but some of the presymtomatic individuals go on to be
ascertained, so it seems incongruent to use this parameter as such. One option would be to
21
have ascertained and unascertained compartments completely separate, and you can model
the presymptomatic phase separately for the ascertained group, while the unascertained
group is only a single compartment.
Response: Thank you. We actually tried exactly the same model in our last round of
revision (please see the model and results in Fig. R1 below). But we chose not to use this
model for two reasons. First, unascertained cases (compartment A) include mild-
symptomatic cases, who have a presymptomatic phase. In fact, the definition of
symptoms has been evolving as our knowledge of COVID-19 increases, and there is no
clear fixed boundary between “asymptomatic” and “mild-symptomatic” cases. Some
COVID-related symptoms were not well known in February, e.g., loss of taste and
smell. Hence unascertained cases in Compartment A consist of those who are
asymptomatic and mild-symptomatic including those who have unknown symptoms.
Second, the model illustrated in Fig. R1A implies that ascertained cases and
unascertained cases could be distinguished at the presymptomatic phase, while in
reality, cases in Wuhan were identified after symptom onset. For the second reason, the
model in Fig. R1A couldn’t capture the immediate spike of ascertained cases with
symptom onset on Feb 17-18, when the “universal symptom survey” was implemented
(Fig. R1C). In contrast, the model presented in the paper addresses these two issues.
Therefore, because of these two reasons, we prefer to keep our current model.
The difference in transmissibility of presymptomatic and asymptomatic cases is
difficult to be directly modelled, because unascertained cases in compartment A include
both asymptomatic cases and mild-symptomatic cases. We followed the reference
provided by the reviewer (Ferretti et al. 2020 [REDACTED]) to assume the same
transmissibility between the presymptomatic phase and symptomatic phase. Our
compartment P contains both ascertained and unascertained cases, who are assumed to
have transmission rates 𝒃 and 𝜶𝒃 respectively. We set the transmission rate of P to be
the same as the unascertained cases (i.e., 𝜶𝒃), because it has been reported that most of
the cases were unascertained (Li et al. 2020, Substantial undocumented infection
facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) [REDACTED]).
We explained this in the Methods section (page 16, lines 425-428).
22
Fig. R1. Model and results with unascertained and ascertained cases completely
separate after the latent stage. (A) Illustration of the model. Ip and Is denote the
presymptomatic and the symptomatic phases of ascertained cases, respectively. (B)
Schematic timeline of an ascertained case from being exposed to the virus to recovery
without isolation. (C) Fitting the real data from January 1 to February 29 and predicting from
March 1 to 8.
2. In your section ‘Estimation of ascertainment rate using cases exported to Singapore’, you
try to estimate the initial ascertainment rate of Jan. 1, 2020 as the starting of your first
period for analysis, but your data starts from the 21st of January to the 28th, which is in the
third period of the analysis. Since this initial rate is important, it seems like this assumption
should be better justified. Also, since the value is important as the denominator, why are
flights only investigated from January 18th to the 23rd?
Response: We agree with the reviewer that it is not ideal to use the exported cases to
Singapore to estimate the initial ascertainment rate due to different timing. But these
are the best data we could find to estimate the ascertainment rate in the early outbreak.
For this reason, we have performed several sensitivity analyses considering different
values of the initial ascertainment rate ranging from 0.14 (sensitivity analysis S6) to 0.42
(sensitivity analysis S7), as well as an extreme scenario assuming the initial
ascertainment rate equal to 1 (sensitivity analysis S8). We have added our justification
to the end of the section (page 15 lines 381-384), as “Without direct information to
23
estimate the initial ascertainment rate before January 1, 2020, we used the results based
on the Singapore data to set the initial value and the prior distribution of the
ascertainment rate in our model, and performed sensitivity analyses under various
assumptions.”
We investigated the flights from January 18 to 23 because the earliest exported
case arrived at Singapore on January 18 and the lockdown of Wuhan City started on
January 23. Although there were some flights from Wuhan to Singapore after January
23, those flights were special evacuated flights rather than regular flights. We have also
added this information to the section (page 14 lines 365-366).
3. The finding that the ascertainment rates were so low through time, and didn’t seem to
change substantially, should be discussed carefully. Media reports suggested that China
dramatically increased testing and case identification through time. So, it is a surprising
result and is worthy of speculation/explanation. Specifically, these results suggest Wuhan
was able to control the epidemic without vastly increasing case reporting, which may bear on
future efforts to contain spread worldwide.
Response: Thank you for the suggestion. We have added the following sentence to the
Discussion (page 8 lines 209-212): “In particular, despite relatively low ascertainment
rates due to undetected symptoms of many cases, the outbreak could be controlled by
extensive interventions to block the transmission from unascertained cases, such as
wearing face masks, social distancing, and quarantining close contacts.”
Minor comments
1. In your new design (Figure 1), you state ‘no presymptomatic phase for asymptomatic
cases’, but you add it before the asymptomatic compartment. Fig1 also says
“Symptomatic/asymptomatic” but should actually say ascertained and unascertained
infections.
24
Response: We used “symptomatic/asymptomatic” to reflect the disease course from
presymptomatic phase to symptomatic phase. To avoid confusion, we have modified
Figure 1b as below, removing “asymptomatic”.
Fig. 1. Illustration of the SAPHIRE model. We extended the classic SEIR model to include
seven compartments, namely S (susceptible), E (exposed), P (presymptomatic infectious), I
(ascertained infectious), A (unascertained infectious), H (isolated), and R (removed). (a)
Relationship between different compartments. Two parameters of interests are r
(ascertainment rate) and b (transmission rate), which are assumed to be varying across time
periods. (b) Schematic disease course of a symptomatic case. In this model, the unascertained
compartment A includes asymptomatic and some mild-symptomatic cases who were not
detected. While there is no presymptomatic phase for asymptomatic cases, we treated
asymptomatic as a special case of mild-symptomatic and modelled both with a
“presymptomatic” phase for simplicity.
2. How can someone move from infectious to recovered without becoming isolated? The
arrow from I to R is odd, because it would be unlikely for someone to be ascertained if they
25
never wind up being isolated.
Response: Like in the classic SEIR model, compartment R stands for “removed”,
including all cases when they are no longer infectious. Given that the infectious period is
much shorter than the time course of full recovery from clinical symptoms, it is possible
that some ascertained cases may no longer be infectious by the time when they were
hospitalized for medical treatment, especially during the early outbreak when there was
a long delay from symptom onset to confirmed diagnosis due to lack of medical
resource.
3. The logic for the equation to calculate the duration in the presymptomatic vs infectious
period (page 20 line 7) is unclear. If presymptomatic infectiousness leads to 44% of total
secondary cases, then it’s important to normalize the transmission rate and duration of this
compartment with the infectiousness and duration of the I compartment, rather than just the
durations.
Response: The result that presymptomatic infectiousness leads to 44% of total
secondary cases is from reference 10 (He et al. 2020, Temporal dynamics in viral
shedding and transmissibility of COVID-19. Nat Med 26, 672-675), which only analyzed
ascertained cases. The equation to calculate the duration of symptomatic infectious
period was based on the assumption of a constant transmission rate across both
presymptomatic and symptomatic phases for the ascertained cases (please see our
response to major comment 1). We have added the justification to the Methods section
(page 16 line 432; page 17 line 433) as below: “Because presymptomatic infectiousness
was estimated to account for 44% of the total infections of ascertained cases,10
we set the
mean of total infectious period as (𝑫𝒑 +𝑫𝒊) =𝑫𝒑
𝟎.𝟒𝟒= 𝟓. 𝟐 days assuming constant
infectiousness across presymptomatic and symptomatic phases for the ascertained
cases,12
thus the mean symptomatic infectious period was 𝑫𝒊 = 𝟐. 𝟗 days.”
26
4. The authors do a great job investigating many different models in the sensitivity analysis.
Since there are many different proposed models, it might be useful to be able to directly
compare the fits of these models using model comparison. I would suggest using a metric like
Bayes factors, BIC, or AIC and including the results as a new table or added on column of
one of the others.
Response: Thank you for the suggestion. We have added one column to our Extended
Table 4, listing the Deviance Information Criterion (DIC) of different models, as we
used the Bayesian methods for fitting the model.
Reference
Ferretti, L. et al. Quantifying SARS-CoV-2 transmission suggests epidemic control with
digital contact tracing. Science 368, (2020).