This option estimates the effective reproduction number Rt over time using a Gaussian process (GP) model. There are two GPs: one for the long-term trend in Rt, and one for short-term deviations from this trend.
Usage
R_estimate_gp(
R_intercept_prior_mu = 1,
R_intercept_prior_sigma = 0,
length_scale_prior_mu = 7 * 3,
length_scale_prior_sigma = 7/2,
magnitude_prior_mu = 0.2,
magnitude_prior_sigma = 0.05,
long_length_scale_prior_mu = 7 * 4 * 3,
long_length_scale_prior_sigma = 7,
long_magnitude_prior_mu = 0.4,
long_magnitude_prior_sigma = 0.1,
matern_nu = c(3/2, 5/2, 1/2),
boundary_factor = 3,
n_basis_factor = 3.42,
link = "inv_softplus",
R_max = 6,
modeldata = modeldata_init()
)Arguments
- R_intercept_prior_mu
Prior (mean) for the intercept of Rt. Should be set to 1 unless you have a clear a priori expectation of the average Rt during the modeled time period.
- R_intercept_prior_sigma
Prior (standard deviation) for the intercept of Rt. By default, we fix
R_interceptto 1 by settingR_intercept_prior_sigma=0. This is because deviations from Rt=1 are already captured by the long-term trend component (see below).- length_scale_prior_mu
Prior (mean) on the length scale of the short-term Gaussian process (in days). This influences the smoothness of Rt. A higher length scale means that the Rt will change more slowly. Choosing a length scale that is too short can lead to overfitting of the Rt trajectory, while choosing a length scale that is too long can result in unrealistically smooth Rt estimates.
- length_scale_prior_sigma
Prior (standard deviation) on the length scale of the short-term Gaussian process. Set to zero to fix the length scale.
- magnitude_prior_mu
Prior (mean) on the magnitude of the short-term Gaussian process. Can be approximately interpreted as the marginal standard deviation of Rt. A higher magnitude allows more extreme Rt values.
- magnitude_prior_sigma
Prior (standard deviation) on the magnitude of the short-term Gaussian process. Set to zero to fix the magnitude.
- long_length_scale_prior_mu
Prior (mean) on the length scale of the long-term Gaussian process (in days). This should be quite long, at least several times the mean shedding delay of the pathogen, and significantly larger than the mean prior for the short-term GP (
length_scale_prior_mu).- long_length_scale_prior_sigma
Prior (standard deviation) on the length scale of the long-term Gaussian process. Set to zero to fix the length scale.
- long_magnitude_prior_mu
Prior (mean) on the magnitude of the long-term Gaussian process. Can be approximately interpreted as the marginal standard deviation of the long-term Rt trend. A higher magnitude allows more extreme Rt values.
- long_magnitude_prior_sigma
Prior (standard deviation) on the magnitude of the long-term Gaussian process. Set to zero to fix the magnitude.
- matern_nu
The smoothness parameter of the Matern kernel. The default is 3/2, other possible choices are 5/2 (more smooth) and 1/2 (less smooth). However, we recommend tuning smoothness primarily using the length scale priors.
- boundary_factor
The boundary factor used in the Gaussian process approximation. The default (
boundary_factor = 3) is higher than the minimum recommendation from Riutort-Mayol et al. to ensure accurate real-time estimation. Lower values can lead to boundary effects close to the present and start of the time series and are thus not recommended. When modeling very long shedding delays, the boundary factor might need to be further increased. Note that this will also automatically increase the number of basis functions used and thereby slow down sampling.- n_basis_factor
Factor used to automatically determine
m, the number of basis functions in the Gaussian process approximation. Based on recommendations from Riutort-Mayol et al., we letm = n_basis_factor * c / (l / S), wherecis theboundary_factor,lis the length scale of the GP (we use the 5% quantile ofgp_length_priorfor a conservative result), andSis the maximum absolute value of the zero-centered input space, which is(n-1)/2wherenis the number of Rt time steps modeled. This means that the number of basis function automatically adapts to the length scale and number of observations in the model. For the default settings (boundary_factor = 3andn_basis_factor = 3.42) and a lower length scale of 3 weeks, this corresponds to approx. 0.25x the number of Rt time points modeled. Increasingn_basis_factorwill make the approximation more accurate by proportionally increasing the number of basis functions, but can slow down sampling.- link
Link function. Currently supported are
inv_softplus(default) andscaled_logit. Both of these links are configured to behave approximately like the identity function around R=1, but become increasingly non-linear below (and in the case ofscaled_logitalso above) R=1.- R_max
If
link=scaled_logitis used, a maximum reproduction number must be assumed. This should be higher than any realistic R value for the modeled pathogen. Default is 6.- modeldata
A
modeldataobject to which the above model specifications should be added. Default is an empty model given bymodeldata_init(). Can also be an already partly specified model returned by otherEpiSewermodeling functions.
Value
A modeldata object containing data and specifications of the model
to be fitted. Can be passed on to other EpiSewer modeling functions to
add further data and model specifications.
The modeldata object also includes information about parameter
initialization (.init), meta data (.metainfo), and checks to be
performed before model fitting (.checks).
Details
The estimated Rt trajectory is primarily influenced by the priors for the length scale and the magnitude of the short-term and long-term Gaussian processes. We recommend adjusting the magnitude when maximum Rt values seem to be too low or too high. In contrast, we recommend adjusting the length scale when the Rt trajectory seems to have too little resolution (try shorter length scales) or is overfitting on noise (try longer length scales). Note that, to ensure identifiability, the prior mean for the length scale of the long-term GP should always be significantly larger than the prior mean for the short-term GP.
The Gaussian process is modeled using a Hilbert space approximation
as described in Riutort-Mayol et al. (2023). This allows for fast inference
without significant loss of accuracy. See the reference for more detail on
choosing an adequate boundary factor and basis function factor. The
EpiSewer implementation of the approximate Gaussian process is strongly
inspired by the implementation in the
EpiNow2 package.
The priors of this component have the following functional form:
R_intercept (intercept):
Normalmagnitude (magnitude):
Truncated Normallength_scale (length scale):
Truncated Normallong_magnitude (magnitude for long-term trend):
Truncated Normallong_length_scale (length scale for long-term trend):
Truncated Normal
References
Riutort-Mayol, G., Bürkner, PC., Andersen, M.R. et al. Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Stat Comput 33, 17 (2023). https://doi.org/10.1007/s11222-022-10167-2
Abbott S, Hellewell J, Thompson RN et al. Estimating the time-varying reproduction number of SARS-CoV-2 using national and subnational case counts. Wellcome Open Res 5, 112 (2020). https://doi.org/10.12688/wellcomeopenres.16006.2