Suggest load per case assumption using wastewater data and case numbers
Source:R/utils_preprocess.R
suggest_load_per_case.Rd
This helper function uses a crude heuristic to infer the
load_per_case
based on the relationship between measured concentrations
and case counts. The goal is to obtain a load_per_case
assumption that is
on the right order of magnitude - this will not be sufficient for accurate
prevalence estimation from wastewater, but fully sufficient for monitoring
trends and estimating Rt.
Arguments
- measurements
A
data.frame
with each row representing one measurement. Must have at least a column with dates and a column with concentration measurements. If multiple measurements per date are provided, their arithmetic mean is used.- cases
A
data.frame
with each row representing one day. Must have at least a column with dates and a column with case numbers.- flows
A
data.frame
with each row representing one day. Must have at least a column with dates and a column with flow measurements.- flow_constant
Fixed flow volume, as an alternative to
flows
, if no regular flow measurements are available.- ascertainment_prop
Proportion of all cases that get detected / reported. Can be used to account for underreporting of infections. Default is
ascertainment_prop=1
, meaning that 100% of infections become confirmed cases.- measurement_shift
The specific timing between wastewater concentrations and case numbers depends on reporting delays and shedding profiles and is typically uncertain. This argument allows to shift the concentration and case number time series relative to each other and to average over several potential lags/leads, as specified by an integer vector. The default is
measurement_shift = seq(-7,7)
, i.e. a shift of concentrations between up to one week before and after case numbers.- shift_weights
Weights for the shifted comparisons. Must be an numeric vector of the same length as
measurement_shift
. IfNULL
(default), the weights are chosen to be approximately inversely proportional to the shift distance.- date_col
Name of the date column in all provided data frames.
- concentration_col
Name of the column containing the measured concentrations.
- flow_col
Name of the column containing the flows.
- case_col
Name of the column containing the case numbers.
- signif_fig
Significant figures to round to. Since this heuristic only provides crude estimates which should not be overinterpreted, the result gets rounded. Default is rounding to the 2 most significant figures.
Value
A suggested load_per_case
that can be used as an assumption in
load_per_case_assume()
.
Details
In the EpiSewer
model, the load_per_case
serves as a scaling
factor describing how many pathogen particles are shed by the average
infected individual overall and how much of this is detectable at the
sampling site. This depends both on biological factors as well as on the
specific sewage system. It is therefore almost always necessary to assume
the load per case based on a comparison of measured concentrations/loads
and case numbers.
The heuristic used here is to fit a linear regression model with
loads (computed using concentrations and flows) as dependent variable and
case numbers as independent variable over all measurements. This does not
explicitly account for shedding profiles or reporting delays, but the
measurement_shift
argument allows to average over a set of relative
shifts between the two time series.
The flow volume unit should be the same as for the concentration measurements, e.g. if concentrations are measured in gc/mL, then the flow should be in mL as well.