Title: | Estimate Disease Severity and Case Ascertainment |
---|---|
Description: | Estimate the severity of a disease and ascertainment of cases, as discussed in Nishiura et al. (2009) <doi:10.1371/journal.pone.0006852>. |
Authors: | Pratik R. Gupte [aut, cph] , Adam Kucharski [aut, cph, cre] , Tim Russell [aut, cph] , Joshua W. Lambert [rev] , Hugo Gruson [rev] , Tim Taylor [rev] , James M. Azam [rev] , Abdoelnaser M. Degoot [rev] , Sebastian Funk [rev] |
Maintainer: | Adam Kucharski <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.2 |
Built: | 2024-11-17 05:56:55 UTC |
Source: | https://github.com/epiverse-trace/cfr |
Calculates the CFR at each time point in the case and death time series supplied, using an expanding window of time. The static CFR is calculated for each time point, using the time series from the start to each time point, and increasing the number of time points included by one in each iteration.
cfr_rolling(data, delay_density = NULL, poisson_threshold = 100)
cfr_rolling(data, delay_density = NULL, poisson_threshold = 100)
data |
A Note that the Note also that the total number of cases must be greater than the total number of reported deaths. |
delay_density |
An optional argument that controls whether delay
correction is applied in the severity estimation.
May be |
poisson_threshold |
The case count above which to use Poisson approximation. Set to 100 by default. Must be > 0. |
When delay correction is applied by passing a delay distribution
density function to delay_density
, the internal function
.estimate_severity()
is used to calculate the rolling severity.
Note that in the naive method the severity estimate and confidence intervals
cannot be calculated for days on which the cumulative number of cases since
the start of the time-series, and for days on which the cumulative number of
deaths reported exceeds the cumulative reported cases, and is returned as
NA
.
cfr_rolling()
applies the internal function .estimate_severity()
to an
expanding time-series of total cases, total estimated outcomes, and total
deaths. The method used to generate a profile likelihood for each day depends
on the outbreak size and initial severity estimate for that day. This is
essentially the same as running cfr_static()
on each new day. The method
used for each day is not communicated to the user, in order to prevent
cluttering the terminal with messages.
A <data.frame>
with the date, maximum likelihood estimate and 95%
confidence interval of the daily severity estimates, named
"severity_estimate", "severity_low", and "severity_high", with one row for
each day in the original data.frame.
# load package data data("ebola1976") # estimate severity without correcting for delays cfr_static(ebola1976) # estimate severity for each day while correcting for delays # obtain onset-to-death delay distribution parameters from Barry et al. 2018 # The Lancet. <https://doi.org/10.1016/S0140-6736(18)31387-4> # view only the first values estimate <- cfr_rolling( ebola1976, delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33) ) head(estimate)
# load package data data("ebola1976") # estimate severity without correcting for delays cfr_static(ebola1976) # estimate severity for each day while correcting for delays # obtain onset-to-death delay distribution parameters from Barry et al. 2018 # The Lancet. <https://doi.org/10.1016/S0140-6736(18)31387-4> # view only the first values estimate <- cfr_rolling( ebola1976, delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33) ) head(estimate)
Calculates the severity of a disease, while optionally correcting for reporting delays using an epidemiological delay distribution of the time between symptom onset and death (onset-to-death).
Other delay distributions may be passed to calculate different disease severity measures such as the hospitalisation fatality risk.
cfr_static(data, delay_density = NULL, poisson_threshold = 100)
cfr_static(data, delay_density = NULL, poisson_threshold = 100)
data |
A Note that the Note also that the total number of cases must be greater than the total number of reported deaths. |
delay_density |
An optional argument that controls whether delay
correction is applied in the severity estimation.
May be |
poisson_threshold |
The case count above which to use Poisson approximation. Set to 100 by default. Must be > 0. |
A <data.frame>
with the maximum likelihood estimate and 95%
confidence interval of the severity estimates, named "severity_estimate",
"severity_low", and "severity_high".
The method used in cfr_static()
follows Nishiura et al.
(2009).
The function calculates a quantity for each day within the input
data, which represents the proportion of cases estimated to have
a known outcome on day
.
Following Nishiura et al.,
is calculated as:
where is the value of the probability mass function at time
and
,
are the number of new cases and new deaths at time
, (respectively).
We then use
at the end of the outbreak in the following likelihood
function to estimate the severity of the disease in question.
and
are the cumulative number of cases and deaths
(respectively) up until time
.
is the parameter we wish to estimate, the severity of the
disease. We estimate
using simple maximum-likelihood methods,
allowing the functions within this package to be quick and easy tools to use.
The precise severity measure — CFR, IFR, HFR, etc — that
represents depends upon the input data given by the user.
The epidemiological delay-distribution density function passed to
delay_density
is used to evaluate the probability mass function
parameterised by time; i.e. which
gives the probability that a case has a known outcome (usually death) at time
, parameterised with disease-specific parameters before it is supplied
here.
The naive CFR estimate (without delay correction) is the outcome of a
Binomial test on deaths and cases using stats::binom.test()
.
The confidence intervals around the estimate are also taken from the test.
The delay-corrected CFR estimates are however obtained by generating a
profile likelihood over the sequence seq(1e-4, 1.0, 1e-4)
. The method used
depends on the outbreak size and the initial expectation of disease severity.
This is implemented in the internal function .estimate_severity()
.
Delay correction, small outbreaks: For outbreaks where the total cases
are below the user-specified 'Poisson threshold' (poisson_threshold
,
default = 100), the CFR and uncertainty around it is taken from a profile
likelihood generated from a Binomial model of deaths (successes) and
estimated known outcomes (trials).
Delay correction, large outbreaks with low severity: For outbreaks
with total cases greater than the Poisson threshold (default = 100) and with
initial severity estimates < 0.05, the CFR and uncertainty are taken from a
Poisson approximation of the Binomial profile likelihood (taking
for
estimated outcomes and
as the severity
estimate).
Delay correction, large outbreaks with higher severity: For outbreaks
with total cases greater than the Poisson threshold (default = 100) and with
initial severity estimates 0.05, the CFR and uncertainty are taken
from a Normal approximation of the Binomial profile likelihood.
Nishiura, H., Klinkenberg, D., Roberts, M., & Heesterbeek, J. A. P. (2009). Early Epidemiological Assessment of the Virulence of Emerging Infectious Diseases: A Case Study of an Influenza Pandemic. PLOS ONE, 4(8), e6852. doi:10.1371/journal.pone.0006852
# load package data data("ebola1976") # estimate severity without correcting for delays cfr_static(ebola1976) # estimate severity for each day while correcting for delays # obtain onset-to-death delay distribution parameters from Barry et al. 2018 # The Lancet. <https://doi.org/10.1016/S0140-6736(18)31387-4> cfr_static( ebola1976, delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33) )
# load package data data("ebola1976") # estimate severity without correcting for delays cfr_static(ebola1976) # estimate severity for each day while correcting for delays # obtain onset-to-death delay distribution parameters from Barry et al. 2018 # The Lancet. <https://doi.org/10.1016/S0140-6736(18)31387-4> cfr_static( ebola1976, delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33) )
Calculates how the severity of a disease changes over time while optionally correcting for reporting delays using an epidemiological delay distribution of the time between symptom onset and outcome (e.g. onset-to-death for the fatality risk).
cfr_time_varying( data, delay_density = NULL, burn_in = 7, smoothing_window = NULL )
cfr_time_varying( data, delay_density = NULL, burn_in = 7, smoothing_window = NULL )
data |
A Note that the Note also that the total number of cases must be greater than the total number of reported deaths. |
delay_density |
An optional argument that controls whether delay
correction is applied in the severity estimation.
May be |
burn_in |
A single integer-like value for the number of time-points (typically days) to disregard at the start of the time-series, if a burn-in period is desired. Defaults to 7, which is a sensible default value that disregards the first week of cases and deaths, assuming daily data. To consider all case data including the start of the time-series, set this argument to 0. |
smoothing_window |
An odd number determining the smoothing window size
to use when smoothing the case and death time-series, using a rolling median
procedure (as the The default behaviour is to apply no smoothing. The minimum value of this argument is 1. |
A <data.frame>
with the date, maximum likelihood estimate and 95%
confidence interval of the daily severity estimates, named
"severity_estimate", "severity_low", and "severity_high", with one row for
each day in the original data.frame.
This function estimates the number of cases which have a known outcome over
time, following Nishiura et al. (2009).
The function calculates a quantity for each day within the input
data, which represents the number of cases estimated to have a known outcome,
on day
.
is calculated in the following way:
We then assume that the severity measure, for example CFR, of interest is binomially distributed, in the following way:
We use maximum likelihood estimation to determine the value of
for each
, where
represents the severity measure of
interest.
The epidemiological delay distribution passed to delay_density
is used to
obtain a probability mass function parameterised by time; i.e.
which gives the probability of the binary outcome of a case (survival or
death) being known by time
. The delay distribution is parameterised
with disease-specific parameters before it is supplied here.
Note that the function arguments burn_in
and smoothing_window
are not
explicitly used in this calculation. burn_in
controls how many estimates at
the beginning of the outbreak are replaced with NA
s — the calculation
above is not applied to the first burn_in
data points.
The calculation is applied to the smoothed data, if a smoothing_window
is specified.
Nishiura, H., Klinkenberg, D., Roberts, M., & Heesterbeek, J. A. P. (2009). Early Epidemiological Assessment of the Virulence of Emerging Infectious Diseases: A Case Study of an Influenza Pandemic. PLOS ONE, 4(8), e6852. doi:10.1371/journal.pone.0006852
# get data pre-loaded with the package data("covid_data") df_covid_uk <- covid_data[covid_data$country == "United Kingdom" & covid_data$date <= as.Date("2020-09-01"), ] # estimate time varying severity without correcting for delays cfr_time_varying <- cfr_time_varying( data = df_covid_uk, burn_in = 7L ) # View tail(cfr_time_varying) # estimate time varying severity while correcting for delays # obtain onset-to-death delay distribution parameters from Linton et al. 2020 # J. Clinical Medicine: <https://doi.org/10.3390/jcm9020538> # view only the first values cfr_time_varying <- cfr_time_varying( data = df_covid_uk, delay_density = function(x) dlnorm(x, meanlog = 2.577, sdlog = 0.440), burn_in = 7L ) tail(cfr_time_varying)
# get data pre-loaded with the package data("covid_data") df_covid_uk <- covid_data[covid_data$country == "United Kingdom" & covid_data$date <= as.Date("2020-09-01"), ] # estimate time varying severity without correcting for delays cfr_time_varying <- cfr_time_varying( data = df_covid_uk, burn_in = 7L ) # View tail(cfr_time_varying) # estimate time varying severity while correcting for delays # obtain onset-to-death delay distribution parameters from Linton et al. 2020 # J. Clinical Medicine: <https://doi.org/10.3390/jcm9020538> # view only the first values cfr_time_varying <- cfr_time_varying( data = df_covid_uk, delay_density = function(x) dlnorm(x, meanlog = 2.577, sdlog = 0.440), burn_in = 7L ) tail(cfr_time_varying)
Data adapted from the {covidregionaldata} package of daily cases and
deaths from the 19 countries with 100,000 or more deaths over the period
2020-01-01 to 2022-12-31. See the References for the publication which
links to data sources made available through {covidregionaldata}.
Included as {covidregionaldata} is no longer on CRAN.
Data are provided as a <data.frame>
.
covid_data
covid_data
covid_data
A <data.frame>
with 20,786 rows and 4 columns:
Calendar date in the format %Y-%m-%d
The country name in simple format, e.g. "United States" rather than "United States of America"
Number of cases reported on each date
Number of deaths reported on each date
Joseph Palmer, Katharine Sherratt, Richard Martin-Nielsen, Jonnie Bevan, Hamish Gibbs, Sebastian Funk and Sam Abbott (2021). covidregionaldata: Subnational data for COVID-19 epidemiology. doi:10.21105/joss.03290
An example epidemic outbreak dataset for use with the cfr
package.
This dataset comes from the first Ebola outbreak in Zaire in 1976 as analysed
in Camacho et al. (2014).
ebola1976
ebola1976
ebola1976
A <data.frame>
with 73 rows and 3 columns:
Calendar date
Number of cases reported
Number of deaths reported
doi:10.1016/j.epidem.2014.09.003
Camacho, A., Kucharski, A. J., Funk, S., Breman, J., Piot, P., & Edmunds, W. J. (2014). Potential for large outbreaks of Ebola virus disease. Epidemics, 9, 70–78. doi:10.1016/j.epidem.2014.09.003
Estimates the proportion of cases or infections that have been ascertained, given a time-series of cases and deaths, a delay distribution and a baseline severity estimate. The resulting ascertainment estimate is calculated as the ratio of the baseline severity estimate, which is assumed to be the 'true' disease severity, and the delay-adjusted severity estimate.
estimate_ascertainment(data, severity_baseline, delay_density = NULL)
estimate_ascertainment(data, severity_baseline, delay_density = NULL)
data |
A Note that the Note also that the total number of cases must be greater than the total number of reported deaths. |
severity_baseline |
A single number in the range 0.0 – 1.0 for the assumed true baseline severity estimate used to estimate the overall ascertainment ratio. Missing by default, which causes the function to error; must be supplied by the user. |
delay_density |
An optional argument that controls whether delay
correction is applied in the severity estimation.
May be |
estimate_ascertainment()
uses cfr_static()
internally to obtain a
severity estimate that is compared against the user-specified baseline
severity. The profile likelihood method used to obtain the severity estimate
is decided by the internal function .estimate_severity()
as used in
cfr_static()
, when delay correction is applied. See the cfr_static()
documentation for an explanation of the methods used depending on outbreak
size and initial severity guess.
A <data.frame>
containing the maximum likelihood estimate estimate
and 95% confidence interval of the corrected severity, named
"ascertainment_estimate" (for the central estimate), and "ascertainment_low"
and "ascertainment_high" for the lower and upper interval limits.
# get data pre-loaded with the package data("covid_data") df_covid_uk <- covid_data[covid_data$country == "United Kingdom", ] df_covid_uk_subset <- subset(df_covid_uk, date <= "2020-05-31") # use a severity baseline of 1.4% (0.014) taken from Verity et al. (2020) # Lancet Infectious Diseases: <https://doi.org/10.1016/S1473-3099(20)30243-7> # use onset-to-death distribution from Linton et al. (2020) # J. Clinical Medicine: <https://doi.org/10.3390/jcm9020538> # subset data until 30th June 2020 data <- df_covid_uk[df_covid_uk$date <= "2020-06-30", ] estimate_ascertainment( data = data, delay_density = function(x) dlnorm(x, meanlog = 2.577, sdlog = 0.440), severity_baseline = 0.014 )
# get data pre-loaded with the package data("covid_data") df_covid_uk <- covid_data[covid_data$country == "United Kingdom", ] df_covid_uk_subset <- subset(df_covid_uk, date <= "2020-05-31") # use a severity baseline of 1.4% (0.014) taken from Verity et al. (2020) # Lancet Infectious Diseases: <https://doi.org/10.1016/S1473-3099(20)30243-7> # use onset-to-death distribution from Linton et al. (2020) # J. Clinical Medicine: <https://doi.org/10.3390/jcm9020538> # subset data until 30th June 2020 data <- df_covid_uk[df_covid_uk$date <= "2020-06-30", ] estimate_ascertainment( data = data, delay_density = function(x) dlnorm(x, meanlog = 2.577, sdlog = 0.440), severity_baseline = 0.014 )
Estimates the expected number of individuals with known outcomes from a case and outcome time series of outbreak data, and an epidemiological delay distribution of symptom onset to outcome. When calculating a case fatality risk, the outcomes must be deaths, the delay distribution must be an onset-to-death distribution, and the function returns estimates of the known death outcomes.
estimate_outcomes(data, delay_density)
estimate_outcomes(data, delay_density)
data |
A Note that the Note also that the total number of cases must be greater than the total number of reported deaths. |
delay_density |
An optional argument that controls whether delay
correction is applied in the severity estimation.
May be |
The ratio u_t
represents, for the outbreak, the overall proportion of
cases whose outcomes are expected to be known by each day $i$. For an ongoing
outbreak with relatively long delays between symptom onset and case outcome,
a u_t
value of 1.0 may indicate that the outbreak has ended, as the
outcomes of all cases are expected to be known.
A <data.frame>
with the columns in data
, and with two additional
columns:
"estimated_outcomes"
for the number of cases with an outcome of interest
(usually, death) estimated to be known on the dates specified in data
, and
u_t
for the ratio of cumulative number of estimated known outcomes
and the cumulative number of cases reported until each date specified in
data
.
# Load Ebola 1976 outbreak data data("ebola1976") # estimate severity for each day while correcting for delays # obtain onset-to-death delay distribution parameters from Barry et al. 2018 # examine the first few rows of the output estimated_outcomes <- estimate_outcomes( data = ebola1976, delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33) ) head(estimated_outcomes)
# Load Ebola 1976 outbreak data data("ebola1976") # estimate severity for each day while correcting for delays # obtain onset-to-death delay distribution parameters from Barry et al. 2018 # examine the first few rows of the output estimated_outcomes <- estimate_outcomes( data = ebola1976, delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33) ) head(estimated_outcomes)
This S3 generic has methods for classes commonly used for epidemiological data.
Currently, the only supported data format is <incidence2>
from the
incidence2 package. See incidence2::incidence()
. Grouped
<incidence2>
data are supported, see Details.
prepare_data(data, ...) ## S3 method for class 'incidence2' prepare_data( data, cases_variable = "cases", deaths_variable = "deaths", fill_NA = TRUE, ... )
prepare_data(data, ...) ## S3 method for class 'incidence2' prepare_data( data, cases_variable = "cases", deaths_variable = "deaths", fill_NA = TRUE, ... )
data |
A |
... |
Currently unused. Passing extra arguments will throw a warning. |
cases_variable |
A string for the name of the cases variable in the
"count_variable" column of |
deaths_variable |
A string for the name of the deaths variable in the
"count_variable" column of |
fill_NA |
A logical indicating whether |
The method for <incidence2>
data can replace NA
s in the case and death
data with 0s using the fill_NA
argument, which is TRUE
by default,
meaning that NA
s are replaced.
Keeping NA
s will cause downstream issues when calling functions such as
cfr_static()
on the data, as they cannot handle NA
s.
Setting fill_NA = TRUE
resolves this issue.
Passing a grouped <incidence2>
object to data
will result in the function
respecting the grouping and returning grouping variables in separate columns.
A <data.frame>
suitable for disease severity estimation functions
provided in cfr, with the columns "date", "cases", and "deaths".
Additionally, for grouped <incidence2>
data, columns representing the
grouping variables will also be present.
The result has a continuous sequence of dates between the start and end date
of data
; this is required if the data is to be passed to functions such as
cfr_static()
.
#### For <incidence2> data #### # load Covid-19 data from incidence2 covid_uk <- incidence2::covidregionaldataUK # convert to incidence2 object covid_uk_incidence <- incidence2::incidence( covid_uk, date_index = "date", counts = c("cases_new", "deaths_new"), count_names_to = "count_variable" ) # View tail of prepared data data <- prepare_data( covid_uk_incidence, cases_variable = "cases_new", deaths_variable = "deaths_new" ) tail(data) #### For grouped <incidence2> data #### # convert data to incidence2 object grouped by region covid_uk_incidence <- incidence2::incidence( covid_uk, date_index = "date", counts = c("cases_new", "deaths_new"), count_names_to = "count_variable", groups = "region" ) # View tail of prepared data data <- prepare_data( covid_uk_incidence, cases_variable = "cases_new", deaths_variable = "deaths_new" ) tail(data)
#### For <incidence2> data #### # load Covid-19 data from incidence2 covid_uk <- incidence2::covidregionaldataUK # convert to incidence2 object covid_uk_incidence <- incidence2::incidence( covid_uk, date_index = "date", counts = c("cases_new", "deaths_new"), count_names_to = "count_variable" ) # View tail of prepared data data <- prepare_data( covid_uk_incidence, cases_variable = "cases_new", deaths_variable = "deaths_new" ) tail(data) #### For grouped <incidence2> data #### # convert data to incidence2 object grouped by region covid_uk_incidence <- incidence2::incidence( covid_uk, date_index = "date", counts = c("cases_new", "deaths_new"), count_names_to = "count_variable", groups = "region" ) # View tail of prepared data data <- prepare_data( covid_uk_incidence, cases_variable = "cases_new", deaths_variable = "deaths_new" ) tail(data)