Title: | Compute, Handle, Plot and Model Incidence of Dated Events |
---|---|
Description: | Provides functions and classes to compute, handle and visualise incidence from dated events for a defined time interval. Dates can be provided in various standard formats. The class 'incidence' is used to store computed incidence and can be easily manipulated, subsetted, and plotted. In addition, log-linear models can be fitted to 'incidence' objects using 'fit'. This package is part of the RECON (<https://www.repidemicsconsortium.org/>) toolkit for outbreak analysis. |
Authors: | Thibaut Jombart [aut], Zhian N. Kamvar [aut] , Rich FitzJohn [aut], Tim Taylor [cre] , Jun Cai [ctb] , Sangeeta Bhatia [ctb], Jakob Schumacher [ctb], Juliet R.C. Pulliam [ctb] |
Maintainer: | Tim Taylor <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.7.5 |
Built: | 2024-11-05 04:43:12 UTC |
Source: | https://github.com/reconhub/incidence |
These functions convert incidence
objects into other classes.
## S3 method for class 'incidence' as.data.frame(x, ..., long = FALSE) as.incidence(x, ...) ## S3 method for class 'matrix' as.incidence( x, dates = NULL, interval = NULL, standard = TRUE, isoweeks = standard, ... ) ## S3 method for class 'data.frame' as.incidence(x, dates = NULL, interval = NULL, isoweeks = TRUE, ...) ## S3 method for class 'numeric' as.incidence(x, dates = NULL, interval = NULL, isoweeks = TRUE, ...)
## S3 method for class 'incidence' as.data.frame(x, ..., long = FALSE) as.incidence(x, ...) ## S3 method for class 'matrix' as.incidence( x, dates = NULL, interval = NULL, standard = TRUE, isoweeks = standard, ... ) ## S3 method for class 'data.frame' as.incidence(x, dates = NULL, interval = NULL, isoweeks = TRUE, ...) ## S3 method for class 'numeric' as.incidence(x, dates = NULL, interval = NULL, isoweeks = TRUE, ...)
x |
An |
... |
Further arguments passed to other functions (no used). |
long |
A logical indicating if the output data.frame should be 'long', i.e. where a single column containing 'groups' is added in case of data computed on several groups. |
dates |
A vector of dates, each corresponding to the (inclusive) lower limit of the bins. |
interval |
An integer indicating the time interval used in the computation of the incidence. If NULL, it will be determined from the first time interval between provided dates. If only one date is provided, it will trigger an error. |
standard |
A logical indicating whether standardised dates should be
used. Defaults to |
isoweeks |
Deprecated. Use standard. |
Conversion to incidence
objects should only be done when the
original dates are not available. In such case, the argument x
should be a matrix corresponding to the $counts
element of an
incidence
object, i.e. giving counts with time intervals in rows and
named groups in columns. In the absence of groups, a single unnamed columns
should be given. data.frame
and vectors will be coerced to a matrix.
Thibaut Jombart [email protected], Rich Fitzjohn
the incidence()
function to generate the 'incidence' objects.
## create fake data data <- c(0,1,1,2,1,3,4,5,5,5,5,4,4,26,6,7,9) sex <- sample(c("m","f"), length(data), replace=TRUE) ## get incidence per group (sex) i <- incidence(data, groups = sex) i plot(i) ## convert to data.frame as.data.frame(i) ## same, 'long format' as.data.frame(i, long = TRUE) ## conversion from a matrix of counts to an incidence object i$counts new_i <- as.incidence(i$counts, i$dates) new_i all.equal(i, new_i)
## create fake data data <- c(0,1,1,2,1,3,4,5,5,5,5,4,4,26,6,7,9) sex <- sample(c("m","f"), length(data), replace=TRUE) ## get incidence per group (sex) i <- incidence(data, groups = sex) i plot(i) ## convert to data.frame as.data.frame(i) ## same, 'long format' as.data.frame(i, long = TRUE) ## conversion from a matrix of counts to an incidence object i$counts new_i <- as.incidence(i$counts, i$dates) new_i all.equal(i, new_i)
This function can be used to bootstrap incidence
objects. Bootstrapping is
done by sampling with replacement the original input dates. See details
for
more information on how this is implemented.
bootstrap(x, randomise_groups = FALSE)
bootstrap(x, randomise_groups = FALSE)
x |
An |
randomise_groups |
A |
As original data are not stored in incidence
objects, the
bootstrapping is achieved by multinomial sampling of date bins weighted by
their relative incidence.
An incidence
object.
Thibaut Jombart [email protected]
find_peak to use estimate peak date using bootstrap
if (require(outbreaks) && require(ggplot2)) { withAutoprint({ i <- incidence(fluH7N9_china_2013$date_of_onset) i plot(i) ## one simple bootstrap x <- bootstrap(i) x plot(x) })}
if (require(outbreaks) && require(ggplot2)) { withAutoprint({ i <- incidence(fluH7N9_china_2013$date_of_onset) i plot(i) ## one simple bootstrap x <- bootstrap(i) x plot(x) })}
cumulate
is an S3 generic to compute cumulative numbers, with methods
for different types of objects:
cumulate(x) ## Default S3 method: cumulate(x) ## S3 method for class 'incidence' cumulate(x)
cumulate(x) ## Default S3 method: cumulate(x) ## S3 method for class 'incidence' cumulate(x)
x |
An incidence object. |
default method is a wrapper for cumsum
incidence
objects: computes cumulative incidence over time
projections
objects: same, for projections
objects,
implemented in the similarly named package; see ?cumulate.projections
for more information, after loading the package
Thibaut Jombart [email protected]
The incidence()
function to generate the 'incidence'
objects.
dat <- as.integer(c(0,1,2,2,3,5,7)) group <- factor(c(1, 2, 3, 3, 3, 3, 1)) i <- incidence(dat, groups = group) i plot(i) i_cum <- cumulate(i) i_cum plot(i_cum)
dat <- as.integer(c(0,1,2,2,3,5,7)) group <- factor(c(1, 2, 3, 3, 3, 3, 1)) i <- incidence(dat, groups = group) i plot(i) i_cum <- cumulate(i) i_cum plot(i_cum)
Access various elements of an incidence object
## S3 method for class 'incidence' dim(x) get_interval(x, ...) ## Default S3 method: get_interval(x, ...) ## S3 method for class 'incidence' get_interval(x, integer = TRUE, ...) get_n(x) ## Default S3 method: get_n(x) ## S3 method for class 'incidence' get_n(x) get_timespan(x) ## Default S3 method: get_timespan(x) ## S3 method for class 'incidence' get_timespan(x)
## S3 method for class 'incidence' dim(x) get_interval(x, ...) ## Default S3 method: get_interval(x, ...) ## S3 method for class 'incidence' get_interval(x, integer = TRUE, ...) get_n(x) ## Default S3 method: get_n(x) ## S3 method for class 'incidence' get_n(x) get_timespan(x) ## Default S3 method: get_timespan(x) ## S3 method for class 'incidence' get_timespan(x)
x |
an incidence object. |
... |
Unused |
integer |
When |
dim()
the dimensions in the number of bins and number of groups
get_interval()
if integer = TRUE
: an integer vector, otherwise: the
value stored in x$interval
get_n()
The total number of cases stored in the object
get_timespan()
: an integer
denoting the timespan represented by the
incidence object.
get_counts()
to access the matrix of counts
get_dates()
to access the dates on the right, left, and center of the
interval.
group_names()
to access and possibly re-name the groups
set.seed(999) dat <- as.Date(Sys.Date()) + sample(-3:50, 100, replace = TRUE) x <- incidence(dat, interval = "month") # the value stored in the interval element get_interval(x) # the numeric value of the interval in days get_interval(x, integer = FALSE) # the number of observations in the object get_n(x) # the length of time represented get_timespan(x) # the number of groups ncol(x) # the number of bins (intervals) nrow(x)
set.seed(999) dat <- as.Date(Sys.Date()) + sample(-3:50, 100, replace = TRUE) x <- incidence(dat, interval = "month") # the value stored in the interval element get_interval(x) # the numeric value of the interval in days get_interval(x, integer = FALSE) # the number of observations in the object get_n(x) # the length of time represented get_timespan(x) # the number of groups ncol(x) # the number of bins (intervals) nrow(x)
This function can be used to estimate the peak of an epidemic curve stored as
incidence
, using bootstrap. See bootstrap for more information
on the resampling.
estimate_peak(x, n = 100, alpha = 0.05)
estimate_peak(x, n = 100, alpha = 0.05)
x |
An |
n |
The number of bootstrap datasets to be generated; defaults to 100. |
alpha |
The type 1 error chosen for the confidence interval; defaults to 0.05. |
Input dates are resampled with replacement to form bootstrapped datasets; the peak is reported for each, resulting in a distribution of peak times. When there are ties for peak incidence, only the first date is reported.
Note that the bootstrapping approach used for estimating the peak time makes the following assumptions:
the total number of event is known (no uncertainty on total incidence)
dates with no events (zero incidence) will never be in bootstrapped datasets
the reporting is assumed to be constant over time, i.e. every case is equally likely to be reported
A list containing the following items:
observed
: the peak incidence of the original dataset
estimated
: the mean peak time of the bootstrap datasets
ci
: the confidence interval based on bootstrap datasets
peaks
: the peak times of the bootstrap datasets
Thibaut Jombart [email protected], with inputs on caveats from Michael Höhle.
bootstrap for the bootstrapping underlying this
approach and find_peak to find the peak in a single
incidence
object.
if (require(outbreaks) && require(ggplot2)) { withAutoprint({ i <- incidence(fluH7N9_china_2013$date_of_onset) i plot(i) ## one simple bootstrap x <- bootstrap(i) x plot(x) ## find 95% CI for peak time using bootstrap peak_data <- estimate_peak(i) peak_data summary(peak_data$peaks) ## show confidence interval plot(i) + geom_vline(xintercept = peak_data$ci, col = "red", lty = 2) ## show the distribution of bootstrapped peaks df <- data.frame(peak = peak_data$peaks) plot(i) + geom_density(data = df, aes(x = peak, y = 10 * ..scaled..), alpha = .2, fill = "red", color = "red") })}
if (require(outbreaks) && require(ggplot2)) { withAutoprint({ i <- incidence(fluH7N9_china_2013$date_of_onset) i plot(i) ## one simple bootstrap x <- bootstrap(i) x plot(x) ## find 95% CI for peak time using bootstrap peak_data <- estimate_peak(i) peak_data summary(peak_data$peaks) ## show confidence interval plot(i) + geom_vline(xintercept = peak_data$ci, col = "red", lty = 2) ## show the distribution of bootstrapped peaks df <- data.frame(peak = peak_data$peaks) plot(i) + geom_density(data = df, aes(x = peak, y = 10 * ..scaled..), alpha = .2, fill = "red", color = "red") })}
This function can be used to find the peak of an epidemic curve stored as an
incidence
object.
find_peak(x, pool = TRUE)
find_peak(x, pool = TRUE)
x |
An |
pool |
If |
The date of the (first) highest incidence in the data.
Thibaut Jombart [email protected], Zhian N. Kamvar [email protected]
estimate_peak()
for bootstrap estimates of the peak time
if (require(outbreaks) && require(ggplot2)) { withAutoprint({ i <- incidence(fluH7N9_china_2013$date_of_onset) i plot(i) ## one simple bootstrap x <- bootstrap(i) x plot(x) ## find 95% CI for peak time using bootstrap find_peak(i) ## show confidence interval plot(i) + geom_vline(xintercept = find_peak(i), col = "red", lty = 2) })}
if (require(outbreaks) && require(ggplot2)) { withAutoprint({ i <- incidence(fluH7N9_china_2013$date_of_onset) i plot(i) ## one simple bootstrap x <- bootstrap(i) x plot(x) ## find 95% CI for peak time using bootstrap find_peak(i) ## show confidence interval plot(i) + geom_vline(xintercept = find_peak(i), col = "red", lty = 2) })}
The function fit
fits two exponential models to incidence data, of the
form:
where 'y' is the incidence, 't' is time
(in days), 'r' is the growth rate, and 'b' is the origin. The function fit
will fit one model by default, but will fit two models on either side of a
splitting date (typically the peak of the epidemic) if the argument split
is provided. When groups are present, these are included in the model as main
effects and interactions with dates. The function fit_optim_split()
can be
used to find the optimal 'splitting' date, defined as the one for which the
best average R2 of the two models is obtained. Plotting can be done using
plot
, or added to an existing incidence plot by the piping-friendly
function add_incidence_fit()
.
fit(x, split = NULL, level = 0.95, quiet = FALSE) fit_optim_split( x, window = x$timespan/4, plot = TRUE, quiet = TRUE, separate_split = TRUE ) ## S3 method for class 'incidence_fit' print(x, ...) ## S3 method for class 'incidence_fit_list' print(x, ...)
fit(x, split = NULL, level = 0.95, quiet = FALSE) fit_optim_split( x, window = x$timespan/4, plot = TRUE, quiet = TRUE, separate_split = TRUE ) ## S3 method for class 'incidence_fit' print(x, ...) ## S3 method for class 'incidence_fit_list' print(x, ...)
x |
An incidence object, generated by the function
|
split |
An optional time point identifying the separation between the two models. If NULL, a single model is fitted. If provided, two models would be fitted on the time periods on either side of the split. |
level |
The confidence interval to be used for predictions; defaults to 95%. |
quiet |
A logical indicating if warnings from |
window |
The size, in days, of the time window either side of the split. |
plot |
A logical indicating whether a plot should be added to the
output ( |
separate_split |
If groups are present, should separate split dates be
determined for each group? Defaults to |
... |
currently unused. |
For fit()
, a list with the class incidence_fit
(for a
single model), or a list containing two incidence_fit
objects (when
fitting two models). incidence_fit
objects contain:
$model
: the fitted linear model
$info
: a list containing various information extracted from the model
(detailed further)
$origin
: the date corresponding to day '0'
The $info
item is a list containing:
r
: the growth rate
r.conf
: the confidence interval of 'r'
pred
: a data.frame
containing predictions of the model,
including the true dates (dates
), their numeric version used in the
model (dates.x
), the predicted value (fit
), and the lower
(lwr
) and upper (upr
) bounds of the associated confidence
interval.
doubling
: the predicted doubling time in days; exists only if 'r' is
positive
doubling.conf
: the confidence interval of the doubling time
halving
: the predicted halving time in days; exists only if 'r' is
negative
halving.conf
: the confidence interval of the halving time
For fit_optim_split
, a list containing:
df
: a data.frame
of dates that were used in the optimization
procedure, and the corresponding average R2 of the resulting models.
split
: the optimal splitting date
fit
: an incidence_fit_list
object containing the fit for each split.
If the separate_split = TRUE
, then the incidence_fit_list
object will
contain these splits nested within each group. All of the incidence_fit
objects can be retrieved with get_fit()
.
plot
: a plot showing the content of df
(ggplot2 object)
Thibaut Jombart [email protected], Zhian N. Kamvar [email protected].
the incidence()
function to generate the 'incidence'
objects. The get_fit()
function to flatten incidence_fit_list
objects to
a list of incidence_fit
objects.
if (require(outbreaks)) { withAutoprint({ dat <- ebola_sim$linelist$date_of_onset ## EXAMPLE WITH A SINGLE MODEL ## compute weekly incidence i.7 <- incidence(dat, interval=7) plot(i.7) plot(i.7[1:20]) ## fit a model on the first 20 weeks f <- fit(i.7[1:20]) f names(f) head(get_info(f, "pred")) ## plot model alone (not recommended) plot(f) ## plot data and model (recommended) plot(i.7, fit = f) plot(i.7[1:25], fit = f) ## piping versions if (require(magrittr)) { withAutoprint({ plot(i.7) %>% add_incidence_fit(f) ## EXAMPLE WITH 2 PHASES ## specifying the peak manually f2 <- fit(i.7, split = as.Date("2014-10-15")) f2 plot(i.7) %>% add_incidence_fit(f2) ## finding the best 'peak' date f3 <- fit_optim_split(i.7) f3 plot(i.7) %>% add_incidence_fit(f3$fit) })} })}
if (require(outbreaks)) { withAutoprint({ dat <- ebola_sim$linelist$date_of_onset ## EXAMPLE WITH A SINGLE MODEL ## compute weekly incidence i.7 <- incidence(dat, interval=7) plot(i.7) plot(i.7[1:20]) ## fit a model on the first 20 weeks f <- fit(i.7[1:20]) f names(f) head(get_info(f, "pred")) ## plot model alone (not recommended) plot(f) ## plot data and model (recommended) plot(i.7, fit = f) plot(i.7[1:25], fit = f) ## piping versions if (require(magrittr)) { withAutoprint({ plot(i.7) %>% add_incidence_fit(f) ## EXAMPLE WITH 2 PHASES ## specifying the peak manually f2 <- fit(i.7, split = as.Date("2014-10-15")) f2 plot(i.7) %>% add_incidence_fit(f2) ## finding the best 'peak' date f3 <- fit_optim_split(i.7) f3 plot(i.7) %>% add_incidence_fit(f3$fit) })} })}
Get counts from an incidence object
get_counts(x, groups = NULL) ## S3 method for class 'incidence' get_counts(x, groups = NULL)
get_counts(x, groups = NULL) ## S3 method for class 'incidence' get_counts(x, groups = NULL)
x |
an |
groups |
if there are groups, use this to specify a group or groups to
subset. Defaults to |
a matrix of counts where each row represents a date bin
if (require(outbreaks)) { withAutoprint({ dat <- ebola_sim$linelist$date_of_onset gend <- ebola_sim$linelist$gender i <- incidence(dat, interval = "week", groups = gend) ## Use with an object and no arguments gives the counts matrix head(get_counts(i)) ## Specifying a position or group name will return a matrix subset to that ## group head(get_counts(i, 1L)) head(get_counts(i, "f")) ## Specifying multiple groups allows you to rearrange columns head(get_counts(i, c("m", "f"))) ## If you want a vector, you can use drop drop(get_counts(i, "f")) })}
if (require(outbreaks)) { withAutoprint({ dat <- ebola_sim$linelist$date_of_onset gend <- ebola_sim$linelist$gender i <- incidence(dat, interval = "week", groups = gend) ## Use with an object and no arguments gives the counts matrix head(get_counts(i)) ## Specifying a position or group name will return a matrix subset to that ## group head(get_counts(i, 1L)) head(get_counts(i, "f")) ## Specifying multiple groups allows you to rearrange columns head(get_counts(i, c("m", "f"))) ## If you want a vector, you can use drop drop(get_counts(i, "f")) })}
Retrieve dates from an incidence object
get_dates(x, ...) ## Default S3 method: get_dates(x, ...) ## S3 method for class 'incidence' get_dates(x, position = "left", count_days = FALSE, ...)
get_dates(x, ...) ## Default S3 method: get_dates(x, ...) ## S3 method for class 'incidence' get_dates(x, position = "left", count_days = FALSE, ...)
x |
an incidence object |
... |
Unused |
position |
One of "left", "center", "middle", or "right" specifying what side of the bin the date should be drawn from. |
count_days |
If |
a vector of dates or numerics
set.seed(999) dat <- as.Date(Sys.Date()) + sample(-3:50, 100, replace = TRUE) x <- incidence(dat, interval = "month") get_dates(x) get_dates(x, position = "middle") set.seed(999) dat <- as.Date(Sys.Date()) + sample(-3:50, 100, replace = TRUE) x <- incidence(dat, interval = "month") get_dates(x) get_dates(x, "center") get_dates(x, "right") # Return dates by number of days from the first date get_dates(x, count_days = TRUE) get_dates(incidence(-5:5), count_days = TRUE)
set.seed(999) dat <- as.Date(Sys.Date()) + sample(-3:50, 100, replace = TRUE) x <- incidence(dat, interval = "month") get_dates(x) get_dates(x, position = "middle") set.seed(999) dat <- as.Date(Sys.Date()) + sample(-3:50, 100, replace = TRUE) x <- incidence(dat, interval = "month") get_dates(x) get_dates(x, "center") get_dates(x, "right") # Return dates by number of days from the first date get_dates(x, count_days = TRUE) get_dates(incidence(-5:5), count_days = TRUE)
incidence_fit
objectsAccessors for incidence_fit
objects
get_fit(x) ## S3 method for class 'incidence_fit' get_fit(x) ## S3 method for class 'incidence_fit_list' get_fit(x) get_info(x, what = "r", ...) ## S3 method for class 'incidence_fit' get_info(x, what = "r", ...) ## S3 method for class 'incidence_fit_list' get_info(x, what = "r", groups = NULL, na.rm = TRUE, ...)
get_fit(x) ## S3 method for class 'incidence_fit' get_fit(x) ## S3 method for class 'incidence_fit_list' get_fit(x) get_info(x, what = "r", ...) ## S3 method for class 'incidence_fit' get_info(x, what = "r", ...) ## S3 method for class 'incidence_fit_list' get_info(x, what = "r", groups = NULL, na.rm = TRUE, ...)
x |
an |
what |
the name of the item in the "info" element of the |
... |
currently unused. |
groups |
if |
na.rm |
when |
a list of incidence_fit
objects.
if (require(outbreaks)) { withAutoprint({ dat <- ebola_sim$linelist$date_of_onset ## EXAMPLE WITH A SINGLE MODEL ## compute weekly incidence sex <- ebola_sim$linelist$gender i.sex <- incidence(dat, interval = 7, group = sex) ## Compute the optimal split for each group separately fits <- fit_optim_split(i.sex, separate_split = TRUE) ## `fits` contains an `incidence_fit_list` object fits$fit ## Grab the list of `incidence_fit` objects get_fit(fits$fit) ## Get the predictions for all groups get_info(fits$fit, "pred", groups = 1) ## Get the predictions, but set `groups` to "before" and "after" get_info(fits$fit, "pred", groups = 2) ## Get the reproduction number get_info(fits$fit, "r") ## Get the doubling confidence interval get_info(fits$fit, "doubling.conf") ## Get the halving confidence interval get_info(fits$fit, "halving.conf") })}
if (require(outbreaks)) { withAutoprint({ dat <- ebola_sim$linelist$date_of_onset ## EXAMPLE WITH A SINGLE MODEL ## compute weekly incidence sex <- ebola_sim$linelist$gender i.sex <- incidence(dat, interval = 7, group = sex) ## Compute the optimal split for each group separately fits <- fit_optim_split(i.sex, separate_split = TRUE) ## `fits` contains an `incidence_fit_list` object fits$fit ## Grab the list of `incidence_fit` objects get_fit(fits$fit) ## Get the predictions for all groups get_info(fits$fit, "pred", groups = 1) ## Get the predictions, but set `groups` to "before" and "after" get_info(fits$fit, "pred", groups = 2) ## Get the reproduction number get_info(fits$fit, "r") ## Get the doubling confidence interval get_info(fits$fit, "doubling.conf") ## Get the halving confidence interval get_info(fits$fit, "halving.conf") })}
extract and set group names
group_names(x, value) group_names(x) <- value ## Default S3 method: group_names(x, value) ## Default S3 replacement method: group_names(x) <- value ## S3 method for class 'incidence' group_names(x, value = NULL) ## S3 replacement method for class 'incidence' group_names(x) <- value
group_names(x, value) group_names(x) <- value ## Default S3 method: group_names(x, value) ## Default S3 replacement method: group_names(x) <- value ## S3 method for class 'incidence' group_names(x, value = NULL) ## S3 replacement method for class 'incidence' group_names(x) <- value
x |
an |
value |
character vector used to rename groups |
This accessor will return a
an integer indicating the number of groups present in the incidence object.
i <- incidence(dates = sample(10, 100, replace = TRUE), interval = 1L, groups = sample(letters[1:3], 100, replace = TRUE)) i group_names(i) # change the names of the groups group_names(i) <- c("Group 1", "Group 2", "Group 3") i # example if there are mistakes in the original data, e.g. # something is misspelled set.seed(50) grps <- sample(c("child", "adult", "adlut"), 100, replace = TRUE, prob = c(0.45, 0.45, 0.05)) i <- incidence(dates = sample(10, 100, replace = TRUE), interval = 1L, groups = grps) colSums(get_counts(i)) # If you change the name of the mis-spelled group, it will be merged with the # correctly-spelled group gname <- group_names(i) gname[gname == "adlut"] <- "adult" # without side-effects print(ii <- group_names(i, gname)) colSums(get_counts(i)) # original still has three groups colSums(get_counts(ii)) # with side-effects group_names(i) <- gname colSums(get_counts(i))
i <- incidence(dates = sample(10, 100, replace = TRUE), interval = 1L, groups = sample(letters[1:3], 100, replace = TRUE)) i group_names(i) # change the names of the groups group_names(i) <- c("Group 1", "Group 2", "Group 3") i # example if there are mistakes in the original data, e.g. # something is misspelled set.seed(50) grps <- sample(c("child", "adult", "adlut"), 100, replace = TRUE, prob = c(0.45, 0.45, 0.05)) i <- incidence(dates = sample(10, 100, replace = TRUE), interval = 1L, groups = grps) colSums(get_counts(i)) # If you change the name of the mis-spelled group, it will be merged with the # correctly-spelled group gname <- group_names(i) gname[gname == "adlut"] <- "adult" # without side-effects print(ii <- group_names(i, gname)) colSums(get_counts(i)) # original still has three groups colSums(get_counts(ii)) # with side-effects group_names(i) <- gname colSums(get_counts(i))
This function computes incidence based on dates of events provided in various formats. A fixed interval, provided as numbers of days, is used to define time intervals. Counts within an interval always include the first date, after which they are labeled, and exclude the second. For instance, intervals labeled as 0, 3, 6, ... mean that the first bin includes days 0, 1 and 2, the second interval includes 3, 4 and 5 etc.
incidence(dates, interval = 1L, ...) ## Default S3 method: incidence(dates, interval = 1L, ...) ## S3 method for class 'Date' incidence( dates, interval = 1L, standard = TRUE, groups = NULL, na_as_group = TRUE, first_date = NULL, last_date = NULL, ... ) ## S3 method for class 'character' incidence( dates, interval = 1L, standard = TRUE, groups = NULL, na_as_group = TRUE, first_date = NULL, last_date = NULL, ... ) ## S3 method for class 'integer' incidence( dates, interval = 1L, groups = NULL, na_as_group = TRUE, first_date = NULL, last_date = NULL, ... ) ## S3 method for class 'numeric' incidence( dates, interval = 1L, groups = NULL, na_as_group = TRUE, first_date = NULL, last_date = NULL, ... ) ## S3 method for class 'POSIXt' incidence( dates, interval = 1L, standard = TRUE, groups = NULL, na_as_group = TRUE, first_date = NULL, last_date = NULL, ... ) ## S3 method for class 'incidence' print(x, ...)
incidence(dates, interval = 1L, ...) ## Default S3 method: incidence(dates, interval = 1L, ...) ## S3 method for class 'Date' incidence( dates, interval = 1L, standard = TRUE, groups = NULL, na_as_group = TRUE, first_date = NULL, last_date = NULL, ... ) ## S3 method for class 'character' incidence( dates, interval = 1L, standard = TRUE, groups = NULL, na_as_group = TRUE, first_date = NULL, last_date = NULL, ... ) ## S3 method for class 'integer' incidence( dates, interval = 1L, groups = NULL, na_as_group = TRUE, first_date = NULL, last_date = NULL, ... ) ## S3 method for class 'numeric' incidence( dates, interval = 1L, groups = NULL, na_as_group = TRUE, first_date = NULL, last_date = NULL, ... ) ## S3 method for class 'POSIXt' incidence( dates, interval = 1L, standard = TRUE, groups = NULL, na_as_group = TRUE, first_date = NULL, last_date = NULL, ... ) ## S3 method for class 'incidence' print(x, ...)
dates |
A vector of dates, which can be provided as objects of the
class: integer, numeric, Date, POSIXct, POSIXlt, and character. (See Note
about |
interval |
An integer or character indicating the (fixed) size of the time interval used for computing the incidence; defaults to 1 day. This can also be a text string that corresponds to a valid date interval: day, week, month, quarter, or year. (See Note). |
... |
Additional arguments passed to other methods (none are used). |
standard |
(Only applicable to Date objects) When |
groups |
An optional factor defining groups of observations for which incidence should be computed separately. |
na_as_group |
A logical value indicating if missing group (NA) should be treated as a separate group. |
first_date , last_date
|
optional first/last dates to be used in the
epicurve. When these are |
x |
An 'incidence' object. |
For details about the incidence class
, see the dedicated
vignette:vignette("incidence_class", package = "incidence")
An list with the class incidence
, which contains the
following items:
dates: The dates marking the left side of the bins used for counting
events. When standard = TRUE
and the interval represents weeks, months,
quarters, or years, the first date will represent the first standard date
(See Interval specification, below).
counts: A matrix of incidence counts, which one column per group (and a single column if no groups were used).
timespan: The length of the period for which incidence is computed, in days.
interval: The bin size. If it's an integer, it represents the number of days between each bin. It can also be a character, e.g. "2 weeks" or "6 months".
n: The total number of cases.
weeks: Dates in week format (YYYY-Www), where YYYY corresponds to the
year of the given week and ww represents the numeric week of the year.
This will be a produced from the function aweek::date2week()
. Note that
these will have a special "week_start"
attribute indicating which day of
the ISO week the week starts on (see Weeks, below).
isoweeks: ISO 8601 week format YYYY-Www, which is returned only when ISO week-based weekly incidence is computed.
dates
)Decimal (numeric) dates: will be truncated with a warning
Character dates should be in the unambiguous yyyy-mm-dd
(ISO 8601)
format. Any other format will trigger an error.
interval
)If interval
is a valid character (e.g. "week" or "1 month"), then
the bin will start at the beginning of the interval just before the first
observation by default. For example, if the first case was recorded on
Wednesday, 2018-05-09:
"week" : first day of the week (i.e. Monday, 2018-05-07) (defaults to ISO weeks, see "Week intervals", below)
"month" : first day of the month (i.e. 2018-05-01)
"quarter" : first day of the quarter (i.e. 2018-04-01)
"year" : first day of the calendar year (i.e. 2018-01-01)
These default intervals can be overridden with standard = FALSE
, which
sets the interval to begin at the first observed case.
As of incidence version 1.7.0, it is possible to construct standardized
incidence objects standardized to any day of the week thanks to the
aweek::date2week()
function from the aweek package. The default
state is to use ISO 8601 definition of weeks, which start on Monday. You can
specify the day of the week an incidence object should be standardised to by
using the pattern "{n} {W} weeks" where "{W}" represents the weekday in an
English or current locale and "{n}" represents the duration, but this can be
ommitted. Below are examples of specifying weeks starting on different days
assuming we had data that started on 2016-09-05, which is ISO week 36 of
2016:
interval = "2 monday weeks" (Monday 2016-09-05)
interval = "1 tue week" (Tuesday 2016-08-30)
interval = "1 Wed week" (Wednesday 2016-08-31)
interval = "1 Thursday week" (Thursday 2016-09-01)
interval = "1 F week" (Friday 2016-09-02)
interval = "1 Saturday week" (Saturday 2016-09-03)
interval = "Sunday week" (Sunday 2016-09-04)
It's also possible to use something like "3 weeks: Saturday"; In addition, there are keywords reserved for specific days of the week:
interval = "week", standard = TRUE (Default, Monday)
interval = "ISOweek" (Monday)
interval = "EPIweek" (Sunday)
interval = "MMWRweek" (Sunday)
The "EPIweek" specification is not strictly reserved for CDC epiweeks, but can be prefixed (or posfixed) by a day of the week: "1 epiweek: Saturday".
first_date
argumentPrevious versions of incidence had the first_date
argument override
standard = TRUE
. It has been changed as of incidence version 1.6.0 to
be more consistent with the behavior when first_date = NULL
. This, however
may be a change in behaviour, so a warning is now issued once and only once
if first_date
is specified, but standard
is not. To never see this
warning, use options(incidence.warn.first_date = FALSE)
.
The intervals for "month", "quarter", and "year" will necessarily vary in the number of days they encompass and warnings will be generated when the first date falls outside of a calendar date that is easily represented across the interval.
Thibaut Jombart, Rich Fitzjohn, Zhian Kamvar
The main other functions of the package include:
plot.incidence()
: Plot epicurves from an incidence object.
fit()
: Fit log-linear model to computed incidence.
fit_optim_split()
: Find the optimal peak of the epidemic
and fits log-linear models on either side of the peak.
subset()
: Handling of incidence
objects.
pool()
: Sum incidence over groups.
as.data.frame.incidence()
: Convert an incidence
object to a
data.frame
.
The following vignettes are also available:
overview
: Provides an overview of the package's features.
customize_plot
: Provides some tips on finer plot customization.
incidence_class
: Details the content of the incidence
class.
## toy example incidence(c(1, 5, 8, 3, 7, 2, 4, 6, 9, 2)) incidence(c(1, 5, 8, 3, 7, 2, 4, 6, 9, 2), 2) ## example using simulated dataset if(require(outbreaks)) { withAutoprint({ onset <- outbreaks::ebola_sim$linelist$date_of_onset ## daily incidence inc <- incidence(onset) inc plot(inc) ## weekly incidence inc.week <- incidence(onset, interval = 7, standard = FALSE) inc.week plot(inc.week) plot(inc.week, border = "white") # with visible border # Starting on Monday inc.isoweek <- incidence(onset, interval = "isoweek") inc.isoweek # Starting on Sunday inc.epiweek <- incidence(onset, interval = "epiweek") inc.epiweek # Starting on Saturday inc.epiweek <- incidence(onset, interval = "saturday epiweek") inc.epiweek ## use group information sex <- outbreaks::ebola_sim$linelist$gender inc.week.gender <- incidence(onset, interval = 7, groups = sex, standard = FALSE) inc.week.gender head(inc.week.gender$counts) plot(inc.week.gender, border = "grey90") inc.satweek.gender <- incidence(onset, interval = "2 epiweeks: saturday", groups = sex) inc.satweek.gender plot(inc.satweek.gender, border = "grey90") })} # Use of first_date d <- Sys.Date() + sample(-3:10, 10, replace = TRUE) # `standard` specified, no warning di <- incidence(d, interval = "week", first_date = Sys.Date() - 10, standard = TRUE) # warning issued if `standard` not specified di <- incidence(d, interval = "week", first_date = Sys.Date() - 10) # second instance: no warning issued di <- incidence(d, interval = "week", first_date = Sys.Date() - 10)
## toy example incidence(c(1, 5, 8, 3, 7, 2, 4, 6, 9, 2)) incidence(c(1, 5, 8, 3, 7, 2, 4, 6, 9, 2), 2) ## example using simulated dataset if(require(outbreaks)) { withAutoprint({ onset <- outbreaks::ebola_sim$linelist$date_of_onset ## daily incidence inc <- incidence(onset) inc plot(inc) ## weekly incidence inc.week <- incidence(onset, interval = 7, standard = FALSE) inc.week plot(inc.week) plot(inc.week, border = "white") # with visible border # Starting on Monday inc.isoweek <- incidence(onset, interval = "isoweek") inc.isoweek # Starting on Sunday inc.epiweek <- incidence(onset, interval = "epiweek") inc.epiweek # Starting on Saturday inc.epiweek <- incidence(onset, interval = "saturday epiweek") inc.epiweek ## use group information sex <- outbreaks::ebola_sim$linelist$gender inc.week.gender <- incidence(onset, interval = 7, groups = sex, standard = FALSE) inc.week.gender head(inc.week.gender$counts) plot(inc.week.gender, border = "grey90") inc.satweek.gender <- incidence(onset, interval = "2 epiweeks: saturday", groups = sex) inc.satweek.gender plot(inc.satweek.gender, border = "grey90") })} # Use of first_date d <- Sys.Date() + sample(-3:10, 10, replace = TRUE) # `standard` specified, no warning di <- incidence(d, interval = "week", first_date = Sys.Date() - 10, standard = TRUE) # warning issued if `standard` not specified di <- incidence(d, interval = "week", first_date = Sys.Date() - 10) # second instance: no warning issued di <- incidence(d, interval = "week", first_date = Sys.Date() - 10)
These functions are color palettes used in incidence.
incidence_pal1(n) incidence_pal1_light(n) incidence_pal1_dark(n)
incidence_pal1(n) incidence_pal1_light(n) incidence_pal1_dark(n)
n |
a number of colors |
Thibaut Jombart [email protected]
plot(1:4, cex=8, pch=20, col = incidence_pal1(4), main = "palette: incidence_pal1") plot(1:100, cex=8, pch=20, col = incidence_pal1(100), main ="palette: incidence_pal1") plot(1:100, cex=8, pch=20, col = incidence_pal1_light(100), main="palette: incidence_pal1_light") plot(1:100, cex=8, pch=20, col = incidence_pal1_dark(100), main="palette: incidence_pal1_dark")
plot(1:4, cex=8, pch=20, col = incidence_pal1(4), main = "palette: incidence_pal1") plot(1:100, cex=8, pch=20, col = incidence_pal1(100), main ="palette: incidence_pal1") plot(1:100, cex=8, pch=20, col = incidence_pal1_light(100), main="palette: incidence_pal1_light") plot(1:100, cex=8, pch=20, col = incidence_pal1_dark(100), main="palette: incidence_pal1_dark")
This function is used to visualise the output of the incidence()
function using the package ggplot2
. #'
## S3 method for class 'incidence' plot( x, ..., fit = NULL, stack = is.null(fit), color = "black", border = NA, col_pal = incidence_pal1, alpha = 0.7, xlab = "", ylab = NULL, labels_week = !is.null(x$weeks), labels_iso = !is.null(x$isoweeks), show_cases = FALSE, n_breaks = 6 ) add_incidence_fit(p, x, col_pal = incidence_pal1) ## S3 method for class 'incidence_fit' plot(x, ...) ## S3 method for class 'incidence_fit_list' plot(x, ...) scale_x_incidence(x, n_breaks = 6, labels_week = TRUE, ...) make_breaks(x, n_breaks = 6L, labels_week = TRUE)
## S3 method for class 'incidence' plot( x, ..., fit = NULL, stack = is.null(fit), color = "black", border = NA, col_pal = incidence_pal1, alpha = 0.7, xlab = "", ylab = NULL, labels_week = !is.null(x$weeks), labels_iso = !is.null(x$isoweeks), show_cases = FALSE, n_breaks = 6 ) add_incidence_fit(p, x, col_pal = incidence_pal1) ## S3 method for class 'incidence_fit' plot(x, ...) ## S3 method for class 'incidence_fit_list' plot(x, ...) scale_x_incidence(x, n_breaks = 6, labels_week = TRUE, ...) make_breaks(x, n_breaks = 6L, labels_week = TRUE)
x |
An incidence object, generated by the function
|
... |
arguments passed to |
fit |
An 'incidence_fit' object as returned by |
stack |
A logical indicating if bars of multiple groups should be stacked, or displayed side-by-side. |
color |
The color to be used for the filling of the bars; NA for invisible bars; defaults to "black". |
border |
The color to be used for the borders of the bars; NA for invisible borders; defaults to NA. |
col_pal |
The color palette to be used for the groups; defaults to
|
alpha |
The alpha level for color transparency, with 1 being fully opaque and 0 fully transparent; defaults to 0.7. |
xlab |
The label to be used for the x-axis; empty by default. |
ylab |
The label to be used for the y-axis; by default, a label will be generated automatically according to the time interval used in incidence computation. |
labels_week |
a logical value indicating whether labels x axis tick marks are in week format YYYY-Www when plotting weekly incidence; defaults to TRUE. |
labels_iso |
(deprecated) This has been superceded by |
show_cases |
if |
n_breaks |
the ideal number of breaks to be used for the x-axis labeling |
p |
An existing incidence plot. |
plot()
will visualise an incidence object using ggplot2
make_breaks()
calculates breaks from an incidence object that always
align with the bins and start on the first observed incidence.
scale_x_incidence()
produces and appropriate ggplot2
scale based on
an incidence object.
plot()
a ggplot2::ggplot()
object.
make_breaks()
a two-element list. The "breaks" element will contain the
evenly-spaced breaks as either dates or numbers and the "labels" element
will contain either a vector of weeks OR a ggplot2::waiver()
object.
scale_x_incidence()
a ggplot2 "ScaleContinuous" object.
Thibaut Jombart [email protected] Zhian N. Kamvar [email protected]
The incidence()
function to generate the 'incidence'
objects.
if(require(outbreaks) && require(ggplot2)) { withAutoprint({ onset <- outbreaks::ebola_sim$linelist$date_of_onset ## daily incidence inc <- incidence(onset) inc plot(inc) ## weekly incidence inc.week <- incidence(onset, interval = 7) inc.week plot(inc.week) # default to label x axis tick marks with isoweeks plot(inc.week, labels_week = FALSE) # label x axis tick marks with dates plot(inc.week, border = "white") # with visible border ## use group information sex <- outbreaks::ebola_sim$linelist$gender inc.week.gender <- incidence(onset, interval = "1 epiweek", groups = sex) plot(inc.week.gender) plot(inc.week.gender, labels_week = FALSE) ## show individual cases at the beginning of the epidemic inc.week.8 <- subset(inc.week.gender, to = "2014-06-01") p <- plot(inc.week.8, show_cases = TRUE, border = "black") p ## update the range of the scale lim <- c(min(get_dates(inc.week.8)) - 7*5, aweek::week2date("2014-W50", "Sunday")) lim p + scale_x_incidence(inc.week.gender, limits = lim) ## customize plot with ggplot2 plot(inc.week.8, show_cases = TRUE, border = "black") + theme_classic(base_size = 16) + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) ## adding fit fit <- fit_optim_split(inc.week.gender)$fit plot(inc.week.gender, fit = fit) plot(inc.week.gender, fit = fit, labels_week = FALSE) })}
if(require(outbreaks) && require(ggplot2)) { withAutoprint({ onset <- outbreaks::ebola_sim$linelist$date_of_onset ## daily incidence inc <- incidence(onset) inc plot(inc) ## weekly incidence inc.week <- incidence(onset, interval = 7) inc.week plot(inc.week) # default to label x axis tick marks with isoweeks plot(inc.week, labels_week = FALSE) # label x axis tick marks with dates plot(inc.week, border = "white") # with visible border ## use group information sex <- outbreaks::ebola_sim$linelist$gender inc.week.gender <- incidence(onset, interval = "1 epiweek", groups = sex) plot(inc.week.gender) plot(inc.week.gender, labels_week = FALSE) ## show individual cases at the beginning of the epidemic inc.week.8 <- subset(inc.week.gender, to = "2014-06-01") p <- plot(inc.week.8, show_cases = TRUE, border = "black") p ## update the range of the scale lim <- c(min(get_dates(inc.week.8)) - 7*5, aweek::week2date("2014-W50", "Sunday")) lim p + scale_x_incidence(inc.week.gender, limits = lim) ## customize plot with ggplot2 plot(inc.week.8, show_cases = TRUE, border = "black") + theme_classic(base_size = 16) + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) ## adding fit fit <- fit_optim_split(inc.week.gender)$fit plot(inc.week.gender, fit = fit) plot(inc.week.gender, fit = fit, labels_week = FALSE) })}
This function pools incidence across all groups of an incidence
object. The resulting incidence()
object will contains counts
summed over all groups present in the input.
pool(x)
pool(x)
x |
An 'incidence' object. |
Thibaut Jombart [email protected]
The incidence()
function to generate the 'incidence'
objects.
dat <- as.integer(c(0,1,2,2,3,5,7)) group <- factor(c(1, 2, 3, 3, 3, 3, 1)) i <- incidence(dat, groups = group) i i$counts ## pool all groups pool(i) pool(i)$counts ## pool only groups 1 and 3 pool(i[,c(1,3)]) pool(i[,c(1,3)])$counts
dat <- as.integer(c(0,1,2,2,3,5,7)) group <- factor(c(1, 2, 3, 3, 3, 3, 1)) i <- incidence(dat, groups = group) i i$counts ## pool all groups pool(i) pool(i)$counts ## pool only groups 1 and 3 pool(i[,c(1,3)]) pool(i[,c(1,3)])$counts
Two functions can be used to subset incidence objects. The function
subset
permits to retain dates within a specified range and,
optionally, specific groups. The operator "[" can be used as for matrices,
using the syntax x[i,j]
where 'i' is a subset of dates, and 'j' is a
subset of groups.
## S3 method for class 'incidence' subset(x, ..., from = min(x$dates), to = max(x$dates), groups = TRUE) ## S3 method for class 'incidence' x[i, j]
## S3 method for class 'incidence' subset(x, ..., from = min(x$dates), to = max(x$dates), groups = TRUE) ## S3 method for class 'incidence' x[i, j]
x |
An incidence object, generated by the function
|
... |
Further arguments passed to other methods (not used). |
from |
The starting date; data strictly before this date are discarded. |
to |
The ending date; data strictly after this date are discarded. |
groups |
(optional) The groups to retained, indicated as subsets of the columns of x$counts. |
i |
a subset of dates to retain |
j |
a subset of groups to retain |
Thibaut Jombart [email protected]
The incidence()
function to generate the 'incidence'
objects.
## example using simulated dataset if(require(outbreaks)) { withAutoprint({ onset <- ebola_sim$linelist$date_of_onset ## weekly incidence inc <- incidence(onset, interval = 7) inc inc[1:10] # first 10 weeks plot(inc[1:10]) inc[-c(11:15)] # remove weeks 11-15 plot(inc[-c(11:15)]) })}
## example using simulated dataset if(require(outbreaks)) { withAutoprint({ onset <- ebola_sim$linelist$date_of_onset ## weekly incidence inc <- incidence(onset, interval = 7) inc inc[1:10] # first 10 weeks plot(inc[1:10]) inc[-c(11:15)] # remove weeks 11-15 plot(inc[-c(11:15)]) })}