--- title: "Getting started with EpiNow2" output: rmarkdown::html_vignette: toc: false number_sections: false bibliography: library.bib csl: https://raw.githubusercontent.com/citation-style-language/styles/master/apa-numeric-superscript-brackets.csl vignette: > %\VignetteIndexEntry{Getting started with EpiNow2} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Quick start In the following section we give an overview of the simple use case for `epinow()` and `regional_epinow()`. The first step to using the package is to load it as follows. ``` r library(EpiNow2) ``` ### Reporting delays, incubation period and generation time Distributions can be supplied in two ways. First, one can supply delay data to `estimate_delay()`, where a subsampled bootstrapped lognormal will be fit to account for uncertainty in the observed data without being biased by changes in incidence (see `?EpiNow2::estimate_delay()`). Second, one can specify predetermined delays with uncertainty using the distribution functions such as `Gamma` or `Lognormal`. An arbitrary number of delay distributions are supported in `dist_spec()` with a common use case being an incubation period followed by a reporting delay. For more information on specifying distributions see (see `?EpiNow2::Distributions`). For example if data on the delay between onset and infection was available we could fit a distribution to it, using `estimate_delay()`, with appropriate uncertainty as follows (note this is a synthetic example), ``` r reporting_delay <- estimate_delay( rlnorm(1000, log(2), 1), max_value = 14, bootstraps = 1 ) ``` If data was not available we could instead specify an informed estimate of the likely delay using the distribution functions `Gamma` or `LogNormal`. To demonstrate, we choose a lognormal distribution with mean 2, standard deviation 1 and a maximum of 10. *This is just an example and unlikely to apply in any particular use case*. ``` r reporting_delay <- LogNormal(mean = 2, sd = 1, max = 10) reporting_delay #> - lognormal distribution (max: 10): #> meanlog: #> 0.58 #> sdlog: #> 0.47 ``` For the rest of this vignette, we will use inbuilt example literature estimates for the incubation period and generation time of Covid-19 (see [here](https://github.com/epiforecasts/EpiNow2/tree/main/data-raw) for the code that generates these estimates). *These distributions are unlikely to be applicable for your use case. We strongly recommend investigating what might be the best distributions to use in any given use case.* ``` r example_generation_time #> - gamma distribution (max: 14): #> shape: #> - normal distribution: #> mean: #> 1.4 #> sd: #> 0.48 #> rate: #> - normal distribution: #> mean: #> 0.38 #> sd: #> 0.25 example_incubation_period #> - lognormal distribution (max: 14): #> meanlog: #> - normal distribution: #> mean: #> 1.6 #> sd: #> 0.064 #> sdlog: #> - normal distribution: #> mean: #> 0.42 #> sd: #> 0.069 ``` Users can also pass a non-parametric delay distribution vector using the `NonParametric` option for both the generation interval and reporting delays. It is important to note that if doing so, both delay distributions are 0-indexed, meaning the first element corresponds to the probability mass at day 0 of an individual's infection. Because the discretised renewal equation doesn't support mass on day 0, the generation interval should be passed in as a 0-indexed vector with a mass of zero on day 0. ``` r example_non_parametric_gi <- NonParametric(pmf = c(0, 0.3, 0.5, 0.2)) example_non_parametric_delay <- NonParametric(pmf = c(0.01, 0.1, 0.5, 0.3, 0.09)) ``` These distributions are passed to downstream functions in the same way that the parametric distributions are. Now, to the functions. ### [epinow()](https://epiforecasts.io/EpiNow2/reference/epinow.html) This function represents the core functionality of the package and includes results reporting, plotting, and optional saving. It requires a data frame of cases by date of report and the distributions defined above. Load example case data from `{EpiNow2}`. ``` r reported_cases <- example_confirmed[1:60] head(reported_cases) #> date confirm #> #> 1: 2020-02-22 14 #> 2: 2020-02-23 62 #> 3: 2020-02-24 53 #> 4: 2020-02-25 97 #> 5: 2020-02-26 93 #> 6: 2020-02-27 78 ``` Estimate cases by date of infection, the time-varying reproduction number, the rate of growth, and forecast these estimates into the future by 7 days. Summarise the posterior and return a summary table and plots for reporting purposes. If a `target_folder` is supplied results can be internally saved (with the option to also turn off explicit returning of results). Here we use the default model parameterisation that prioritises real-time performance over run-time or other considerations. For other formulations see the documentation for `estimate_infections()`. ``` r estimates <- epinow( data = reported_cases, generation_time = gt_opts(example_generation_time), delays = delay_opts(example_incubation_period + reporting_delay), rt = rt_opts(prior = list(mean = 2, sd = 0.2)), stan = stan_opts(cores = 4, control = list(adapt_delta = 0.99)), verbose = interactive() ) names(estimates) #> [1] "estimates" "estimated_reported_cases" #> [3] "summary" "plots" #> [5] "timing" ``` Both summary measures and posterior samples are returned for all parameters in an easily explored format which can be accessed using `summary`. The default is to return a summary table of estimates for key parameters at the latest date partially supported by data. ``` r knitr::kable(summary(estimates)) ``` |measure |estimate | |:--------------------------------|:----------------------| |New infections per day |2235 (1258 -- 4079) | |Expected change in daily reports |Likely decreasing | |Effective reproduction no. |0.89 (0.69 -- 1.1) | |Rate of growth |-0.029 (-0.1 -- 0.049) | |Doubling/halving time (days) |-24 (14 -- -6.9) | Summarised parameter estimates can also easily be returned, either filtered for a single parameter or for all parameters. ``` r head(summary(estimates, type = "parameters", params = "R")) #> date variable strat type median mean sd lower_90 #> #> 1: 2020-02-22 R estimate 2.295630 2.300011 0.14226384 2.072498 #> 2: 2020-02-23 R estimate 2.254414 2.260309 0.12775375 2.058350 #> 3: 2020-02-24 R estimate 2.213859 2.218421 0.11582310 2.039173 #> 4: 2020-02-25 R estimate 2.167609 2.174463 0.10645004 2.011138 #> 5: 2020-02-26 R estimate 2.119388 2.128587 0.09939624 1.977938 #> 6: 2020-02-27 R estimate 2.072939 2.080978 0.09424020 1.942162 #> lower_50 lower_20 upper_20 upper_50 upper_90 #> #> 1: 2.199171 2.257930 2.331979 2.393862 2.536167 #> 2: 2.170840 2.220883 2.287825 2.346032 2.475402 #> 3: 2.137291 2.182338 2.242451 2.297515 2.410503 #> 4: 2.099378 2.138842 2.196990 2.246167 2.354895 #> 5: 2.058774 2.094525 2.147410 2.192443 2.298576 #> 6: 2.014358 2.047728 2.098028 2.142222 2.244473 ``` Reported cases are returned in a separate data frame in order to streamline the reporting of forecasts and for model evaluation. ``` r head(summary(estimates, output = "estimated_reported_cases")) #> date type median mean sd lower_90 lower_50 lower_20 #> #> 1: 2020-02-22 gp_rt 75 77.7545 22.45421 46 62.00 70 #> 2: 2020-02-23 gp_rt 87 89.3255 24.15807 53 73.00 82 #> 3: 2020-02-24 gp_rt 86 88.5850 23.75963 54 72.00 81 #> 4: 2020-02-25 gp_rt 80 82.0745 22.36889 49 66.00 75 #> 5: 2020-02-26 gp_rt 81 82.8155 22.64243 50 67.75 76 #> 6: 2020-02-27 gp_rt 107 110.9325 29.73030 66 90.00 101 #> upper_20 upper_50 upper_90 #> #> 1: 81 91 118 #> 2: 93 104 132 #> 3: 92 103 132 #> 4: 86 96 120 #> 5: 86 96 124 #> 6: 115 129 163 ``` A range of plots are returned (with the single summary plot shown below). These plots can also be generated using the following `plot` method. ``` r plot(estimates) ``` ![plot of chunk plot_estimates](EpiNow2-plot_estimates-1.png) ### [regional_epinow()](https://epiforecasts.io/EpiNow2/reference/regional_epinow.html) The `regional_epinow()` function runs the `epinow()` function across multiple regions in an efficient manner. Define cases in multiple regions delineated by the region variable. ``` r reported_cases <- data.table::rbindlist(list( data.table::copy(reported_cases)[, region := "testland"], reported_cases[, region := "realland"] )) head(reported_cases) #> date confirm region #> #> 1: 2020-02-22 14 testland #> 2: 2020-02-23 62 testland #> 3: 2020-02-24 53 testland #> 4: 2020-02-25 97 testland #> 5: 2020-02-26 93 testland #> 6: 2020-02-27 78 testland ``` Calling `regional_epinow()` runs the `epinow()` on each region in turn (or in parallel depending on the settings used). Here we switch to using a weekly random walk rather than the full Gaussian process model giving us piecewise constant estimates by week. ``` r estimates <- regional_epinow( data = reported_cases, generation_time = gt_opts(example_generation_time), delays = delay_opts(example_incubation_period + reporting_delay), rt = rt_opts(prior = list(mean = 2, sd = 0.2), rw = 7), gp = NULL, stan = stan_opts(cores = 4, warmup = 250, samples = 1000) ) #> INFO [2024-10-15 17:53:24] Producing following optional outputs: regions, summary, samples, plots, latest #> INFO [2024-10-15 17:53:24] Reporting estimates using data up to: 2020-04-21 #> INFO [2024-10-15 17:53:24] No target directory specified so returning output #> INFO [2024-10-15 17:53:24] Producing estimates for: testland, realland #> INFO [2024-10-15 17:53:24] Regions excluded: none #> INFO [2024-10-15 17:53:46] Completed estimates for: testland #> INFO [2024-10-15 17:54:09] Completed estimates for: realland #> INFO [2024-10-15 17:54:09] Completed regional estimates #> INFO [2024-10-15 17:54:09] Regions with estimates: 2 #> INFO [2024-10-15 17:54:09] Regions with runtime errors: 0 #> INFO [2024-10-15 17:54:09] Producing summary #> INFO [2024-10-15 17:54:09] No summary directory specified so returning summary output #> INFO [2024-10-15 17:54:10] No target directory specified so returning timings ``` Results from each region are stored in a `regional` list with across region summary measures and plots stored in a `summary` list. All results can be set to be internally saved by setting the `target_folder` and `summary_dir` arguments. Each region can be estimated in parallel using the `{future}` package (when in most scenarios `cores` should be set to 1). For routine use each MCMC chain can also be run in parallel (with `future` = TRUE) with a time out (`max_execution_time`) allowing for partial results to be returned if a subset of chains is running longer than expected. See the documentation for the `{future}` package for details on nested futures. Summary measures that are returned include a table formatted for reporting (along with raw results for further processing). Futures updated will extend the S3 methods used above to smooth access to this output. ``` r knitr::kable(estimates$summary$summarised_results$table) ``` |Region |New infections per day |Expected change in daily reports |Effective reproduction no. |Rate of growth |Doubling/halving time (days) | |:--------|:----------------------|:--------------------------------|:--------------------------|:-----------------------|:----------------------------| |realland |2107 (1035 -- 4584) |Likely decreasing |0.87 (0.62 -- 1.2) |-0.037 (-0.11 -- 0.049) |-19 (14 -- -6.2) | |testland |2084 (1062 -- 4392) |Likely decreasing |0.85 (0.61 -- 1.2) |-0.04 (-0.11 -- 0.048) |-18 (15 -- -6.4) | A range of plots are again returned (with the single summary plot shown below). ``` r estimates$summary$summary_plot ``` ![plot of chunk plot_regional_epinow_summary](EpiNow2-plot_regional_epinow_summary-1.png)