---
title: "Getting started with EpiNow2"
output:
  rmarkdown::html_vignette:
    toc: false
    number_sections: false
bibliography: library.bib
csl: https://raw.githubusercontent.com/citation-style-language/styles/master/apa-numeric-superscript-brackets.csl
vignette: >
  %\VignetteIndexEntry{Getting started with EpiNow2}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


## Quick start

In the following section we give an overview of the simple use case for `epinow()` and `regional_epinow()`.

The first step to using the package is to load it as follows.


``` r
library(EpiNow2)
```

### Reporting delays, incubation period and generation time

Distributions can be supplied in two ways. First, one can supply delay data to `estimate_delay()`, where a subsampled bootstrapped lognormal will be fit to account for uncertainty in the observed data without being biased by changes in incidence (see `?EpiNow2::estimate_delay()`).

Second, one can specify predetermined delays with uncertainty using the distribution functions such as `Gamma` or `Lognormal`. An arbitrary number of delay distributions are supported in `dist_spec()` with a common use case being an incubation period followed by a reporting delay. For more information on specifying distributions see (see `?EpiNow2::Distributions`).

For example if data on the delay between onset and infection was available we could fit a distribution to it, using `estimate_delay()`, with appropriate uncertainty as follows (note this is a synthetic example),

``` r
reporting_delay <- estimate_delay(
  rlnorm(1000, log(2), 1),
  max_value = 14, bootstraps = 1
)
```

If data was not available we could instead specify an informed estimate of the likely delay using the distribution functions `Gamma` or `LogNormal`. 
To demonstrate, we choose a lognormal distribution with mean 2, standard deviation 1 and a maximum of 10. *This is just an example and unlikely to apply in any particular use case*.


``` r
reporting_delay <- LogNormal(mean = 2, sd = 1, max = 10)
reporting_delay
#> - lognormal distribution (max: 10):
#>   meanlog:
#>     0.58
#>   sdlog:
#>     0.47
```

For the rest of this vignette, we will use inbuilt example literature estimates for the incubation period and generation time of Covid-19 (see [here](https://github.com/epiforecasts/EpiNow2/tree/main/data-raw) for the code that generates these estimates). *These distributions are unlikely to be applicable for your use case. We strongly recommend investigating what might be the best distributions to use in any given use case.*


``` r
example_generation_time
#> - gamma distribution (max: 14):
#>   shape:
#>     - normal distribution:
#>       mean:
#>         1.4
#>       sd:
#>         0.48
#>   rate:
#>     - normal distribution:
#>       mean:
#>         0.38
#>       sd:
#>         0.25
example_incubation_period
#> - lognormal distribution (max: 14):
#>   meanlog:
#>     - normal distribution:
#>       mean:
#>         1.6
#>       sd:
#>         0.064
#>   sdlog:
#>     - normal distribution:
#>       mean:
#>         0.42
#>       sd:
#>         0.069
```

Users can also pass a non-parametric delay distribution vector using the `NonParametric` option 
for both the generation interval and reporting delays. It is important to note that if doing so,
both delay distributions are 0-indexed, meaning the first element corresponds to the probability mass
at day 0 of an individual's infection. Because the discretised renewal equation doesn't support mass on day 0, the generation interval should be passed in as a 0-indexed vector with a mass of zero on day 0. 


``` r
example_non_parametric_gi <-  NonParametric(pmf = c(0, 0.3, 0.5, 0.2))

example_non_parametric_delay <- NonParametric(pmf = c(0.01, 0.1, 0.5, 0.3, 0.09))
```
These distributions are passed to downstream functions in the same way that the 
parametric distributions are. 

Now, to the functions.

### [epinow()](https://epiforecasts.io/EpiNow2/reference/epinow.html)

This function represents the core functionality of the package and includes results reporting, plotting, and optional saving. It requires a data frame of cases by date of report and the distributions defined above.

Load example case data from `{EpiNow2}`.


``` r
reported_cases <- example_confirmed[1:60]
head(reported_cases)
#>          date confirm
#>        <Date>   <num>
#> 1: 2020-02-22      14
#> 2: 2020-02-23      62
#> 3: 2020-02-24      53
#> 4: 2020-02-25      97
#> 5: 2020-02-26      93
#> 6: 2020-02-27      78
```

Estimate cases by date of infection, the time-varying reproduction number, the rate of growth, and forecast these estimates into the future by 7 days. Summarise the posterior and return a summary table and plots for reporting purposes. If a `target_folder` is supplied results can be internally saved (with the option to also turn off explicit returning of results). Here we use the default model parameterisation that prioritises real-time performance over run-time or other considerations. For other formulations see the documentation for `estimate_infections()`.


``` r
estimates <- epinow(
  data = reported_cases,
  generation_time = gt_opts(example_generation_time),
  delays = delay_opts(example_incubation_period + reporting_delay),
  rt = rt_opts(prior = list(mean = 2, sd = 0.2)),
  stan = stan_opts(cores = 4, control = list(adapt_delta = 0.99)),
  verbose = interactive()
)
names(estimates)
#> [1] "estimates"                "estimated_reported_cases"
#> [3] "summary"                  "plots"                   
#> [5] "timing"
```

Both summary measures and posterior samples are returned for all parameters in an easily explored format which can be accessed using `summary`. The default is to return a summary table of estimates for key parameters at the latest date partially supported by data. 


``` r
knitr::kable(summary(estimates))
```


|measure                          |estimate               |
|:--------------------------------|:----------------------|
|New infections per day           |2235 (1258 -- 4079)    |
|Expected change in daily reports |Likely decreasing      |
|Effective reproduction no.       |0.89 (0.69 -- 1.1)     |
|Rate of growth                   |-0.029 (-0.1 -- 0.049) |
|Doubling/halving time (days)     |-24 (14 -- -6.9)       |


Summarised parameter estimates can also easily be returned, either filtered for a single parameter or for all parameters.


``` r
head(summary(estimates, type = "parameters", params = "R"))
#>          date variable  strat     type   median     mean         sd lower_90
#>        <Date>   <char> <char>   <char>    <num>    <num>      <num>    <num>
#> 1: 2020-02-22        R   <NA> estimate 2.295630 2.300011 0.14226384 2.072498
#> 2: 2020-02-23        R   <NA> estimate 2.254414 2.260309 0.12775375 2.058350
#> 3: 2020-02-24        R   <NA> estimate 2.213859 2.218421 0.11582310 2.039173
#> 4: 2020-02-25        R   <NA> estimate 2.167609 2.174463 0.10645004 2.011138
#> 5: 2020-02-26        R   <NA> estimate 2.119388 2.128587 0.09939624 1.977938
#> 6: 2020-02-27        R   <NA> estimate 2.072939 2.080978 0.09424020 1.942162
#>    lower_50 lower_20 upper_20 upper_50 upper_90
#>       <num>    <num>    <num>    <num>    <num>
#> 1: 2.199171 2.257930 2.331979 2.393862 2.536167
#> 2: 2.170840 2.220883 2.287825 2.346032 2.475402
#> 3: 2.137291 2.182338 2.242451 2.297515 2.410503
#> 4: 2.099378 2.138842 2.196990 2.246167 2.354895
#> 5: 2.058774 2.094525 2.147410 2.192443 2.298576
#> 6: 2.014358 2.047728 2.098028 2.142222 2.244473
```

Reported cases are returned in a separate data frame in order to streamline the reporting of forecasts and for model evaluation.


``` r
head(summary(estimates, output = "estimated_reported_cases"))
#>          date   type median     mean       sd lower_90 lower_50 lower_20
#>        <Date> <char>  <num>    <num>    <num>    <num>    <num>    <num>
#> 1: 2020-02-22  gp_rt     75  77.7545 22.45421       46    62.00       70
#> 2: 2020-02-23  gp_rt     87  89.3255 24.15807       53    73.00       82
#> 3: 2020-02-24  gp_rt     86  88.5850 23.75963       54    72.00       81
#> 4: 2020-02-25  gp_rt     80  82.0745 22.36889       49    66.00       75
#> 5: 2020-02-26  gp_rt     81  82.8155 22.64243       50    67.75       76
#> 6: 2020-02-27  gp_rt    107 110.9325 29.73030       66    90.00      101
#>    upper_20 upper_50 upper_90
#>       <num>    <num>    <num>
#> 1:       81       91      118
#> 2:       93      104      132
#> 3:       92      103      132
#> 4:       86       96      120
#> 5:       86       96      124
#> 6:      115      129      163
```

A range of plots are returned (with the single summary plot shown below). These plots can also be generated using the following `plot` method.


``` r
plot(estimates)
```

![plot of chunk plot_estimates](EpiNow2-plot_estimates-1.png)


### [regional_epinow()](https://epiforecasts.io/EpiNow2/reference/regional_epinow.html)

The `regional_epinow()` function runs the `epinow()` function across multiple regions in
an efficient manner.

Define cases in multiple regions delineated by the region variable.


``` r
reported_cases <- data.table::rbindlist(list(
  data.table::copy(reported_cases)[, region := "testland"],
  reported_cases[, region := "realland"]
))
head(reported_cases)
#>          date confirm   region
#>        <Date>   <num>   <char>
#> 1: 2020-02-22      14 testland
#> 2: 2020-02-23      62 testland
#> 3: 2020-02-24      53 testland
#> 4: 2020-02-25      97 testland
#> 5: 2020-02-26      93 testland
#> 6: 2020-02-27      78 testland
```

Calling `regional_epinow()` runs the `epinow()` on each region in turn (or in parallel depending on the settings used). Here we switch to using a weekly random walk rather than the full Gaussian process model giving us piecewise constant estimates by week.


``` r
estimates <- regional_epinow(
  data = reported_cases,
  generation_time = gt_opts(example_generation_time),
  delays = delay_opts(example_incubation_period + reporting_delay),
  rt = rt_opts(prior = list(mean = 2, sd = 0.2), rw = 7),
  gp = NULL,
  stan = stan_opts(cores = 4, warmup = 250, samples = 1000)
)
#> INFO [2024-10-15 17:53:24] Producing following optional outputs: regions, summary, samples, plots, latest
#> INFO [2024-10-15 17:53:24] Reporting estimates using data up to: 2020-04-21
#> INFO [2024-10-15 17:53:24] No target directory specified so returning output
#> INFO [2024-10-15 17:53:24] Producing estimates for: testland, realland
#> INFO [2024-10-15 17:53:24] Regions excluded: none
#> INFO [2024-10-15 17:53:46] Completed estimates for: testland
#> INFO [2024-10-15 17:54:09] Completed estimates for: realland
#> INFO [2024-10-15 17:54:09] Completed regional estimates
#> INFO [2024-10-15 17:54:09] Regions with estimates: 2
#> INFO [2024-10-15 17:54:09] Regions with runtime errors: 0
#> INFO [2024-10-15 17:54:09] Producing summary
#> INFO [2024-10-15 17:54:09] No summary directory specified so returning summary output
#> INFO [2024-10-15 17:54:10] No target directory specified so returning timings
```

Results from each region are stored in a `regional` list with across region summary measures and plots stored in a `summary` list. All results can be set to be internally saved by setting the `target_folder` and `summary_dir` arguments. Each region can be estimated in parallel using the `{future}` package (when in most scenarios `cores` should be set to 1). For routine use each MCMC chain can also be run in parallel (with `future` = TRUE) with a time out (`max_execution_time`) allowing for partial results to be returned if a subset of chains is running longer than expected. See the documentation for the `{future}` package for details on nested futures.

Summary measures that are returned include a table formatted for reporting (along with raw results for further processing). Futures updated will extend the S3 methods used above to smooth access to this output.


``` r
knitr::kable(estimates$summary$summarised_results$table)
```


|Region   |New infections per day |Expected change in daily reports |Effective reproduction no. |Rate of growth          |Doubling/halving time (days) |
|:--------|:----------------------|:--------------------------------|:--------------------------|:-----------------------|:----------------------------|
|realland |2107 (1035 -- 4584)    |Likely decreasing                |0.87 (0.62 -- 1.2)         |-0.037 (-0.11 -- 0.049) |-19 (14 -- -6.2)             |
|testland |2084 (1062 -- 4392)    |Likely decreasing                |0.85 (0.61 -- 1.2)         |-0.04 (-0.11 -- 0.048)  |-18 (15 -- -6.4)             |


A range of plots are again returned (with the single summary plot shown below).


``` r
estimates$summary$summary_plot
```

![plot of chunk plot_regional_epinow_summary](EpiNow2-plot_regional_epinow_summary-1.png)