Modelling uncertainty in R₀

Epidemic final size calculations are sensitive to input data such as the R0 of the infection. Such values can often be uncertain in the early stages of an outbreak. This uncertainty can be included in final size calculations by running final_size() for values drawn from a distribution, and summarising the outcomes.

New to finalsize? It may help to read the “Get started”, “Modelling heterogeneous contacts”, or “Modelling heterogeneous susceptibility” vignettes first!

Use case

The infection parameter (R0) is uncertain. We want to know how much variation this could cause in the estimated final size of the epidemic.

What we have

  1. In addition to the R0, demography data, social contact data, and data on the distribution of susceptibility among age groups;
  2. A measure of the error in the estimated R0 of the infection.

What we assume

  1. An SIR epidemic, and the complete partitioning of individuals into different demographic and infection risk groups.
# load finalsize
library(finalsize)
library(socialmixr)
library(ggplot2)

Getting R0, contact and demography data, and susceptibility

This example uses social contact data from the POLYMOD project (Mossong et al. 2008) to estimate the final size of an epidemic in the U.K. These data are provided with the socialmixr package.

These data are handled just as in the “Get started” vignette. This example also considers an infection with an R0 of 1.5.

# get UK polymod data from socialmixr
polymod <- socialmixr::polymod
contact_data <- socialmixr::contact_matrix(
  polymod,
  countries = "United Kingdom",
  age.limits = c(0, 5, 18, 40, 65),
  symmetric = TRUE
)

# get the contact matrix and demography data
contact_matrix <- t(contact_data$matrix)
demography_vector <- contact_data$demography$population

# scale the contact matrix so the largest eigenvalue is 1.0
contact_matrix <- contact_matrix / max(Re(eigen(contact_matrix)$values))

# divide each row of the contact matrix by the corresponding demography
contact_matrix <- contact_matrix / demography_vector

n_demo_grps <- length(demography_vector)
# mean R0 is 1.5
r0_mean <- 1.5

For simplicity, this example considers a scenario in which susceptibility to infection does not vary.

# susceptibility is uniform
susc_uniform <- matrix(
  data = 1,
  nrow = n_demo_grps,
  ncol = 1L
)

# p_susceptibility is uniform
p_susc_uniform <- susc_uniform

Running final_size over R0 samples

The basic reproduction number R0 of an infection might be uncertain in the early stages of an epidemic. This uncertainty can be modelled by running final_size() multiple times for the same contact, demography, and susceptibility data, while sampling R0 values from a distribution.

This example assumes that the R0 estimate, and the uncertainty around that estimate, is provided as the mean and standard deviation of a normal distribution.

This example considers a normal distribution N(μ = 1.5, σ = 0.1), for an R0 of 1.5. We can draw 1,000 R0 samples from this distribution and run final_size() on the contact data and demography data for each sample.

This is quick, as finalsize is an Rcpp package with a C++ backend.

# create an R0 samples vector
r0_samples <- rnorm(n = 1000, mean = r0_mean, sd = 0.1)

Iterate final_size()

With base R

# run final size on each sample with the same data
final_size_data <- Map(
  r0_samples, seq_along(r0_samples),
  f = function(r0, i) {
    # the i-th final size estimate
    fs <- final_size(
      r0 = r0,
      contact_matrix = contact_matrix,
      demography_vector = demography_vector,
      susceptibility = susc_uniform,
      p_susceptibility = p_susc_uniform
    )

    fs$replicate <- i
    fs$r0_estimate <- r0
    fs
  }
)

# combine data
final_size_data <- Reduce(x = final_size_data, f = rbind)

# order age groups
final_size_data$demo_grp <- factor(
  final_size_data$demo_grp,
  levels = contact_data$demography$age.group
)

# examine some replicates in the data
head(final_size_data, 15)
#>    demo_grp   susc_grp susceptibility group_size p_infected n_infected
#> 1     [0,5) susc_grp_1              1    3453670  0.2828074   976723.4
#> 2    [5,18) susc_grp_1              1    9761554  0.5289652  5163522.0
#> 3   [18,40) susc_grp_1              1   18110368  0.3789748  6863372.4
#> 4   [40,65) susc_grp_1              1   19288101  0.3173556  6121186.7
#> 5       65+ susc_grp_1              1    9673058  0.1972188  1907708.7
#> 6     [0,5) susc_grp_1              1    3453670  0.4125902  1424950.4
#> 7    [5,18) susc_grp_1              1    9761554  0.6852156  6688769.0
#> 8   [18,40) susc_grp_1              1   18110368  0.5339755  9670493.7
#> 9   [40,65) susc_grp_1              1   19288101  0.4601724  8875852.6
#> 10      65+ susc_grp_1              1    9673058  0.3013400  2914879.0
#> 11    [0,5) susc_grp_1              1    3453670  0.3584494  1237965.8
#> 12   [5,18) susc_grp_1              1    9761554  0.6249082  6100075.4
#> 13  [18,40) susc_grp_1              1   18110368  0.4709703  8529445.0
#> 14  [40,65) susc_grp_1              1   19288101  0.4010051  7734626.6
#> 15      65+ susc_grp_1              1    9673058  0.2567629  2483682.1
#>    replicate r0_estimate
#> 1          1    1.305846
#> 2          1    1.305846
#> 3          1    1.305846
#> 4          1    1.305846
#> 5          1    1.305846
#> 6          2    1.494286
#> 7          2    1.494286
#> 8          2    1.494286
#> 9          2    1.494286
#> 10         2    1.494286
#> 11         3    1.410225
#> 12         3    1.410225
#> 13         3    1.410225
#> 14         3    1.410225
#> 15         3    1.410225

With {tidyverse}

library(tibble)
library(dplyr)
library(tidyr)
library(purrr)
library(forcats)

final_size_data <-
  # create a dataframe with values from a vector
  tibble(r0 = r0_samples) %>%
  rownames_to_column() %>%
  # map the function final_size() to all the r0 values
  # with the same set of arguments
  # with {purrr}
  mutate(
    temp = map(
      .x = r0,
      .f = final_size,
      contact_matrix = contact_matrix,
      demography_vector = demography_vector,
      susceptibility = susc_uniform,
      p_susceptibility = p_susc_uniform
    )
  ) %>%
  # unnest all the dataframe outputs in temp
  unnest(temp) %>%
  # relevel the factor variable
  mutate(
    demo_grp = fct_relevel(
      demo_grp,
      contact_data %>%
        pluck("demography") %>%
        pluck("age.group")
    )
  )

head(final_size_data, 15)
#> # A tibble: 15 × 8
#>    rowname    r0 demo_grp susc_grp   susceptibility group_size p_infected
#>    <chr>   <dbl> <fct>    <chr>               <dbl>      <dbl>      <dbl>
#>  1 1        1.31 [0,5)    susc_grp_1              1    3453670      0.283
#>  2 1        1.31 [5,18)   susc_grp_1              1    9761554      0.529
#>  3 1        1.31 [18,40)  susc_grp_1              1   18110368      0.379
#>  4 1        1.31 [40,65)  susc_grp_1              1   19288101      0.317
#>  5 1        1.31 65+      susc_grp_1              1    9673058      0.197
#>  6 2        1.49 [0,5)    susc_grp_1              1    3453670      0.413
#>  7 2        1.49 [5,18)   susc_grp_1              1    9761554      0.685
#>  8 2        1.49 [18,40)  susc_grp_1              1   18110368      0.534
#>  9 2        1.49 [40,65)  susc_grp_1              1   19288101      0.460
#> 10 2        1.49 65+      susc_grp_1              1    9673058      0.301
#> 11 3        1.41 [0,5)    susc_grp_1              1    3453670      0.358
#> 12 3        1.41 [5,18)   susc_grp_1              1    9761554      0.625
#> 13 3        1.41 [18,40)  susc_grp_1              1   18110368      0.471
#> 14 3        1.41 [40,65)  susc_grp_1              1   19288101      0.401
#> 15 3        1.41 65+      susc_grp_1              1    9673058      0.257
#> # ℹ 1 more variable: n_infected <dbl>

Visualise uncertainty in final size

ggplot(final_size_data) +
  stat_summary(
    aes(
      demo_grp, p_infected
    ),
    fun = mean,
    fun.min = function(x) {
      quantile(x, 0.05)
    },
    fun.max = function(x) {
      quantile(x, 0.95)
    }
  ) +
  scale_y_continuous(
    labels = scales::percent,
    limits = c(0.25, 1)
  ) +
  theme_classic() +
  theme(
    legend.position = "top",
    legend.key.height = unit(2, "mm"),
    legend.title = ggtext::element_markdown(
      vjust = 1
    )
  ) +
  coord_cartesian(
    expand = TRUE
  ) +
  labs(
    x = "Age group",
    y = "% Infected"
  )
Estimated ranges of the final size of a hypothetical SIR epidemic in age groups of the U.K. population, when the $R_0$ is estimated to be 1.5, with a standard deviation around this estimate of 0.1. In this example, relatively low uncertainty in $R_0$ estimates can also lead to uncertainty in the estimated final size of the epidemic. Points represent means, while ranges extend between the 5th and 95th percentiles.

Estimated ranges of the final size of a hypothetical SIR epidemic in age groups of the U.K. population, when the R0 is estimated to be 1.5, with a standard deviation around this estimate of 0.1. In this example, relatively low uncertainty in R0 estimates can also lead to uncertainty in the estimated final size of the epidemic. Points represent means, while ranges extend between the 5th and 95th percentiles.

Mossong, Joël, Niel Hens, Mark Jit, Philippe Beutels, Kari Auranen, Rafael Mikolajczyk, Marco Massari, et al. 2008. “Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases.” PLOS Medicine 5 (3): e74. https://doi.org/10.1371/journal.pmed.0050074.