--- title: "Visualising multi-state models using mstate" author: "Edouard F. Bonneville" date: "`r Sys.setenv(LANG = 'en_US.UTF-8'); format(Sys.Date(), '%d %B %Y')`" output: rmarkdown::html_vignette bibliography: Tutorial.bib vignette: > %\VignetteIndexEntry{Visualising multi-state models using mstate} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 6, fig.align = 'center' ) ``` # Preamble The purpose of the present vignette is to demonstrate the visualisation capacities of *mstate*, using both base R graphics and the *ggplot2* package [@ggplot]. To do so, we will use the dataset used to illustrate competing risks analyses in Section 3 of the Tutorial by @Tut . The dataset is available in *mstate* under the object name `aidssi`. We can begin by loading both the *mstate* and *ggplot2* libraries, and set a general theme for our plots: ```{r} # Load libraries library(mstate) library(ggplot2) # Set general ggplot2 theme theme_set(theme_bw(base_size = 14)) ``` We can then proceed to load the dataset, and prepare it for a competing risks analysis using `msprep()`. The steps are described in more detail in the [main vignette](https://CRAN.R-project.org/package=mstate/vignettes/Tutorial.pdf). ```{r load_dat} # Load data data("aidssi") head(aidssi) # Shorter name si <- aidssi # Prepare transition matrix tmat <- trans.comprisk(2, names = c("event-free", "AIDS", "SI")) tmat # Run msprep si$stat1 <- as.numeric(si$status == 1) si$stat2 <- as.numeric(si$status == 2) silong <- msprep( time = c(NA, "time", "time"), status = c(NA, "stat1", "stat2"), data = si, keep = "ccr5", trans = tmat ) # Run cox model silong <- expand.covs(silong, "ccr5") c1 <- coxph(Surv(time, status) ~ ccr5WM.1 + ccr5WM.2 + strata(trans), data = silong) ``` # Visualising cumulative baseline hazards using `plot.msfit()` Using `msfit()`, we can obtain patient-specific transition hazards. We look here at a patient with a CCR5 genotype "WW" (wild type allele on both chromosomes). ```{r msfit_prep} # Data to predict WW <- data.frame( ccr5WM.1 = c(0, 0), ccr5WM.2 = c(0, 0), trans = c(1, 2), strata = c(1, 2) ) # Make msfit object msf.WW <- msfit( object = c1, newdata = WW, trans = tmat ) ``` The cumulative baseline hazards can be plotted simply by using: ```{r msfitplot_base_1} plot(msf.WW) ``` If we specify the argument `use.ggplot = TRUE`, the `plot` method will return a ggplot object. ```{r msfitplot_ggplot2_1} plot(msf.WW, use.ggplot = TRUE) ``` When using the argument `type = "separate"`, the base R plot will return a separate plot for each transition: ```{r msfitplot_base_2} par(mfrow = c(1, 2)) plot(msf.WW, type = "separate") ``` The *ggplot2* version will return a facetted plot, where the axis scales can either be kept "fixed", or "free" using the `scale_type` argument. It is essentially the same argument as `scales` from the `facet_wrap()` function of *ggplot2*, see `?ggplot2::facet_wrap`. ```{r msfitplot_ggplot_2} # Fixed scales plot(msf.WW, type = "separate", use.ggplot = TRUE, scale_type = "fixed") # Free scales plot(msf.WW, type = "separate", use.ggplot = TRUE, scale_type = "free", xlim = c(0, 15)) ``` The remaining arguments are standard plotting adjustments, which will work for both the *ggplot2* and base R version of the plots. For details, see `?mstate::plot.msfit`. Any further adjustments that are not available through the function arguments (such as plot title) can simply be added using standard *ggplot2* syntax using `+`, or graphics functions such as `title()`. The following is a customised example: ```{r msfitplot_customs} par(mfrow = c(1, 1)) # A base R customised plot plot( msf.WW, type = "single", cols = c("blue", "black"), # or numeric e.g. c(1, 2) xlim = c(0, 15), legend.pos = "top", lty = c("dashed", "solid"), use.ggplot = FALSE ) title("Cumulative baseline hazards") # A ggplot2 customised plot plot( msf.WW, type = "single", cols = c("blue", "black"), # or numeric e.g. c(1, 2) xlim = c(0, 15), lty = c("dashed", "solid"), legend.pos = "bottom", use.ggplot = TRUE ) + # Add title and center ggtitle("Cumulative baseline hazards") + theme(plot.title = element_text(hjust = 0.5)) ``` Available using `use.ggplot = TRUE` are the confidence intervals around the cumulative hazards, which can be obtained by specifying `conf.type` of type "plain" or "log", for example in single plot: ```{r msfitplot_confint1} plot( msf.WW, type = "single", use.ggplot = TRUE, conf.type = "log", conf.int = 0.95 ) ``` Or else, in a facetted plot: ```{r msfitplot_confint2} plot( msf.WW, type = "separate", use.ggplot = TRUE, conf.type = "log", conf.int = 0.95, scale_type = "free_y" ) ``` # Visualising transition probabilities using `plot.probtrans()` The transition hazards obtained in the previous section can be used to obtain patient-specific transition probabilities using `probtrans()`. Here, we would like to predict from the beginning of follow-up (`predt = 0`). ```{r} # Run probtrans pt.WW <- probtrans(msf.WW, predt = 0) # Example predict at different times summary(pt.WW, times = c(1, 5, 10)) ``` The `plot` method for `probtrans` objects allows to visualise the transition probabilities in various ways, using both functionality from base R graphics and *ggplot2*. ## Filled/stacked plots The default is a so-called "filled" plot, where the transition probabilities are represented by coloured areas. The `from` argument allows the user to choose the state to predict from (default is 1, the first state). ```{r plot_pt_filled1} plot(pt.WW, from = 1) ``` Again, the `use.ggplot = TRUE` argument can be used to return a ggplot object instead: ```{r plot_pt_filled2} # from = 1 implied plot(pt.WW, use.ggplot = TRUE) ``` Note that the *ggplot2* version of the plot places the state labels in a legend rather than labels in the plot itself. If we prefer the latter, we can specify `label = annotate` instead - and change the size of the labels with cex. ```{r plot_pt_filled3} plot(pt.WW, use.ggplot = TRUE, label = "annotate", cex = 6) ``` More generally, we can also adjust the stacking order using the `ord` argument, which takes a numeric vector equal to the number of states, in the desired stacking order (bottom to top). ```{r plot_pt_filled4} # Check state order again from transition matrix tmat # Plot aids (state 2), then event-free (state 1), and SI on top (state 3) plot(pt.WW, use.ggplot = TRUE, ord = c(2, 1, 3)) ``` You can also choose to forgo the colour, and specify `type = "stacked"` instead: ```{r plot_pt_stacked} plot(pt.WW, use.ggplot = TRUE, type = "stacked") ``` ## Single and separate plots, confidence intervals Instead of visualising the probabilities using stacked areas, we can go the traditional route and use a single line per state. Confidence intervals can then be added. By specifying `type = "separate"`, the base R plot will return a separate plot for all three states. ```{r plot_pt_separate1} par(mfrow = c(1, 3)) plot(pt.WW, type = "separate") ``` The *ggplot2* version will return a facetted plot, with one state per facet. It also accommodates confidence intervals, which are either of type "log" (default) or "plain". ```{r plot_pt_separate2} plot( pt.WW, use.ggplot = TRUE, type = "separate", conf.int = 0.95, # 95% level conf.type = "log" ) ``` These confidence intervals can be omitted using `conf.type = "none"`. What if we wanted all of these in one plot? For that, we can use the `type = single` option: ```{r plot_pt_single1} plot( pt.WW, use.ggplot = TRUE, type = "single", conf.int = 0.95, # 95% level conf.type = "log" ) ``` As before, the confidence intervals can be omitted. ```{r plot_pt_single2} plot( pt.WW, use.ggplot = TRUE, type = "single", conf.type = "none", lty = c(1, 2, 3), # change the linetype lwd = 1.5, ) ``` If the multi-state model is large, we may be only interested in plotting the probabilities for a subset of states together. This subset will not sum up to 1, so the stacked/filled plots are not appropriate. There is no set function to do this, but it can be done by extracting the data from a `plot.probtrans()`-based ggplot object as follows: ```{r plot_pt_single3} # Run plot and extract data using $data dat_plot <- plot(x = pt.WW, use.ggplot = TRUE, type = "single")$data # Begin new plot - Exclude or select states to be plotted ggplot(data = dat_plot[state != "event-free", ], aes( x = time, y = prob, ymin = CI_low, ymax = CI_upp, group = state, linetype = state, col = state )) + # Add CI and lines; change fill = NA to remove CIs geom_ribbon(col = NA, fill = "gray", alpha = 0.5) + geom_line() + # Remaining details labs(x = "Time", y = "Cumulative Incidence") + coord_cartesian(ylim = c(0, 1), xlim = c(0, 14), expand = 0) ``` If interest lies in plotting the probability of a *single* state, the procedure above can be used, or the `vis.multiple.pt()` function presented further could be used directly. # Plotting non-parametric cumulative incidence functions The `Cuminc()` function in *mstate* produces (for competing risks data) the non-parametric Aalen-Johansen estimates of the cumulative incidence functions. This is the same as is obtained in the *cmrpsk* package with the function `cuminc()`. In *mstate*, an object of class "Cuminc" can also be plotted as follows: ```{r plot.Cuminc1} cum_incid <- Cuminc( time = "time", status = "status", data = si ) plot( x = cum_incid, use.ggplot = TRUE, conf.type = "log", lty = 1:2, conf.int = 0.95, ) ``` In the case where this a grouping variable: ```{r plot.cuminc_x} cum_incid_grp <- Cuminc( time = "time", status = "status", group = "ccr5", data = si ) plot( x = cum_incid_grp, use.ggplot = TRUE, conf.type = "none", lty = 1:4, facet = FALSE ) plot( x = cum_incid_grp, use.ggplot = TRUE, conf.type = "none", lty = 1:4, facet = TRUE ) ``` # Visualising multiple probtrans objects We may also be interested in comparing the predicted probabilities for *multiple* reference patients. First, we need to create as many msfit/probtrans objects as there are reference patients of interest: ```{r vis_multiple_prep} # 1. Prepare patient data - both CCR5 genotypes WW <- data.frame( ccr5WM.1 = c(0, 0), ccr5WM.2 = c(0, 0), trans = c(1, 2), strata = c(1, 2) ) WM <- data.frame( ccr5WM.1 = c(1, 0), ccr5WM.2 = c(0, 1), trans = c(1, 2), strata = c(1, 2) ) # 2. Make msfit objects msf.WW <- msfit(c1, WW, trans = tmat) msf.WM <- msfit(c1, WM, trans = tmat) # 3. Make probtrans objects pt.WW <- probtrans(msf.WW, predt = 0) pt.WM <- probtrans(msf.WM, predt = 0) ``` Afterwards, we can supply the probtrans objects in a list into the `vis.multiple.pt()` function. This will visualise the probability of being in state number `to` over time, given the reference patient was in state number `from` at the `predt` time supplied in `probtrans()`. The example below visualises the probability of being in state AIDS, given event-free at time 0. The two lines/associated confidence intervals correspond to the reference patients - both with a different CCR5 genotype ("WW" or "WM"). ```{r vis_multiple_plot} vis.multiple.pt( x = list(pt.WW, pt.WM), from = 1, to = 2, conf.type = "log", cols = c(1, 2), labels = c("Pat WW", "Pat WM"), legend.title = "Ref patients" ) ``` This function could just as well be used for a single probtrans object: ```{r vis_multiple_plot2} vis.multiple.pt( x = list(pt.WW), from = 1, to = 2, conf.type = "log", cols = c(1, 2), labels = c("Pat WW", "Pat WM"), legend.title = "Ref patients" ) ``` Note this function is only available in the ggplot format. ## A mirrored plot Multiple probtrans objects can also be compared by means of a mirrored plot. The idea is to compare the probabilities of being in (all) states between two references patients at a particular horizon. In addition to reference patients, different subsets of the data could equally be compared. ```{r mirror1} vis.mirror.pt( x = list(pt.WW, pt.WM), titles = c("WW", "WM"), horizon = 10 ) ``` Omitting the `horizon` argument will default to plotting both sides with their respective maximum follow-up time, so it may be asymmetric. We thus recommend to always use the `horizon` argument. If for example time is in years, and follow-up is extremely short, you may want to override the breaks on the x-axis. This can be done using the `breaks_x_left` and `breaks_x_right` arguments. ```{r mirror2} vis.mirror.pt( x = list(pt.WW, pt.WM), titles = c("WW", "WM"), size_titles = 8, horizon = 5, breaks_x_left = c(0, 1, 2, 3, 4, 5), breaks_x_right = c(0, 1, 2, 3, 4), ord = c(3, 2, 1) ) ``` # Saving outputs Any plots made with *mstate* using the `use.ggplot = TRUE` will return a ggplot object, which can be saved to a desired format by using `ggplot2::ggsave()`. Please refer to the [article](https://ggplot2.tidyverse.org/reference/ggsave.html#saving-images-without-ggsave-) written by the *ggplot2* team on using `ggplot2::ggsave()`. ```{r saving, eval=FALSE} # Saving a ggplot2 plot p <- plot(pt.WW, use.ggplot = TRUE) ggplot2::ggsave("my_ggplot2_plot.png") # Standard graphics plot png("my_standard_plot.png") plot(pt.WW, use.ggplot = FALSE) dev.off() ``` # Reproducibility
Reproducibility receipt ```{r session-info, include=TRUE, echo=TRUE, results='markup'} # Date/time Sys.time() # Environment sessionInfo() ```
# References