epitab
provides functionality for building contingency
tables with a variety of additional configurations. It was initially
designed for use in epidemiology, as an extension to the
Epi::stat.table
function. However, by identifying core
components of a descriptive table, it is flexible enough to be used in a
variety of disciplines and situations. This vignette provides an
overview of the types of tables that can be built using
epitab
.
For demonstration purposes, a simulated data set representing an observational study of a disease will be used. This fictitious disease primarily affects elderly people, and not every patient receives first-line treatment. The disease itself comes in two variants: A and B.
set.seed(17)
treat <- data.frame(age=abs(rnorm(100, 60, 20)),
sex=factor(sample(c("M", "F"), 100, replace=T)),
variant=factor(sample(c("A", "B"), 100, replace=T)),
treated=factor(sample(c("Yes", "No"), 100, replace=T), levels=c("Yes", "No")))
treat$agebin <- cut(treat$age, breaks=c(0, 40, 60, 80, 9999),labels=c("0-40", "41-60", "61-80", "80+"))
age | sex | variant | treated | agebin |
---|---|---|---|---|
39.69983 | F | B | No | 0-40 |
58.40727 | M | B | No | 41-60 |
55.34026 | F | B | Yes | 41-60 |
43.65464 | F | B | Yes | 41-60 |
75.44182 | F | A | Yes | 61-80 |
56.68776 | M | A | Yes | 41-60 |
Contingency tables are useful tools for exploratory analysis of a data set, and highlight relationships between one or more independent variables and (typically one) outcome of interest. The example code below shows how to build a basic contingency table to view how treatment varies by age group and sex.
Both the independent and outcome variables are
passed in through lists, where the column names must be
quoted strings (thereby allowing for these tables to be used in
automated scripts). The list entry labels are used to provide the column
and row labels. The crosstab_funcs
argument specifies what
summary measures should be calculated for each covariate / outcome
combination. The freq
function calculates the frequency of
each of these cells, and (optionally) provides the proportion in
parentheses.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(freq()),
data=treat)
## | | |Treated | |
## | |All |Yes |No |
## ---------------------------------------------------------------
## | | | | |
## |Total |100 |53 (100) |47 (100) |
## | | | | |
## Age |0-40 |13 (13) |8 (15.1) |5 (10.6) |
## |41-60 |34 (34) |18 (34) |16 (34) |
## |61-80 |39 (39) |19 (35.8) |20 (42.6) |
## |80+ |14 (14) |8 (15.1) |6 (12.8) |
## | | | | |
## Sex |F |54 (54) |32 (60.4) |22 (46.8) |
## |M |46 (46) |21 (39.6) |25 (53.2) |
Using this standard contingency table as a starting point, there are
several ways to customise the table. The presence of the overall
frequency column is controlled by the marginal
argument.
There are also options to freq
that specify the formatting
of the cross-tabulated frequencies (see ?freq
for more
details).
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(freq(proportion = "none")),
marginal=FALSE,
data=treat)
## | |Treated | |
## | |Yes |No |
## -----------------------------------------
## | | | |
## |Total |53 |47 |
## | | | |
## Age |0-40 |8 |5 |
## |41-60 |18 |16 |
## |61-80 |19 |20 |
## |80+ |8 |6 |
## | | | |
## Sex |F |32 |22 |
## |M |21 |25 |
Note that multiple outcomes can be selected, although it still results in a 2-way contingency table between all the covariates and the outcomes independently. It is not currently possible to produce a 3-way contingency table.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated", "Variant"="variant"),
crosstab_funcs=list(freq()),
data=treat)
## | | |Treated | |Variant | |
## | |All |Yes |No |A |B |
## ---------------------------------------------------------------------------------------------
## | | | | | | |
## |Total |100 |53 (100) |47 (100) |48 (100) |52 (100) |
## | | | | | | |
## Age |0-40 |13 (13) |8 (15.1) |5 (10.6) |4 (8.3) |9 (17.3) |
## |41-60 |34 (34) |18 (34) |16 (34) |17 (35.4) |17 (32.7) |
## |61-80 |39 (39) |19 (35.8) |20 (42.6) |25 (52.1) |14 (26.9) |
## |80+ |14 (14) |8 (15.1) |6 (12.8) |2 (4.2) |12 (23.1) |
## | | | | | | |
## Sex |F |54 (54) |32 (60.4) |22 (46.8) |29 (60.4) |25 (48.1) |
## |M |46 (46) |21 (39.6) |25 (53.2) |19 (39.6) |27 (51.9) |
Additional statistics can be added to these contingency tables in two ways. Column-wise measures act on each outcome in turn without regard to the covariates, while row-wise measures are those that are calculated for every level of each independent variable.
It is often the case that in addition to the categorical variables
included in the contingency table, there are continuous attributes that
we are interested in. The col_funcs
argument to
contingency_table
calculates summary measures for each
outcome and can be used for this purpose.
The example below shows how to calculate mean age across treatment
types, using the provided function summary_mean
, to which
the name of the continuous variable of interest is passed as a
string.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(freq()),
col_funcs=list("Mean age"=summary_mean("age")),
data=treat)
## | | |Treated | |
## | |All |Yes |No |
## --------------------------------------------------------------------
## | | | | |
## |Total |100 |53 (100) |47 (100) |
## | | | | |
## Age |0-40 |13 (13) |8 (15.1) |5 (10.6) |
## |41-60 |34 (34) |18 (34) |16 (34) |
## |61-80 |39 (39) |19 (35.8) |20 (42.6) |
## |80+ |14 (14) |8 (15.1) |6 (12.8) |
## | | | | |
## Sex |F |54 (54) |32 (60.4) |22 (46.8) |
## |M |46 (46) |21 (39.6) |25 (53.2) |
## | | | | |
## Mean age | |60.05 |58.69 |61.58 |
As with crosstab_funcs
, multiple summary values can be
passed to col_funcs
. The example below shows the use of the
other column-wise function provided with epitab
:
summary_median
.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(freq()),
col_funcs=list("Mean age"=summary_mean("age"),
"Median age"=summary_median("age")),
data=treat)
## | | |Treated | |
## | |All |Yes |No |
## ----------------------------------------------------------------------
## | | | | |
## |Total |100 |53 (100) |47 (100) |
## | | | | |
## Age |0-40 |13 (13) |8 (15.1) |5 (10.6) |
## |41-60 |34 (34) |18 (34) |16 (34) |
## |61-80 |39 (39) |19 (35.8) |20 (42.6) |
## |80+ |14 (14) |8 (15.1) |6 (12.8) |
## | | | | |
## Sex |F |54 (54) |32 (60.4) |22 (46.8) |
## |M |46 (46) |21 (39.6) |25 (53.2) |
## | | | | |
## Mean age | |60.05 |58.69 |61.58 |
## | | | | |
## Median age | |60.94 |60.39 |62.79 |
Another common addition is to display the coefficients of a
regression model that relates the independent variables with an outcome
(although not necessarily the same outcome displayed in the contingency
table). For example, we may be interested to see how treatment varies by
age group by looking at the odds ratios (ORs) of a univariate logistic
regression. This functionality is provided by the row_funcs
argument to contingency_table
, which accepts a named list
of functions that meet the correct requirements. The two functions
provided with this package are odds_ratio
and
hazard_ratio
, used to display coefficients resulting from
logistic regression and Cox regression respectively.
The example below shows how to specify that the odds ratios should be
calculated in addition to the cross-tabulated frequencies. The only
required argument to odds_ratio
is the name of the outcome
variable.
contingency_table(list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"='treated'),
data=treat,
crosstab_funcs=list(freq()),
row_funcs=list("OR"=odds_ratio('treated'))
)
## | | |Treated | | |
## | |All |Yes |No |OR |
## ---------------------------------------------------------------------------------------
## | | | | | |
## |Total |100 |53 (100) |47 (100) | |
## | | | | | |
## Age |0-40 |13 (13) |8 (15.1) |5 (10.6) |1 |
## |41-60 |34 (34) |18 (34) |16 (34) |1.42 (0.39 - 5.55) |
## |61-80 |39 (39) |19 (35.8) |20 (42.6) |1.68 (0.48 - 6.44) |
## |80+ |14 (14) |8 (15.1) |6 (12.8) |1.20 (0.26 - 5.79) |
## | | | | | |
## Sex |F |54 (54) |32 (60.4) |22 (46.8) |1 |
## |M |46 (46) |21 (39.6) |25 (53.2) |1.73 (0.79 - 3.87) |
Additional arguments to odds_ratio
allow the model to
adjust for every other covariate included in independents
,
specify the largest group as the baseline, and select whether to include
confidence intervals. Note that multiple functions can be provided to
row_funcs
. While the table below may not fit on the page of
this document, it fits neatly within the standard R terminal output.
Strategies for neatly displaying tables are discussed later on.
contingency_table(list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"='treated'),
data=treat,
crosstab_funcs=list(freq()),
row_funcs=list("OR"=odds_ratio('treated', relevel_baseline=TRUE),
"Adj OR"=odds_ratio('treated', adjusted=TRUE,
relevel_baseline=TRUE))
)
## | | |Treated | | | |
## | |All |Yes |No |OR |Adj OR |
## ---------------------------------------------------------------------------------------------------------------
## | | | | | | |
## |Total |100 |53 (100) |47 (100) | | |
## | | | | | | |
## Age |0-40 |13 (13) |8 (15.1) |5 (10.6) |0.59 (0.16 - 2.10) |0.60 (0.15 - 2.14) |
## |41-60 |34 (34) |18 (34) |16 (34) |0.84 (0.33 - 2.12) |0.80 (0.31 - 2.03) |
## |61-80 |39 (39) |19 (35.8) |20 (42.6) |1 |1 |
## |80+ |14 (14) |8 (15.1) |6 (12.8) |0.71 (0.20 - 2.43) |0.64 (0.18 - 2.24) |
## | | | | | | |
## Sex |F |54 (54) |32 (60.4) |22 (46.8) |1 |1 |
## |M |46 (46) |21 (39.6) |25 (53.2) |1.73 (0.79 - 3.87) |1.77 (0.80 - 4.02) |
Another use case in epidemiology is when survival is the outcome of
interest. Such data is more appropriately modelled using Cox regression,
which can be specified with the hazard_ratio
function. This
requires the outcome to be specified as a string detailing a
Surv
object, for example
hazard_ratio("Surv(time, status)")
. See the help page
?hazard_ratio
for further details.
There is no limit to the number of column-wise and row-wise functions that can be supplied, although too many can hinder readability and detract from the purpose of the table.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(freq()),
col_funcs=list("Mean age"=summary_mean("age"),
"Median age"=summary_median("age")),
row_funcs=list("OR"=odds_ratio('treated', relevel_baseline=TRUE),
"Adj OR"=odds_ratio('treated', adjusted=TRUE,
relevel_baseline=TRUE)),
data=treat)
## | | |Treated | | | |
## | |All |Yes |No |OR |Adj OR |
## ----------------------------------------------------------------------------------------------------------------------
## | | | | | | |
## |Total |100 |53 (100) |47 (100) | | |
## | | | | | | |
## Age |0-40 |13 (13) |8 (15.1) |5 (10.6) |0.59 (0.16 - 2.10) |0.60 (0.15 - 2.14) |
## |41-60 |34 (34) |18 (34) |16 (34) |0.84 (0.33 - 2.12) |0.80 (0.31 - 2.03) |
## |61-80 |39 (39) |19 (35.8) |20 (42.6) |1 |1 |
## |80+ |14 (14) |8 (15.1) |6 (12.8) |0.71 (0.20 - 2.43) |0.64 (0.18 - 2.24) |
## | | | | | | |
## Sex |F |54 (54) |32 (60.4) |22 (46.8) |1 |1 |
## |M |46 (46) |21 (39.6) |25 (53.2) |1.73 (0.79 - 3.87) |1.77 (0.80 - 4.02) |
## | | | | | | |
## Mean age | |60.05 |58.69 |61.58 | | |
## | | | | | | |
## Median age | |60.94 |60.39 |62.79 | | |
This flexibility of epitab
allows for either simple
summary tables that are used to highlight a trend within the data, or
more complex reference tables that hold a large amount of summary
statistics.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated", "Disease variant"="variant"),
crosstab_funcs=list(freq()),
col_funcs=list("Mean age"=summary_mean("age"),
"Median age"=summary_median("age")),
row_funcs=list("Treatment OR"=odds_ratio('treated', relevel_baseline=TRUE),
"Disease variant OR"=odds_ratio('variant',
relevel_baseline=TRUE)),
data=treat)
## | | |Treated | |Disease variant | | | |
## | |All |Yes |No |A |B |Treatment OR |Disease variant OR |
## ------------------------------------------------------------------------------------------------------------------------------------------------------------
## | | | | | | | | |
## |Total |100 |53 (100) |47 (100) |48 (100) |52 (100) | | |
## | | | | | | | | |
## Age |0-40 |13 (13) |8 (15.1) |5 (10.6) |4 (8.3) |9 (17.3) |0.59 (0.16 - 2.10) |4.02 (1.10 - 17.13) |
## |41-60 |34 (34) |18 (34) |16 (34) |17 (35.4) |17 (32.7) |0.84 (0.33 - 2.12) |1.79 (0.70 - 4.63) |
## |61-80 |39 (39) |19 (35.8) |20 (42.6) |25 (52.1) |14 (26.9) |1 |1 |
## |80+ |14 (14) |8 (15.1) |6 (12.8) |2 (4.2) |12 (23.1) |0.71 (0.20 - 2.43) |10.71 (2.47 - 75.53) |
## | | | | | | | | |
## Sex |F |54 (54) |32 (60.4) |22 (46.8) |29 (60.4) |25 (48.1) |1 |1 |
## |M |46 (46) |21 (39.6) |25 (53.2) |19 (39.6) |27 (51.9) |1.73 (0.79 - 3.87) |1.65 (0.75 - 3.68) |
## | | | | | | | | |
## Mean age | |60.05 |58.69 |61.58 |59.23 |60.8 | | |
## | | | | | | | | |
## Median age | |60.94 |60.39 |62.79 |61.26 |60.01 | | |
contingency_table
can even be used when there is no
cross-tabulation, for example as a means of displaying regression
coefficients.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
row_funcs=list("OR"=odds_ratio('treated', relevel_baseline=TRUE),
"Adj OR"=odds_ratio('treated', adjusted=TRUE,
relevel_baseline=TRUE)),
data=treat)
## | |OR |Adj OR |
## --------------------------------------------------------------------
## | | | |
## Age |0-40 |0.59 (0.16 - 2.10) |0.60 (0.15 - 2.14) |
## |41-60 | | |
## |61-80 |1 |1 |
## |80+ |0.71 (0.20 - 2.43) |0.64 (0.18 - 2.24) |
## | | | |
## Sex |F |1 |1 |
## |M |1.73 (0.79 - 3.87) |1.77 (0.80 - 4.02) |
The default print
method of these contingency tables is
designed for a standard wide R console, where the entire table fits
width-wise. However, for situations where a table is being produced for
distribution or publication of any type, greater attention to detail and
appearance is required. epitab
provides several options for
exporting clean-looking tables.
neat_table
to HTML and PDFThe neat_table
function provided in epitab
builds a cleanly formatted table for output to HMTL or LaTeX, using
knitr::kable
and the kableExtra
package. The
output of neat_table
is a kable
object and so
can be passed to kableExtra::kable_styling()
, allowing for
the specification of various cosmetic settings. See the help files for
both neat_table
and kableExtra::kable_styling
for further details.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"='treated'),
crosstab_funcs=list(freq()),
row_funcs=list("OR"=odds_ratio('treated'),
"Adj OR"=odds_ratio('treated', adjusted=TRUE)),
col_funcs=list("Mean age"=summary_mean("age")),
data=treat) %>%
neat_table() %>%
kableExtra::kable_styling(bootstrap_options=c("striped", "hover"),
full_width=FALSE)
All | Yes | No | OR | Adj OR | |
---|---|---|---|---|---|
Total | 100 | 53 (100) | 47 (100) | ||
Age | |||||
0-40 | 13 (13) | 8 (15.1) | 5 (10.6) | 1 | 1 |
41-60 | 34 (34) | 18 (34) | 16 (34) | 1.42 (0.39 - 5.55) | 1.34 (0.36 - 5.29) |
61-80 | 39 (39) | 19 (35.8) | 20 (42.6) | 1.68 (0.48 - 6.44) | 1.68 (0.47 - 6.49) |
80+ | 14 (14) | 8 (15.1) | 6 (12.8) | 1.20 (0.26 - 5.79) | 1.08 (0.22 - 5.31) |
Sex | |||||
F | 54 (54) | 32 (60.4) | 22 (46.8) | 1 | 1 |
M | 46 (46) | 21 (39.6) | 25 (53.2) | 1.73 (0.79 - 3.87) | 1.77 (0.80 - 4.02) |
Mean age | 60.05 | 58.69 | 61.58 |
Due to the vignette markdown theme, these styling changes won’t appear in this HTML. The screenshot below shows how the table appears when using the above code in the default Rmarkdown template.
For outputting to PDF documents using LaTeX, the same
neat_table
function can be used, but now the
latex
output format must be specified. Also it is
highly recommended to use the booktabs
argument to produce far cleaner looking tables.
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"='treated'),
crosstab_funcs=list(freq()),
row_funcs=list("OR"=odds_ratio('treated'),
"Adj OR"=odds_ratio('treated', adjusted=TRUE)),
col_funcs=list("Mean age"=summary_mean("age")),
data=treat) %>%
neat_table('latex', booktabs=TRUE) %>%
kableExtra::kable_styling(font_size=8)
The above call will display the table below using the default Rmarkdown template.
kable
If full control of the table appearance is required, then the raw
character matrix is provided as the mat
attribute of the
output of contingency_table
. It can be used in conjunction
with knitr::kable
and kableExtra
. NB: The
default value for format
in kable
is
pandoc, which does not work well with epitab
, try
html or markdown instead.
contingency_table(list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"='treated'),
crosstab_funcs=list(freq()),
row_funcs=list("OR"=odds_ratio('treated'),
"Adj OR"=odds_ratio('treated', adjusted=TRUE)),
data=treat)$mat %>%
knitr::kable("html") %>%
kableExtra::kable_styling(bootstrap_options="striped")
Treated | ||||||
All | Yes | No | OR | Adj OR | ||
Total | 100 | 53 (100) | 47 (100) | |||
Age | 0-40 | 13 (13) | 8 (15.1) | 5 (10.6) | 1 | 1 |
41-60 | 34 (34) | 18 (34) | 16 (34) | 1.42 (0.39 - 5.55) | 1.34 (0.36 - 5.29) | |
61-80 | 39 (39) | 19 (35.8) | 20 (42.6) | 1.68 (0.48 - 6.44) | 1.68 (0.47 - 6.49) | |
80+ | 14 (14) | 8 (15.1) | 6 (12.8) | 1.20 (0.26 - 5.79) | 1.08 (0.22 - 5.31) | |
Sex | F | 54 (54) | 32 (60.4) | 22 (46.8) | 1 | 1 |
M | 46 (46) | 21 (39.6) | 25 (53.2) | 1.73 (0.79 - 3.87) | 1.77 (0.80 - 4.02) |
Again note that the stylistic changes made above will not display in the vignette you are currently reading due to the vignette template, but they will appear in your Rmarkdown output as shown below.
Since Word is a proprietary format, it is challenging to directly embed tables into documents. The most convenient method to export a contingency table into Word involves the following steps:
tab <- contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(freq()),
col_funcs=list("Mean age"=summary_mean("age"),
"Median age"=summary_median("age")),
row_funcs=list("OR"=odds_ratio('treated', relevel_baseline=TRUE),
"Adj OR"=odds_ratio('treated', adjusted=TRUE,
relevel_baseline=TRUE)),
data=treat)
write.table(tab$mat, "mytable.csv", row.names=FALSE, col.names=FALSE, sep=',')
In the above examples, the summary functions used to build up the
table in crosstab_funcs
, row_funcs
, and
col_funcs
have been provided by epitab
.
However, for greater flexibility, any correctly parametrised function
can be supplied instead. This section details the appropriate form for
each of these 3 arguments.
The functions passed in to crosstab_functions
are run
for each combination of outcome and independent variable level.
Arguments:
data
: A subset of the full data, the strata of
individuals with the current independent variable level.outcome_level
: A string providing the current outcome
level.outcome_name
: A string providing the current outcome
variable.independent_level
: A string providing the current
independent level.independent_name
: A string providing the current
independent variable.The function must return a vector of length one, representing the statistic for this covariate-level / outcome-level pair.
The example function below calculates the proportion of each
treatment type per covariate level (rather than also displaying the
counts as freq
does).
proportion <- function(data, outcome_level=NULL, outcome_name=NULL, independent_level=NULL, independent_name=NULL) {
if (!is.null(outcome_level) && !is.null(outcome_name))
data <- data[data[[outcome_name]] == outcome_level, ]
count <- if (!is.null(independent_level) && !is.null(independent_name)) {
sum(data[[independent_name]] == independent_level)
} else {
nrow(data)
}
proportion <- count / nrow(data)
sprintf("%0.1f%%", proportion*100)
}
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(proportion),
data=treat)
## | | |Treated | |
## | |All |Yes |No |
## --------------------------------------------------------
## | | | | |
## |Total |100 |100.0% |100.0% |
## | | | | |
## Age |0-40 |13.0% |15.1% |10.6% |
## |41-60 |34.0% |34.0% |34.0% |
## |61-80 |39.0% |35.8% |42.6% |
## |80+ |14.0% |15.1% |12.8% |
## | | | | |
## Sex |F |54.0% |60.4% |46.8% |
## |M |46.0% |39.6% |53.2% |
The column-wise functions provided with epitab
are
summary_mean
and summary_median
, and are used
to investigate relationships between the outcome variables that aren’t
necessarily associated with the categorical covariates.
Args:
data
: The full data set that was passed to
contingency_table
.outcome_level
: The current level of the outcome, as a
string.outcome_name
: The current outcome, as a string.Returns:
The function must return a single value, representing the statistic for this outcome level.
The example function below extends summary_mean
by
adding the standard deviation in parentheses. It is hard-coded to work
for the continuous variable age
in the dummy data set.
meanage_sd <- function(data, outcome_level=NULL, outcome_name=NULL) {
if (!is.null(outcome_level) && !is.null(outcome_name))
data <- data[data[[outcome_name]] == outcome_level, ]
mean <- round(mean(data[['age']]), 2)
sd <- round(sd(data[['age']]), 2)
paste0(mean, " (", sd, ")")
}
contingency_table(independents=list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"="treated"),
crosstab_funcs=list(freq()),
col_funcs=list("Mean age"=summary_mean("age"),
"Mean age (sd)"=meanage_sd),
data=treat)
## | | |Treated | |
## | |All |Yes |No |
## ---------------------------------------------------------------------------------------
## | | | | |
## |Total |100 |53 (100) |47 (100) |
## | | | | |
## Age |0-40 |13 (13) |8 (15.1) |5 (10.6) |
## |41-60 |34 (34) |18 (34) |16 (34) |
## |61-80 |39 (39) |19 (35.8) |20 (42.6) |
## |80+ |14 (14) |8 (15.1) |6 (12.8) |
## | | | | |
## Sex |F |54 (54) |32 (60.4) |22 (46.8) |
## |M |46 (46) |21 (39.6) |25 (53.2) |
## | | | | |
## Mean age | |60.05 |58.69 |61.58 |
## | | | | |
## Mean age (sd) | |60.05 (20.13) |58.69 (21.36) |61.58 (18.76) |
Row-wise functions are used to estimate summary measures for the independent variables outside of the contingency table. This can be useful for providing a summary statistic that is not necessarily related to the outcome variables. This example will describe the case where we wish to run a linear regression on a continuous outcome, in particular, this example will regress on continuous age. This is not a particularly helpful analysis, since one of the covariates is age group, but it will serve as an example. These functions are run for each independent variable and must be parametrised as follows:
Args:
data
: The data set originally passed into
contingency_table
.var
: The current independent variable, as a
string.all_vars
: All independent variables as passed into
independents
. This is provided to allow regression models
to adjust for other covariates.The function must return a vector with length equal to the number of
levels of var
.
lr <- function(data, var=NULL, all_vars=NULL) {
if (is.null(var) || is.null(all_vars)) {
return("")
}
levs <- levels(data[[var]])
form <- as.formula(paste('age ~', var))
mod <- lm(form, data)
coefs <- c(coef(mod), 1) # Add baseline as 1
# coefficients are named with <variable><level>
labels <- paste0(var, levs)
# set baseline name in coefficients vector
names(coefs)[length(coefs)] <- labels[1]
round(coefs[labels], 3)
}
contingency_table(list("Age"="agebin",
"Sex"="sex"),
outcomes=list("Treated"='treated'),
data=treat,
crosstab_funcs=list(freq()),
row_funcs=list("Regression on age"=lr)
)
## | | |Treated | | |
## | |All |Yes |No |Regression on age |
## --------------------------------------------------------------------------------------
## | | | | | |
## |Total |100 |53 (100) |47 (100) | |
## | | | | | |
## Age |0-40 |13 (13) |8 (15.1) |5 (10.6) |1 |
## |41-60 |34 (34) |18 (34) |16 (34) |23.49 |
## |61-80 |39 (39) |19 (35.8) |20 (42.6) |42.379 |
## |80+ |14 (14) |8 (15.1) |6 (12.8) |65.6 |
## | | | | | |
## Sex |F |54 (54) |32 (60.4) |22 (46.8) |1 |
## |M |46 (46) |21 (39.6) |25 (53.2) |1.996 |