---
title: "EpiStats"
author: "Jean Pierre Decorps"
date: "`r Sys.Date()`"
output: rmarkdown::pdf_document
geometry: margin=0.5in
vignette: >
  %\VignetteIndexEntry{Using EpiStats}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---



Epiconcept is made up of a team of doctors, epidemiologists, data scientists and digital specialists.
For more than 20 years, Epiconcept has contributed to the improvement of public health by writing software, carrying out epidemiological studies, research, evaluation and training to better detect, monitor and prevent disease and to improve treatment.


--------------------------------------------------------------------------------------------


# Package Epistats

## Description

The EpiStats package is a set of functions aimed at epidemiologists. They include commands for measures of association and impact for case control studies and cohort studies. They may be particularly useful for outbreak investigations and include univariate and stratified analyses.
    
The generic function *crossTable* provides a contingency table with optional parameters *percent* and *statistic*

The functions for cohort studies include the *CS*, *CSTable* and *CSInter* commands. 

The functions for case control studies include the *CC*, *CCTable* and *CCInter* commands. 

All variables used need to be numeric binary variables and coded as 0 and 1 or as factors.
    
## Cohort study functions:

The cohort study functions relate to cohort studies that measure risks, rather than rates in person-
time.

The **CS** function provides a 2 by 2 table and measures the association between the outcome and one
exposure. It includes the risk ratio and its 95% confidence intervals, the attributable fraction among
the exposed and unexposed, and a chi square test and its p-value.

The **CSTable** function displays the measures of association between the outcome and a set of
exposures in a table (risk ratios, confidence intervals and p-values). This helps the researcher to
compare between exposures and provides a nice table for reports.

The **CSInter** function investigates the effect of a third variable on the association between an
exposure and the outcome. It presents two by two tables stratified by the levels of a third value. It
provides the Woolf test for homogeneity between stratum-specific risk ratios. It provides the crude
risk ratio between an exposure and an outcome and the risk ratio adjusted by the third variable.
CSInter helps the researcher understand whether a third variable may have an effect modifying or
confounding effect on the association between an exposure and the outcome.

## Case control study functions:

The **CC** function provides a 2 by 2 table and measures the association between the outcome and one
exposure. It includes the odds ratio and its 95% confidence intervals, the attributable fraction among
the exposed, and a chi square test and its p-value.

The **CCTable** function displays the measures of association between the outcome and a set of
exposures in a table (odds ratios, confidence intervals and p-values). This helps the researcher to
compare between exposures and provides a nice table for reports.

The **CCInter** function investigates the effect of a third variable on the association between an
exposure and the outcome. It presents two by two tables stratified by the levels of a third value. It
provides the Woolf test for homogeneity between stratum-specific odds ratios. It provides the crude
odds ratio between an exposure and an outcome and the odds ratio adjusted by the third variable.
CCInter helps the researcher understand whether a third variable may have an effect modifying or
confounding effect on the association between an exposure and the outcome.

## The "Tiramisu" dataset

The dataset used in this vignette is from an outbreak investigation carried out in Germany in 1998 by
Anja Hauri, Robert Koch Institute. It is used in case studies by organisations including EPIET, ECDC
and EpiConcept.

--------------------------------------------------

  The **CSTable**, **CSInter**, **CCTable** and **CCInter** functions are based on commands written
  in Stata by *Gilles Desve*, who we gratefully acknowledge.

--------------------------------------------------


# Working with Epistats and "Tiramisu" dataset

## Loading and recoding the dataset

```{r message=FALSE}
library(EpiStats)
library(dplyr)
library(knitr)

options(knitr.kable.NA = '')
#options(width=200)

data(Tiramisu)
DF <- Tiramisu

DF <- DF %>%
  # Recoding all variables as binary '0' or '1'
  mutate(salmon = ifelse(salmon == 9, NA, salmon)) %>% 
  mutate(horseradish = ifelse(horseradish == 9, NA, horseradish)) %>% 
  mutate(pork = ifelse(pork == 9, NA, pork)) %>% 
  mutate(sex01 = case_when(sex == "females" ~ 1, sex == "males" ~ 0)) %>% 
  mutate(agegroup = case_when(age < 30 ~ 0, age >= 30 ~ 1)) %>%
  mutate(tportion = case_when(tportion == 0 ~ 0, tportion == 1 ~ 1, tportion >= 2 ~ 2)) %>%
  mutate(tportion = as.factor(tportion)) %>%
  as.data.frame(stringsAsFactors=TRUE)

Colnames <- DF %>% 
  select(-ill, -age, -sex, -dateonset, -uniquekey, -tportion, -mportion) %>% 
  colnames()

```

\newpage
## crossTable
Creates a contingency table of variable of interest and exposure. Percentage are optionals by *row* or by *column*. It can provides an optional statistic (*fisher* or *chisquare*).

### Syntax

**crosTable**(data, var1, var2, percent="none", statistic="none")

### Examples
Recoding some data to have ordered factors
```{r}
DF2 <- DF
DF2$ill <- factor(DF2$ill, levels=c(1,0), ordered = TRUE)
DF2$beer <- factor(DF2$beer, levels=c(1,0), ordered = TRUE)
DF2$tira <- factor(DF2$tira, levels=c(1,0), ordered = TRUE)
DF2$sex <- factor(DF2$sex, levels = c("males", "females"), ordered = TRUE)

```

### Example 1: crossTable ill - tira

```{r}
ret <- crossTable(DF2, var1="ill", var2="tira")
ret
kable(ret, align="r")
```

### Example 2: crossTable ill - sex with column percentage and chi2 stat
```{r}
ret <- crossTable(DF2, "ill", "sex", "col", "chi2")
kable(ret, align="r", caption = "with columns %")
```

\newpage
### Example 3: CrossTable ill - sex with row percentage and Fisher stat
NB: All parameters are unquoted
```{r}
ret <- crossTable(DF2, ill, sex, row, fisher)
ret
kable(ret, align="r")

```

## CrossTable beer - sex with column and row percentages and Chi2 stat
NB: All parameters are unquoted
```{r}
ret <- crossTable(DF2, beer, sex, both, chi2)
ret
kable(ret, align="r", caption = "% rows and columns")

```

\newpage
## CS

  CS analyses cohort studies with equal follow-up time per subject. The risk (the proportion of individuals who become cases) is calculated overall and among the exposed and unexposed. Note that all variables need to be numeric and binary and coded as "0" and "1".
  
  Point estimates and confidence intervals for the risk ratio and risk difference are calculated,
  along with attributable or preventive fractions for the exposed and the total population. Additionally you can select if you want to display the Fisher's exact test, by specifying exact = TRUE. If you specify full = TRUE you can easily access useful statistics from the output tables.

### Syntax

**CS**(x, cases, exposure, exact, full=FALSE)

### Example 1: CS ill - mousse (unformatted)

```{r}
CS(DF, "ill", "mousse", exact = FALSE)

```

\newpage
### Example 2: CS ill - beer (formatted)

The following results tables are outputs in "markdown" using the *kable* function.

```{r }
result <- CS(DF, "ill", "beer", exact = TRUE, full = TRUE)
kable(result$df1, align = "r")
kable(result$df2, align = result$df2.align )
```

By storing the results in the object "result", you are able to use the result tables in Markdown as shown above. By specifying "full = TRUE" you can also easily use individual elements of the results. For example if you would like to view just the risk ratio, you can view it by typing:

```{r}
result$st$risk_ratio$point_estimate
```

\newpage
## CSTable - Summary table for cohort studies

  CSTable is used for univariate analysis of cohort studies with several exposures. The results are summarised in one table with one row per exposure making comparisons between exposures easier and providing a useful table for integrating into reports. Note that all variables need to be numeric and binary and coded as "0" and "1".
  
  
  The results of this function contain: The name of exposure variables, the total number
  of exposed, the number of exposed cases, the attack rate among the exposed, the total number of unexposed, the number of
  unexposed cases, the attack rate among the unexposed, risk ratios, 95% confidence intervals, 95% p-values.
 
  You can optionally choose to display the Fisher's exact p-value instead of the Chi squared p-value, with the option exact = TRUE.
  
  You can specify the sort order, with the option sort="rr" to order by risk ratios. The default sort order is by p-values. 
  
  The option "full = TRUE" provides you with useful formatting information, which can be handy if you're using "markdown".

### Syntax

**CSTable**(x, cases, exposure=c(), exact=FALSE, sort = "pvalue", full=FALSE)


### Example 1: CSTable results ordered by p-value (unformatted)

```{r}
CSTable(DF,
        "ill",
        exposure = c("sex01", "agegroup", "tira", "beer", "mousse", "wmousse", "dmousse",
                     "redjelly", "fruitsalad", "tomato", "mince", "salmon", "horseradish",
                     "chickenwin", "roastbeef", "pork"))
```

\newpage
### Example 2: CSTable results ordered by risk ratio (formatted)

The following results tables are outputs in "markdown" using the kable function.

```{r}
res = CSTable(DF, "ill", sort = "rr", exposure = Colnames, full = TRUE)

kable(res$df, digits=res$digits, align=res$align)
```

\newpage
### Example 3: CSTable results ordered by p-value from the Fisher's exact test (formatted)

The following results tables are outputs in "markdown" using the kable function.

```{r}
res = CSTable(DF, "ill", exact = TRUE, exposure = Colnames, full = TRUE)
kable(res$df, digits=res$digits, align=res$align)
```

By storing the results in the object "res", you are able to use the result table in Markdown as shown above. You can also use individual elements of the results. For example if you would like to view just the risk ratio, you can view it by typing (for example):

```{r}
res$df$RR[2]
```


\newpage
## CSInter - Stratified analysis for cohort studies

  CSInter is useful to determine the effects of a third variable on the association between an exposure and
  an outcome.  CSInter produces 2 by 2 tables with stratum specific risk ratios, attributable risk among
  exposed and population attributable risk. Note that the outcome and exposure variable need to be numeric
  and binary and coded as "0" and 1". The third variable needs to be numeric, but may have more categories,
  such as "0", "1" and "2".

  CSInter displays a summary with the crude RR, the Mantel Haenszel adjusted RR and the result of a 
  "Woolf" test for homogeneity of stratum-specific RR.
  
The option "full = TRUE" provides you with useful formatting information, which can be handy if you're using "markdown".


### Syntax

**CSInter**(x, cases, exposure, by, full=FALSE)

### Example 1 : CSInter ill - wmousse by tira (unformatted)


```{r}
CSInter(DF, cases="ill", exposure = "wmousse", by = "tira")
```

\newpage
### Example 2 : CSInter ill - beer by tira (formatted)

The following results tables are outputs in "markdown" using the kable function.

```{r}
res <- CSInter(DF, "ill", "beer", "tira", full = TRUE)
```

```{r echo=FALSE}
kable(res$df1, align="r")
kable(res$df2, align="r")
```


\newpage
### Example 3: CSInter ill - beer by tportion (formatted)

The following results tables are outputs in "markdown" using the kable function.

```{r}
res <- CSInter(DF, "ill", "beer", "tportion", full = TRUE)
kable(res$df1, align="r")
kable(res$df2, align="r")
```

By storing the results in the object "res", you are able to use the result table in Markdown as shown above. You can also use individual elements of the results. For example if you would like to view just the Mantel-Haenszel risk ratio for beer adjusted for tportion, you can view it by typing:

```{r}
 res$df2$Stats[3]

```

\newpage
## CC

CC is used for case control studies to determine the association between an exposure and an outcome. Variables need to be binary and coded as "0" and "1". Point estimates and confidence intervals for the odds ratio are calculated along with attributable or preventive fractions for the exposed and total population. Additionally you can select if you want to display the Fisher's exact test, by specifying exact = TRUE.  If you specify full = TRUE you can easily access useful statistics from the output tables.

### Syntax

**CC**(x, cases, exposure, exact, full=FALSE)

\newpage
### Example 1: CC ill - mousse (unformatted)

```{r}
cc(DF, "ill", "mousse", exact = TRUE)
```

\newpage
### Example 2: CC ill - beer (formatted)

The following results tables are outputs in "markdown" using the kable function.

```{r results='asis'}
result <- CC(DF, "ill", "beer", exact = TRUE, full = TRUE)
kable(result$df1, align="r")
kable(result$df2, align=result$df2.align)

```

By storing the results in the object "result", you are able to use the result tables in Markdown as shown above. By specifying "full = TRUE" you can also easily use individual elements of the results.For example if you would like to view just the odds ratio, you can view it by typing:

```{r}
result$st$odds_ratio$point_estimate
```

\newpage
## CCTable - Summary table for case control studies

  CCTable is used for univariate analysis of case control studies with several exposures. The results are summarised in one table with one row per exposure making comparisons between exposures easier and providing a useful table for integrating into reports. Note that all variables need to be numeric and binary and coded as "0" and "1".

  The results of this function contain: The name of exposure variables, the total 
  number of cases, the number of exposed cases, the percentage of exposed among cases, the number of controls,
  the number of exposed controls, the percentage of exposed among controls, odds ratios, 95%CI intervals, p-values. 
  
  You can optionally choose to display the Fisher's exact p-value instead of the Chi squared p-value, with the option exact = TRUE.
  
  You can specify the sort order, with the option sort="or" to order by odds ratios. The default sort order is by p-values.
  
The option "full = TRUE" provides you with useful formatting information, which can be handy if you're using "markdown".

### Syntax

**CCTable**(x, cases, exposure=c(), exact=FALSE, sort = "pvalue", full=FALSE)

### Example 1:  CCTable results ordered by p-value (unformatted)

```{r}
CCTable(DF, "ill",
        exposure = c("sex01", "agegroup", "tira", "beer", "mousse", "wmousse", "dmousse",
                     "redjelly", "fruitsalad", "tomato", "mince", "salmon", "horseradish",
                     "chickenwin", "roastbeef", "pork"))
```

\newpage
### Example 2: CCTable results ordered by odds ratio (formatted)

The following results tables are outputs in "markdown" using the kable function.

```{r}
res = CCTable(DF, "ill", sort = "or", exposure = Colnames)
kable(res$df)
```

\newpage
### Example 3: CCTable results ordered by p-value from the Fisher's exact test (formatted)

The following results tables are outputs in "markdown" using the kable function.


```{r}
res = CCTable(DF, "ill", exposure = Colnames, exact=TRUE)
kable(res$df)
```

By storing the results in the object "res", you are able to use the result table in Markdown as shown above. You can also use individual elements of the results. For example if you would like to view just the odds ratio, you can view it by typing (for example):

```{r}
res$df$OR[1]
```


\newpage
## CCInter - Stratified analysis for case control studies

  CCInter is useful to determine the effects of a third variable on the association between an exposure and an outcome.
  CCInter produces 2 by 2 tables with stratum specific odds ratios, attributable risk among
  exposed and population attributable risk. 
  
  Note that the outcome and exposure variable need to be numeric and binary
  and coded as "0" and 1". The third variable needs to be numeric, but may have more categories, such as "0", "1" and "2".

  CCInter displays a summary with the crude OR, the Mantel Haenszel adjusted OR and the result of a 
  Woolf test for homogeneity of stratum-specific OR.

The option "full = TRUE" provides you with useful formatting information, which can be handy if you're using "markdown".

### Syntax

**CCInter** (x, cases, exposure, by, full=FALSE)


### Example 1: CCInter ill - wmousse by tira (unformatted)


```{r message=FALSE, warning=FALSE}

CCInter(DF, cases="ill", exposure = "wmousse", by = "tira")

```

### Example 2: CCInter ill - beer by tira (formatted)

The following results tables are outputs in "markdown" using the kable function.

```{r message=FALSE, warning=FALSE}

res <- CCInter(DF, cases="ill", exposure = "beer", by = "tira", full = TRUE)
kable(res$df1, align=res$df1.align)
kable(res$df2)
```

\newpage
### Example 3: CCInter ill - beer by tportion (formatted)

The following results tables are outputs in "markdown" using the kable function.

```{r message=FALSE}

res <- CCInter(DF, cases="ill", exposure = "beer", by = "tportion", full = TRUE)
kable(res$df1, align=res$df1.align)
kable(res$df2, align=res$df2.align)

```

By storing the results in the object "res", you are able to use the result table in Markdown as shown above. You can also use individual elements of the results. For example if you would like to view just the Mantel-Haenszel odds ratio for beer adjusted for tportion, you can view it by typing:

```{r}
res$df2$Stats[3]
```