Title: | Elementary Epidemiological Functions for Epidemiology and Biostatistics |
---|---|
Description: | Contains elementary tools for analysis of common epidemiological problems, ranging from sample size estimation, through 2x2 contingency table analysis and basic measures of agreement (kappa, sensitivity/specificity). Appropriate print and summary statements are also written to facilitate interpretation wherever possible. Source code is commented throughout to facilitate modification. The target audience includes advanced undergraduate and graduate students in epidemiology or biostatistics courses, and clinical researchers. |
Authors: | Michael A Rotondi <[email protected]> |
Maintainer: | Michael A Rotondi <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.5 |
Built: | 2024-11-09 04:21:47 UTC |
Source: | https://github.com/cran/epibasix |
This function displays the simple correlation of two vectors of equal length, as well as providing confidence limits and hypothesis tests.
corXY(X, Y, alpha=0.05, rho0 = 0, HA="not.equal", digits=3)
corXY(X, Y, alpha=0.05, rho0 = 0, HA="not.equal", digits=3)
X |
A Vector of the same length as Y |
Y |
A Vector of the same length as X, This function requires the input of Vectors |
alpha |
The Type I error rate for Hypothesis Tests and Confidence Intervals |
rho0 |
The Null Hypothesis for Hypothesis Tests |
HA |
The alternative hypothesis can be one of "less.than", "greater.than", or "not.equal" |
digits |
The number of digits to round results |
This function provides the required information, such as the Pearson correlation Hypothesis Tests and confidence intervals, while providing suitable detail in the and print statements for epidemiologists to understand the information at hand.
rho |
The Sample Pearson Correlation, as calculated in the cor function. |
n |
The sample size. |
Test |
The Test Statistic for the desired hypothesis test based on Fisher's Transformation. |
p.Value |
The p-value for the Hypothesis Test. |
CIL |
The lower bound of the constructed confidence interval for |
CIU |
The Upper bound of the constructed confidence interval for |
alpha |
The desired Type I Error Rate |
rho0 |
The Null Hypothesis |
HA |
The supplied Alternative Hypothesis |
Michael Rotondi, [email protected]
Casella G and Berger RL. Statistical Inference (2nd Ed.) Duxbury: New York, 2002. Koepsell TD and Weiss NS. Epidemiologic Methods. Oxford University Press: New York, 2003.
## Not run: Suppose we want to test whether two randomly generated normal vectors are uncorrelated x <- rnorm(100); y <- rnorm(100); corXY(x,y);
## Not run: Suppose we want to test whether two randomly generated normal vectors are uncorrelated x <- rnorm(100); y <- rnorm(100); corXY(x,y);
Provides Minimum Detectable Difference in Means Between Two Populations for fixed values of sigma and n. Useful for experimental design for randomized trials.
diffDetect(N,sigma,alpha=0.05, power=0.8, two.tailed=TRUE)
diffDetect(N,sigma,alpha=0.05, power=0.8, two.tailed=TRUE)
N |
A Vector (or single value) of fixed sample sizes. |
sigma |
A Vector (or single value) of fixed standard deviations sizes. |
alpha |
The desired Type I Error Rate |
power |
The desired level of power, recall power = 1 - Type II Error. |
two.tailed |
Logical, If TRUE calculations are based on a two-tailed Type I error, if FALSE, a one-sided calculation is performed. |
This function can be used as a tool for sensitivity analysis on the choice of population standard deviation. As is often the case, the sample size is fixed by practical considerations, such as cost or difficulty recruiting subjects. This simple tool may help determine whether it is worth performing an experiment that can only detect a given calculated difference between means.
delta |
A Matrix of minimum detectable differences for fixed values of n and sigma |
N |
A Vector (or single value) of specified sample sizes. |
sigma |
A Vector (or single value) of specified standard deviations sizes. |
alpha |
The desired Type I Error Rate |
power |
The desired level of power, recall power = 1 - Type II Error. |
two.tailed |
Logical, If TRUE calculations are based on a two-tailed Type I error, if FALSE, a one-sided calculation is performed. |
Michael Rotondi, [email protected]
Matthews JNS. Introduction to Randomized Controlled Clinical Trials (2nd Ed.) Chapman & Hall: New York, 2006.
## Not run: Suppose, for financial considerations we can only enroll 100 people in a blood pressure medication trial. What is the minimum difference we can detect between means if sigma = 1, 5 or 10 mmHg, at standard levels? ## End(Not run) n <- 100; sigma <- c(1, 5, 10); diffDetect(n,sigma);
## Not run: Suppose, for financial considerations we can only enroll 100 people in a blood pressure medication trial. What is the minimum difference we can detect between means if sigma = 1, 5 or 10 mmHg, at standard levels? ## End(Not run) n <- 100; sigma <- c(1, 5, 10); diffDetect(n,sigma);
This function analyzes 2x2 tables assuming either a case-control or cohort study. Information such as Pearson's chi-squared test, the odds ratio, risk difference and relative risk are computed, as well as confidence intervals.
epi2x2(X,alpha=0.05, digits=3)
epi2x2(X,alpha=0.05, digits=3)
X |
A 2x2 matrix in standard epidemiological format, that is, column one represents outcome present, column two outcome absent, while row one represents risk present and row two represents risk absent. This is crucial for correct computation of odds ratio and parameters. |
alpha |
The desired Type I Error Rate for Hypothesis Tests and Confidence Intervals |
digits |
Number of Digits to round calculations |
This function is similar to PROC FREQ in SAS, as it provides the comprehensive analysis of a 2x2 contingency table. Again, I must stress that the table must be entered in the appropriate format, or unsuitable estimates will result. In a case control study, cases should be entered as column one and controls as column two.
X |
The original input matrix. |
Sy |
Value for Pearson's Chi-squared statistic (with continuity correction). |
Sy.p.value |
P-value for the hypothesis test of no association. |
Fisher.p.value |
P-value for the hypothesis test of no association. (Using Fisher's Exact Test) |
OR |
Point Estimate of the odds ratio. |
OR.CIL |
Lower Confidence Limit for the odds ratio. |
OR.CIU |
Upper Confidence Limit for the odds ratio. |
p1Co |
Row One Risk (Cohort Study) |
p2Co |
Row Two Risk (Cohort Study) |
rdCo |
Risk difference (Cohort Study). That is p1Co - p2Co. |
rdCo.CIL |
Lower Confidence Limit for Risk Difference in a cohort study. |
rdCo.CIU |
Upper Confidence Limit for Risk Difference in a cohort study. |
RR |
Relative Risk (Cohort Study) |
RR.CIL |
Lower Confidence Limit for Relative Risk in a cohort study. |
RR.CIU |
Upper Confidence Limit for Relative Risk in a cohort study. |
p1CC |
Column One Risk (Case-Control Study) |
p2CC |
Column Two Risk (Case-Control Study) |
rdCC |
Risk difference (Case-Control Study). That is p1CC - p2CC. |
rdCC.CIL |
Lower Confidence Limit for Risk Difference in a case-control study. |
rdCC.CIU |
Upper Confidence Limit for Risk Difference in a case-control study. |
alpha |
The desired Type I Error Rate for Hypothesis Tests and Confidence Intervals |
digits |
Number of Digits to round calculations |
Michael Rotondi, [email protected]
Szklo M and Nieto FJ. Epidemiology: Beyond the Basics, Jones and Bartlett: Boston, 2007.
data <- cbind(c(100, 225), c(58, 45)); summary(epi2x2(data));
data <- cbind(c(100, 225), c(58, 45)); summary(epi2x2(data));
Computes the Kappa Statistic for agreement between Two Raters, performs Hypothesis tests and calculates Confidence Intervals.
epiKappa(C, alpha=0.05, k0=0.4, digits=3)
epiKappa(C, alpha=0.05, k0=0.4, digits=3)
C |
An nxn classification matrix or matrix of proportions. |
k0 |
The Null hypothesis, kappa0 = k0 |
alpha |
The desired Type I Error Rate for Hypothesis Tests and Confidence Intervals |
digits |
Number of Digits to round calculations |
The Kappa statistic is used to measure agreement between two raters. For simplicity, consider the case where each rater can classify an object as Type I, or Type II. Then, the diagonal elements of a 2x2 matrix are the agreeing elements, that is where both raters classify an object as Type I or Type II. The discordant observations are on the off-diagonal. Note that the alternative hypothesis is always greater then, as we are interested in whether kappa exceeds a certain threshold, such as 0.4, for Fair agreement.
kappa |
The computation of the kappa statistic. |
seh |
The standard error computed under H0 |
seC |
The standard error as computed for Confidence Intervals |
CIL |
Lower Confidence Limit for |
CIU |
Upper Confidence Limit for |
Z |
Hypothesis Test Statistic, |
p.value |
P-Value for hypothesis test |
Data |
Returns the original matrix of agreement. |
k0 |
The Null hypothesis, kappa = k0 |
alpha |
The desired Type I Error Rate for Hypothesis Tests and Confidence Intervals |
digits |
Number of Digits to round calculations |
Michael Rotondi, [email protected]
Szklo M and Nieto FJ. Epidemiology: Beyond the Basics, Jones and Bartlett: Boston, 2007.
Fleiss J. Statistical Methods for Rates and Proportions, 2nd ed. New York: John Wiley and Sons; 1981.
X <- cbind(c(28,5), c(4,61)); summary(epiKappa(X, alpha=0.05, k0 = 0.6));
X <- cbind(c(28,5), c(4,61)); summary(epiKappa(X, alpha=0.05, k0 = 0.6));
This function computes the standard two sample T-Test, as well as performing hypothesis tests and computing confidence intervals for the equality of both population means.
epiTTest(X,Y, alpha=0.05, pooled=FALSE, digits=3)
epiTTest(X,Y, alpha=0.05, pooled=FALSE, digits=3)
X |
A vector of observed values of a continuous random variable. |
Y |
A vector of observed values of a continuous random variable. |
alpha |
The desired Type I Error Rate for Confidence Intervals |
pooled |
Logical: If TRUE, a pooled estimate of the variance is used. That is, the variance is assumed to be equal in both groups. If FALSE, the Satterthwaite estimate of the variance is used. |
digits |
Number of Digits to round calculations |
This function performs the simple two-sample T-Test, while providing detailed information regarding the analysis and summary information for both groups. Note that this function requires the input of two vectors, so if the data is stored in a matrix, it must be separated into two distinct vectors, X and Y.
nx |
The number of observations in X. |
ny |
The number of observations in Y. |
mean.x |
The sample mean of X. |
mean.y |
The sample mean of Y. |
s.x |
The standard deviation of X. |
s.y |
The standard deviation of Y. |
d |
The difference between sample means, that is, mean.x - mean.y. |
s2p |
The pooled variance, when applicable. |
df |
The degrees of freedom for the test. |
TStat |
The test statistic for the null hypothesis |
p.value |
The P-value for the test statistic for |
CIL |
The lower bound of the constructed confidence interval for |
CIU |
The lower bound of the constructed confidence interval for |
pooled |
Logical: as above for assuming variances are equal. |
alpha |
The desired Type I Error Rate for Confidence Intervals |
Michael Rotondi, [email protected]
Casella G and Berger RL. Statistical Inference (2nd Ed.) Duxbury: New York, 2002.
Szklo M and Nieto FJ. Epidemiology: Beyond the Basics, Jones and Bartlett: Boston, 2007.
X <- rnorm(100,10,1); Y <- rnorm(100); summary(epiTTest(X,Y, pooled = FALSE));
X <- rnorm(100,10,1); Y <- rnorm(100); summary(epiTTest(X,Y, pooled = FALSE));
This function performs elemenentary pair-matched analysis using McNemar's test and computing risk differences.
mcNemar(X, alpha= 0.05, force=FALSE, digits=3)
mcNemar(X, alpha= 0.05, force=FALSE, digits=3)
X |
A 2x2 matrix, with disease status (Yes/No) for the exposed individual in the columns and disease status (Yes/No) for the control individuals in the rows. Note that for a matched-pair analysis, each entry corresponds to a pair of subjects. |
alpha |
The desired Type I Error Rate for Hypothesis Tests and Confidence Intervals |
force |
Logical: McNemar's test is typically valid when the number of discordant pairs exceeds 30. The function may be forced to work, without regards to this concern with FORCE=TRUE. |
digits |
Number of Digits to round calculations |
McNemar's OR is computed as b/c. While standard errors are computed using a transformation.
The risk difference is computed as . Note that this technique can be used for
cohort studies as well as matched trials.
X |
The original input matrix. |
ORMc |
McNemar's Odds Ratio |
ORMC.CIL |
Lower Confidence Limit for McNemar's OR |
ORMC.CIU |
Upper Confidence Limit for McNemar's OR |
rd |
Point Estimate of the risk difference |
rd.CIL |
Lower Confidence Limit for the risk difference |
rd.CIU |
Upper Confidence Limit for the risk difference |
XMc |
Value for McNemar's Chi-squared statistic |
.
XMc.p.Value |
P-value for the hypothesis test of no association. |
alpha |
The desired Type I Error Rate for Hypothesis Tests and Confidence Intervals |
digits |
Number of Digits to round calculations |
Michael Rotondi, [email protected]
Szklo M and Nieto FJ. Epidemiology: Beyond the Basics, Jones and Bartlett: Boston, 2007.
## Not run: Data for matched-cohort study, comparing smokers to non-smokers for the presence of lung cancer. ## End(Not run) X <- cbind(c(15,5), c(19,61)); summary(mcNemar(X, alpha=0.05, force=TRUE));
## Not run: Data for matched-cohort study, comparing smokers to non-smokers for the presence of lung cancer. ## End(Not run) X <- cbind(c(15,5), c(19,61)); summary(mcNemar(X, alpha=0.05, force=TRUE));
This function provides detailed sample size estimation information to determine the number of subjects that must be enrolled in a randomized trial with a continuous outcome.
n4means(delta, sigma, alpha=0.05, power=0.8, AR=1, two.tailed=TRUE, digits=3)
n4means(delta, sigma, alpha=0.05, power=0.8, AR=1, two.tailed=TRUE, digits=3)
delta |
The minimum detectable difference between population means. |
sigma |
The standard error of the outcome. |
AR |
The Allocation Ratio: One implies an equal number of subjects per treatment and control group (maximum efficiency), > 1, implies more subjects will be enrolled in the control group (e.g. in the case of costly intervention), < 1 implies more in the tretment group (rarely used). |
alpha |
The desired Type I Error Rate |
power |
The desired level of power, recall power = 1 - Type II Error. |
two.tailed |
Logical, If TRUE calculations are based on a two-tailed Type I error, if FALSE, a one-sided calculation is performed. |
digits |
Number of Digits to round calculations |
This function provides detailed information, similar to PROC POWER in SAS, but with less functionality and more concise output. It is used for sample size estimation in a randomized trial where the outcome is continuous, such as blood pressure, or weight.
nE |
The minimum number of subjects required in the Experimental group. |
nC |
The minimum number of subjects required in the Control group. |
delta |
The minimum detectable difference between population means. |
sigma |
The standard error of the outcome. |
alpha |
The desired Type I Error Rate |
power |
The desired level of power, recall power = 1 - Type II Error. |
AR |
The Allocation Ratio |
Michael Rotondi, [email protected]
Matthews JNS. Introduction to Randomized Controlled Clinical Trials (2nd Ed.) Chapman & Hall: New York, 2006.
## Not run: Suppose we wish to test whether a blood pressure medication reduces diastolic blood pressure by 10 mm Hg, at standard significance and power, assume the standard deviation is 10 mm Hg. ## End(Not run) n4means(delta=10, sigma=10, alpha=0.05, power=0.80);
## Not run: Suppose we wish to test whether a blood pressure medication reduces diastolic blood pressure by 10 mm Hg, at standard significance and power, assume the standard deviation is 10 mm Hg. ## End(Not run) n4means(delta=10, sigma=10, alpha=0.05, power=0.80);
This function provides detailed sample size estimation information to determine the number of subjects that must be enrolled in a randomized trial with a binary outcome.
n4props(pe, pc, alpha=0.05, power = 0.80, AR=1, two.tailed=TRUE, digits=3)
n4props(pe, pc, alpha=0.05, power = 0.80, AR=1, two.tailed=TRUE, digits=3)
pe |
The anticipated proportion of individuals in the experimental group with the outcome. |
pc |
The anticipated proportion of individuals in the control group with the outcome. |
AR |
The Allocation Ratio: One implies an equal number of subjects per treatment and control group (maximum efficiency), > 1, implies more subjects will be enrolled in the control group (e.g. in the case of costly intervention), < 1 implies more in the tretment group (rarely used). |
alpha |
The desired Type I Error Rate |
power |
The desired level of power, recall power = 1 - Type II Error. |
two.tailed |
Logical, If TRUE calculations are based on a two-tailed Type I error, if FALSE, a one-sided calculation is performed. |
digits |
Number of Digits to round calculations |
This function provides detailed information, similar to PROC POWER in SAS, but with less functionality and more concise output. It is used for sample size estimation in a randomized trial where the response is binary. A simple example may include whether an individual dies from a heart attack. In epidemiological terms, pe and pc can be thought of as the expected prevalence of the outcome in the experimental and control group.
nE |
The minimum number of subjects required in the Experimental group. |
nC |
The minimum number of subjects required in the Control group. |
pe |
The anticipated proportion of individuals in the experimental group with the outcome. |
pc |
The anticipated proportion of individuals in the control group with the outcome. |
alpha |
The desired Type I Error Rate |
power |
The desired level of power, recall power = 1 - Type II Error. |
AR |
The Allocation Ratio |
Michael Rotondi, [email protected]
Matthews JNS. Introduction to Randomized Controlled Clinical Trials (2nd Ed.) Chapman & Hall: New York, 2006.
## Not run: Suppose a new drug is thought to reduce heart attack mortality from 0.10 to 0.03. Calculate the required number of subjects that must be enrolled in a study to detect this difference with alpha = 0.05 and power = 0.80. ## End(Not run) n4props(0.03, 0.10, AR=1, alpha=0.05, power=0.80);
## Not run: Suppose a new drug is thought to reduce heart attack mortality from 0.10 to 0.03. Calculate the required number of subjects that must be enrolled in a study to detect this difference with alpha = 0.05 and power = 0.80. ## End(Not run) n4props(0.03, 0.10, AR=1, alpha=0.05, power=0.80);
This function provides detailed information regarding the comparison of two competing methods, for example self-report and gold-standard treatment through a sensitivity/specificity analysis.
sensSpec(X, alpha=0.05, CL=TRUE, digits=3)
sensSpec(X, alpha=0.05, CL=TRUE, digits=3)
X |
A 2x2 matrix, with Gold Standard Class A and B in the columns and Comparison Method A and B in the rows. |
CL |
Logical: If TRUE, Confidence Intervals are calculated and displayed in summary method. |
alpha |
The desired Type I Error Rate for Hypothesis Tests and Confidence Intervals |
digits |
Number of Digits to round calculations |
This function is designed to calculate Sensitivity, Specificity, Youden's J and Percent Agreement. These tools are used to assess the validity of a new instrument or self-report against the current gold standard. In general, self-report is less expensive, but may be subject to information bias. Computational formulae can be found in the reference.
X |
The original input matrix. |
sens |
The point estimate of sensitivity |
spec |
The point estimate of specificity |
PA |
The point estimate of Percent Agreement |
YoudenJ |
The point estimate of Youden's J |
sens.s |
The standard deviation of sensitivity |
spec.s |
The standard deviation of specificity |
PA.s |
The standard deviation of Percent Agreement |
YoudenJ.s |
The standard deviation of Youden's J |
sens.CIL |
The lower bound of the constructed confidence interval for true sensitivity. |
sens.CIU |
The upper bound of the constructed confidence interval for true sensitivity |
spec.CIL |
The lower bound of the constructed confidence interval for true specificity. |
spec.CIU |
The upper bound of the constructed confidence interval for true specificity. |
PA.CIL |
The lower bound of the constructed confidence interval for Percent Agreement. |
PA.CIU |
The upper bound of the constructed confidence interval for Percent Agreement. |
YoudenJ.CIL |
The lower bound of the constructed confidence interval for Youden's J. |
YoudenJ.CIU |
The upper bound of the constructed confidence interval for Youden's J. |
alpha |
The desired Type I Error Rate for Hypothesis Tests and Confidence Intervals |
digits |
Number of Digits to round calculations |
All confidence limits rely on simple asymptotic theory, as such, confidence limits may lie outside of [0,1]. A more accurate method is available in the twoby2 function of the Epi package, which employs a logit transformation.
Michael Rotondi, [email protected]
Szklo M and Nieto FJ. Epidemiology: Beyond the Basics, Jones and Bartlett: Boston, 2007.
## Not run: From Szklo and Nieto, p. 315 dat <- cbind(c(18,1), c(19,11)); summary(sensSpec(dat));
## Not run: From Szklo and Nieto, p. 315 dat <- cbind(c(18,1), c(19,11)); summary(sensSpec(dat));
This function provides detailed univariate analysis for a single variable. Values include the sample mean, median, standard deviation and range, as well as tools for hypothesis tests and confidence intervals.
univar(X, alpha=0.05, mu0 = 0, shapiro=FALSE, digits=3)
univar(X, alpha=0.05, mu0 = 0, shapiro=FALSE, digits=3)
X |
A Vector of observed values from a continuous distribution |
alpha |
The desired Type I Error Rate for Hypothesis Tests and Confidence Intervals |
mu0 |
The null hypothesis for the true population mean |
.
shapiro |
Logical: TRUE returns the Shapiro-Wilks Test for normality, this portion calls the shapiro.test function. |
digits |
Number of Digits to round calculations |
This function provides a thorough summary of information within a vector. It conveniently calculates useful statistics at the call of a single command. Furthermore, it provides methods to test the hypothesis/construct confidence intervals for the true population mean.
n |
Number of Observations Used |
mean |
The sample mean of the observations in X. |
median |
The sample median of the observations in X. |
min |
The sample minimum of the observations in X. |
max |
The sample maximum of the observations in X. |
s |
The sample standard deviation of the observations in X. |
var |
The sample variance of the observations in X. |
test |
The test statistic for the null hypothesis |
p.value |
The p.value for the test statistic for |
CIL |
The lower bound of the constructed confidence interval for |
CIU |
The upper bound of the constructed confidence interval for |
shapiro.statistic |
The value of the Shapiro-Wilks Statistic for Normality. |
shapiro.p.value |
The P-value of the Shapiro-Wilks Statistic for Normality. |
alpha |
The desired Type I Error Rate for Hypothesis Tests and Confidence Intervals |
mu0 |
The null hypothesis for the true population mean |
.
shapiro |
Logical: TRUE returns the Shapiro-Wilks Test for normality |
digits |
Number of Digits to round calculations |
Michael Rotondi, [email protected]
Casella G and Berger RL. Statistical Inference (2nd Ed.) Duxbury: New York, 2002.
x <- rexp(100); univar(x);
x <- rexp(100); univar(x);