This vignette covers the use of the functions incprops
,
inccounts
, and prevcounts
. It is strongly
recommended that the vignette “Introduction” be read before any use of
this package.
The two primary functions for HIV incidence estimation are
incprops
and inccounts
. These take as
arguments a summary of arbitrarily complex survey data sets capturing
HIV prevalence and prevalence of recent HIV infection among HIV positive
subjects. They return estimates of incidence, and, if specification of
multiple cross-sectional surveys is provided, incidence differences
(point estimates, confidence intervals, p values, and subsidiary
output). The principal reference for the methodology underlying this
implementation is Kassanjee et al. Epidemiology, 2012.1 Further
guidance is provided in Kassanjee, McWalter, Welte. AIDS Research
and Human Retroviruses, 2014.2, and some hitherto unpublished technical
details are in the appendix of vignette “Introduction”.
A fundamental element in the conception of the inctools is
that the primary entry point into the critical methodology which
inctools implements is the function incprops
,
which takes, as summary of the population state, estimates of HIV
prevalence and the prevalence of recent infection amongst HIV
positive subjects (including variance and covariance). These estimates,
in turn, would usually be best derived by (potentially complex)
preliminary analysis of the raw survey data documenting individual
subjects’ status ascertainment, cluster and strata membership,
weighting, etc.
The derivation of these prevalence estimates from raw data is in
principle facilitated by various algorithms which are implemented in
other packages, and are essentially independent of any of the innovation
captured in this package. Use of inctools does not imply any
specific approach to the preliminary analytical methodology, but the
widely used package survey (totally independently maintained,
with no link to inctools) may be suitable for many typical data
sets. Additionally, to facilitate ‘naive’, self contained within the
package, analysis, the ancilliary function prevcounts
is
provided. This takes survey counts and produces prevalence estimates
(for both HIV and recent infection, including variance) under the
simplifying assumption of individual level random selection of subjects
from a single population group.
incprops
and
inccounts
The functions incprops
and inccounts
provide a near-identical interface, as further detailed in the help
pages. Both functions take considerably pre-processed data specifying a
recent infection test and a survey in which it is used: * estimates of
false recency rate (FRR–β) and
mean duration of recent infection (MDRI–ΩT) and their
respective relative standard errors and recency time cutoff (T) * and
survey data: proportions (counts, if using function
inccounts
) of HIV positives (PrevH) and positives for
recency (PrevR) and their relative standard errors.
A critical distinction is that with the use of incprops
,
variance of prevalences, including covariance, is explicitly supplied,
and with the use of inccounts
, variance emerges from counts
and design effects, and there is no covariance.
The output for a single survey is an estimate of incidence along with confidence intervals and RSE, estimated annual rate of infection and associated confidence intervals, and confidence intervals for parameters MDRI and FRR, which are deduced from input parameters.
The output for multiple surveys is the same output as for a single survey, along with pairwise comparisons of incidence rates, confidence intervals of differences, and tests of equality with p-values and RSE of differences.
Consider a single cross sectional survey summarised by:
and proposed to be processed by 10,000 bootstap iterations. Function
incprops
will calculate:
incprops(PrevH = 0.20, RSE_PrevH = 0.028, PrevR = 0.10, RSE_PrevR = 0.09,
BS_Count = 10000, Boot = TRUE, MDRI = 200, RSE_MDRI = 0.05,
FRR = 0.01, RSE_FRR = 0.2, BigT = 730)
## $Incidence.Statistics
## # A tibble: 1 × 6
## Incidence CI_LB CI_UB RSE Cov.PrevH.I Cor.PrevH.I
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.0426472 0.0330613 0.0531862 0.120208 0.00000838974 0.291124
##
## $Annual.Risk.of.Infection
## # A tibble: 1 × 3
## ARI ARI.CI_LB ARI.CI_UB
## <dbl> <dbl> <dbl>
## 1 0.0417506 0.0325208 0.0517965
##
## $MDRI.CI
## CI_LB CI_UB
## 1 180.4004 219.5996
##
## $FRR.CI
## # A tibble: 1 × 2
## CI_LB CI_UB
## <dbl> <dbl>
## 1 0.00608007 0.0139199
Multiple surveys can be processed in a single call to
incprops
by supplying vectors of the parameters. Note
that:
incprops
also calculates all
pairwise incidence differences and p values.incprops(PrevH = c(0.20,0.21,0.18), RSE_PrevH = c(0.028,0.03,0.022),
PrevR = c(0.10,0.13,0.12), RSE_PrevR = c(0.094,0.095,0.05),
BS_Count = 10000, Boot = FALSE, BMest = 'MDRI.FRR.indep',
MDRI = c(200,180,180), RSE_MDRI = c(0.05,0.07,0.06),
FRR = c(0.01,0.009,0.02), RSE_FRR = c(0.2,0.2,0.1), BigT = 730)
## $Incidence.Statistics
## # A tibble: 3 × 6
## survey Incidence CI_LB CI_UB RSE RSE_InfSS
## <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0.0426472 0.0323959 0.0528986 0.122642 0.0539212
## 2 2 0.0677397 0.0503319 0.0851475 0.131115 0.0730176
## 3 3 0.0484745 0.0396085 0.0573405 0.0933180 0.0662453
##
## $Incidence.Difference.Statistics
## # A tibble: 6 × 8
## compare Diff CI_LB CI_UB RSE RSE_InfSS p_value
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 vs 2 -0.0250925 -0.0452945 -0.00489049 0.410774 0.217381 0.0149152
## 2 1 vs 3 -0.00582725 -0.0193807 0.00772616 1.18669 0.677794 0.399407
## 3 2 vs 1 0.0250925 0.00489049 0.0452945 0.410774 0.217381 0.0149152
## 4 2 vs 3 0.0192652 -0.000270292 0.0388008 0.517372 0.306104 0.0532552
## 5 3 vs 1 0.00582725 -0.00772616 0.0193807 1.18669 0.677794 0.399407
## 6 3 vs 2 -0.0192652 -0.0388008 0.000270292 0.517372 0.306104 0.0532552
## # ℹ 1 more variable: p_value_InfSS <dbl>
##
## $MDRI.CI
## CI_LB CI_UB
## 1 180.4004 219.5996
## 2 155.3045 204.6955
## 3 158.8324 201.1676
##
## $FRR.CI
## # A tibble: 3 × 2
## CI_LB CI_UB
## <dbl> <dbl>
## 1 0.00608007 0.0139199
## 2 0.00547206 0.0125279
## 3 0.0160801 0.0239199
prevcounts
Function prevcounts
, while not strictly necessary (and
indeed not recommended for final inference on incidence based on real
survey data, presumably obtained at great cost and with considerable
complex sampling structure) turns counts of:
into (point) estimates (and variance) of prevalence of HIV and
prevalence of recent infection among HIV the positives. At heart, this
is a relatively simple multinomial distribution analysis (trinomial, in
the case of complete coverage of recency testing amongst HIV positives)
and could be accomplished without any significant innovation directly
arising out of the core methods of this package, but function
prevcounts
at least provides a consistent entry point into
this analysis, using arguments consistently named to align to the other
functions, including appropriate design effects. The most likely use of
prevcounts
is probably indirectly through
inccounts
, but it is provided in user-exposed form for its
intuitive supportive value and for recycling into user customisations
beyond routine primary incidence estimation. Note that the use of
prevcounts
implies an interpretation of these counts which
precludes non-null covariance of the prevalence of HIV and the
prevalence of recency.
For a single survey:
## # A tibble: 1 × 4
## PrevH PrevR RSE_PrevH RSE_PrevR
## <dbl> <dbl> <dbl> <dbl>
## 1 0.2 0.07 0.0296648 0.141169
Note that:
It is spelled out that in this instance all HIV positive subjects were tested for recent infection
A design effect is provided for adjusting the variance of the prevalence of HIV
An independent design effect is provided for adjusting the variance of the prevalence of recent infection amongst HIV positives
There is no mention of a total sample size in excess of the
number of individuals in the underlying survey, on whom there is no HIV
status information. A more sophisticated analysis might well use this
larger sample size, and additionally account for the frequency, and
risk-factor distribution, of missingness, but the conception
prevcounts
does not require this data. The use of design
effects, though limited and ultimately problematic, is the appropriate
way to insert distributional information beyond the implied two
independent binomial distributions for each prevalence.
Input can be provided for two or more surveys in vector form, using
the concatenation expression c()
:
prevcounts (N = c(5000,6000), N_H = c(1000,1100), N_testR = c(950,1060),
N_R = c(100,70), DE_H = c(1.1,1.2), DE_R = c(1.2,1.3))
## # A tibble: 2 × 4
## PrevH PrevR RSE_PrevH RSE_PrevR
## <dbl> <dbl> <dbl> <dbl>
## 1 0.2 0.105263 0.0296648 0.103619
## 2 0.183333 0.0660377 0.0298481 0.131700
Kassanjee, R., McWalter, T.A., Baernighausen, T. and Welte, A. “A new general biomarker-based incidence estimator.” Epidemiology; 2012, 23(5): 721-728.↩︎
Kassanjee, R., McWalter, T.A. and Welte, A. “Short Communication: Defining Optimality of a Test for Recent Infection for HIV Incidence Surveillance.” AIDS Research and Human Retroviruses; 2014, 30(1): 45-49.↩︎