NEWS
diyar 0.5.1.9000
New features
Changes
Bug fixes
diyar 0.5.1 (2023-11-12)
New features
Changes
Bug fixes
links()
- Incorrect results in some situations. Resolved.
links_af_probabilistic()
- Failed in some situations. Resolved.
diyar 0.5.0 (2023-11-05)
New features
- New option (
"semi"
) for the batched
argument in links()
. All
matches are compared against the record-set in the next iteration.
Therefore, the number of record-pairs increase exponentially as new
matches are found. This means fewer record-pairs (memory usage) but a
longer run time compared to the "no"
option. Conversely, it leads to
more record-pairs (memory usage) but a shorter run time compared to
the "yes"
option.
- New argument (
batched
) in episodes()
- New argument (
split
) in episodes()
. Split the analysis in
N
-splits of strata
. This leads to fewer record-pairs (and memory
usage) but a longer run time.
- New argument (
decode
) in as.data.frame.pid()
,
as.data.frame.epid()
and as.data.frame.pane()
- New function -
episodes_af_shift()
. A more vectorised approach to
episodes()
based on epidm::group_time()
.
- New function -
links_wf_episodes()
. Implantation of episodes()
using links()
.
Changes
- Optimised
episodes()
and links()
. Each iteration now uses less
time and memory.
link_id
slot in pid
objects is now a list
.
links()
- records with missing values in a sub_criteria
are now
skipped at the corresponding iteration.
- Updated argument in
links()
- recursive
. This now takes any of
three options [c("linked", "unlinked", "none")]
.
[c("linked", "unlinked")]
collectively were previously [TRUE]
,
while ["none"]
was previously [FALSE]
.
as.epids()
now calls make_episodes()
.
- The default value for the
window
argument in partitions()
is now
NULL
as.data.frame()
and as.data.list()
now only creates
elements/fields from non-empty fields
id
and gid
slots in number_line
objects are now integer(0)
by
default.
episode_group()
, record_group()
and range_match_legacy()
have
been removed.
["recurisve"]
episodes from episodes()
are now presented as
["rolling"]
episodes with reference_event = "all_records"
i.e
Old syntax ~ episodes(..., episode_type == "recursive")
New syntax ~ episodes(..., episode_type == "rolling", reference_event = "all_records")
Bug fixes
- When
recursive
was TRUE
, links()
ended prematurely and therefore
missed some matches. Resolved.
recurrence_sub_criteria
in episodes()
was not implemented
correctly and lead to incorrect linkage result in some instances.
Resolved.
overlap_method()
- logical tests recycled incorrectly. Resolved.
check_links
argument - Option "g"
implemented as option "l"
.
Resolved.
make_pairs_wf_source()
. Created incorrect pairs. Resolved.
case_sub_criteria
and recurrence_sub_criteria
in episodes()
led
to incorrect results. Resolved.
diyar 0.4.2 (2022-12-20)
New features
- New argument in
merge_ids()
- shrink
and expand
.
- New S3 method for class ‘d_report’ -
plot
.
- New S3 method for class ‘sub_criteria’ -
format
.
- New function -
true()
. Predefined logical test for use with
sub_criteria()
.
- New function -
false()
. Predefined logical test for use with
sub_criteria()
.
- New argument in
links()
- batched
. Specify if all record pairs are
created or compared at once ("no"
) or in batches ("yes"
).
- New argument in
links()
- repeats_allowed
. Specify if record-pairs
with duplicate elements should be created.
- New argument in
links()
- permutations_allowed
. Specify if
permutations of the same record-pair should be created.
- New argument in
links()
- ignore_same_source
. Specify if
record-pairs from different datasets should be created.
- New argument in
eval_sub_criteria()
- depth
. First order of
recursion.
- New function -
sets()
and make_sets()
. Create permutations of
record-sets.
Changes
links()
- When shrink
is TRUE
, records in a record-group must
meet every listed match criteria
and sub_criteria
. For example, if
pid_cri
is 3, then the record must have meet matched another on the
the first three match criteria.
links()
- pid@iteration
now tracks when a record was dealt with
instead of when it was assigned to a record-group. For example, a
record can be closed (matched or not matched) at iteration 1 but
assigned to a record-group at iteration 5.
make_pairs()
- x.*
and y.*
values in the output are now swapped.
sub_criteria
can now export any data created by match_func
. To do
this, match_func
must export a list
, where the first element is a
logical object. See an example below.
library(diyar)
val <- rep(month.abb[1:5], 2); val
#> [1] "Jan" "Feb" "Mar" "Apr" "May" "Jan" "Feb" "Mar" "Apr" "May"
match_and_export <- function(x, y){
output <- list(x == y,
data.frame(x_val = x, y_val = y, is_match = x == y))
return(output)
}
sub.cri.1 <- sub_criteria(
val, match_funcs = list(match.export = match_and_export)
)
format(sub.cri.1, show_levels = TRUE)
#> logical_test-{
#> Lv.0.1-match.export(Jan,Feb,Mar ...)
#> }
eval_sub_criteria(sub.cri.1)
#> $logical_test
#> [1] 1 0 0 0 0 1 0 0 0 0
#>
#> $mf.0.1
#> x_val y_val is_match
#> 1 Jan Jan TRUE
#> 2 Feb Jan FALSE
#> 3 Mar Jan FALSE
#> 4 Apr Jan FALSE
#> 5 May Jan FALSE
#> 6 Jan Jan TRUE
#> 7 Feb Jan FALSE
#> 8 Mar Jan FALSE
#> 9 Apr Jan FALSE
#> 10 May Jan FALSE
links
can now export any data created within a sub_criteria
. To do
this, the sub_criteria
must be created as described above. See an
example below
val <- 1:5
diff_one_and_export <- function(x, y){
diff <- x - y
is_match <- diff <= 1
output <- list(is_match,
data.frame(x_val = x, y_val = y, diff = diff, is_match = is_match))
return(output)
}
sub.cri.2 <- sub_criteria(
val, match_funcs = list(diff.export = diff_one_and_export)
)
links(
criteria = "place_holder",
sub_criteria = list("cr1" = sub.cri.2))
#> $pid
#> [1] "P.1 (CRI 001)" "P.1 (CRI 001)" "P.3 (CRI 001)" "P.3 (CRI 001)"
#> [5] "P.5 (No hits)"
#>
#> $export
#> $export$cri.1
#> $export$cri.1$iteration.1
#> $export$cri.1$iteration.1$mf.0.1
#> x_val y_val diff is_match
#> 1 1 1 0 TRUE
#> 2 2 1 1 TRUE
#> 3 3 1 2 FALSE
#> 4 4 1 3 FALSE
#> 5 5 1 4 FALSE
#>
#>
#> $export$cri.1$iteration.2
#> $export$cri.1$iteration.2$mf.0.1
#> x_val y_val diff is_match
#> 1 3 3 0 TRUE
#> 2 4 3 1 TRUE
#> 3 5 3 2 FALSE
Bug fixes
summary.epid()
- Incorrect count for ‘by episode type
’. Resolved.
episodes()
- Incorrect results in some instances with skip_order
.
Resolved.
make_ids()
- Did not capture all records in that should be in a
record-group when matches are recursive. Resolved.
make_pairs()
- Incorrect record-pairs in some instances. Resolved.
eval_sub_criteria()
- When output of match_func
is length one,
it’s not recycled. Resolved.
reverse_number_line()
- Incorrect results in some instances.
Resolved.
links()
- Incorrect iteration
(pids
slot) for non-matches.
Resolved.
links()
and episodes()
- Timing for each iteration was incorrect.
Resolved.
diyar 0.4.1 (2021-12-05)
New features
- New function -
overlap_method_names()
. Overlap methods for a
corresponding overlap method codes.
- Memory usage added to
*with_report
options for display.
Changes
"chain"
overlap method split into "x_chain_y"
and "y_chain_x"
.
"chain"
will continue to be supported as a keyword for
"x_chain_y" OR "y_chain_x"
method
"across"
overlap method split into "x_across_y"
and
"y_across_x"
. "across"
will continue to be supported as a keyword
for "x_across_y" OR "y_across_x"
methods
"inbetween"
overlap method split into "x_inbetween_y"
and
"y_inbetween_x"
. "inbetween"
will continue to be supported as a
keyword for "x_inbetween_y" OR "y_inbetween_x"
methods
- Optimised
overlaps()
.
- Some overlap method codes have changed. Please review any previously
specified codes with
overlap_method_names()
.
Bug fixes
make_batch_pairs()
(internal) created invalid record pairs.
Resolved.
diyar 0.4.0 (2021-11-30)
New features
- New function -
reframe()
. Modify the attributes of a sub_criteria
object.
- New function -
link_records()
. Record linkage by creating all record
pairs as opposed to batches as with link()
.
- New function -
make_pairs()
. Create every combination of
records-pairs for a given dataset.
- New function -
make_pairs_wf_source()
. Create records-pairs from
different sources only.
- New function -
make_ids()
. Convert an edge list to a group
identifier.
- New function -
merge_ids()
. Merge two group identifiers.
- New function -
attrs()
. Pass a set of attributes to one instance of
match_funcs
or equal_funcs
.
Changes
- Optimised
episodes_wf_splits()
- Optimised
episodes()
and links()
. Reduced processing times.
- Three new options for the
display
argument.
"progress_with_report"
, "stats_with_report"
and
"none_with_report"
. Creates a d_report
; a status of the analysis
over its run time.
eval_sub_criteria()
. Record-pairs are no longer created in the
function. Therefore, index_record
and sn
arguments have been
replaced with x_pos
and y_pos
.
link_records()
and links_wf_probabilistic()
. The cmp_threshold
argument has been renamed to attr_threshold
.
show_labels
argument in schema()
. Two new options - "wind_nm"
and "length"
to replace "length_label"
.
Bug fixes
- Incorrect
wind_id
list in episodes(..., data_link = "XX")
in .
Resolved.
- Incorrect
link_id
in links(..., recursive = TRUE)
. Resolved.
iteration
not recorded in some situations with episodes()
.
Resolved.
skip_order
ends an open episode. Resolved.
NA
in dist_wind_index
and dist_epid_index
when sn
is supplied.
Resolved.
overlap_method_codes()
- overlap method codes not recycled properly.
Resolved.
diyar 0.3.1 (2021-08-09)
New features
- New function -
delink()
. Unlink identifiers.
- New function -
episodes_wf_splits()
. Wrapper function of
episodes()
. Better optimised for handling datasets with many
duplicate records.
- New function -
combi()
. Numeric codes for unique combination of
vectors.
- New function -
attr_eval()
. Recursive evaluation of a function on
each attribute of a sub_criteria
.
Changes
- Two new
case_nm
values - Case_CR
and Recurrence_CR
which are
Case
and Recurrence
without a sub-criteria match.
Bug fixes
- Corrected length arrows in
schema.epid
.
- Corrected outcome of
eval_sub_criteria
with 1 result.
diyar 0.3.0 (2021-04-25)
New features
- New function -
links_wf_probabilistic()
. Probabilistic record
linkage.
- New function -
partitions()
. Spilt events into sections in time.
- New function -
schema()
. Plot schema diagrams for pid
, epid
,
pane
and number_line
objects.
- New functions -
encode()
and decode()
. Encode and decode slots
values to minimise memory usage.
- New argument in
episodes()
- case_sub_criteria
and
recurrence_sub_criteria
. Additional matching conditions for temporal
links.
- New argument in
episodes()
- case_length_total
and
recurrence_length_total
. Number of temporal links required for a
window
/episode
.
- New argument in
links()
- recursive
. Control if matches can spawn
new matches.
- New argument in
links()
- check_duplicates
. Control the checking
of logical tests on duplicate values. If FALSE
, results are recycled
for the duplicates.
as.data.frame
and as.list
S3 methods for the pid
, number_line
,
epid
, pane
objects.
- New option for
episode_type
in episodes()
- “recursive”. For
recursive episodes where every linked events can be used as a
subsequent index event.
recurrence_from_last
renamed to reference_event
and given two new
options.
Changes
episodes()
and links()
. Speed improvements.
- Default time zone for an
epid_interval
or pane_interval
with
POSIXct
objects is now “GMT”.
number_line_sequence()
- splits number_line objects. Also available
as a seq
method.
epid_total
, pid_total
and pane_total
slots are populated by
default. No need to used group_stats
to get these.
to_df()
- Removed. Use as.data.frame()
instead.
to_s4()
- Now an internal function. It’s no longer exported.
compress_number_line()
- Now an internal function. It’s no longer
exported. Use episodes()
instead.
sub_criteria()
- produces a sub_criteria
object. Nested “AND” and
“OR” conditions are now possible.
case_overlap_methods
, recurrence_overlap_methods
and
overlap_methods
now take integer
codes for different combinations
of overlap methods. See overlap_methods$options
for the full list.
character
inputs are still supported.
Bug fixes
"Single-record"
was wrong in links
summary output. Resolved.
diyar 0.2.0 (2020-09-17)
New features
- Better support for
Inf
in number_line
objects.
- Can now use multiple
case_length
or recurrence_length
for the same
event.
- Can now use multiple
overlap_methods
for the corresponding
case_length
and recurrence_length
.
- New function
links()
to replace record_group()
.
- New function
sub_criteria()
. The new way of supplying a
sub_criteria
in links()
.
- New functions
exact_match()
, range_match()
and
range_match_legacy()
. Predefined logical tests for use with
sub_criteria()
. User-defined tests can also be used. See
?sub_criteria
.
- New function
custom_sort()
for nested sorting.
- New function
epid_lengths()
to show the required case_length
or
recurrence_length
for an analyses. Useful in confirming the required
case_length
or recurrence_length
for episode tracking.
- New function
epid_windows()
. Shows the period a date
will overlap
with given a particular case_length
or recurrence_length
. Useful
in confirming the required case_length
or recurrence_length
for
episode tracking.
- New argument -
strata
in links()
. Useful for stratified data
linkage. As in stratified episode tracking, a record with a missing
strata
(NA_character_
) is skipped from data linkage.
- New argument -
data_links
in links()
. Unlink record groups that do
not include records from certain data sources
- New convenience functions
listr()
. Format atomic
vectors as a written list.
combns()
. An extension of combn
to generate permutations not
ordinarily captured by combn
.
- New
iteration
slot for pid
and epid
objects
- New
overlap_method
- reverse()
Changes
number_line()
- l
and r
must have the same length or be 1
.
episodes()
- case_nm
differentiates between duplicates of "Case"
("Duplicate_C"
) and "Recurrent"
events ("Duplicate_R"
).
- Strata and episode-level options for most arguments. This gives
greater flexibility within the same instance of
episodes()
.
- Episode-level - The behaviour for each episode is determined by the
corresponding option for its index event (
"Case"
).
episode_type
- simultaneously track both "fixed"
and
"rolling"
episodes.
skip_if_b4_lengths
- simultaneously track episodes where events
before a cut-off range are both skipped and not skipped.
episode_unit
- simultaneously track episodes by different units
of time.
case_for_recurrence
- simultaneously track "rolling"
episodes
with and without an additional case window for recurrent events.
recurrence_from_last
- simultaneously track "rolling"
episodes
with reference windows calculated from the first and last event of
the previous window.
- Strata-level - The behaviour for each episode is determined by the
corresponding option for its
strata
. Options must be the same in
each strata.
from_last
- simultaneously track episodes in both directions of
time - past to present and present to past.
episodes_max
- simultaneously track different number of episodes
within the dataset.
include_overlap_method
- "overlap"
and "none"
will not be
combined with other methods.
"overlap"
- mutually inclusive with the other methods, so their
inclusion is not necessary.
"none"
- mutually exclusive and prioritised over the other methods
(including "none"
), so their inclusion is not necessary.
- Events can now have missing cut-off points (
NA_real_
) or periods
(number_line(NA_real_, NA_real_)
) case_length
and
recurrence_length
. This ensures that the event does not become an
index case however, it can still be part of different episode. For
reference, an event with a missing strata
(NA_character_
) ensures
that the event does not become an index case nor part of any episode.
Bug fixes
fixed_episodes
, rolling_episodes
and episode_group
-
include_index_period
didn’t work in certain situations. Corrected.
fixed_episodes
, rolling_episodes
and episode_group
-
dist_from_wind
was wrong in certain situations. Corrected.
diyar 0.1.0 (2020-06-13)
New features
- New argument in
record_group()
- strata
. Perform record linkage
separately within subsets of a dataset.
- New argument in
overlap()
, compress_number_line()
,
fixed_sepisodes()
, rolling_episodes()
and episode_group()
-
overlap_methods
and methods
. Replaces overlap_method
and
method
respectively. Use different sets of methods within the same
dataset when grouping episodes or collapsing number_line
objects.
overlap_method
and method
only permits 1 method per per dataset.
- New slot in
epid
objects - win_nm
. Shows the type of window each
event belongs to i.e. case or recurrence window
- New slot in
epid
objects - win_id
. Unique ID for each window. The
ID is the sn
of the reference event for each window
- Format of
epid
objects updated to reflect this
- New slot in
epid
objects - dist_from_wind
. Shows the duration of
each event from its window’s reference event
- New slot in
epid
objects - dist_from_epid
. Shows the duration of
each event from its episode’s reference event
- New argument in
episode_group()
and rolling_episodes()
-
recurrence_from_last
. Determine if reference events should be the
first or last event from the previous window.
- New argument in
episode_group()
and rolling_episodes()
-
case_for_recurrence
. Determine if recurrent events should have their
own case windows or not.
- New argument in
episode_group()
, fixed_episodes()
and
rolling_episodes()
- data_links
. Unlink episodes that do not
include records from certain data_source(s)
.
episode_group()
, fixed_episodes()
and rolling_episodes()
-
case_length
and recurrence_length
arguments. You can now use a
range (number_line
object).
- New argument in
episode_group()
, fixed_episodes()
and
rolling_episodes()
- include_index_period
. If TRUE
, overlaps
with the index event or period are grouped together even if they are
outside the cut-off range (case_length
or recurrence_length
).
- New slot in
pid
objects - link_id
. Shows the record (sn
slot) to
which every record in the dataset has matched to.
- New function -
invert_number_line()
. Invert the left
and/or
right
points to the opposite end of the number line
- New accessor functions -
left_point(x)<-
, right_point(x)<-
,
start_point(x)<-
and end_point(x)<-
Changes
overlap()
renamed to overlaps()
. overlap()
is now a convenience
overlap_method
to capture ANY kind of overlap.
"none"
is another convenience overlap_method
for NO kind of
overlap
expand_number_line()
- new options for point
; "left"
and
"right"
compress_number_line()
- compressed number_line
object inherits
the direction of the widest number_line
among overlapping group of
number_line
objects
overlap_methods
- have been changed such that each pair of
number_line
objects can only overlap in one way. E.g.
"chain"
and "aligns_end"
used to be possible but this is now
considered a "chain"
overlap only
"aligns_start"
and "aligns_end"
use to be possible but this is
now considered an "exact"
overlap
number_line_sequence()
- Output is now a list
.
number_line_sequence()
- now works across multiple number_line
objects.
to_df()
- can now change number_line
objects to data.frames.
to_s4()
can do the reverse.
epid
objects are the default outputs for fixed_episodes()
,
rolling_episodes()
and episode_group()
pid
objects are the default outputs for record_group()
- In episode grouping, the
case_nm
for events that were skipped due to
rolls_max
or episodes_max
is now "Skipped"
.
- In
episode_group()
and record_group()
, sn
can be negative
numbers but must still be unique
- Optimised
episode_group()
and record_group()
. Runs just a little
bit faster …
- Relaxed the requirement for
x
and y
to have the same lengths in
overlap functions.
- The behaviour of overlap functions will now be the same as that of
standard R logical tests
episode_group
- case_length
and recurrence_length
arguments. Now
accepts negative numbers.
- negative “lengths” will collapse two periods into one, if the second
one is within some days before the
end_point()
of the first
period.
- if the “lengths” are larger than the
number_line_width()
, both
will be collapsed if the second one is within some days (or any
other episode_unit
) before the start_point()
of the first
period.
- cheat sheet updated
Bug fixes
- Recurrence was not checked if the initial case event had no
duplicates. Resolved
case_nm
wasn’t right for rolling episodes. Resolved
diyar 0.0.3 (2019-12-08)
Changes
- #7
episode_group()
, fixed_episodes()
and rolling_episodes()
-
optimized to take less time when working with large datasets
episode_group()
, fixed_episodes()
and rolling_episodes()
-
date
argument now supports numeric values
compress_number_line()
- the output (gid
slot) is now a group
identifier just like in epid
objects (epid_interval
)
diyar 0.0.2 (2019-11-11)
New feature
pid
S4 object class for results of record_group()
. This will
replace the current default (data.frame
) in the next major release
epid
S4 object class for results of episode_group()
,
fixed_episodes()
and rolling_episodes()
. This will replace the
current default (data.frame
) in the next release
to_s4()
and to_s4
argument in record_group()
, episode_group()
,
fixed_episodes()
and rolling_episodes()
. Changes their output from
a data.frame
(current default) to epid
or pid
objects
to_df()
changes epid
or pid
objects to a data.frame
deduplicate
argument from fixed_episodes()
and
rolling_episodes()
added to episode_group()
Changes
fixed_episodes()
and rolling_episodes()
are now wrapper functions
of episode_group()
. Functionality remains the same but now includes
all arguments available to episode_group()
- Changed the output of
fixed_episodes()
and rolling_episodes()
from
number_line
to data.frame
, pending the change to epid
objects
pid_cri
column returned in record_group
is now numeric
. 0
indicates no match.
- columns can now be used as
criteria
multiple times record_group()
- #6
number_line
objects can now be used as a criteria
in record_group()
Bug fixes
- #3 - Resolved a bug
with
episode_unit
in episode_group()
- #4 - Resolved a bug
with
bi_direction
in episode_group()
diyar 0.0.1 (2019-10-06)
Features
fixed_episodes()
and rolling_episodes()
- Group records into fixed
or rolling episodes of events or period of events.
episode_group()
- A more comprehensive implementation of
fixed_episodes()
and rolling_episodes()
, with additional features
such as user defined case assignment.
record_group()
- Multistage deterministic linkage that addresses
missing data.
number_line
S4 object.
- Used to represent a range of numeric values to match using
record_group()
- Used to represent a period in time to be grouped using
fixed_episodes()
, rolling_episodes()
and episode_group()
- Used as the returned output of
fixed_episodes()
and
rolling_episodes()