The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
links()
- Incorrect results in some situations.
Resolved.links_af_probabilistic()
- Failed in some situations.
Resolved."semi"
) for the batched
argument in links()
. All matches are compared against the
record-set in the next iteration. Therefore, the number of record-pairs
increase exponentially as new matches are found. This means fewer
record-pairs (memory usage) but a longer run time compared to the
"no"
option. Conversely, it leads to more record-pairs
(memory usage) but a shorter run time compared to the "yes"
option.batched
) in episodes()
split
) in episodes()
. Split
the analysis in N
-splits of strata
. This leads
to fewer record-pairs (and memory usage) but a longer run time.decode
) in
as.data.frame.pid()
, as.data.frame.epid()
and
as.data.frame.pane()
episodes_af_shift()
. A more vectorised
approach to episodes()
based on
epidm::group_time()
.links_wf_episodes()
. Implantation of
episodes()
using links()
.episodes()
and links()
. Each
iteration now uses less time and memory.link_id
slot in pid
objects is now a
list
.links()
- records with missing values in a
sub_criteria
are now skipped at the corresponding
iteration.links()
- recursive
.
This now takes any of three options
[c("linked", "unlinked", "none")]
.
[c("linked", "unlinked")]
collectively were previously
[TRUE]
, while ["none"]
was previously
[FALSE]
.as.epids()
now calls make_episodes()
.window
argument in
partitions()
is now NULL
as.data.frame()
and as.data.list()
now
only creates elements/fields from non-empty fieldsid
and gid
slots in
number_line
objects are now integer(0)
by
default.episode_group()
, record_group()
and
range_match_legacy()
have been removed.["recurisve"]
episodes from episodes()
are
now presented as ["rolling"]
episodes with
reference_event = "all_records"
i.e
Old syntax ~ episodes(..., episode_type == "recursive")
New syntax ~ episodes(..., episode_type == "rolling", reference_event = "all_records")
recursive
was TRUE
,
links()
ended prematurely and therefore missed some
matches. Resolved.recurrence_sub_criteria
in episodes()
was
not implemented correctly and lead to incorrect linkage result in some
instances. Resolved.overlap_method()
- logical tests recycled incorrectly.
Resolved.check_links
argument - Option "g"
implemented as option "l"
. Resolved.make_pairs_wf_source()
. Created incorrect pairs.
Resolved.case_sub_criteria
and
recurrence_sub_criteria
in episodes()
led to
incorrect results. Resolved.merge_ids()
- shrink
and
expand
.plot
.format
.true()
. Predefined logical test for use
with sub_criteria()
.false()
. Predefined logical test for use
with sub_criteria()
.links()
- batched
. Specify
if all record pairs are created or compared at once ("no"
)
or in batches ("yes"
).links()
- repeats_allowed
.
Specify if record-pairs with duplicate elements should be created.links()
-
permutations_allowed
. Specify if permutations of the same
record-pair should be created.links()
-
ignore_same_source
. Specify if record-pairs from different
datasets should be created.
eval_sub_criteria()
-
depth
. First order of recursion.sets()
and make_sets()
.
Create permutations of record-sets.links()
- When shrink
is
TRUE
, records in a record-group must meet every listed
match criteria
and sub_criteria
. For example,
if pid_cri
is 3, then the record must have meet matched
another on the the first three match criteria.links()
- pid@iteration
now tracks when a
record was dealt with instead of when it was assigned to a record-group.
For example, a record can be closed (matched or not matched) at
iteration 1 but assigned to a record-group at iteration 5.make_pairs()
- x.*
and y.*
values in the output are now swapped.sub_criteria
can now export any data created by
match_func
. To do this, match_func
must export
a list
, where the first element is a logical object. See an
example below.library(diyar)
<- rep(month.abb[1:5], 2); val
val #> [1] "Jan" "Feb" "Mar" "Apr" "May" "Jan" "Feb" "Mar" "Apr" "May"
<- function(x, y){
match_and_export <- list(x == y,
output data.frame(x_val = x, y_val = y, is_match = x == y))
return(output)
}.1 <- sub_criteria(
sub.crimatch_funcs = list(match.export = match_and_export)
val,
)
format(sub.cri.1, show_levels = TRUE)
#> logical_test-{
#> Lv.0.1-match.export(Jan,Feb,Mar ...)
#> }
eval_sub_criteria(sub.cri.1)
#> $logical_test
#> [1] 1 0 0 0 0 1 0 0 0 0
#>
#> $mf.0.1
#> x_val y_val is_match
#> 1 Jan Jan TRUE
#> 2 Feb Jan FALSE
#> 3 Mar Jan FALSE
#> 4 Apr Jan FALSE
#> 5 May Jan FALSE
#> 6 Jan Jan TRUE
#> 7 Feb Jan FALSE
#> 8 Mar Jan FALSE
#> 9 Apr Jan FALSE
#> 10 May Jan FALSE
links
can now export any data created within a
sub_criteria
. To do this, the sub_criteria
must be created as described above. See an example below<- 1:5
val <- function(x, y){
diff_one_and_export <- x - y
diff <- diff <= 1
is_match <- list(is_match,
output data.frame(x_val = x, y_val = y, diff = diff, is_match = is_match))
return(output)
}.2 <- sub_criteria(
sub.crimatch_funcs = list(diff.export = diff_one_and_export)
val,
)links(
criteria = "place_holder",
sub_criteria = list("cr1" = sub.cri.2))
#> $pid
#> [1] "P.1 (CRI 001)" "P.1 (CRI 001)" "P.3 (CRI 001)" "P.3 (CRI 001)"
#> [5] "P.5 (No hits)"
#>
#> $export
#> $export$cri.1
#> $export$cri.1$iteration.1
#> $export$cri.1$iteration.1$mf.0.1
#> x_val y_val diff is_match
#> 1 1 1 0 TRUE
#> 2 2 1 1 TRUE
#> 3 3 1 2 FALSE
#> 4 4 1 3 FALSE
#> 5 5 1 4 FALSE
#>
#>
#> $export$cri.1$iteration.2
#> $export$cri.1$iteration.2$mf.0.1
#> x_val y_val diff is_match
#> 1 3 3 0 TRUE
#> 2 4 3 1 TRUE
#> 3 5 3 2 FALSE
summary.epid()
- Incorrect count for
‘by episode type
’. Resolved.episodes()
- Incorrect results in some instances with
skip_order
. Resolved.make_ids()
- Did not capture all records in that should
be in a record-group when matches are recursive. Resolved.make_pairs()
- Incorrect record-pairs in some
instances. Resolved.eval_sub_criteria()
- When output of
match_func
is length one, it’s not recycled. Resolved.reverse_number_line()
- Incorrect results in some
instances. Resolved.links()
- Incorrect iteration
(pids
slot) for non-matches. Resolved.links()
and episodes()
- Timing for each
iteration was incorrect. Resolved.overlap_method_names()
. Overlap methods
for a corresponding overlap method codes.*with_report
options for
display."chain"
overlap method split into
"x_chain_y"
and "y_chain_x"
.
"chain"
will continue to be supported as a keyword for
"x_chain_y" OR "y_chain_x"
method"across"
overlap method split into
"x_across_y"
and "y_across_x"
.
"across"
will continue to be supported as a keyword for
"x_across_y" OR "y_across_x"
methods"inbetween"
overlap method split into
"x_inbetween_y"
and "y_inbetween_x"
.
"inbetween"
will continue to be supported as a keyword for
"x_inbetween_y" OR "y_inbetween_x"
methodsoverlaps()
.overlap_method_names()
.make_batch_pairs()
(internal) created invalid record
pairs. Resolved.reframe()
. Modify the attributes of a
sub_criteria
object.link_records()
. Record linkage by
creating all record pairs as opposed to batches as with
link()
.make_pairs()
. Create every combination
of records-pairs for a given dataset.make_pairs_wf_source()
. Create
records-pairs from different sources only.make_ids()
. Convert an edge list to a
group identifier.merge_ids()
. Merge two group
identifiers.attrs()
. Pass a set of attributes to one
instance of match_funcs
or equal_funcs
.episodes_wf_splits()
episodes()
and links()
. Reduced
processing times.display
argument.
"progress_with_report"
, "stats_with_report"
and "none_with_report"
. Creates a d_report
; a
status of the analysis over its run time.eval_sub_criteria()
. Record-pairs are no longer created
in the function. Therefore, index_record
and
sn
arguments have been replaced with x_pos
and
y_pos
.link_records()
and
links_wf_probabilistic()
. The cmp_threshold
argument has been renamed to attr_threshold
.show_labels
argument in schema()
. Two new
options - "wind_nm"
and "length"
to replace
"length_label"
.wind_id
list in
episodes(..., data_link = "XX")
in . Resolved.link_id
in
links(..., recursive = TRUE)
. Resolved.iteration
not recorded in some situations with
episodes()
. Resolved.skip_order
ends an open episode. Resolved.NA
in dist_wind_index
and
dist_epid_index
when sn
is supplied.
Resolved.overlap_method_codes()
- overlap method codes not
recycled properly. Resolved.delink()
. Unlink identifiers.episodes_wf_splits()
. Wrapper function
of episodes()
. Better optimised for handling datasets with
many duplicate records.combi()
. Numeric codes for unique
combination of vectors.attr_eval()
. Recursive evaluation of a
function on each attribute of a sub_criteria
.case_nm
values - Case_CR
and
Recurrence_CR
which are Case
and
Recurrence
without a sub-criteria match.schema.epid
.eval_sub_criteria
with 1
result.links_wf_probabilistic()
. Probabilistic
record linkage.partitions()
. Spilt events into sections
in time.schema()
. Plot schema diagrams for
pid
, epid
, pane
and
number_line
objects.encode()
and decode()
.
Encode and decode slots values to minimise memory usage.episodes()
-
case_sub_criteria
and recurrence_sub_criteria
.
Additional matching conditions for temporal links.episodes()
-
case_length_total
and recurrence_length_total
.
Number of temporal links required for a
window
/episode
.links()
- recursive
.
Control if matches can spawn new matches.links()
-
check_duplicates
. Control the checking of logical tests on
duplicate values. If FALSE
, results are recycled for the
duplicates.as.data.frame
and as.list
S3 methods for
the pid
, number_line
, epid
,
pane
objects.episode_type
in episodes()
- “recursive”. For recursive episodes where every linked events can be
used as a subsequent index event.recurrence_from_last
renamed to
reference_event
and given two new options.episodes()
and links()
. Speed
improvements.epid_interval
or
pane_interval
with POSIXct
objects is now
“GMT”.number_line_sequence()
- splits number_line objects.
Also available as a seq
method.epid_total
, pid_total
and
pane_total
slots are populated by default. No need to used
group_stats
to get these.to_df()
- Removed. Use as.data.frame()
instead.to_s4()
- Now an internal function. It’s no longer
exported.compress_number_line()
- Now an internal function. It’s
no longer exported. Use episodes()
instead.sub_criteria()
- produces a sub_criteria
object. Nested “AND” and “OR” conditions are now possible.case_overlap_methods
,
recurrence_overlap_methods
and overlap_methods
now take integer
codes for different combinations of
overlap methods. See overlap_methods$options
for the full
list. character
inputs are still supported."Single-record"
was wrong in links
summary
output. Resolved.Inf
in number_line
objects.case_length
or
recurrence_length
for the same event.
overlap_methods
for the
corresponding case_length
and
recurrence_length
.links()
to replace
record_group()
.sub_criteria()
. The new way of supplying a
sub_criteria
in links()
.exact_match()
, range_match()
and range_match_legacy()
. Predefined logical tests for use
with sub_criteria()
. User-defined tests can also be used.
See ?sub_criteria
.custom_sort()
for nested sorting.epid_lengths()
to show the required
case_length
or recurrence_length
for an
analyses. Useful in confirming the required case_length
or
recurrence_length
for episode tracking.epid_windows()
. Shows the period a
date
will overlap with given a particular
case_length
or recurrence_length
. Useful in
confirming the required case_length
or
recurrence_length
for episode tracking.strata
in links()
. Useful
for stratified data linkage. As in stratified episode tracking, a record
with a missing strata
(NA_character_
) is
skipped from data linkage.data_links
in links()
.
Unlink record groups that do not include records from certain data
sourceslistr()
. Format atomic
vectors as a
written list.combns()
. An extension of combn
to
generate permutations not ordinarily captured by
combn
.iteration
slot for pid
and
epid
objectsoverlap_method
- reverse()
number_line()
- l
and r
must
have the same length or be 1
.episodes()
- case_nm
differentiates
between duplicates of "Case"
("Duplicate_C"
)
and "Recurrent"
events ("Duplicate_R"
).episodes()
.
"Case"
).
episode_type
- simultaneously track both
"fixed"
and "rolling"
episodes.skip_if_b4_lengths
- simultaneously track episodes
where events before a cut-off range are both skipped and not
skipped.episode_unit
- simultaneously track episodes by
different units of time.case_for_recurrence
- simultaneously track
"rolling"
episodes with and without an additional case
window for recurrent events.recurrence_from_last
- simultaneously track
"rolling"
episodes with reference windows calculated from
the first and last event of the previous window.strata
. Options must be the
same in each strata.
from_last
- simultaneously track episodes in both
directions of time - past to present and present to past.episodes_max
- simultaneously track different number of
episodes within the dataset.include_overlap_method
- "overlap"
and
"none"
will not be combined with other methods.
"overlap"
- mutually inclusive with the other methods,
so their inclusion is not necessary."none"
- mutually exclusive and prioritised over the
other methods (including "none"
), so their inclusion is not
necessary.NA_real_
)
or periods (number_line(NA_real_, NA_real_)
)
case_length
and recurrence_length
. This
ensures that the event does not become an index case however, it can
still be part of different episode. For reference, an event with a
missing strata
(NA_character_
) ensures that
the event does not become an index case nor part of any episode.fixed_episodes
, rolling_episodes
and
episode_group
- include_index_period
didn’t
work in certain situations. Corrected.fixed_episodes
, rolling_episodes
and
episode_group
- dist_from_wind
was wrong in
certain situations. Corrected.record_group()
- strata
.
Perform record linkage separately within subsets of a dataset.overlap()
,
compress_number_line()
, fixed_sepisodes()
,
rolling_episodes()
and episode_group()
-
overlap_methods
and methods
. Replaces
overlap_method
and method
respectively. Use
different sets of methods within the same dataset when grouping episodes
or collapsing number_line
objects.
overlap_method
and method
only permits 1
method per per dataset.epid
objects - win_nm
. Shows
the type of window each event belongs to i.e. case or recurrence
windowepid
objects - win_id
. Unique
ID for each window. The ID is the sn
of the reference event
for each window
epid
objects updated to reflect thisepid
objects - dist_from_wind
.
Shows the duration of each event from its window’s reference eventepid
objects - dist_from_epid
.
Shows the duration of each event from its episode’s reference eventepisode_group()
and
rolling_episodes()
- recurrence_from_last
.
Determine if reference events should be the first or last event from the
previous window.episode_group()
and
rolling_episodes()
- case_for_recurrence
.
Determine if recurrent events should have their own case windows or
not.episode_group()
,
fixed_episodes()
and rolling_episodes()
-
data_links
. Unlink episodes that do not include records
from certain data_source(s)
.episode_group()
, fixed_episodes()
and
rolling_episodes()
- case_length
and
recurrence_length
arguments. You can now use a range
(number_line
object).episode_group()
,
fixed_episodes()
and rolling_episodes()
-
include_index_period
. If TRUE
, overlaps with
the index event or period are grouped together even if they are outside
the cut-off range (case_length
or
recurrence_length
).pid
objects - link_id
. Shows
the record (sn
slot) to which every record in the dataset
has matched to.invert_number_line()
. Invert the
left
and/or right
points to the opposite end
of the number lineleft_point(x)<-
,
right_point(x)<-
, start_point(x)<-
and
end_point(x)<-
overlap()
renamed to overlaps()
.
overlap()
is now a convenience overlap_method
to capture ANY kind of overlap."none"
is another convenience
overlap_method
for NO kind of overlapexpand_number_line()
- new options for
point
; "left"
and "right"
compress_number_line()
- compressed
number_line
object inherits the direction of the widest
number_line
among overlapping group of
number_line
objectsoverlap_methods
- have been changed such that each pair
of number_line
objects can only overlap in one way. E.g.
"chain"
and "aligns_end"
used to be
possible but this is now considered a "chain"
overlap
only"aligns_start"
and "aligns_end"
use to be
possible but this is now considered an "exact"
overlapnumber_line_sequence()
- Output is now a
list
.number_line_sequence()
- now works across multiple
number_line
objects.to_df()
- can now change number_line
objects to data.frames.
to_s4()
can do the reverse.epid
objects are the default outputs for
fixed_episodes()
, rolling_episodes()
and
episode_group()
pid
objects are the default outputs for
record_group()
case_nm
for events that were
skipped due to rolls_max
or episodes_max
is
now "Skipped"
.episode_group()
and record_group()
,
sn
can be negative numbers but must still be uniqueepisode_group()
and
record_group()
. Runs just a little bit faster …x
and y
to
have the same lengths in overlap functions.
episode_group
- case_length
and
recurrence_length
arguments. Now accepts negative numbers.
end_point()
of the first
period.
number_line_width()
, both will be collapsed if the second
one is within some days (or any other episode_unit
) before
the start_point()
of the first period.case_nm
wasn’t right for rolling episodes.
Resolvedepisode_group()
, fixed_episodes()
and
rolling_episodes()
- optimized to take less time when
working with large datasetsepisode_group()
, fixed_episodes()
and
rolling_episodes()
- date
argument now
supports numeric valuescompress_number_line()
- the output (gid
slot) is now a group identifier just like in epid
objects
(epid_interval
)pid
S4 object class for results of
record_group()
. This will replace the current default
(data.frame
) in the next major releaseepid
S4 object class for results of
episode_group()
, fixed_episodes()
and
rolling_episodes()
. This will replace the current default
(data.frame
) in the next releaseto_s4()
and to_s4
argument in
record_group()
, episode_group()
,
fixed_episodes()
and rolling_episodes()
.
Changes their output from a data.frame
(current default) to
epid
or pid
objectsto_df()
changes epid
or pid
objects to a data.frame
deduplicate
argument from fixed_episodes()
and rolling_episodes()
added to
episode_group()
fixed_episodes()
and rolling_episodes()
are now wrapper functions of episode_group()
. Functionality
remains the same but now includes all arguments available to
episode_group()
fixed_episodes()
and
rolling_episodes()
from number_line
to
data.frame
, pending the change to epid
objectspid_cri
column returned in record_group
is
now numeric
. 0
indicates no match.criteria
multiple times
record_group()
number_line
objects can now be used as a
criteria
in record_group()
episode_unit
in
episode_group()
bi_direction
in
episode_group()
fixed_episodes()
and rolling_episodes()
-
Group records into fixed or rolling episodes of events or period of
events.episode_group()
- A more comprehensive implementation
of fixed_episodes()
and rolling_episodes()
,
with additional features such as user defined case assignment.record_group()
- Multistage deterministic linkage that
addresses missing data.number_line
S4 object.
record_group()
fixed_episodes()
, rolling_episodes()
and
episode_group()
fixed_episodes()
and
rolling_episodes()
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.