The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
links() - Incorrect results in some situations.
Resolved.links_af_probabilistic() - Failed in some situations.
Resolved."semi") for the batched
argument in links(). All matches are compared against the
record-set in the next iteration. Therefore, the number of record-pairs
increase exponentially as new matches are found. This means fewer
record-pairs (memory usage) but a longer run time compared to the
"no" option. Conversely, it leads to more record-pairs
(memory usage) but a shorter run time compared to the "yes"
option.batched) in episodes()split) in episodes(). Split
the analysis in N-splits of strata. This leads
to fewer record-pairs (and memory usage) but a longer run time.decode) in
as.data.frame.pid(), as.data.frame.epid() and
as.data.frame.pane()episodes_af_shift(). A more vectorised
approach to episodes() based on
epidm::group_time().links_wf_episodes(). Implantation of
episodes() using links().episodes() and links(). Each
iteration now uses less time and memory.link_id slot in pid objects is now a
list.links() - records with missing values in a
sub_criteria are now skipped at the corresponding
iteration.links()- recursive.
This now takes any of three options
[c("linked", "unlinked", "none")] .
[c("linked", "unlinked")] collectively were previously
[TRUE], while ["none"] was previously
[FALSE].as.epids() now calls make_episodes().window argument in
partitions() is now NULLas.data.frame() and as.data.list() now
only creates elements/fields from non-empty fieldsid and gid slots in
number_line objects are now integer(0) by
default.episode_group(), record_group() and
range_match_legacy() have been removed.["recurisve"] episodes from episodes() are
now presented as ["rolling"] episodes with
reference_event = "all_records" i.e
Old syntax ~ episodes(..., episode_type == "recursive")New syntax ~ episodes(..., episode_type == "rolling", reference_event = "all_records")recursive was TRUE,
links() ended prematurely and therefore missed some
matches. Resolved.recurrence_sub_criteria in episodes() was
not implemented correctly and lead to incorrect linkage result in some
instances. Resolved.overlap_method() - logical tests recycled incorrectly.
Resolved.check_links argument - Option "g"
implemented as option "l". Resolved.make_pairs_wf_source(). Created incorrect pairs.
Resolved.case_sub_criteria and
recurrence_sub_criteria in episodes() led to
incorrect results. Resolved.merge_ids() - shrink and
expand.plot.format.true(). Predefined logical test for use
with sub_criteria().false(). Predefined logical test for use
with sub_criteria().links()- batched. Specify
if all record pairs are created or compared at once ("no")
or in batches ("yes").links()- repeats_allowed.
Specify if record-pairs with duplicate elements should be created.links()-
permutations_allowed. Specify if permutations of the same
record-pair should be created.links()-
ignore_same_source. Specify if record-pairs from different
datasets should be created.
eval_sub_criteria()-
depth. First order of recursion.sets() and make_sets().
Create permutations of record-sets.links() - When shrink is
TRUE, records in a record-group must meet every listed
match criteria and sub_criteria. For example,
if pid_cri is 3, then the record must have meet matched
another on the the first three match criteria.links() - pid@iteration now tracks when a
record was dealt with instead of when it was assigned to a record-group.
For example, a record can be closed (matched or not matched) at
iteration 1 but assigned to a record-group at iteration 5.make_pairs() - x.* and y.*
values in the output are now swapped.sub_criteria can now export any data created by
match_func. To do this, match_func must export
a list, where the first element is a logical object. See an
example below.library(diyar)
val <- rep(month.abb[1:5], 2); val
#> [1] "Jan" "Feb" "Mar" "Apr" "May" "Jan" "Feb" "Mar" "Apr" "May"
match_and_export <- function(x, y){
output <- list(x == y,
data.frame(x_val = x, y_val = y, is_match = x == y))
return(output)
}
sub.cri.1 <- sub_criteria(
val, match_funcs = list(match.export = match_and_export)
)
format(sub.cri.1, show_levels = TRUE)
#> logical_test-{
#> Lv.0.1-match.export(Jan,Feb,Mar ...)
#> }
eval_sub_criteria(sub.cri.1)
#> $logical_test
#> [1] 1 0 0 0 0 1 0 0 0 0
#>
#> $mf.0.1
#> x_val y_val is_match
#> 1 Jan Jan TRUE
#> 2 Feb Jan FALSE
#> 3 Mar Jan FALSE
#> 4 Apr Jan FALSE
#> 5 May Jan FALSE
#> 6 Jan Jan TRUE
#> 7 Feb Jan FALSE
#> 8 Mar Jan FALSE
#> 9 Apr Jan FALSE
#> 10 May Jan FALSElinks can now export any data created within a
sub_criteria. To do this, the sub_criteria
must be created as described above. See an example belowval <- 1:5
diff_one_and_export <- function(x, y){
diff <- x - y
is_match <- diff <= 1
output <- list(is_match,
data.frame(x_val = x, y_val = y, diff = diff, is_match = is_match))
return(output)
}
sub.cri.2 <- sub_criteria(
val, match_funcs = list(diff.export = diff_one_and_export)
)
links(
criteria = "place_holder",
sub_criteria = list("cr1" = sub.cri.2))
#> $pid
#> [1] "P.1 (CRI 001)" "P.1 (CRI 001)" "P.3 (CRI 001)" "P.3 (CRI 001)"
#> [5] "P.5 (No hits)"
#>
#> $export
#> $export$cri.1
#> $export$cri.1$iteration.1
#> $export$cri.1$iteration.1$mf.0.1
#> x_val y_val diff is_match
#> 1 1 1 0 TRUE
#> 2 2 1 1 TRUE
#> 3 3 1 2 FALSE
#> 4 4 1 3 FALSE
#> 5 5 1 4 FALSE
#>
#>
#> $export$cri.1$iteration.2
#> $export$cri.1$iteration.2$mf.0.1
#> x_val y_val diff is_match
#> 1 3 3 0 TRUE
#> 2 4 3 1 TRUE
#> 3 5 3 2 FALSEsummary.epid() - Incorrect count for
‘by episode type’. Resolved.episodes() - Incorrect results in some instances with
skip_order. Resolved.make_ids() - Did not capture all records in that should
be in a record-group when matches are recursive. Resolved.make_pairs() - Incorrect record-pairs in some
instances. Resolved.eval_sub_criteria() - When output of
match_func is length one, it’s not recycled. Resolved.reverse_number_line() - Incorrect results in some
instances. Resolved.links()- Incorrect iteration
(pids slot) for non-matches. Resolved.links() and episodes() - Timing for each
iteration was incorrect. Resolved.overlap_method_names(). Overlap methods
for a corresponding overlap method codes.*with_report options for
display."chain" overlap method split into
"x_chain_y" and "y_chain_x".
"chain" will continue to be supported as a keyword for
"x_chain_y" OR "y_chain_x" method"across" overlap method split into
"x_across_y" and "y_across_x".
"across" will continue to be supported as a keyword for
"x_across_y" OR "y_across_x" methods"inbetween" overlap method split into
"x_inbetween_y" and "y_inbetween_x".
"inbetween" will continue to be supported as a keyword for
"x_inbetween_y" OR "y_inbetween_x" methodsoverlaps().overlap_method_names().make_batch_pairs() (internal) created invalid record
pairs. Resolved.reframe(). Modify the attributes of a
sub_criteria object.link_records(). Record linkage by
creating all record pairs as opposed to batches as with
link().make_pairs(). Create every combination
of records-pairs for a given dataset.make_pairs_wf_source(). Create
records-pairs from different sources only.make_ids(). Convert an edge list to a
group identifier.merge_ids(). Merge two group
identifiers.attrs(). Pass a set of attributes to one
instance of match_funcs or equal_funcs.episodes_wf_splits()episodes() and links(). Reduced
processing times.display argument.
"progress_with_report", "stats_with_report"
and "none_with_report". Creates a d_report; a
status of the analysis over its run time.eval_sub_criteria(). Record-pairs are no longer created
in the function. Therefore, index_record and
sn arguments have been replaced with x_pos and
y_pos.link_records() and
links_wf_probabilistic(). The cmp_threshold
argument has been renamed to attr_threshold.show_labels argument in schema(). Two new
options - "wind_nm" and "length" to replace
"length_label".wind_id list in
episodes(..., data_link = "XX") in . Resolved.link_id in
links(..., recursive = TRUE). Resolved.iteration not recorded in some situations with
episodes(). Resolved.skip_order ends an open episode. Resolved.NA in dist_wind_index and
dist_epid_index when sn is supplied.
Resolved.overlap_method_codes() - overlap method codes not
recycled properly. Resolved.delink(). Unlink identifiers.episodes_wf_splits(). Wrapper function
of episodes(). Better optimised for handling datasets with
many duplicate records.combi(). Numeric codes for unique
combination of vectors.attr_eval(). Recursive evaluation of a
function on each attribute of a sub_criteria.case_nm values - Case_CR and
Recurrence_CR which are Case and
Recurrence without a sub-criteria match.schema.epid.eval_sub_criteria with 1
result.links_wf_probabilistic(). Probabilistic
record linkage.partitions(). Spilt events into sections
in time.schema(). Plot schema diagrams for
pid, epid, pane and
number_line objects.encode() and decode().
Encode and decode slots values to minimise memory usage.episodes() -
case_sub_criteria and recurrence_sub_criteria.
Additional matching conditions for temporal links.episodes()-
case_length_total and recurrence_length_total.
Number of temporal links required for a
window/episode.links() - recursive.
Control if matches can spawn new matches.links() -
check_duplicates. Control the checking of logical tests on
duplicate values. If FALSE, results are recycled for the
duplicates.as.data.frame and as.list S3 methods for
the pid, number_line, epid,
pane objects.episode_type in episodes()
- “recursive”. For recursive episodes where every linked events can be
used as a subsequent index event.recurrence_from_last renamed to
reference_event and given two new options.episodes() and links(). Speed
improvements.epid_interval or
pane_interval with POSIXct objects is now
“GMT”.number_line_sequence() - splits number_line objects.
Also available as a seq method.epid_total, pid_total and
pane_total slots are populated by default. No need to used
group_stats to get these.to_df() - Removed. Use as.data.frame()
instead.to_s4() - Now an internal function. It’s no longer
exported.compress_number_line() - Now an internal function. It’s
no longer exported. Use episodes() instead.sub_criteria() - produces a sub_criteria
object. Nested “AND” and “OR” conditions are now possible.case_overlap_methods,
recurrence_overlap_methods and overlap_methods
now take integer codes for different combinations of
overlap methods. See overlap_methods$options for the full
list. character inputs are still supported."Single-record" was wrong in links summary
output. Resolved.Inf in number_line
objects.case_length or
recurrence_length for the same event.
overlap_methods for the
corresponding case_length and
recurrence_length.links() to replace
record_group().sub_criteria(). The new way of supplying a
sub_criteria in links().exact_match(), range_match()
and range_match_legacy(). Predefined logical tests for use
with sub_criteria(). User-defined tests can also be used.
See ?sub_criteria.custom_sort() for nested sorting.epid_lengths() to show the required
case_length or recurrence_length for an
analyses. Useful in confirming the required case_length or
recurrence_length for episode tracking.epid_windows(). Shows the period a
date will overlap with given a particular
case_length or recurrence_length. Useful in
confirming the required case_length or
recurrence_length for episode tracking.strata in links(). Useful
for stratified data linkage. As in stratified episode tracking, a record
with a missing strata (NA_character_) is
skipped from data linkage.data_links in links().
Unlink record groups that do not include records from certain data
sourceslistr(). Format atomic vectors as a
written list.combns(). An extension of combn to
generate permutations not ordinarily captured by
combn.iteration slot for pid and
epid objectsoverlap_method - reverse()number_line() - l and r must
have the same length or be 1.episodes() - case_nm differentiates
between duplicates of "Case" ("Duplicate_C")
and "Recurrent" events ("Duplicate_R").episodes().
"Case").
episode_type - simultaneously track both
"fixed" and "rolling" episodes.skip_if_b4_lengths - simultaneously track episodes
where events before a cut-off range are both skipped and not
skipped.episode_unit - simultaneously track episodes by
different units of time.case_for_recurrence - simultaneously track
"rolling" episodes with and without an additional case
window for recurrent events.recurrence_from_last - simultaneously track
"rolling" episodes with reference windows calculated from
the first and last event of the previous window.strata. Options must be the
same in each strata.
from_last - simultaneously track episodes in both
directions of time - past to present and present to past.episodes_max - simultaneously track different number of
episodes within the dataset.include_overlap_method - "overlap" and
"none" will not be combined with other methods.
"overlap" - mutually inclusive with the other methods,
so their inclusion is not necessary."none" - mutually exclusive and prioritised over the
other methods (including "none"), so their inclusion is not
necessary.NA_real_)
or periods (number_line(NA_real_, NA_real_))
case_length and recurrence_length. This
ensures that the event does not become an index case however, it can
still be part of different episode. For reference, an event with a
missing strata (NA_character_) ensures that
the event does not become an index case nor part of any episode.fixed_episodes, rolling_episodes and
episode_group - include_index_period didn’t
work in certain situations. Corrected.fixed_episodes, rolling_episodes and
episode_group - dist_from_wind was wrong in
certain situations. Corrected.record_group() - strata.
Perform record linkage separately within subsets of a dataset.overlap(),
compress_number_line(), fixed_sepisodes(),
rolling_episodes() and episode_group() -
overlap_methods and methods. Replaces
overlap_method and method respectively. Use
different sets of methods within the same dataset when grouping episodes
or collapsing number_line objects.
overlap_method and method only permits 1
method per per dataset.epid objects - win_nm. Shows
the type of window each event belongs to i.e. case or recurrence
windowepid objects - win_id. Unique
ID for each window. The ID is the sn of the reference event
for each window
epid objects updated to reflect thisepid objects - dist_from_wind.
Shows the duration of each event from its window’s reference eventepid objects - dist_from_epid.
Shows the duration of each event from its episode’s reference eventepisode_group() and
rolling_episodes() - recurrence_from_last.
Determine if reference events should be the first or last event from the
previous window.episode_group() and
rolling_episodes() - case_for_recurrence.
Determine if recurrent events should have their own case windows or
not.episode_group(),
fixed_episodes() and rolling_episodes() -
data_links. Unlink episodes that do not include records
from certain data_source(s).episode_group(), fixed_episodes() and
rolling_episodes() - case_length and
recurrence_length arguments. You can now use a range
(number_line object).episode_group(),
fixed_episodes() and rolling_episodes() -
include_index_period. If TRUE, overlaps with
the index event or period are grouped together even if they are outside
the cut-off range (case_length or
recurrence_length).pid objects - link_id. Shows
the record (sn slot) to which every record in the dataset
has matched to.invert_number_line(). Invert the
left and/or right points to the opposite end
of the number lineleft_point(x)<-,
right_point(x)<-, start_point(x)<- and
end_point(x)<-overlap() renamed to overlaps().
overlap() is now a convenience overlap_method
to capture ANY kind of overlap."none" is another convenience
overlap_method for NO kind of overlapexpand_number_line() - new options for
point; "left" and "right"compress_number_line() - compressed
number_line object inherits the direction of the widest
number_line among overlapping group of
number_line objectsoverlap_methods - have been changed such that each pair
of number_line objects can only overlap in one way. E.g.
"chain" and "aligns_end" used to be
possible but this is now considered a "chain" overlap
only"aligns_start" and "aligns_end" use to be
possible but this is now considered an "exact" overlapnumber_line_sequence() - Output is now a
list.number_line_sequence() - now works across multiple
number_line objects.to_df() - can now change number_line
objects to data.frames.
to_s4() can do the reverse.epid objects are the default outputs for
fixed_episodes(), rolling_episodes() and
episode_group()pid objects are the default outputs for
record_group()case_nm for events that were
skipped due to rolls_max or episodes_max is
now "Skipped".episode_group() and record_group(),
sn can be negative numbers but must still be uniqueepisode_group() and
record_group(). Runs just a little bit faster …x and y to
have the same lengths in overlap functions.
episode_group - case_length and
recurrence_length arguments. Now accepts negative numbers.
end_point() of the first
period.
number_line_width(), both will be collapsed if the second
one is within some days (or any other episode_unit) before
the start_point() of the first period.case_nm wasn’t right for rolling episodes.
Resolvedepisode_group(), fixed_episodes() and
rolling_episodes() - optimized to take less time when
working with large datasetsepisode_group(), fixed_episodes() and
rolling_episodes() - date argument now
supports numeric valuescompress_number_line() - the output (gid
slot) is now a group identifier just like in epid objects
(epid_interval)pid S4 object class for results of
record_group(). This will replace the current default
(data.frame) in the next major releaseepid S4 object class for results of
episode_group(), fixed_episodes() and
rolling_episodes(). This will replace the current default
(data.frame) in the next releaseto_s4() and to_s4 argument in
record_group(), episode_group(),
fixed_episodes() and rolling_episodes().
Changes their output from a data.frame (current default) to
epid or pid objectsto_df() changes epid or pid
objects to a data.framededuplicate argument from fixed_episodes()
and rolling_episodes() added to
episode_group()fixed_episodes() and rolling_episodes()
are now wrapper functions of episode_group(). Functionality
remains the same but now includes all arguments available to
episode_group()fixed_episodes() and
rolling_episodes() from number_line to
data.frame, pending the change to epid
objectspid_cri column returned in record_group is
now numeric. 0 indicates no match.criteria multiple times
record_group()number_line objects can now be used as a
criteria in record_group()episode_unit in
episode_group()bi_direction in
episode_group()fixed_episodes() and rolling_episodes() -
Group records into fixed or rolling episodes of events or period of
events.episode_group() - A more comprehensive implementation
of fixed_episodes() and rolling_episodes(),
with additional features such as user defined case assignment.record_group() - Multistage deterministic linkage that
addresses missing data.number_line S4 object.
record_group()fixed_episodes(), rolling_episodes() and
episode_group()fixed_episodes() and
rolling_episodes()These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.