The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
In the example we will use the same dataset as in the Blocking records for record linkage vignette.
reclin2 packageThe package contains function pair_ann() which aims at
integration with reclin2 package. This function works as
follows.
pair_ann(x = census[1:1000],
y = cis[1:1000],
on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc"),
deduplication = FALSE) |>
head()| .x | .y | block |
|---|---|---|
| 204 | 1 | 1 |
| 204 | 176 | 1 |
| 204 | 375 | 1 |
| 204 | 391 | 1 |
| 204 | 405 | 1 |
| 204 | 424 | 1 |
Which provides you information on the total number of pairs. This can
be further included in the pipeline of the reclin2 package
(note that we use a different ANN this time).
pair_ann(x = census[1:1000],
y = cis[1:1000],
on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc"),
deduplication = FALSE,
ann = "hnsw") |>
compare_pairs(on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc"),
comparators = list(cmp_jarowinkler())) |>
score_simple("score",
on = c("pername1", "pername2", "sex", "dob_day", "dob_mon", "dob_year", "enumcap", "enumpc")) |>
select_threshold("threshold", score = "score", threshold = 6) |>
link(selection = "threshold") |>
head()| .y | .x | person_id.x | pername1.x | pername2.x | sex.x | dob_day.x | dob_mon.x | dob_year.x | hse_num | enumcap.x | enumpc.x | str_nam | cap_add | census_id | x | person_id.y | pername1.y | pername2.y | sex.y | dob_day.y | dob_mon.y | dob_year.y | enumcap.y | enumpc.y | cis_id | y |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11 | 945 | DE256NG039003 | HARRIET | THOMSON | F | 12 | 1 | 1995 | 39 | 39 SPRINGFIELD ROAD | DE256NG | Springfield Road | 39, Springfield Road | CENSDE256NG039003 | 945 | DE256NG039003 | HARRIET | THOMSON | F | 12 | 1 | 39 SPRINGFIELD ROAD | DE256NG | CISDE256NG039003 | 11 | |
| 71 | 427 | DE159QA062001 | LEWIS | GREEN | M | 23 | 3 | 1973 | 62 | 62 CHURCH ROAD | DE159QA | Church Road | 62, Church Road | CENSDE159QA062001 | 427 | DE159QA062001 | LEWIS | GREEN | M | 23 | 3 | 62 CHURCH ROAD | DE159QA | CISDE159QA062001 | 71 | |
| 83 | 720 | DE237GG025002 | IMOGEN | DARIS | F | 6 | 4 | 1968 | 25 | 25 WOODLANDS ROAD | DE237GG | Woodlands Road | 25, Woodlands Road | CENSDE237GG025002 | 720 | DE237GG025002 | IMOGEW | DAVIS | F | 6 | 4 | 25 WOODLANDS ROAD | DE237GG | CISDE237GG025002 | 83 | |
| 99 | 136 | DE125LU022001 | DANIEC | MICCER | M | 21 | 4 | 1947 | 22 | 22 PARK LANE | DE125LU | Park Lane | 22, Park Lane | CENSDE125LU022001 | 136 | DE125LU022001 | DAMIEL | HILLER | M | 21 | 4 | 22 PARK LANE | DE125LU | CISDE125LU022001 | 99 | |
| 154 | 949 | DE256NG040002 | CHLOE | WILSON | F | 5 | 7 | 1978 | 40 | 40 SPRINGFIELD ROAD | DE256NG | Springfield Road | 40, Springfield Road | CENSDE256NG040002 | 949 | DE256NG040002 | CHLOE | WILSOM | F | 5 | 7 | 40 SPRINGFIELD ROAD | DE256NG | CISDE256NG040002 | 154 | |
| 156 | 549 | DE159QY035002 | AVA | KING | F | 7 | 7 | 1969 | 35 | 35 CHURCH ROAD | DE159QY | Church Road | 35, Church Road | CENSDE159QY035002 | 549 | DE159QY035002 | AVA | KING | F | 7 | 7 | 35 CHURCH ROAD | DE159QY | CISDE159QY035002 | 156 |
fastLink packageJust use the block column in the function
fastLink::blockData(). As a result you will obtain a list
of records blocked for further processing.
RecordLinkage packageJust use the block column in the argument
blockfld in the compare.dedup() or
compare.linkage() function. Please note that
block column for the RecordLinkage package
should be stored as a character not a
numeric/integer vector.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.