The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

fuzzylink: Probabilistic Record Linkage Using Pretrained Text Embeddings

Links datasets through fuzzy string matching using pretrained text embeddings. Produces more accurate record linkage when lexical string distance metrics are a poor guide to match quality (e.g., "Patricia" is more lexically similar to "Patrick" than it is to "Trish"). Capable of performing multilingual record linkage. Methods are described in Ornstein (2025) <https://joeornstein.github.io/publications/fuzzylink.pdf>.

Version: 0.2.1
Depends: R (≥ 4.1.0)
Imports: stats, utils, dplyr, Rfast, reshape2, stringdist, stringr, httr, jsonlite, httr2, ranger
Published: 2025-06-14
DOI: 10.32614/CRAN.package.fuzzylink
Author: Joe Ornstein ORCID iD [aut, cre, cph]
Maintainer: Joe Ornstein <jornstein at uga.edu>
BugReports: https://github.com/joeornstein/fuzzylink/issues
License: MIT + file LICENSE
URL: https://github.com/joeornstein/fuzzylink
NeedsCompilation: no
Materials: README NEWS
CRAN checks: fuzzylink results

Documentation:

Reference manual: fuzzylink.pdf

Downloads:

Package source: fuzzylink_0.2.1.tar.gz
Windows binaries: r-devel: fuzzylink_0.2.1.zip, r-release: not available, r-oldrel: not available
macOS binaries: r-release (arm64): fuzzylink_0.2.1.tgz, r-oldrel (arm64): fuzzylink_0.2.1.tgz, r-release (x86_64): not available, r-oldrel (x86_64): not available

Linking:

Please use the canonical form https://CRAN.R-project.org/package=fuzzylink to link to this page.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.