The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

metaphonebr

Codecov test coverage R-CMD-check Lifecycle: experimental

The goal of metaphonebr is to simplify brazilian names phonetically using a custom metaphoneBR algorithm that preserves ending vowels, created for aiding in dataset pairing in the absence of unambiguous keys.

Installation

The package is in the process of submission to CRAN. When it is accepted, the stable version can be installed with:

install.packages("metaphonebr")

You can install the development version of metaphonebr from GitHub with :

# install.packages("remotes")
remotes::install_github("ipeadata-lab/metaphonebr")

Example

This is a basic example which shows how to use the main function:

example_names <- c("João da Silva", "Maria", "Marya",
                    "Helena", "Elena", "Philippe", "Filipe", "Xavier", "Chavier")
phonetic_codes <- metaphonebr::metaphonebr(example_names)
print(data.frame(original = example_names, metaphonebr = phonetic_codes))

The metaphoneBR phonetic encoding algorithm proceeds as follows:

  1. Initial Cleanup & Preparation:
  2. Silent Letter Removal:
  3. Digraph Simplification (Sound Grouping):
  4. Similar Consonant Simplification:
  5. Terminal Nasal Sound Simplification:
  6. Duplicate Vowel Removal:
  7. Final Cleanup (Duplicate Letters & Spaces):

The resulting code is an attempt to represent the phonetic signature of the name in a simplified, standardized way for a Brazilian Portuguese context. In particular, by construction it preserves ending vowels since they imply generally gender information in Brazilian Names (ex.: ADRIANO and ADRIANA).

Nota Ipea

metaphonebr is developed by a team of researchers at Instituto de Pesquisa Econômica Aplicada (Ipea).

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.