The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The goal of metaphonebr is to simplify brazilian names phonetically using a custom metaphoneBR algorithm that preserves ending vowels, created for aiding in dataset pairing in the absence of unambiguous keys.
The package is in the process of submission to CRAN. When it is accepted, the stable version can be installed with:
install.packages("metaphonebr")
You can install the development version of metaphonebr from GitHub with :
# install.packages("remotes")
::install_github("ipeadata-lab/metaphonebr") remotes
This is a basic example which shows how to use the main function:
<- c("João da Silva", "Maria", "Marya",
example_names "Helena", "Elena", "Philippe", "Filipe", "Xavier", "Chavier")
<- metaphonebr::metaphonebr(example_names)
phonetic_codes print(data.frame(original = example_names, metaphonebr = phonetic_codes))
metaphoneBR
phonetic encoding algorithm proceeds as
follows:LH
is replaced by 1
(representing a
palatal lateral approximant, like in “Filha” -> “FI1A”).NH
is replaced by 3
(representing a
palatal nasal, like in “Manhã” -> “MA3A”).CH
is replaced by X
(representing the /ʃ/
sound, like in “Chico” -> “XICO”).SH
is replaced by X
(for foreign names
with /ʃ/ sound, like in “Shirley” -> “XIRLEY”).SCH
is replaced by X
(approximating /ʃ/ or
/sk/, like in “Schmidt” -> “XMIT”).PH
is replaced by F
(like in “Philip”
-> “FILIP”).SC
followed by E
or I
becomes
S
(like in “SCENA” -> “SENA”).SC
followed by A
, O
, or
U
becomes SK
(like in “ESCOVA” ->
“ESKOVA”).QU
or QÜ
followed by E
or
I
becomes K
(e.g., “QUEIJO” ->
“KEIJO”).GU
or GÜ
followed by E
or
I
becomes G
(the U
is silent,
e.g., “GUERRA” -> “GERRA”).QU
becomes K
(e.g., “QUANTO”
-> “KANTO”).Ç
is replaced by S
.C
followed by E
or I
is
replaced by S
(like in “CELSO” -> “SELSO”).C
(not part of an already transformed digraph
like CH or SC) is replaced by K
(like in “CARLOS” ->
“KARLOS”).G
followed by E
or I
is
replaced by J
(like in “GELO” -> “JELO”; GUE/GUI already
handled).Q
(that wasn’t part of QU) is replaced by
K
.W
is replaced by V
(common Brazilian
Portuguese pronunciation, e.g., “WALTER” -> “VALTER”).Y
is replaced by I
(e.g., “YARA” ->
“IARA”).Z
is replaced by S
(e.g., “ZEBRA” ->
“SEBRA”).X
preceded by S
has the X
removed (e.g., “EXCELENTE” -> “ESELENTE”, to avoid a double /s/
representation from SKS
).N
is replaced by M
(e.g.,
“JOAQUIN” -> “JOAQUIM”).AO
is replaced by OM
(e.g.,
“JOÃO” -> “JOOM”).ÃES
is replaced by AES
(e.g.,
“MÃES” -> “MAES”).1
for LH or 3
for NH) are
reduced to a single letter (e.g., “CARRO” might become “CARO”, “LESSA”
becomes “LESA”. Note: This rule simplifies sounds like ‘RR’ and ‘SS’ to
their single counterparts, which is a common Metaphone-style
simplification).The resulting code is an attempt to represent the phonetic signature of the name in a simplified, standardized way for a Brazilian Portuguese context. In particular, by construction it preserves ending vowels since they imply generally gender information in Brazilian Names (ex.: ADRIANO and ADRIANA).
metaphonebr is developed by a team of researchers at Instituto de Pesquisa Econômica Aplicada (Ipea).
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.