The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
High-performance Unicode and Punycode encoding/decoding for internationalized domain names (IDNs) in R.
The punycoder package addresses critical gaps in R’s URL
processing capabilities by providing reliable, fast conversion between
Unicode and ASCII representations of domain names. It follows RFC 3492
standards and is designed for robust handling of internationalized
domain names in web scraping, data analysis, and URL processing
workflows.
punycoder has a small dependency footprint:
R (>= 3.5.0),
Rcpplibidn2 (detected at
compile time)pkg-config (used by
configure to detect libidn2)testthat, knitr,
rmarkdownYou can install the development version of punycoder from GitHub with:
# install.packages("remotes")
remotes::install_github("bart-turczynski/punycoder")libidn2)punycoder works without extra system libraries. If
libidn2 is available at build time, the package enables a
native backend automatically; otherwise it uses the built-in C++
fallback backend.
To install the recommended optional dependency:
brew install libidn2 pkg-configsudo apt-get install libidn2-0-dev pkg-configsudo dnf install libidn2-devel pkgconf-pkg-configsudo pacman -S libidn2 pkgconfVerify the library is visible before installing
punycoder from source:
system("pkg-config --modversion libidn2")Then install/reinstall punycoder:
remotes::install_github("bart-turczynski/punycoder")library(punycoder)
# Basic encoding
puny_encode("café.com")
#> [1] "xn--caf-dma.com"
# Check if domain is punycode
is_punycode("xn--example")
#> [1] TRUE
# Validate domains
validate_domain("test.com")
#> Punycoder Domain Validation Results
#> ==================================
#>
#> Domain: test.com
#> Valid: TRUElibidn2 when available, with a built-in fallback
backendProcess international websites with Unicode domain names:
international_urls <- c(
"https://café.paris.fr/menu",
"https://москва.рф/news",
"https://北京.中国/info"
)
# Convert for HTTP requests
ascii_urls <- url_encode(international_urls)Clean and standardize URL datasets:
# Identify international domains
is_idn(c("café.com", "example.com", "москва.рф"))
# Validate domain names
validate_domain(c("valid.com", "invalid..domain"))punycoder currently provides:
puny_encode(),
puny_decode()url_encode(),
url_decode(), parse_url()is_punycode(),
is_idn(), validate_domain()libidn2 when present,
built-in fallback otherwise)Rcpp.libidn2.punycoder is inspired by urltools and is
designed to provide a robust fix for punycode encode/decode issues that
may arise in urltools workflows.We welcome contributions. See CONTRIBUTING.md for the current development workflow.
MIT
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.