The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Package {punycoder}


Type: Package
Title: Unicode and Punycode Domain Name Processing
Version: 1.0.0
Description: High-performance Unicode and Punycode encoding/decoding for internationalized domain names. Provides RFC 3492 compliant conversion functions with a focus on URL processing and data analysis workflows. Addresses limitations in existing R packages for handling international domain names in web scraping and URL parsing applications.
Depends: R (≥ 3.5.0)
Imports: Rcpp (≥ 1.0.0)
LinkingTo: Rcpp
SystemRequirements: GNU libidn2 (optional, for native punycode backend)
License: MIT + file LICENSE
URL: https://github.com/bart-turczynski/punycoder
BugReports: https://github.com/bart-turczynski/punycoder/issues
Encoding: UTF-8
Suggests: testthat (≥ 3.0.0), knitr, rmarkdown
VignetteBuilder: knitr
Config/roxygen2/version: 8.0.0
NeedsCompilation: yes
Packaged: 2026-06-05 18:13:57 UTC; bartturczynski
Author: Bart Turczynski [aut, cre]
Maintainer: Bart Turczynski <bartek+punycoder@turczynski.pl>
Repository: CRAN
Date/Publication: 2026-06-11 12:20:02 UTC

Unicode and Punycode Domain Name Processing

Description

Provides high-performance functions for encoding and decoding internationalized domain names according to RFC 3492 (Punycode) and IDNA standards.

Details

The punycoder package fills a critical gap in R's ecosystem for handling international domain names. It provides reliable, fast conversion between Unicode and ASCII representations of domain names.

Author(s)

Maintainer: Bart Turczynski bartek+punycoder@turczynski.pl

Authors:

See Also

Useful links:


Test if domain contains internationalized characters

Description

Determines whether a domain name contains Unicode characters that would require punycode encoding for ASCII compatibility.

Usage

is_idn(x)

Arguments

x

Character vector of domain names to test

Value

A logical vector the same length as x, where TRUE indicates the element contains non-ASCII Unicode characters.

See Also

is_punycode for detecting punycode domains, puny_encode for encoding Unicode domains.

Examples


is_idn("caf\u00E9.com") # TRUE
is_idn("example.com")    # FALSE
is_idn(c(
  "caf\u00E9.com",
  "\u043C\u043E\u0441\u043A\u0432\u0430.\u0440\u0444",
  "test.com"
))  # c(TRUE, TRUE, FALSE)


Test if string is punycode encoded

Description

Determines whether a given string or domain name is already encoded in punycode format (starts with xn– prefix).

Usage

is_punycode(x)

Arguments

x

Character vector to test

Value

A logical vector the same length as x, where TRUE indicates the element contains a punycode-encoded label (xn– prefix).

See Also

is_idn for detecting Unicode domains, puny_decode for decoding punycode domains.

Examples


is_punycode("xn--example") # TRUE
is_punycode("example.com") # FALSE
is_punycode(c("xn--caf-dma.com", "regular.com"))  # c(TRUE, FALSE)


Parse URLs with internationalized domain name handling

Description

Parses URLs and returns a structured list with proper handling of internationalized domain names. This function provides both Unicode and ASCII representations of domain components.

Usage

parse_url(url, encode_domains = FALSE)

Arguments

url

Character vector of URLs to parse

encode_domains

Logical flag; encode parsed host names to ASCII.

Value

An object of class "punycoder_parsed_url" (a named list) with components:

scheme

Character vector of URL schemes (e.g., "https").

domain

Character vector of domain names.

port

Integer vector of port numbers.

path

Character vector of URL paths.

query

Character vector of query strings.

fragment

Character vector of fragment identifiers.

Each component has one element per input URL. Invalid URLs yield NA components. For valid URLs without an explicit path, path is returned as "".

See Also

url_encode, url_decode for URL transformation with IDN handling.

Examples


# Parse URL with Unicode domain
parse_url(
  "https://caf\u00E9.example.com:8080/path?query=value#fragment"
)

# Parse multiple URLs
urls <- c(
  "https://caf\u00E9.com/menu",
  "https://\u043C\u043E\u0441\u043A\u0432\u0430.\u0440\u0444/info"
)
parse_url(urls)


Print method for punycoder parsed URL results

Description

Print method for punycoder parsed URL results

Usage

## S3 method for class 'punycoder_parsed_url'
print(x, ...)

Arguments

x

A punycoder_parsed_url object

...

Additional arguments (ignored)

Value

Invisibly returns x.


Print method for punycoder validation results

Description

Print method for punycoder validation results

Usage

## S3 method for class 'punycoder_validation'
print(x, ...)

Arguments

x

A punycoder_validation object

...

Additional arguments (ignored)

Value

Invisibly returns x.


Decode ASCII punycode to Unicode domains

Description

Converts ASCII punycode domain names back to their Unicode representation. This is the reverse operation of puny_encode and is useful for displaying human-readable domain names.

Usage

puny_decode(x, strict = getOption("punycoder.strict", TRUE))

Arguments

x

Character vector of ASCII punycode domains to decode

strict

Logical; whether to apply strict validation. Defaults to 'getOption("punycoder.strict", TRUE)'.

Value

A character vector the same length as x, with each element containing the Unicode-decoded domain name. Elements corresponding to NA inputs are NA_character_. In non-strict mode, domains that fail decoding are also returned as NA_character_.

See Also

puny_encode for the reverse operation, url_decode for full URL decoding.

Examples


# Basic decoding
puny_decode("xn--caf-dma.com")
puny_decode("xn--80adxhks.xn--p1ai")

# Vectorized decoding
ascii_domains <- c("xn--caf-dma.com", "xn--80adxhks.xn--p1ai")
puny_decode(ascii_domains)


Encode Unicode domains to ASCII punycode

Description

Converts Unicode domain names to their ASCII punycode representation following RFC 3492 standards. This function is essential for processing internationalized domain names (IDNs) in web scraping and URL analysis.

Usage

puny_encode(x, strict = getOption("punycoder.strict", TRUE))

Arguments

x

Character vector of Unicode domain names to encode

strict

Logical; whether to apply strict validation. Defaults to 'getOption("punycoder.strict", TRUE)'.

Value

A character vector the same length as x, with each element containing the ASCII punycode-encoded domain name. Elements corresponding to NA inputs are NA_character_. In non-strict mode, domains that fail encoding are also returned as NA_character_.

See Also

puny_decode for the reverse operation, url_encode for full URL encoding.

Examples


# Basic encoding
puny_encode("caf\u00E9.com")
puny_encode("\u043C\u043E\u0441\u043A\u0432\u0430.\u0440\u0444")

# Vectorized encoding
domains <- c(
  "caf\u00E9.com",
  "\u043C\u043E\u0441\u043A\u0432\u0430.\u0440\u0444",
  "\u5317\u4EAC.\u4E2D\u56FD"
)
puny_encode(domains)


Decode URLs with ASCII punycode domains to Unicode

Description

Converts URLs containing ASCII punycode domain names back to their Unicode representation for display purposes. This function makes internationalized URLs human-readable.

Usage

url_decode(url, strict = getOption("punycoder.strict", TRUE))

Arguments

url

Character vector of URLs with ASCII punycode domains

strict

Logical; whether to apply strict validation. Defaults to 'getOption("punycoder.strict", TRUE)'.

Value

A character vector the same length as url, with each element containing the URL with its host portion decoded to Unicode. Only the domain component is transformed; scheme, path, query, and fragment are preserved. Elements corresponding to NA inputs are NA_character_.

See Also

url_encode for the reverse operation, puny_decode for domain-only decoding, parse_url for URL component extraction.

Examples


# Basic URL decoding
url_decode("https://xn--caf-dma.example.com/path")
url_decode("https://xn--80adxhks.xn--p1ai/page")

# Vectorized URL decoding
ascii_urls <- c(
  "https://xn--caf-dma.com/menu",
  "https://xn--1qqw23a.xn--55qx5d/info"
)
url_decode(ascii_urls)


Encode URLs with Unicode domains to ASCII

Description

Converts URLs containing Unicode domain names to their ASCII representation while preserving the rest of the URL structure. This function is essential for preparing URLs for systems that require ASCII-only domain names.

Usage

url_encode(url, strict = getOption("punycoder.strict", TRUE))

Arguments

url

Character vector of URLs with potential Unicode domains

strict

Logical; whether to apply strict validation. Defaults to 'getOption("punycoder.strict", TRUE)'.

Value

A character vector the same length as url, with each element containing the URL with its host portion ASCII-encoded. Only the domain component is transformed; scheme, path, query, and fragment are preserved. Elements corresponding to NA inputs are NA_character_.

See Also

url_decode for the reverse operation, puny_encode for domain-only encoding, parse_url for URL component extraction.

Examples


# Basic URL encoding
url_encode("https://caf\u00E9.example.com/path?query=value")
url_encode(
  "https://\u043C\u043E\u0441\u043A\u0432\u0430.\u0440\u0444/page"
)

# Vectorized URL encoding
urls <- c(
  "https://caf\u00E9.com/menu",
  "https://\u5317\u4EAC.\u4E2D\u56FD/info"
)
url_encode(urls)


Comprehensive domain name validation

Description

Validates domain names according to RFC standards, checking for proper format, length restrictions, and character requirements. Supports both Unicode and ASCII domain names.

Usage

validate_domain(x, strict = getOption("punycoder.strict", TRUE))

Arguments

x

Character vector of domain names to validate

strict

Logical; whether to apply strict validation. Defaults to 'getOption("punycoder.strict", TRUE)'.

Value

An object of class "punycoder_validation" (a named list) with components:

domains

Character vector of the input domain names.

valid

Logical vector indicating whether each domain is valid.

errors

List of character vectors, each containing error messages for the corresponding domain (empty for valid domains).

See Also

puny_encode for encoding validated domains.

Examples


validate_domain("example.com")
validate_domain("caf\u00E9.example.com")
long_label <- paste(rep("x", 250), collapse = "")
validate_domain(c("valid.com", "invalid..com", long_label))

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.