The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
adaR is a wrapper for ada-url, a WHATWG-compliant and fast URL parser written in modern C++ .
It implements several auxilliary functions to work with urls:
utils::URLdecode
(~40x speedup)More general information on URL parsing can be found in the introductory vignette via vignette("adaR")
.
adaR
is part of a series of R packages to analyse webtracking data:
You can install the development version of adaR from GitHub with:
The version on CRAN can be installed with
This is a basic example which shows all the returned components of a URL.
library(adaR)
ada_url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag")
#> href protocol username
#> 1 https://user_1:password_1@example.org:8080/api?q=1#frag https: user_1
#> password host hostname port pathname search hash
#> 1 password_1 example.org:8080 example.org 8080 /api ?q=1 #frag
/*
* https://user:pass@example.com:1234/foo/bar?baz#quux
* | | | | ^^^^| | |
* | | | | | | | `----- hash_start
* | | | | | | `--------- search_start
* | | | | | `----------------- pathname_start
* | | | | `--------------------- port
* | | | `----------------------- host_end
* | | `---------------------------------- host_start
* | `--------------------------------------- username_end
* `--------------------------------------------- protocol_end
*/
It solves some problems of urltools with more complex urls.
urltools::url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.
7z/data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519")
#> scheme domain port
#> 1 https 40.7519848,-74.0015045,14.\n 7z <NA>
#> path
#> 1 data=!4m5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519
#> parameter fragment
#> 1 <NA> <NA>
ada_url_parse("https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m
5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519")
#> href
#> 1 https://www.google.com/maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519
#> protocol username password host hostname port
#> 1 https: www.google.com www.google.com
#> pathname
#> 1 /maps/place/Pennsylvania+Station/@40.7519848,-74.0015045,14.7z/data=!4m 5!3m4!1s0x89c259ae15b2adcb:0x7955420634fd7eba!8m2!3d40.750568!4d-73.993519
#> search hash
#> 1
A “raw” url parse using ada is extremely fast (see ada-url.com) but for this to carry over to R is tricky. The performance is still compatible with urltools::url_parse
with the noted advantage in accuracy in some practical circumstances.
bench::mark(
ada = ada_url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag", decode = FALSE),
urltools = urltools::url_parse("https://user_1:password_1@example.org:8080/dir/../api?q=1#frag"),
iterations = 1, check = FALSE
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 ada 2.43ms 2.43ms 411. 2.49KB 0
#> 2 urltools 526.26µs 526.26µs 1900. 2.49KB 0
For further benchmark results, see benchmark.md
in data_raw
.
There are four more groups of functions available to work with url parsing:
ada_get_*()
get a specific componentada_has_*()
check if a specific component is presentada_set_*()
set a specific component from URLSada_clear_*()
remove a specific component from URLSpublic_suffix()
extracts their top level domain from the public suffix list, excluding private domains.
urls <- c(
"https://subsub.sub.domain.co.uk",
"https://domain.api.gov.uk",
"https://thisisnotpart.butthisispartoftheps.kawasaki.jp"
)
public_suffix(urls)
#> [1] "co.uk" "gov.uk"
#> [3] "butthisispartoftheps.kawasaki.jp"
If you are wondering about the last url. The list also contains wildcard suffixes such as *.kawasaki.jp
which need to be matched.
The logo is created from this portrait of Ada Lovelace, a very early pioneer in Computer Science.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.