The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
In cheapr, ‘cheap’ means fast and memory-efficient, and that’s exactly the philosophy that cheapr aims to follow.
You can install cheapr like so:
install.packages("cheapr")
or you can install the development version of cheapr:
::install_github("NicChr/cheapr") remotes
Some common operations that cheapr can do much faster and more efficiently include:
Counting, finding, removing and replacing NA
and
scalar values
Creating factors
Creating multiple sequences in a vectorised way
Sub-setting vectors and data frames efficiently
Safe, flexible and fast greatest common divisor and lowest common multiple
Lags/leads
Lightweight integer64
support
In-memory Math (no copies, vectors updated by reference)
Summary statistics of data frame variables
Binning of continuous data
Let’s first load the required packages
library(cheapr)
library(bench)
NA
Because R mostly uses vectors and vectorised operations, this means that there are few scalar-optimised operations.
cheapr provides tools to efficiently count, find, replace and remove scalars.
# Setup data with NA values
set.seed(42)
<- sample(1:5, 30, TRUE)
x <- na_insert(x, n = 7)
x
cheapr_table(x, order = TRUE) # Fast table()
#> 1 2 3 4 5 <NA>
#> 6 6 3 4 4 7
NA
functions
na_count(x)
#> [1] 7
na_rm(x)
#> [1] 1 5 1 2 4 2 1 4 5 4 2 3 1 1 3 4 5 5 2 3 2 1 2
na_find(x)
#> [1] 4 8 11 15 22 24 26
na_replace(x, -99)
#> [1] 1 5 1 -99 2 4 2 -99 1 4 -99 5 4 2 -99 3 1 1 3
#> [20] 4 5 -99 5 -99 2 -99 3 2 1 2
Scalar functions
val_count(x, 3)
#> [1] 3
val_rm(x, 3)
#> [1] 1 5 1 NA 2 4 2 NA 1 4 NA 5 4 2 NA 1 1 4 5 NA 5 NA 2 NA 2
#> [26] 1 2
val_find(x, 3)
#> [1] 16 19 27
val_replace(x, 3, 99)
#> [1] 1 5 1 NA 2 4 2 NA 1 4 NA 5 4 2 NA 99 1 1 99 4 5 NA 5 NA 2
#> [26] NA 99 2 1 2
Scalar based case-match
val_match(
x, 1 ~ "one",
2 ~ "two",
3 ~ "three",
.default = ">3"
)#> [1] "one" ">3" "one" ">3" "two" ">3" "two" ">3" "one"
#> [10] ">3" ">3" ">3" ">3" "two" ">3" "three" "one" "one"
#> [19] "three" ">3" ">3" ">3" ">3" ">3" "two" ">3" "three"
#> [28] "two" "one" "two"
<- matrix(na_insert(rnorm(10^6), prop = 1/4), ncol = 10^3)
m # Number of NA values by row
mark(row_na_counts(m),
rowSums(is.na(m)))
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 row_na_counts(m) 1.44ms 1.52ms 634. 9.16KB 0
#> 2 rowSums(is.na(m)) 2.76ms 3.68ms 285. 3.85MB 26.4
# Number of NA values by col
mark(col_na_counts(m),
colSums(is.na(m)))
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 col_na_counts(m) 1.51ms 1.57ms 612. 9.15KB 0
#> 2 colSums(is.na(m)) 1.32ms 2.21ms 494. 3.82MB 51.7
is_na
is a multi-threaded alternative to
is.na
<- rnorm(10^6) |>
x na_insert(10^5)
options(cheapr.cores = 4)
mark(is.na(x), is_na(x))
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 is.na(x) 559µs 773µs 1148. 3.81MB 217.
#> 2 is_na(x) 275µs 559µs 1944. 3.82MB 191.
options(cheapr.cores = 1)
### posixlt method is much faster
<- as.POSIXlt(seq.int(0, length.out = 10^6, by = 3600),
hours tz = "UTC") |>
na_insert(10^5)
mark(is.na(hours), is_na(hours))
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 is.na(hours) 1s 1s 1.00 61.05MB 2.00
#> 2 is_na(hours) 11.7ms 13.1ms 65.5 7.67MB 11.2
It differs in 2 regards:
NA
when either that
element is an NA
value or it is a list containing only
NA
values.is_na
returns a logical vector where
TRUE
defines an empty row of only NA
values.# List example
is.na(list(NA, list(NA, NA), 10))
#> [1] TRUE FALSE FALSE
is_na(list(NA, list(NA, NA), 10))
#> [1] TRUE TRUE FALSE
# Data frame example
<- new_df(x = c(1, NA, 3),
df y = c(NA, NA, NA))
df#> x y
#> 1 1 NA
#> 2 NA NA
#> 3 3 NA
is_na(df)
#> [1] FALSE TRUE FALSE
is_na(df)
#> [1] FALSE TRUE FALSE
# The below identity should hold
identical(is_na(df), row_na_counts(df) == ncol(df))
#> [1] TRUE
is_na
and all the NA
handling functions
fall back on calling is.na()
if no suitable method is
found. This means that custom objects like vctrs rcrds and more are
supported.
overview
Inspired by the excellent skimr package, overview()
is a
cheaper alternative designed for larger data.
<- new_df(
df x = sample.int(100, 10^6, TRUE),
y = as_factor(sample(LETTERS, 10^6, TRUE)),
z = rnorm(10^6)
)overview(df)
#> obs: 1000000
#> cols: 3
#>
#> ----- Numeric -----
#> col class n_missng p_complt n_unique mean p0 p25 p50 p75
#> 1 x integr 0 1 100 50.52 1 25 51 76
#> 2 z numerc 0 1 1000000 -0.00038 -4.58 -0.67 -0.00062 0.68
#> p100 iqr sd hist
#> 1 100 51 28.88 ▇▇▇▇▇
#> 2 5.08 1.35 1 ▁▃▇▂▁
#>
#> ----- Categorical -----
#> col class n_missng p_complt n_unique n_levels min max
#> 1 y factor 0 1 26 26 A Z
mark(overview(df, hist = FALSE))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 overview(df, hist = FALSE) 113ms 118ms 8.54 2.09KB 0
sset
sset(iris, 1:5)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
sset(iris, 1:5, j = "Species")
#> Species
#> 1 setosa
#> 2 setosa
#> 3 setosa
#> 4 setosa
#> 5 setosa
# sset always returns a data frame when input is a data frame
sset(iris, 1, 1) # data frame
#> Sepal.Length
#> 1 5.1
1, 1] # not a data frame
iris[#> [1] 5.1
<- sample.int(10^6, 10^4, TRUE)
x <- sample.int(10^6, 10^4, TRUE)
y mark(sset(x, x %in_% y), sset(x, x %in% y), x[x %in% y])
#> # A tibble: 3 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 sset(x, x %in_% y) 106µs 151µs 6464. 83.5KB 4.17
#> 2 sset(x, x %in% y) 192µs 261µs 3950. 285.7KB 13.2
#> 3 x[x %in% y] 153µs 234µs 4463. 324.8KB 15.6
sset
uses an internal range-based subset when
i
is an ALTREP integer sequence of the form m:n.
mark(sset(df, 0:10^5), df[0:10^5, , drop = FALSE])
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 sset(df, 0:10^5) 150.1µs 386.55µs 3050. 1.53MB 65.8
#> 2 df[0:10^5, , drop = FALSE] 6.57ms 7.33ms 135. 4.83MB 8.71
It also accepts negative indexes
mark(sset(df, -10^4:0),
-10^4:0, , drop = FALSE],
df[check = FALSE) # The only difference is the row names
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 sset(df, -10^4:0) 1.9ms 2.63ms 390. 15.1MB 164.
#> 2 df[-10^4:0, , drop = FALSE] 24.1ms 24.14ms 41.4 72.5MB 953.
The biggest difference between sset
and [
is the way logical vectors are handled. The two main differences when
i
is a logical vector are:
NA
values are ignored, only the locations of
TRUE
values are used.i
must be the same length as x
and is not
recycled.# Examples with NAs
<- c(1, 5, NA, NA, -5)
x > 0]
x[x #> [1] 1 5 NA NA
sset(x, x > 0)
#> [1] 1 5
# Example with length(i) < length(x)
sset(x, TRUE)
#> Error in check_length(i, length(x)): i must have length 5
# This is equivalent
TRUE]
x[#> [1] 1 5 NA NA -5
# to..
sset(x)
#> [1] 1 5 NA NA -5
lag_()
set.seed(37)
lag_(1:10, 3) # Lag(3)
#> [1] NA NA NA 1 2 3 4 5 6 7
lag_(1:10, -3) # Lead(3)
#> [1] 4 5 6 7 8 9 10 NA NA NA
# Using an example from data.table
library(data.table)
<- data.table(year=2010:2014, v1=runif(5), v2=1:5, v3=letters[1:5])
dt
# Similar to data.table::shift()
lag_(dt, 1) # Lag
#> year v1 v2 v3
#> <int> <num> <int> <char>
#> 1: NA NA NA <NA>
#> 2: 2010 0.54964085 1 a
#> 3: 2011 0.07883715 2 b
#> 4: 2012 0.64879698 3 c
#> 5: 2013 0.49685336 4 d
lag_(dt, -1) # Lead
#> year v1 v2 v3
#> <int> <num> <int> <char>
#> 1: 2011 0.07883715 2 b
#> 2: 2012 0.64879698 3 c
#> 3: 2013 0.49685336 4 d
#> 4: 2014 0.71878731 5 e
#> 5: NA NA NA <NA>
With lag_
we can update variables by reference,
including entire data frames
# At the moment, shift() cannot do this
lag_(dt, set = TRUE)
#> year v1 v2 v3
#> <int> <num> <int> <char>
#> 1: NA NA NA <NA>
#> 2: 2010 0.54964085 1 a
#> 3: 2011 0.07883715 2 b
#> 4: 2012 0.64879698 3 c
#> 5: 2013 0.49685336 4 d
# Was updated by reference
dt #> year v1 v2 v3
#> <int> <num> <int> <char>
#> 1: NA NA NA <NA>
#> 2: 2010 0.54964085 1 a
#> 3: 2011 0.07883715 2 b
#> 4: 2012 0.64879698 3 c
#> 5: 2013 0.49685336 4 d
lag2_
is a more generalised variant that supports
vectors of lags, custom ordering and run lengths.
lag2_(dt, order = 5:1) # Reverse order lag (same as lead)
#> year v1 v2 v3
#> <int> <num> <int> <char>
#> 1: 2010 0.54964085 1 a
#> 2: 2011 0.07883715 2 b
#> 3: 2012 0.64879698 3 c
#> 4: 2013 0.49685336 4 d
#> 5: NA NA NA <NA>
lag2_(dt, -1) # Same as above
#> year v1 v2 v3
#> <int> <num> <int> <char>
#> 1: 2010 0.54964085 1 a
#> 2: 2011 0.07883715 2 b
#> 3: 2012 0.64879698 3 c
#> 4: 2013 0.49685336 4 d
#> 5: NA NA NA <NA>
lag2_(dt, c(1, -1)) # Alternating lead/lag
#> year v1 v2 v3
#> <int> <num> <int> <char>
#> 1: NA NA NA <NA>
#> 2: 2011 0.07883715 2 b
#> 3: 2010 0.54964085 1 a
#> 4: 2013 0.49685336 4 d
#> 5: 2012 0.64879698 3 c
lag2_(dt, c(-1, 0, 0, 0, 0)) # Lead e.g. only first row
#> year v1 v2 v3
#> <int> <num> <int> <char>
#> 1: 2010 0.54964085 1 a
#> 2: 2010 0.54964085 1 a
#> 3: 2011 0.07883715 2 b
#> 4: 2012 0.64879698 3 c
#> 5: 2013 0.49685336 4 d
gcd2(5, 25)
#> [1] 5
scm2(5, 6)
#> [1] 30
gcd(seq(5, 25, by = 5))
#> [1] 5
scm(seq(5, 25, by = 5))
#> [1] 300
<- seq(1L, 1000000L, 1L)
x mark(gcd(x))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 gcd(x) 800ns 900ns 933524. 0B 0
<- seq(0, 10^6, 0.5)
x mark(gcd(x))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 gcd(x) 36.5ms 38.3ms 26.4 0B 0
As an example, to create 3 sequences with different increments,
the usual approach might be to use lapply to loop through the increment
values together with seq()
# Base R
<- c(1, 0.5, 0.1)
increments <- 1
start <- 5
end unlist(lapply(increments, \(x) seq(start, end, x)))
#> [1] 1.0 2.0 3.0 4.0 5.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 1.0 1.1 1.2 1.3 1.4
#> [20] 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3
#> [39] 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0
In cheapr you can use seq_()
which accepts vector
arguments.
seq_(start, end, increments)
#> [1] 1.0 2.0 3.0 4.0 5.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 1.0 1.1 1.2 1.3 1.4
#> [20] 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3
#> [39] 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0
Use add_id = TRUE
to label the individual sequences.
seq_(start, end, increments, add_id = TRUE)
#> 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3
#> 1.0 2.0 3.0 4.0 5.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 1.0 1.1 1.2 1.3 1.4 1.5
#> 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#> 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5
#> 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#> 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0
If you know the sizes of your sequences beforehand, use
sequence_()
<- c(3, 5, 10)
seq_sizes sequence_(seq_sizes, from = 0, by = 1/3, add_id = TRUE)
#> 1 1 1 2 2 2 2 2
#> 0.0000000 0.3333333 0.6666667 0.0000000 0.3333333 0.6666667 1.0000000 1.3333333
#> 3 3 3 3 3 3 3 3
#> 0.0000000 0.3333333 0.6666667 1.0000000 1.3333333 1.6666667 2.0000000 2.3333333
#> 3 3
#> 2.6666667 3.0000000
You can also calculate the sequence sizes using
seq_size()
seq_size(start, end, increments)
#> [1] 5 9 41
<- rep(TRUE, 10^6)
x mark(cheapr_which = which_(x),
base_which = which(x))
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 cheapr_which 4.84ms 5.73ms 166. 3.81MB 8.72
#> 2 base_which 569.2µs 1.48ms 796. 7.63MB 90.6
<- rep(FALSE, 10^6)
x mark(cheapr_which = which_(x),
base_which = which(x))
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 cheapr_which 1.42ms 1.68ms 589. 0B 0
#> 2 base_which 224.7µs 235.5µs 4032. 3.81MB 171.
<- c(rep(TRUE, 5e05), rep(FALSE, 1e06))
x mark(cheapr_which = which_(x),
base_which = which(x))
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 cheapr_which 4.12ms 4.43ms 219. 1.91MB 4.14
#> 2 base_which 541.6µs 1.02ms 1047. 7.63MB 95.5
<- c(rep(FALSE, 5e05), rep(TRUE, 1e06))
x mark(cheapr_which = which_(x),
base_which = which(x))
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 cheapr_which 4.82ms 5.32ms 186. 3.81MB 11.2
#> 2 base_which 719.8µs 1.68ms 684. 9.54MB 97.3
<- sample(c(TRUE, FALSE), 10^6, TRUE)
x sample.int(10^6, 10^4)] <- NA
x[mark(cheapr_which = which_(x),
base_which = which(x))
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 cheapr_which 2.98ms 3.4ms 294. 1.89MB 8.58
#> 2 base_which 3.57ms 3.94ms 250. 5.7MB 23.2
<- sample(seq(-10^3, 10^3, 0.01))
x <- do.call(paste0, expand.grid(letters, letters, letters, letters))
y mark(cheapr_factor = factor_(x),
base_factor = factor(x))
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 cheapr_factor 8.25ms 9.26ms 107. 4.59MB 7.11
#> 2 base_factor 281.42ms 284.67ms 3.51 27.84MB 0
mark(cheapr_factor = factor_(x, order = FALSE),
base_factor = factor(x, levels = unique(x)))
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 cheapr_factor 3.19ms 3.49ms 267. 1.53MB 4.42
#> 2 base_factor 453.47ms 453.47ms 2.21 22.79MB 2.21
mark(cheapr_factor = factor_(y),
base_factor = factor(y))
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 cheapr_factor 181.98ms 187.19ms 5.34 5.23MB 0
#> 2 base_factor 2.72s 2.72s 0.367 54.35MB 0.734
mark(cheapr_factor = factor_(y, order = FALSE),
base_factor = factor(y, levels = unique(y)))
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 cheapr_factor 4.52ms 5.18ms 185. 3.49MB 9.48
#> 2 base_factor 44.36ms 47.6ms 20.8 39.89MB 13.9
<- sample.int(10^6, 10^5, TRUE)
x <- sample.int(10^6, 10^5, TRUE)
y mark(cheapr_intersect = intersect_(x, y, dups = FALSE),
base_intersect = intersect(x, y))
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 cheapr_intersect 2.11ms 2.25ms 419. 1.18MB 6.87
#> 2 base_intersect 3.24ms 3.95ms 252. 5.16MB 16.3
mark(cheapr_setdiff = setdiff_(x, y, dups = FALSE),
base_setdiff = setdiff(x, y))
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 cheapr_setdiff 2.08ms 2.39ms 400. 1.76MB 6.94
#> 2 base_setdiff 3.36ms 4.26ms 238. 5.71MB 16.0
%in_%
and %!in_%
mark(cheapr = x %in_% y,
base = x %in% y)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 cheapr 1.31ms 1.41ms 675. 781.34KB 4.31
#> 2 base 2.09ms 2.43ms 409. 2.53MB 12.6
mark(cheapr = x %!in_% y,
base = !x %in% y)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 cheapr 1.3ms 1.42ms 679. 787.87KB 6.63
#> 2 base 2.12ms 2.54ms 395. 2.91MB 12.1
as_discrete
as_discrete
is a cheaper alternative to
cut
<- rnorm(10^6)
x <- seq(0, max(x), 0.2)
b mark(cheapr_cut = as_discrete(x, b, left = FALSE),
base_cut = cut(x, b))
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 cheapr_cut 19.4ms 20.2ms 48.9 3.87MB 4.65
#> 2 base_cut 46.6ms 48.4ms 20.4 26.76MB 13.6
cheapr_if_else
A cheap alternative to ifelse
mark(
cheapr_if_else(x >= 0, "pos", "neg"),
ifelse(x >= 0, "pos", "neg"),
::fifelse(x >= 0, "pos", "neg")
data.table
)#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl>
#> 1 "cheapr_if_else(x >= 0, \"pos\"… 9.9ms 11ms 72.8 11.4MB 13.8
#> 2 "ifelse(x >= 0, \"pos\", \"neg\… 134.16ms 135.3ms 5.55 53.4MB 5.55
#> 3 "data.table::fifelse(x >= 0, \"… 9.06ms 10.6ms 87.7 11.4MB 9.97
case
cheapr’s version of a case-when statement, with mostly the same
arguments as dplyr::case_when
but similar efficiency as
data.table::fcase
mark(case(
>= 0 ~ "pos",
x < 0 ~ "neg",
x .default = "Unknown"
),::fcase(
data.table>= 0, "pos",
x < 0, "neg",
x rep_len(TRUE, length(x)), "Unknown"
))#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl>
#> 1 "case(x >= 0 ~ \"pos\", x < 0 ~ \"… 29.9ms 31.2ms 32.1 28.7MB 21.4
#> 2 "data.table::fcase(x >= 0, \"pos\"… 15.7ms 17ms 56.6 26.7MB 37.8
val_match
is an even cheaper special variant of
case
when all LHS expressions are length-1 vectors, i.e
scalars
<- round(rnorm(10^6))
x
mark(
val_match(x, 1 ~ Inf, 2 ~ -Inf, .default = NaN),
case(x == 1 ~ Inf,
== 2 ~ -Inf,
x .default = NaN),
::fcase(x == 1, Inf,
data.table== 2, -Inf,
x rep_len(TRUE, length(x)), NaN)
)#> # A tibble: 3 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:t> <bch:t> <dbl> <bch:byt> <dbl>
#> 1 val_match(x, 1 ~ Inf, 2 ~ -Inf, … 7.58ms 8.63ms 113. 8.79MB 17.6
#> 2 case(x == 1 ~ Inf, x == 2 ~ -Inf… 25.13ms 27.65ms 36.0 27.63MB 18.0
#> 3 data.table::fcase(x == 1, Inf, x… 11.99ms 13.56ms 72.7 30.52MB 59.9
get_breaks
is a very fast function for generating pretty
equal-width breaks It is similar to base::pretty
though
somewhat less flexible with simpler arguments.
<- with_local_seed(rnorm(10^5), 112)
x # approximately 10 breaks
get_breaks(x, 10)
#> [1] -6 -4 -2 0 2 4 6
pretty(x, 10)
#> [1] -6 -5 -4 -3 -2 -1 0 1 2 3 4 5
mark(
get_breaks(x, 20),
pretty(x, 20),
check = FALSE
)#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 get_breaks(x, 20) 62.9µs 67µs 14314. 0B 0
#> 2 pretty(x, 20) 406.3µs 635µs 1707. 1.91MB 34.0
# Not pretty but equal width breaks
get_breaks(x, 5, pretty = FALSE)
#> [1] -5.0135893 -3.2004889 -1.3873886 0.4257118 2.2388121 4.0519125
diff(get_breaks(x, 5, pretty = FALSE)) # Widths
#> [1] 1.8131 1.8131 1.8131 1.8131 1.8131
It can accept both data and a length-two vector representing a range, meaning it can easily be used in ggplot2 and base R plots
library(ggplot2)
<- airquality |>
gg ggplot(aes(x = Ozone, y = Wind)) +
geom_point() +
geom_smooth(se = FALSE)
# Add our breaks
+
gg scale_x_continuous(breaks = get_breaks)
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#> Warning: Removed 37 rows containing non-finite outside the scale range
#> (`stat_smooth()`).
#> Warning: Removed 37 rows containing missing values or values outside the scale range
#> (`geom_point()`).
# More breaks
# get_breaks accepts a range too
+
gg scale_x_continuous(breaks = \(x) get_breaks(range(x), 20))
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#> Warning: Removed 37 rows containing non-finite outside the scale range
#> (`stat_smooth()`).
#> Removed 37 rows containing missing values or values outside the scale range
#> (`geom_point()`).
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.