The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
tidytable
is a data frame manipulation library for users
who need data.table
speed but prefer tidyverse
-like syntax.
Install the released version from CRAN with:
install.packages("tidytable")
Or install the development version from GitHub with:
# install.packages("pak")
::pak("markfairbanks/tidytable") pak
tidytable
replicates tidyverse
syntax but
uses data.table
in the background. In general you can
simply use library(tidytable)
to replace your existing
dplyr
and tidyr
code with
data.table
backed equivalents.
A full list of implemented functions can be found here.
library(tidytable)
<- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df
%>%
df select(x, y, z) %>%
filter(x < 4, y > 1) %>%
arrange(x, y) %>%
mutate(double_x = x * 2,
x_plus_y = x + y)
#> # A tidytable: 3 × 5
#> x y z double_x x_plus_y
#> <int> <int> <chr> <dbl> <int>
#> 1 1 4 a 2 5
#> 2 2 5 a 4 7
#> 3 3 6 b 6 9
You can use the normal tidyverse
group_by()
/ungroup()
workflow, or you can use
.by
syntax to reduce typing. Using .by
in a
function is shorthand for
df %>% group_by() %>% some_function() %>% ungroup()
.
.by = z
.by = c(y, z)
<- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)
df
%>%
df summarize(avg_z = mean(z),
.by = c(x, y))
#> # A tidytable: 2 × 3
#> x y avg_z
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
All functions that can operate by group have a .by
argument built in. (mutate()
, filter()
,
summarize()
, etc.)
The above syntax is equivalent to:
%>%
df group_by(x, y) %>%
summarize(avg_z = mean(z)) %>%
ungroup()
#> # A tidytable: 2 × 3
#> x y avg_z
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
Both options are available for users, so you can use the syntax that you prefer.
tidytable
allows you to select/drop columns just like
you would in the tidyverse by utilizing the tidyselect
package
in the background.
Normal selection can be mixed with all tidyselect
helpers: everything()
, starts_with()
,
ends_with()
, any_of()
, where()
,
etc.
<- data.table(
df a = 1:3,
b1 = 4:6,
b2 = 7:9,
c = c("a", "a", "b")
)
%>%
df select(a, starts_with("b"))
#> # A tidytable: 3 × 3
#> a b1 b2
#> <int> <int> <int>
#> 1 1 4 7
#> 2 2 5 8
#> 3 3 6 9
A full overview of selection options can be found here.
.by
tidyselect
helpers also work when using
.by
:
<- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)
df
%>%
df summarize(avg_z = mean(z),
.by = where(is.character))
#> # A tidytable: 2 × 3
#> x y avg_z
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
Tidy evaluation can be used to write custom functions with
tidytable
functions. The embracing shortcut
{{ }}
works, or you can use enquo()
with
!!
if you prefer:
<- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))
df
<- function(data, add_col) {
add_one %>%
data mutate(new_col = {{ add_col }} + 1)
}
%>%
df add_one(x)
#> # A tidytable: 3 × 4
#> x y z new_col
#> <dbl> <int> <chr> <dbl>
#> 1 1 4 a 2
#> 2 1 5 a 2
#> 3 1 6 b 2
The .data
and .env
pronouns also work
within tidytable
functions:
<- 10
var
%>%
df mutate(new_col = .data$x + .env$var)
#> # A tidytable: 3 × 4
#> x y z new_col
#> <dbl> <int> <chr> <dbl>
#> 1 1 4 a 11
#> 2 1 5 a 11
#> 3 1 6 b 11
A full overview of tidy evaluation can be found here.
dt()
helperThe dt()
function makes regular data.table
syntax pipeable, so you can easily mix tidytable
syntax
with data.table
syntax:
<- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df
%>%
df dt(, .(x, y, z)) %>%
dt(x < 4 & y > 1) %>%
dt(order(x, y)) %>%
dt(, double_x := x * 2) %>%
dt(, .(avg_x = mean(x)), by = z)
#> # A tidytable: 2 × 2
#> z avg_x
#> <chr> <dbl>
#> 1 a 1.5
#> 2 b 3
For those interested in performance, speed comparisons can be found here.
tidytable
is only possible because of the great
contributions to R by the data.table
and
tidyverse
teams. data.table
is used as the
main data frame engine in the background, while tidyverse
packages like rlang
, vctrs
, and
tidyselect
are heavily relied upon to give users an
experience similar to dplyr
and tidyr
.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.