The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
tidier
package provides ‘Apache Spark’ style window
aggregation for R dataframes and remote dbplyr
tbls via ‘mutate’ in
‘dplyr’
flavour.
Create a new column with average temp over last seven days in the same month.
set.seed(101)
|>
airquality # create date column
::mutate(date_col = lubridate::make_date(1973, Month, Day)) |>
dplyr# create gaps by removing some days
::slice_sample(prop = 0.8) |>
dplyr# compute mean temperature over last seven days in the same month
::mutate(avg_temp_over_last_week = mean(Temp, na.rm = TRUE),
tidier.order_by = Day,
.by = Month,
.frame = c(lubridate::days(7), # 7 days before current row
::days(-1) # do not include current row
lubridate
),.index = date_col
)#> # A tibble: 122 × 8
#> Month Ozone Solar.R Wind Temp Day date_col avg_temp_over_last_week
#> <int> <int> <int> <dbl> <int> <int> <date> <dbl>
#> 1 6 NA 286 8.6 78 1 1973-06-01 NaN
#> 2 6 NA 242 16.1 67 3 1973-06-03 78
#> 3 6 NA 186 9.2 84 4 1973-06-04 72.5
#> 4 6 NA 264 14.3 79 6 1973-06-06 76.3
#> 5 6 29 127 9.7 82 7 1973-06-07 77
#> 6 6 NA 273 6.9 87 8 1973-06-08 78
#> 7 6 NA 259 10.9 93 11 1973-06-11 83
#> 8 6 NA 250 9.2 92 12 1973-06-12 85.2
#> 9 6 23 148 8 82 13 1973-06-13 86.6
#> 10 6 NA 332 13.8 80 14 1973-06-14 87.2
#> # ℹ 112 more rows
mutate
supports
.by
(group by),.order_by
(order by),.frame
(endpoints of window frame),.index
(identify index column like date column, in df
version only),.complete
(whether to compute over incomplete window,
in df version only).mutate
automatically uses a future backend (via furrr
, in df
version only).This implementation is inspired by Apache Spark’s windowSpec
class with rangeBetween
and rowsBetween
.
dbplyr
implements this via dbplyr::win_over
enabling sparklyr
users to write window computations. Also see, dbplyr::window_order
/dbplyr::window_frame
.
tidier
’s mutate
wraps this functionality via
uniform syntax across dataframes and remote tbls.
tidypyspark
python package implements mutate
style window computation
API for pyspark.
remotes::install_github("talegari/tidier")
install.packages("tidier")
tidier
package is deeply indebted to three amazing
packages and people behind it.
Wickham H, François R, Henry L, Müller K, Vaughan D (2023). _dplyr: A
Grammar of Data Manipulation_. R package version 1.1.0,
<https://CRAN.R-project.org/package=dplyr>.
Vaughan D (2021). _slider: Sliding Window Functions_. R package
version 0.2.2, <https://CRAN.R-project.org/package=slider>.
Wickham H, Girlich M, Ruiz E (2023). _dbplyr: A 'dplyr' Back End
for Databases_. R package version 2.3.2,
<https://CRAN.R-project.org/package=dbplyr>.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.