The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Improved support for handling large data from files and S3:
ingestion with read_parquet_duckdb()
and others, and
materialization with as_duckdb_tibble()
,
compute.duckplyr_df()
and compute_file()
. See
vignette("large")
for details.
Control automatic materialization of duckplyr frames with the new
prudence
argument to as_duckdb_tibble()
,
duckdb_tibble()
, compute.duckplyr_df()
and
compute_file()
. See vignette("prudence")
for
details.
read_csv_duckdb()
and others, deprecating
duckplyr_df_from_csv()
and df_from_csv()
(#210, #396, #459).
read_sql_duckdb()
(experimental) to run SQL queries
against the default DuckDB connection and return the result as a
duckplyr frame (duckdb/duckdb-r#32, #397).
db_exec()
to execute configuration queries against
the default duckdb connection (#39, #165, #227, #404, #459).
duckdb_tibble()
(#382, #457).
as_duckdb_tibble()
, replaces
as_duckplyr_tibble()
and as_duckplyr_df()
(#383, #457) and supports dbplyr connections to a duckdb database (#86,
#211, #226).
compute_parquet()
and compute_csv()
,
implement compute.duckplyr_df()
(#409, #430).
fallback_config()
to create a configuration file for
the settings that do not affect behavior (#216, #426).
is_duckdb_tibble()
, deprecates
is_duckplyr_df()
(#391, #392).
last_rel()
to retrieve the last relation object used
in materialization (#209, #375).
Add "prudent_duckplyr_df"
class that stops automatic
materialization and requires collect()
(#381,
#390).
Partial support for across()
in
mutate()
and summarise()
(#296, #306, #318,
@lionel-, @DavisVaughan).
Implement na.rm
handling for sum()
,
min()
, max()
, any()
and
all()
, with fallback for window functions (#205,
#566).
Add support for sub()
and gsub()
(@toppyy, #420).
Handle dplyr::desc()
(#550).
Avoid forwarding is.na()
to is.nan()
to
support non-numeric data, avoid checking roundtrip for timestamp data
(#482).
Correctly handle missing values in
if_else()
.
Limit number of items that can be handled with %in%
(#319).
duckdb_tibble()
checks if columns can be represented
in DuckDB (#537).
Fall back to dplyr when passing multiple
with joins
(#323).
Improve fallback error message by explicitly materializing (#432, #456).
Point to the native CSV reader if encountering data frames read with readr (#127, #469).
Improve as_duckdb_tibble()
error message for invalid
x
(@maelle, #339).
Depend on dplyr instead of reexporting all generics (#405). Nothing changes for users in scripts. When using duckplyr in a package, you now also need to import dplyr.
Fallback logging is now on by default, can be disabled with configuration (#422).
The default DuckDB connection is now based on a file, the
location defaults to a subdirectory of tempdir()
and can be
controlled with the DUCKPLYR_TEMP_DIR
environment variable
(#439, #448, #561).
collect()
returns a tibble (#438, #447).
explain()
returns the input, invisibly
(#331).
Compute ptype only for join columns in a safe way without materialization, not for the entire data frame (#289).
Internal expr_scrub()
(used for telemetry) can
handle function-definitions (@toppyy, #268, #271).
Harden telemetry code against invalid arguments (#321).
New articles: vignette("large")
,
vignette("prudence")
, vignette("fallback")
,
vignette("limits")
, vignette("developers")
,
vignette("telemetry")
(#207, #504).
New flights_df()
used instead of
palmerpenguins::penguins
(#408).
Move to the tidyverse GitHub organization, new repository URL https://github.com/tidyverse/duckplyr/ (#225).
Avoid base pipe in examples for compatibility with R 4.0.0 (#463, #466).
Comparison expressions are translated in a way that allows them to be pushed down to Parquet (@toppyy, #270).
Printing a duckplyr frame no longer materializes (#255, #378).
Prefer vctrs::new_data_frame()
over
tibble()
(#500).
df_from_file()
and related functions support multiple
files (#194, #195), show a clear error message for non-string
path
arguments (#182), and create a tibble by default
(#177).as_duckplyr_tibble()
to convert a data frame to a
duckplyr tibble (#177).?df_from_file
shows how to read multiple files (#181,
#186) and how to specify CSV column types (#140, #189), and is shown
correctly in reference index (#173, #190).as.integer()
,
NA
and %in%
(#83, #154, #148, #155, #159,
#160).library(duckplyr)
calls
methods_overwrite()
(#164).grepl()
.intersect()
,
setdiff()
, symdiff()
, union()
,
and union_all()
(#169).NA
and those used in an
expression (#157).head(-1)
forwards to the default implementation (#131,
#156).left_join()
and other join functions call
auto_copy()
.row_number()
returns integer.is.na(NaN)
is TRUE
.summarise(count = n(), count = n())
creates only one
column named count
.?df_from_file
(@andreranza, #133, #134).vec_ptype()
does not materialize (#149).expect_identical()
to
capture differences between doubles and integers.df_to_parquet()
to write to Parquet, new
convenience functions df_from_csv()
,
duckdb_df_from_csv()
, df_from_parquet()
and
duckdb_df_from_parquet()
(#87, #89, #96, #128).summarise()
(#72, #106).summarise()
no longer restores subclass.log10()
and
log()
.fallback_sitrep()
and related functionality for
collecting telemetry data (#102, #107, #110, #111, #115). No data is
collected by default, only a message is displayed once per session and
then every eight hours. Opt in or opt out by setting environment
variables.group_by()
and other methods to collect
fallback information (#94, #104, #105).suppressWarnings()
as the identity
function.cli::cli_abort()
over stop()
or
rlang::abort()
(#114)..data$a
and .env$a
.integer
, numeric
, logical
,
Date
, POSIXct
, and difftime
for
now.DUCKPLYR_METHODS_OVERWRITE
is set to TRUE
, loading duckplyr automatically calls
methods_overwrite()
.log()
and
log10()
.methods_overwrite()
and methods_restore()
show a message.grepl(x = NA)
gives correct results.auto_copy()
for non-data-frame input.distinct()
now preserves order in corner cases (#77,
#78).log(0)
and
log(-1)
(#75, #76).mutate()
that are actually
representable in duckdb (#73).ifelse()
, support
if_else()
(#79).dplyr_reconstruct()
method (#48).meta_replay()
.arrange()
in case of ties.slice_sample()
, not
sample_n()
or sample_frac()
(#74).IS NOT DISTINCT FROM
for faster execution
(duckdb/duckdb-r#41, #68).summarise()
keeps "duckplyr_df"
class
(#63, #64).
Fix compatibility with duckdb >= 0.9.1.
Skip tests that give different output on dev tidyselect.
Import utils::globalVariables()
.
Small README improvements (@maelle, #34, #57).
Fix 301 in README.
Improve documentation.
Work around problem with dplyr_reconstruct()
in R
4.3.
Rename duckdb_from_file()
to
df_from_file()
.
Unexport private duckdb_rel_from_df()
,
rel_from_df()
, wrap_df()
and
wrap_integer()
.
Reexport %>%
and tibble()
.
R CMD check
.relexpr_window()
for now.Initial version, exporting: - new_relational()
to
construct objects of class "relational"
- Generics
rel_aggregate()
, rel_distinct()
,
rel_filter()
, rel_join()
,
rel_limit()
, rel_names()
,
rel_order()
, rel_project()
,
rel_set_diff()
, rel_set_intersect()
,
rel_set_symdiff()
, rel_to_df()
,
rel_union_all()
- new_relexpr()
to construct
objects of class "relational_relexpr"
- Expression builders
relexpr_constant()
, relexpr_function()
,
relexpr_reference()
, relexpr_set_alias()
,
relexpr_window()
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.