The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Package website: release | dev
Extends the mlr3 package with a DataBackend to transparently work with databases. Two additional backends are currently implemented:
DataBackendDplyr
: Relies internally on the abstraction
of dplyr and dbplyr. This allows working on
a broad range of DBMS, such as SQLite, MySQL, MariaDB, or
PostgreSQL.DataBackendDuckDB
: Connector to duckdb. This
includes support for Parquet files (see example below).To construct the backends, you have to establish a connection to the
DBMS yourself with the DBI package. For the
serverless SQLite and DuckDB, we provide the converters
as_sqlite_backend()
and
as_duckdb_backend()
.
You can install the released version of mlr3db from CRAN with:
install.packages("mlr3db")
And the development version from GitHub with:
# install.packages("devtools")
::install_github("mlr-org/mlr3db") devtools
library("mlr3db")
#> Loading required package: mlr3
# Create a classification task:
= tsk("spam")
task
# Convert the task backend from a in-memory backend (DataBackendDataTable)
# to an out-of-memory SQLite backend via DataBackendDplyr.
# A temporary directory is used here to store the database files.
$backend = as_sqlite_backend(task$backend, path = tempfile())
task
# Resample a classification tree using a 3-fold CV.
# The requested data will be queried and fetched from the database in the background.
resample(task, lrn("classif.rpart"), rsmp("cv", folds = 3))
#> <ResampleResult> of 3 iterations
#> * Task: spam
#> * Learner: classif.rpart
#> * Warnings: 0 in 0 iterations
#> * Errors: 0 in 0 iterations
library("mlr3db")
# Get an example parquet file from the package install directory:
# spam dataset (tsk("spam")) stored as parquet file
= system.file(file.path("extdata", "spam.parquet"), package = "mlr3db")
file
# Create a backend on the file
= as_duckdb_backend(file)
backend
# Construct classification task on the constructed backend
= as_task_classif(backend, target = "type")
task
# Resample a classification tree using a 3-fold CV.
# The requested data will be queried and fetched from the database in the background.
resample(task, lrn("classif.rpart"), rsmp("cv", folds = 3))
#> <ResampleResult> of 3 iterations
#> * Task: backend
#> * Learner: classif.rpart
#> * Warnings: 0 in 0 iterations
#> * Errors: 0 in 0 iterations
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.