The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

mlr3db

r-cmd-check CRAN Status Mattermost

Package website: release | dev

Extends the mlr3 package with a DataBackend to transparently work with databases. Three additional backends are currently implemented:

To construct the backends, you have to establish a connection to the DBMS yourself with the DBI package. For the serverless SQLite and DuckDB, we provide the converters as_sqlite_backend() and as_duckdb_backend().

Installation

You can install the released version of mlr3db from CRAN with:

install.packages("mlr3db")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("mlr-org/mlr3db")

Example

DataBackendDplyr

library("mlr3db")
#> Loading required package: mlr3

# Create a classification task:
task = tsk("spam")

# Convert the task backend from a in-memory backend (DataBackendDataTable)
# to an out-of-memory SQLite backend via DataBackendDplyr.
# A temporary directory is used here to store the database files.
task$backend = as_sqlite_backend(task$backend, path = tempfile())

# Resample a classification tree using a 3-fold CV.
# The requested data will be queried and fetched from the database in the background.
resample(task, lrn("classif.rpart"), rsmp("cv", folds = 3))
#> Warning in warn_deprecated("DataBackend$data_formats"):
#> DataBackend$data_formats is deprecated and will be removed in the future.
#> 
#> ── <ResampleResult> with 3 resampling iterations ───────────────────────────────
#>  task_id    learner_id resampling_id iteration     prediction_test warnings
#>     spam classif.rpart            cv         1 <PredictionClassif>        0
#>     spam classif.rpart            cv         2 <PredictionClassif>        0
#>     spam classif.rpart            cv         3 <PredictionClassif>        0
#>  errors
#>       0
#>       0
#>       0

DataBackendDuckDB

library("mlr3db")

# Get an example parquet file from the package install directory:
# spam dataset (tsk("spam")) stored as parquet file
file = system.file(file.path("extdata", "spam.parquet"), package = "mlr3db")

# Create a backend on the file
backend = as_duckdb_backend(file)

# Construct classification task on the constructed backend
task = as_task_classif(backend, target = "type")

# Resample a classification tree using a 3-fold CV.
# The requested data will be queried and fetched from the database in the background.
resample(task, lrn("classif.rpart"), rsmp("cv", folds = 3))
#> 
#> ── <ResampleResult> with 3 resampling iterations ───────────────────────────────
#>  task_id    learner_id resampling_id iteration     prediction_test warnings
#>  backend classif.rpart            cv         1 <PredictionClassif>        0
#>  backend classif.rpart            cv         2 <PredictionClassif>        0
#>  backend classif.rpart            cv         3 <PredictionClassif>        0
#>  errors
#>       0
#>       0
#>       0

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.