The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
tidylearn provides a unified
tidyverse-compatible interface to R’s machine learning
ecosystem. It wraps proven packages like glmnet, randomForest, xgboost,
e1071, cluster, and dbscan - you get the reliability of established
implementations with the convenience of a consistent, tidy API.
What tidylearn does:
tl_model()) to 20+
ML algorithmsWhat tidylearn is NOT:
model$fit)The core of tidylearn is the tl_model() function, which
dispatches to the appropriate underlying package based on the method you
specify. The wrapped packages include stats, glmnet, randomForest,
xgboost, gbm, e1071, nnet, rpart, cluster, and dbscan.
# Classification with logistic regression
model_logistic <- tl_model(iris, Species ~ ., method = "logistic")
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
print(model_logistic)
#> tidylearn Model
#> ===============
#> Paradigm: supervised
#> Method: logistic
#> Task: Classification
#> Formula: Species ~ .
#>
#> Training observations: 150# Principal Component Analysis
model_pca <- tl_model(iris[, 1:4], method = "pca")
print(model_pca)
#> tidylearn Model
#> ===============
#> Paradigm: unsupervised
#> Method: pca
#> Technique: pca
#>
#> Training observations: 150# Transform data
transformed <- predict(model_pca)
head(transformed)
#> # A tibble: 6 × 5
#> .obs_id PC1 PC2 PC3 PC4
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 -2.26 -0.478 0.127 0.0241
#> 2 2 -2.07 0.672 0.234 0.103
#> 3 3 -2.36 0.341 -0.0441 0.0283
#> 4 4 -2.29 0.595 -0.0910 -0.0657
#> 5 5 -2.38 -0.645 -0.0157 -0.0358
#> 6 6 -2.07 -1.48 -0.0269 0.00659# K-means clustering
model_kmeans <- tl_model(iris[, 1:4], method = "kmeans", k = 3)
print(model_kmeans)
#> tidylearn Model
#> ===============
#> Paradigm: unsupervised
#> Method: kmeans
#> Technique: kmeans
#>
#> Training observations: 150tidylearn provides comprehensive preprocessing functions:
# Simple random split
split <- tl_split(iris, prop = 0.7, seed = 123)
# Train model
model_train <- tl_model(split$train, Species ~ ., method = "logistic")
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# Test predictions
predictions_test <- predict(model_train, new_data = split$test)
head(predictions_test)
#> # A tibble: 6 × 1
#> .pred
#> <dbl>
#> 1 2.22e-16
#> 2 2.22e-16
#> 3 2.22e-16
#> 4 2.22e-16
#> 5 2.22e-16
#> 6 2.22e-16# Stratified split (maintains class proportions)
split_strat <- tl_split(iris, prop = 0.7, stratify = "Species", seed = 123)
# Check proportions are maintained
prop.table(table(split_strat$train$Species))
#>
#> setosa versicolor virginica
#> 0.3333333 0.3333333 0.3333333
prop.table(table(split_strat$test$Species))
#>
#> setosa versicolor virginica
#> 0.3333333 0.3333333 0.3333333
prop.table(table(iris$Species))
#>
#> setosa versicolor virginica
#> 0.3333333 0.3333333 0.3333333tidylearn provides a unified interface to these established R packages:
| Method | Underlying Package | Function Called |
|---|---|---|
"linear" |
stats | lm() |
"polynomial" |
stats | lm() with poly() |
"logistic" |
stats | glm(..., family = binomial) |
"ridge", "lasso",
"elastic_net" |
glmnet | glmnet() |
"tree" |
rpart | rpart() |
"forest" |
randomForest | randomForest() |
"boost" |
gbm | gbm() |
"xgboost" |
xgboost | xgb.train() |
"svm" |
e1071 | svm() |
"nn" |
nnet | nnet() |
"deep" |
keras | keras_model_sequential() |
| Method | Underlying Package | Function Called |
|---|---|---|
"pca" |
stats | prcomp() |
"mds" |
stats, MASS, smacof | cmdscale(), isoMDS(), etc. |
"kmeans" |
stats | kmeans() |
"pam" |
cluster | pam() |
"clara" |
cluster | clara() |
"hclust" |
stats | hclust() |
"dbscan" |
dbscan | dbscan() |
You always have access to the raw model from the underlying package
via $fit:
Now that you understand the basics, explore:
tl_auto_ml()tidylearn is a wrapper package that provides:
tl_model()) that dispatches to proven packages like
glmnet, randomForest, xgboost, e1071, and othersmodel$fit for package-specific functionalityThe underlying algorithms are unchanged - tidylearn simply makes them easier to use together.
# Quick example combining everything
data_split <- tl_split(iris, prop = 0.7, stratify = "Species", seed = 42)
data_prep <- tl_prepare_data(data_split$train, Species ~ ., scale_method = "standardize")
#> Scaling numeric features using method: standardize
model_final <- tl_model(data_prep$data, Species ~ ., method = "forest")
test_preds <- predict(model_final, new_data = data_split$test)
print(model_final)
#> tidylearn Model
#> ===============
#> Paradigm: supervised
#> Method: forest
#> Task: Classification
#> Formula: Species ~ .
#>
#> Training observations: 105These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.