The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
tidy_gower() — eliminated two layers of redundant work
in the pairwise distance loop:
max - min) and ordinal rank vectors were
previously recomputed on every (i, j) pair. They are now
computed once in a pre-pass, reducing work from O(n² × p) to O(n² +
p).data[i, k] — which
dispatches to the R-level [.data.frame method on every call
— with pre-extracted plain-vector access col_vecs[[k]][i],
which resolves at the C level. Benchmarks show 10–100× faster scalar
access; the gain compounds across the full n*(n-1)/2 * p
iterations.is.numeric, is.ordered) are
now resolved once into a col_type character vector,
removing repeated S3 predicate calls from the inner loop.tl_reduce_dimensions() returning the internal
.obs_id row identifier as a column of its
$data result. Passing that data to a supervised model via a
response ~ . formula fed .obs_id in as a
high-cardinality predictor, which made tree-based fits effectively
non-terminating. The identifier is now dropped from the returned data,
consistent with how the pipeline and transfer-learning paths already
handle it.print() and summary() erroring on
the model objects returned by tl_step_selection() and
tl_tune_xgboost(). Both constructed their object without
the spec$paradigm field or the
tidylearn_supervised class, so the print method hit a
zero-length if condition and summary() took
the unsupervised branch. Both objects are now built consistently with
tl_model().tidy_gower() (and
tidy_dist(..., method = "gower")) erroring on single-row
input. The pairwise loop used 1:(n - 1), which produces the
invalid sequence 1:0 when n is 1; it now uses
seq_len(n - 1), so a single-row data frame returns an empty
dist object, consistent with
stats::dist().tidy_gower() /
tidy_dist(..., method = "gower") covering: return type and
metadata, symmetry and self-distance, identical rows, hand-verified
numeric / categorical / ordered / mixed-type distances, NA skipping,
custom weights, constant-column denominator behaviour, and single-row
input.Suggests (caret,
mclust, onnx, parsnip, recipes, reticulate, workflows) — none were
referenced in package code, tests, or vignettes.tl_read() Family)tl_read() dispatcher function — auto-detects format
from file extension, URL pattern, or connection string and routes to the
appropriate readertidylearn_data object, a tibble
subclass carrying source, format, and timestamp metadata via
print.tidylearn_data()tl_read_csv() / tl_read_tsv() — via readr
with base R fallbacktl_read_excel() — .xls,
.xlsx, .xlsm files via readxltl_read_parquet() — via nanoparquettl_read_json() — tabular JSON via jsonlitetl_read_rds() / tl_read_rdata() — native R
formats via base Rtl_read_db() — query any live DBI connectiontl_read_sqlite() — auto-connect to SQLite files via
RSQLitetl_read_postgres() — connection string or named params
via RPostgrestl_read_mysql() — connection string or named params via
RMariaDBtl_read_bigquery() — Google BigQuery via bigrquerytl_read_s3() — download and read from S3 URIs via
paws.storagetl_read_github() — download raw files from GitHub
repositoriestl_read_kaggle() — download datasets via the Kaggle
CLItl_read() accepts a character vector of paths — reads
each and row-binds with a source_file columntl_read_dir() — scan a directory for data files with
optional format, pattern, and recursive filteringtl_read_zip() — extract and read from zip archives,
with optional file selectiontl_check_packages()tl_read()
in the workflowtl_transfer_learning() hanging indefinitely when
used with PCA pre-training. The .obs_id row-identifier
column from PCA output was being included in the supervised formula,
creating a massive dummy-variable matrix. The column is now stripped
before both training and prediction.tl_run_pipeline() failing with “attempt to select
less than one element” when all cross-validation metrics were NA. Root
cause: scale() returned matrix columns instead of vectors,
causing downstream metric computation to produce NaN. Added
as.vector() wrapper and hardened the best-model selection
to handle all-NA metric values gracefully.tl_auto_ml() time budget enforcement. The
budget now controls which models are attempted: budgets under 30s skip
slow C-level models (forest, SVM, XGBoost) entirely, and
cross-validation is skipped when remaining time is tight. Baseline model
order changed to fast-first (tree, logistic/linear, then forest). See
?tl_auto_ml for full details on budget tiers.tl_interaction_effects() crashing with “unused
argument (se.fit)” because tidylearn’s predict() method
does not support se.fit. Now uses
stats::predict() on the raw model object for confidence
intervals. Also fixed an invalid formula in the internal slope
calculation.tl_plot_interaction() expecting
fit/lwr/upr columns from
predict() output. Now correctly handles tidylearn’s
.pred tibble format.tl_plot_intervals() calling non-existent
tl_prediction_intervals() function. Now computes confidence
and prediction intervals directly via
stats::predict(..., interval = "confidence") and
stats::predict(..., interval = "prediction").tl_plot_svm_boundary() erroring with “at least
two predictor variables required” when using response ~ .
formulas. The function now resolves predictors from data column names
instead of all.vars(), which does not expand
.. Also switched from geom_contour_filled
(which failed on discrete class predictions) to
geom_raster.tl_plot_svm_tuning() passing NULL
entries in the ranges list to e1071::tune(),
which caused “NA/NaN/Inf in foreign function call” errors. Tuning ranges
are now built conditionally based on the kernel type.tl_plot_xgboost_shap_summary() failing with
“arguments imply differing number of rows” when n_samples
differed from nrow(data). Sampling is now performed before
SHAP computation so that feature values and SHAP values always have the
same number of rows.tl_check_assumptions() crashing with “list object
cannot be coerced to logical” when some assumption checks returned NULL
(e.g., when optional test packages were not installed).gamma calculation to use predictor
count only (1 / (ncol(data) - 1)) instead of including the
response column.@return tag to
print.tidylearn_data().size parameter with
linewidth in all geom_line() calls across
visualization, classification, PCA, DBSCAN, and validation plotting
functions.tl_default_param_grid, tl_tune_grid,
tl_tune_random, tl_plot_tuning_results, and
input validation.1:n patterns with
seq_len() / seq_along().lintr configuration enforcing
%>% pipe consistencytl_table() dispatcher function — mirrors
plot() but produces formatted gt tables
instead of ggplot2 visualisationstl_table_metrics() — styled evaluation metrics table
from tl_evaluate()tl_table_coefficients() — model coefficients with
p-values (lm/glm) or sorted by magnitude (glmnet), with conditional
highlightingtl_table_confusion() — confusion matrix with correct
predictions highlighted on the diagonaltl_table_importance() — ranked feature importance with
colour gradienttl_table_variance() — PCA variance explained with
cumulative % colouredtl_table_loadings() — PCA loadings with diverging
red–blue colour scaletl_table_clusters() — cluster sizes and mean feature
values for kmeans, pam, clara, dbscan, and hclust modelstl_table_comparison() — side-by-side multi-model
comparison tablegt theme via
internal tl_gt_theme() helpergt is a suggested dependency — functions error with an
install message if gt is not availabletl_fit_dbscan() returning a non-existent
core_points field instead of summary from the
underlying tidy_dbscan() resultplot() failing on supervised models with “could
not find function ‘tl_plot_model’” by implementing the missing
tl_plot_model() and tl_plot_unsupervised()
internal dispatchers (#1)tl_plot_actual_predicted(),
tl_plot_residuals(), and tl_plot_confusion()
failing due to accessing a non-existent $prediction column
on predict output (correct column is $.pred)$prediction column mismatch in the
tl_dashboard() predictions tabletl_model() - Single function to fit 20+ machine
learning models$fit for package-specific
functionalitytl_split() - Train/test splitting with stratification
supporttl_prepare_data() - Data preprocessing (scaling,
imputation, encoding)tl_evaluate() - Model evaluation with multiple
metricstl_auto_ml() - Automated machine learningtl_tune() - Hyperparameter tuning with grid and random
searchtidylearn wraps established R packages including: stats, glmnet, randomForest, xgboost, gbm, e1071, nnet, rpart, cluster, dbscan, MASS, and smacof.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.