The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Every machine learning workflow starts with data. tidylearn’s
tl_read() family provides a single consistent interface for
loading data from files, databases, cloud storage, and APIs into tidy
tibbles — ready for tl_prepare_data() and
tl_model().
All readers return a tidylearn_data object, a tibble
subclass that carries metadata about the source, format, and read
timestamp.
tl_read() Dispatchertl_read() auto-detects the data format and dispatches to
the appropriate backend — just like tl_model() routes to
the right algorithm:
# Format is auto-detected from the file extension
data <- tl_read("sales.csv")
data <- tl_read("results.xlsx", sheet = "Q1")
data <- tl_read("experiment.parquet")
data <- tl_read("config.json")
data <- tl_read("model_data.rds")
# Override format detection when the extension is ambiguous
data <- tl_read("export.txt", format = "tsv")The result always prints with a metadata header:
tmp <- tempfile(fileext = ".csv")
write.csv(mtcars, tmp, row.names = FALSE)
data <- tl_read(tmp, .quiet = TRUE)
data
#> -- tidylearn data ---------
#> Source: C:\Users\CESAIR~1\AppData\Local\Temp\Rtmp8GXPQ6\file6ed8165f47f7.csv
#> Format: csv
#> Read at: 2026-04-09 10:55:56
#>
#> # A tibble: 32 × 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ℹ 22 more rowsUses readr when available for fast, column-type-aware parsing. Falls back to base R automatically if readr is not installed.
Reads .xls, .xlsx, and .xlsm
files via readxl. Select
sheets by name or position:
library(readxl)
path <- readxl_example("datasets.xlsx")
excel_data <- tl_read_excel(path, sheet = "mtcars")
head(excel_data, 3)
#> -- tidylearn data ---------
#> Source: C:/Dev-Infra/Packages/R/readxl/extdata/datasets.xlsx
#> Format: excel
#> Read at: 2026-04-09 10:55:56
#>
#> # A tibble: 3 × 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1Lightweight, columnar storage for large datasets. Uses nanoparquet — a fast, dependency-free reader:
Reads tabular JSON (array of objects) via jsonlite. Nested structures are automatically flattened:
library(jsonlite)
tmp_json <- tempfile(fileext = ".json")
write_json(mtcars[1:5, ], tmp_json)
json_data <- tl_read_json(tmp_json)
json_data
#> -- tidylearn data ---------
#> Source: C:\Users\CESAIR~1\AppData\Local\Temp\Rtmp8GXPQ6\file6ed86c526a8a.json
#> Format: json
#> Read at: 2026-04-09 10:55:57
#>
#> # A tibble: 5 × 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> * <dbl> <int> <int> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2Native R serialisation formats — no extra packages needed:
tmp_rds <- tempfile(fileext = ".rds")
saveRDS(iris, tmp_rds)
rds_data <- tl_read_rds(tmp_rds)
nrow(rds_data)
#> [1] 150tmp_rdata <- tempfile(fileext = ".rdata")
my_data <- mtcars
save(my_data, file = tmp_rdata)
# Name is auto-detected when there is a single data frame
rdata_data <- tl_read_rdata(tmp_rdata)
nrow(rdata_data)
#> [1] 32All database readers use DBI as the interface layer. Each reader manages its own connection lifecycle — connect, query, disconnect — so you only need to provide the path or credentials and a SQL query.
The simplest database backend — no server required:
library(DBI)
library(RSQLite)
# Create an example database
tmp_db <- tempfile(fileext = ".sqlite")
conn <- dbConnect(SQLite(), tmp_db)
dbWriteTable(conn, "iris_tbl", iris)
dbDisconnect(conn)
# Read with tl_read_sqlite
db_data <- tl_read_sqlite(
tmp_db,
"SELECT * FROM iris_tbl WHERE Species = 'setosa'"
)
nrow(db_data)
#> [1] 50If you already have a DBI connection, use tl_read_db()
directly — it will not close your connection:
conn <- dbConnect(SQLite(), ":memory:")
dbWriteTable(conn, "mtcars_tbl", mtcars)
sql <- "SELECT mpg, wt, hp FROM mtcars_tbl WHERE mpg > 20"
db_result <- tl_read_db(conn, sql)
db_result
#> -- tidylearn data ---------
#> Source: SQLiteConnection: SELECT mpg, wt, hp FROM mtcars_tbl WHERE mpg > 20
#> Format: database
#> Read at: 2026-04-09 10:55:57
#>
#> # A tibble: 14 × 3
#> mpg wt hp
#> * <dbl> <dbl> <dbl>
#> 1 21 2.62 110
#> 2 21 2.88 110
#> 3 22.8 2.32 93
#> 4 21.4 3.22 110
#> 5 24.4 3.19 62
#> 6 22.8 3.15 95
#> 7 32.4 2.2 66
#> 8 30.4 1.62 52
#> 9 33.9 1.84 65
#> 10 21.5 2.46 97
#> 11 27.3 1.94 66
#> 12 26 2.14 91
#> 13 30.4 1.51 113
#> 14 21.4 2.78 109
dbDisconnect(conn)These require a running database server or cloud service. The API is the same — provide connection details and a SQL query:
# PostgreSQL
pg_data <- tl_read_postgres(
dsn = "localhost",
query = "SELECT * FROM sales WHERE year = 2025",
dbname = "analytics",
user = "myuser",
password = "mypass"
)
# MySQL / MariaDB # nolint: commented_code_linter.
mysql_data <- tl_read_mysql(
dsn = "mysql://user:pass@host:3306/mydb",
query = "SELECT * FROM customers LIMIT 1000"
)
# BigQuery
bq_data <- tl_read_bigquery(
project = "my-gcp-project",
query = "SELECT * FROM `dataset.table` LIMIT 1000"
)Downloads a file from S3 and auto-detects the format from the key’s extension. Requires valid AWS credentials:
Downloads raw files directly from a repository. Accepts full URLs or
owner/repo shorthand:
Downloads datasets via the Kaggle CLI. Install with
pip install kaggle and configure your API credentials:
data <- tl_read_kaggle("zillow/zecon", file = "Zip_time_series.csv")
data <- tl_read_kaggle("titanic", file = "train.csv", type = "competition")Real-world data is often split across multiple files. tidylearn handles three common patterns.
Pass a character vector to tl_read() — each file is read
and row-bound, with a source_file column tracking
origin:
dir <- tempdir()
write.csv(iris[1:50, ], file.path(dir, "batch1.csv"), row.names = FALSE)
write.csv(iris[51:100, ], file.path(dir, "batch2.csv"), row.names = FALSE)
paths <- file.path(dir, c("batch1.csv", "batch2.csv"))
combined <- tl_read(paths, .quiet = TRUE)
table(combined$source_file)
#>
#> batch1.csv batch2.csv
#> 50 50Point tl_read_dir() at a directory. Filter by format,
regex pattern, or scan recursively:
dir <- tempfile(pattern = "tl_vignette_")
dir.create(dir)
write.csv(iris[1:50, ], file.path(dir, "jan.csv"), row.names = FALSE)
write.csv(iris[51:100, ], file.path(dir, "feb.csv"), row.names = FALSE)
write.csv(iris[101:150, ], file.path(dir, "mar.csv"), row.names = FALSE)
# Read all CSVs from the directory
all_data <- tl_read_dir(dir, format = "csv", .quiet = TRUE)
nrow(all_data)
#> [1] 150
table(all_data$source_file)
#>
#> feb.csv jan.csv mar.csv
#> 50 50 50# Filter with a regex pattern
subset <- tl_read_dir(dir, pattern = "^(jan|feb)", .quiet = TRUE)
nrow(subset)
#> [1] 100Passing a directory path directly to tl_read() works
too:
tl_read_zip() extracts the archive, auto-detects the
file format, and reads the contents. Select a specific file or let it
discover data files automatically:
# Create an example zip
dir <- tempfile(pattern = "tl_zip_src_")
dir.create(dir)
write.csv(iris, file.path(dir, "iris.csv"), row.names = FALSE)
zip_path <- tempfile(fileext = ".zip")
old_wd <- getwd()
setwd(dir)
utils::zip(zip_path, "iris.csv")
setwd(old_wd)
zip_data <- tl_read_zip(zip_path, .quiet = TRUE)
nrow(zip_data)
#> [1] 150
attr(zip_data, "tl_format")
#> [1] "zip+csv"Zip files are also auto-detected by tl_read():
tidylearn_data ClassEvery reader returns a tidylearn_data object — a tibble
subclass with three metadata attributes:
| Attribute | Description |
|---|---|
tl_source |
File path, URL, or description of the data source |
tl_format |
Detected or specified format (e.g., "csv",
"zip+csv") |
tl_timestamp |
POSIXct timestamp of when the data was read |
Because tidylearn_data inherits from
tbl_df, all dplyr verbs, ggplot2, and tidylearn functions
work transparently:
tmp <- tempfile(fileext = ".csv")
write.csv(mtcars, tmp, row.names = FALSE)
data <- tl_read(tmp, .quiet = TRUE)
# Check metadata
attr(data, "tl_format")
#> [1] "csv"
# Works with dplyr
data %>%
filter(mpg > 20) %>%
select(mpg, wt, hp) %>%
head(3)
#> -- tidylearn data ---------
#> Source: C:\Users\CESAIR~1\AppData\Local\Temp\Rtmp8GXPQ6\file6ed869c2410e.csv
#> Format: csv
#> Read at: 2026-04-09 10:55:58
#>
#> # A tibble: 3 × 3
#> mpg wt hp
#> <dbl> <dbl> <dbl>
#> 1 21 2.62 110
#> 2 21 2.88 110
#> 3 22.8 2.32 93Combining tl_read() with the rest of tidylearn gives you
a complete workflow from raw data to published results:
# 1. Ingest
tmp <- tempfile(fileext = ".csv")
write.csv(iris, tmp, row.names = FALSE)
data <- tl_read(tmp, .quiet = TRUE)
# CSV files lose factor information, so convert character columns as needed
data <- data %>% mutate(Species = as.factor(Species))
# 2. Split
split <- tl_split(data, prop = 0.7, stratify = "Species", seed = 42)
# 3. Model
model <- tl_model(split$train, Species ~ ., method = "forest")
# 4. Evaluate
eval_result <- tl_evaluate(model, new_data = split$test)
eval_result
#> # A tibble: 1 × 2
#> metric value
#> <chr> <dbl>
#> 1 accuracy 0.933| Format | Function | Backend | Dependency |
|---|---|---|---|
| CSV | tl_read_csv() |
readr / base R | Suggests (readr) |
| TSV | tl_read_tsv() |
readr / base R | Suggests (readr) |
| Excel | tl_read_excel() |
readxl | Suggests |
| Parquet | tl_read_parquet() |
nanoparquet | Suggests |
| JSON | tl_read_json() |
jsonlite | Suggests |
| RDS | tl_read_rds() |
base R | None |
| RData | tl_read_rdata() |
base R | None |
| SQLite | tl_read_sqlite() |
DBI + RSQLite | Suggests |
| PostgreSQL | tl_read_postgres() |
DBI + RPostgres | Suggests |
| MySQL | tl_read_mysql() |
DBI + RMariaDB | Suggests |
| BigQuery | tl_read_bigquery() |
bigrquery | Suggests |
| S3 | tl_read_s3() |
paws.storage | Suggests |
| GitHub | tl_read_github() |
base R | None |
| Kaggle | tl_read_kaggle() |
Kaggle CLI | None (system) |
| Directory | tl_read_dir() |
(dispatches) | — |
| Zip | tl_read_zip() |
base R + (dispatches) | — |
| Multi-path | tl_read(c(...)) |
(dispatches) | — |
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.