The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Transfer REDCap data to a database in chunks and use in R without exceeding available memory. Whether you have a large or small project, enjoy features such as data labeling, converting coded values, and hearing a “quack” sound on success. Compatible with all databases but optimized for DuckDB.
Is your project outgrowing your computer? Have you seen this error when using the REDCap API to retrieve data?
Error: vector memory limit of 16.0 GB reached, see mem.maxVSize()
What does it mean? Well, R objects a stored in your random access memory (RAM). When your data gets too big, you hit your memory limit. redquack’s solution to this error is to store the data out of memory in a local database for easy retrieval in R.
The solution:
API requests are handled by httr2, which persistently retries to ensure your data is transferred successfully.
From CRAN:
# install.packages("pak")
::pak("redquack") pak
From GitHub (development version):
::pak("dylanpieper/redquack") pak
These packages are also used in the examples:
::pak(c("keyring", "dplyr")) pak
Your API token allows R to interface with your REDCap instance, and it should be stored securely. I recommend using the keyring package to store your API token. For example:
::key_set("redcap", "test") keyring
Use this token to run the examples: 9A81268476645C4E5F03428B8AC3AA7B
Data from REDCap is transferred to a database connection in chunks of record IDs:
library(redquack)
library(dplyr)
<- use_duckdb()
conn
<- redcap_to_db(
result
conn,url = "https://bbmc.ouhsc.edu/redcap/api/",
token = keyring::key_get("redcap", "test")
)
redcap_to_db()
returns a list of metadata with class
redcap_transfer_result
:
success
: Logical if the transfer was completed with no
failed processingerror_chunks
: Vector of chunk numbers that failed
processingtime_s
: Numeric value for total seconds to process all
dataThe database created by redcap_to_db()
contains up to
four tables:
data
: Contains the raw REDCap records
<- tbl_redcap(conn) |> collect() data
metadata
: Contains project metadata for labeling and
coded value conversion
<- metadata(conn) meta
redcap_log
: Contains REDCap audit logs from the past
week (default)
<- redcap_log(conn) redcap_log
transfer_log
: Contains operation logs of the
transfer process
<- transfer_log(conn) transfer_log
Data is initially inserted into the database as VARCHAR/TEXT for consistent handling across chunks.
For DuckDB, data types are optimized after transfer to improve query performance:
You can also query the database to inspect the data types:
inspect(conn)
Retrieve and manipulate your REDCap data with familiar dplyr verbs and tbl-like
syntax. The key difference is you reference the database table first
using tbl_redcap()
.
Retrieve your REDCap data as a single table:
<- tbl_redcap(conn) |>
data collect()
Retrieve data organized by REDCap instruments as a list:
<- tbl_redcap(conn) |>
instruments collect_list()
Assign instrument data frames to the global environment:
tbl_redcap(conn) |>
collect_list() |>
list_to_env()
Use dplyr verbs to filter and select data before collecting:
# Filter to specific participants
<- tbl_redcap(conn) |>
nutmouse_data filter(name_last == "Nutmouse") |>
collect_labeled_list()
# Select specific columns across all instruments
<- tbl_redcap(conn) |>
key_vars select(record_id, email, sex, bmi) |>
collect_list()
# Simple column selection
<- tbl_redcap(conn) |>
analysis_data select(email, sex) |>
collect_labeled()
Perform complex data manipulation with grouping and filtering:
# Group by sex and filter to below-average BMI
<- tbl_redcap(conn) |>
below_avg_bmi select(record_id, sex, bmi) |>
group_by(sex) |>
filter(bmi < mean(bmi)) |>
arrange(bmi) |>
collect_list()
Apply column and coded value labels and control value conversion
(i.e., choice mappings). collect_labeled()
and
collect_labeled_list()
give you control over how these
labels are applied:
Full labeling and coded value conversion (default):
<- tbl_redcap(conn) |> collect_labeled()
data
$sex
data#> [1] "Female" "Male" "Male" "Female" "Male"
attr(,"label")
#> [1] "Gender"
Keep raw coded values:
<- tbl_redcap(conn) |> collect_labeled(convert = FALSE)
data
$sex
data#> <labelled<integer>[5]>: Gender
#> [1] 0 1 1 0 1
#>
#> Labels:
#> value label
#> 0 Female
#> 1 Male
Column labels only:
<- tbl_redcap(conn) |> collect_labeled(vals = FALSE)
data
$sex
data#> [1] 0 1 1 0 1
#> attr(,"label")
#> [1] "Gender"
Value conversion only:
<- tbl_redcap(conn) |> collect_labeled(cols = FALSE)
data
$sex
data#> [1] "Female" "Male" "Male" "Female" "Male"
attr(data$sex, "label")
#> NULL
You can also write a Parquet file directly from DuckDB. A Parquet file will be much smaller than a DuckDB file and easy to share:
save_parquet(conn, "redcap.parquet")
When you’re finished working with your data, clean up your connection:
close_duckdb(conn)
Or, if you do not need to access the data again, remove/delete the file:
remove_duckdb(conn)
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.