The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

vald.extractor

CRAN status R-CMD-check

Robust Pipeline for VALD ForceDecks Data Extraction and Analysis

vald.extractor extends the valdr package by providing a production-ready, fault-tolerant pipeline for extracting, cleaning, and visualizing VALD ForceDecks data across multiple sports. Designed for CRAN submission with comprehensive documentation and enterprise-grade error handling.

The Problem

Organizations using VALD ForceDecks face three critical challenges:

  1. API Stability: Manual exports or large data pulls frequently timeout, causing incomplete datasets
  2. Data Cleaning: Team/sport names are inconsistent (“Football” vs “Soccer” vs “FSI”), requiring hours of manual categorization
  3. Code Duplication: Analyzing multiple test types (CMJ, DJ, ISO) requires duplicate code for each metric suffix

The Solution

vald.extractor solves these problems through:

Installation

# Install from CRAN (when available)
install.packages("vald.extractor")

# Or install development version from GitHub
# install.packages("devtools")
devtools::install_github("praveenmaths89/vald.extractor")

Quick Start

library(vald.extractor)

# 1. Set VALD credentials
valdr::set_credentials(
  client_id     = "your_client_id",
  client_secret = "your_client_secret",
  tenant_id     = "your_tenant_id",
  region        = "aue"
)

# 2. Fetch test and trial data in chunks (prevents timeout)
vald_data <- fetch_vald_batch(
  start_date = "2020-01-01T00:00:00Z",
  chunk_size = 100
)

# 3. Fetch and standardize athlete metadata
metadata <- fetch_vald_metadata(
  client_id     = "your_client_id",
  client_secret = "your_client_secret",
  tenant_id     = "your_tenant_id"
)

athlete_metadata <- standardize_vald_metadata(
  profiles = metadata$profiles,
  groups   = metadata$groups
)

# 4. Apply automated sports classification
athlete_metadata <- classify_sports(athlete_metadata)
table(athlete_metadata$sports_clean)

# 5. Transform to wide format and join with metadata
# ... (see vignette for complete pipeline)

# 6. Split by test type with suffix removal
test_datasets <- split_by_test(final_analysis_data)

cmj_data <- test_datasets$CMJ  # Column names: "PEAK_FORCE_Both", not "PEAK_FORCE_Both_CMJ"
dj_data  <- test_datasets$DJ   # Same column names enable generic analysis

# 7. Generate summary statistics
summary_vald_metrics(cmj_data, group_vars = c("sex", "sports"))

# 8. Visualize trends and comparisons
plot_vald_trends(cmj_data, metric_col = "PEAK_FORCE_Both", group_col = "profileId")
plot_vald_compare(cmj_data, metric_col = "JUMP_HEIGHT_Both", group_col = "sports", fill_col = "sex")

Key Features

1. Fault-Tolerant Batch Extraction

# Processes 5000 tests without timeout errors
vald_data <- fetch_vald_batch(
  start_date = "2020-01-01T00:00:00Z",
  chunk_size = 100,  # Adjust based on API performance
  verbose = TRUE
)

# If chunk 23 fails, chunks 1-22 and 24+ still succeed
# Error messages indicate which rows failed for debugging

Why it matters: Organizations with large historical datasets (5000+ tests) cannot extract data in a single API call. The chunked approach with tryCatch error handling ensures partial extraction succeeds even if some chunks fail.

2. Automated Sports Taxonomy

metadata <- classify_sports(metadata, group_col = "all_group_names")

# Before:
# "Team A - Football", "Soccer U18", "FSI Elite", "Basketball", "BBall"

# After:
# "Football", "Football", "Football", "Basketball", "Basketball"

table(metadata$sports_clean)
#> Football    Basketball    Cricket    Swimming    Track & Field
#>      523           198        145          87              234

The Value Add: Multi-sport organizations waste hours manually categorizing athletes. This regex-based system handles 15+ sports out-of-the-box and is easily extensible.

3. Generic Test-Type Analysis

# Write analysis code ONCE that works for ALL test types
analyze_bilateral_asymmetry <- function(test_data) {
  test_data %>%
    mutate(
      asymmetry = (PEAK_FORCE_Left - PEAK_FORCE_Right) /
                  ((PEAK_FORCE_Left + PEAK_FORCE_Right) / 2) * 100
    )
}

# Apply to CMJ, DJ, ISO without code changes
test_datasets <- split_by_test(final_data)
cmj_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$CMJ)
dj_with_asymmetry  <- analyze_bilateral_asymmetry(test_datasets$DJ)
iso_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$ISO)

DRY Principle: Without suffix removal, you’d need separate code for PEAK_FORCE_Left_CMJ, PEAK_FORCE_Left_DJ, etc. This package enables true generic programming.

4. Metadata Patching

# Fix missing/incorrect demographics from external Excel file
cmj_data <- patch_metadata(
  data = cmj_data,
  patch_file = "corrections.xlsx",
  fields_to_patch = c("sex", "dateOfBirth")
)

# Unknown values are replaced with corrections
table(cmj_data$sex)
#> Before: Male: 450, Female: 380, Unknown: 45
#> After:  Male: 470, Female: 405, Unknown: 0

5. Publication-Ready Visualizations

# Longitudinal trends
plot_vald_trends(
  data = cmj_data,
  metric_col = "JUMP_HEIGHT_Both",
  group_col = "profileId",
  facet_col = "sports"
)

# Cross-sectional comparisons
plot_vald_compare(
  data = cmj_data,
  metric_col = "PEAK_FORCE_Both",
  group_col = "sports",
  fill_col = "sex"
)

Documentation

Production Use Cases

vald.extractor is designed for:

Comparison to Manual Workflow

Task Manual Workflow vald.extractor
Extract 5000 tests ❌ API timeout errors ✅ Chunked processing (15 min)
Classify 500 athletes into sports ❌ 2-3 hours manual work ✅ Automated (30 sec)
Analyze CMJ, DJ, ISO separately ❌ Duplicate code for each ✅ Generic functions
Handle missing demographics ❌ Manual data entry ✅ Excel patch import
Generate summary tables ❌ Custom scripts summary_vald_metrics()
Create visualizations ❌ ggplot2 from scratch ✅ Pre-built themes

Roadmap for R Journal Submission

The R Journal article will focus on:

  1. Technical Innovation: Chunked extraction architecture with fault tolerance
  2. Domain Contribution: Automated sports taxonomy as a time-saving tool for practitioners
  3. Software Engineering: Modular design, comprehensive testing, CRAN-compliant documentation
  4. Reproducible Research: Complete workflow from raw API to publication figures

Key Message: “Automating domain-specific data taxonomy for multi-organizational sports science”

Citation

If you use vald.extractor in published research, please cite:

Chougale PD, Anathakumar U (2026). vald.extractor: Robust Pipeline for VALD
ForceDecks Data Extraction and Analysis. R package version 0.1.0.
https://github.com/praveenmaths89/vald.extractor

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/new-sport-taxonomy)
  3. Add tests for new functionality
  4. Submit a pull request

Common contributions:

License

MIT License - see LICENSE file for details.

Acknowledgments

Support


Status: Ready for CRAN submission pending: - [ ] Final testing on multiple VALD tenants - [ ] CRAN comment responses - [ ] Logo design (hex sticker) - [ ] pkgdown website deployment

Maintainer: Praveen D Chougale (praveenmaths89@gmail.com)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.