The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

vald.extractor

Robust Pipeline for VALD ForceDecks Data Extraction and Analysis

vald.extractor extends the valdr package by providing a production-ready, fault-tolerant pipeline for extracting, cleaning, and visualizing VALD ForceDecks data across multiple sports. Designed for CRAN submission with comprehensive documentation and enterprise-grade error handling.

The Problem

Organizations using VALD ForceDecks face three critical challenges:

API Stability: Manual exports or large data pulls frequently timeout, causing incomplete datasets
Data Cleaning: Team/sport names are inconsistent (“Football” vs “Soccer” vs “FSI”), requiring hours of manual categorization
Code Duplication: Analyzing multiple test types (CMJ, DJ, ISO) requires duplicate code for each metric suffix

The Solution

vald.extractor solves these problems through:

Chunked Batch Processing: Extracts data in manageable chunks (100 tests at a time) with fault-tolerant error handling
Automated Sports Taxonomy: Regex-based pattern matching standardizes inconsistent naming conventions
Generic Programming: Strips test-type suffixes to enable writing analysis code once that works for all tests

Installation

# Install from CRAN (when available)
install.packages("vald.extractor")

# Or install development version from GitHub
# install.packages("devtools")
devtools::install_github("praveenmaths89/vald.extractor")

Quick Start

library(vald.extractor)

# 1. Set VALD credentials
valdr::set_credentials(
  client_id     = "your_client_id",
  client_secret = "your_client_secret",
  tenant_id     = "your_tenant_id",
  region        = "aue"
)

# 2. Fetch test and trial data in chunks (prevents timeout)
vald_data <- fetch_vald_batch(
  start_date = "2020-01-01T00:00:00Z",
  chunk_size = 100
)

# 3. Fetch and standardize athlete metadata
metadata <- fetch_vald_metadata(
  client_id     = "your_client_id",
  client_secret = "your_client_secret",
  tenant_id     = "your_tenant_id"
)

athlete_metadata <- standardize_vald_metadata(
  profiles = metadata$profiles,
  groups   = metadata$groups
)

# 4. Apply automated sports classification
athlete_metadata <- classify_sports(athlete_metadata)
table(athlete_metadata$sports_clean)

# 5. Transform to wide format and join with metadata
# ... (see vignette for complete pipeline)

# 6. Split by test type with suffix removal
test_datasets <- split_by_test(final_analysis_data)

cmj_data <- test_datasets$CMJ  # Column names: "PEAK_FORCE_Both", not "PEAK_FORCE_Both_CMJ"
dj_data  <- test_datasets$DJ   # Same column names enable generic analysis

# 7. Generate summary statistics
summary_vald_metrics(cmj_data, group_vars = c("sex", "sports"))

# 8. Visualize trends and comparisons
plot_vald_trends(cmj_data, metric_col = "PEAK_FORCE_Both", group_col = "profileId")
plot_vald_compare(cmj_data, metric_col = "JUMP_HEIGHT_Both", group_col = "sports", fill_col = "sex")

Key Features

1. Fault-Tolerant Batch Extraction

# Processes 5000 tests without timeout errors
vald_data <- fetch_vald_batch(
  start_date = "2020-01-01T00:00:00Z",
  chunk_size = 100,  # Adjust based on API performance
  verbose = TRUE
)

# If chunk 23 fails, chunks 1-22 and 24+ still succeed
# Error messages indicate which rows failed for debugging

Why it matters: Organizations with large historical datasets (5000+ tests) cannot extract data in a single API call. The chunked approach with tryCatch error handling ensures partial extraction succeeds even if some chunks fail.

2. Automated Sports Taxonomy

metadata <- classify_sports(metadata, group_col = "all_group_names")

# Before:
# "Team A - Football", "Soccer U18", "FSI Elite", "Basketball", "BBall"

# After:
# "Football", "Football", "Football", "Basketball", "Basketball"

table(metadata$sports_clean)
#> Football    Basketball    Cricket    Swimming    Track & Field
#>      523           198        145          87              234

The Value Add: Multi-sport organizations waste hours manually categorizing athletes. This regex-based system handles 15+ sports out-of-the-box and is easily extensible.

3. Generic Test-Type Analysis

# Write analysis code ONCE that works for ALL test types
analyze_bilateral_asymmetry <- function(test_data) {
  test_data %>%
    mutate(
      asymmetry = (PEAK_FORCE_Left - PEAK_FORCE_Right) /
                  ((PEAK_FORCE_Left + PEAK_FORCE_Right) / 2) * 100
    )
}

# Apply to CMJ, DJ, ISO without code changes
test_datasets <- split_by_test(final_data)
cmj_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$CMJ)
dj_with_asymmetry  <- analyze_bilateral_asymmetry(test_datasets$DJ)
iso_with_asymmetry <- analyze_bilateral_asymmetry(test_datasets$ISO)

DRY Principle: Without suffix removal, you’d need separate code for PEAK_FORCE_Left_CMJ, PEAK_FORCE_Left_DJ, etc. This package enables true generic programming.

4. Metadata Patching

# Fix missing/incorrect demographics from external Excel file
cmj_data <- patch_metadata(
  data = cmj_data,
  patch_file = "corrections.xlsx",
  fields_to_patch = c("sex", "dateOfBirth")
)

# Unknown values are replaced with corrections
table(cmj_data$sex)
#> Before: Male: 450, Female: 380, Unknown: 45
#> After:  Male: 470, Female: 405, Unknown: 0

5. Publication-Ready Visualizations

# Longitudinal trends
plot_vald_trends(
  data = cmj_data,
  metric_col = "JUMP_HEIGHT_Both",
  group_col = "profileId",
  facet_col = "sports"
)

# Cross-sectional comparisons
plot_vald_compare(
  data = cmj_data,
  metric_col = "PEAK_FORCE_Both",
  group_col = "sports",
  fill_col = "sex"
)

Documentation

Vignette: End-to-End Pipeline: From API to Multi-Sport Analysis
Function Reference: ?fetch_vald_batch, ?standardize_vald_metadata, ?split_by_test, etc.
GitHub Repository: https://github.com/praveenmaths89/vald.extractor

Production Use Cases

vald.extractor is designed for:

Multi-Sport Organizations: National sport institutes, university athletic departments, professional academies
Longitudinal Research: Track athlete development over months/years with automated weekly updates
Cross-Sectional Studies: Compare performance across sports, sexes, age groups
Clinical Settings: Monitor return-to-sport progression, ACL rehab, injury risk

Comparison to Manual Workflow

Task	Manual Workflow	vald.extractor
Extract 5000 tests	❌ API timeout errors	✅ Chunked processing (15 min)
Classify 500 athletes into sports	❌ 2-3 hours manual work	✅ Automated (30 sec)
Analyze CMJ, DJ, ISO separately	❌ Duplicate code for each	✅ Generic functions
Handle missing demographics	❌ Manual data entry	✅ Excel patch import
Generate summary tables	❌ Custom scripts	✅ `summary_vald_metrics()`
Create visualizations	❌ ggplot2 from scratch	✅ Pre-built themes

Roadmap for R Journal Submission

The R Journal article will focus on:

Technical Innovation: Chunked extraction architecture with fault tolerance
Domain Contribution: Automated sports taxonomy as a time-saving tool for practitioners
Software Engineering: Modular design, comprehensive testing, CRAN-compliant documentation
Reproducible Research: Complete workflow from raw API to publication figures

Key Message: “Automating domain-specific data taxonomy for multi-organizational sports science”

Citation

If you use vald.extractor in published research, please cite:

Chougale PD, Ananthakumar U (2026). vald.extractor: Robust Pipeline for VALD
ForceDecks Data Extraction and Analysis. R package version 0.1.0.
https://github.com/praveenmaths89/vald.extractor

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/new-sport-taxonomy)
Add tests for new functionality
Submit a pull request

Common contributions:

Add patterns for new sports in classify_sports()
Improve error messages
Add new visualization themes
Extend to other VALD devices (NordBord, DynaMo, etc.)

License

MIT License - see LICENSE file for details.

Acknowledgments

VALD Performance: For providing the ForceDecks API and valdr package
tidyverse team: For creating the tools that make this package possible
Sports scientists: Who provided real-world use cases and taxonomy requirements

Support

Issues: GitHub Issues
Email: praveenmaths89@gmail.com
Documentation: Run vignette("end-to-end-pipeline", package = "vald.extractor")

Status: Ready for CRAN submission pending: - [ ] Final testing on multiple VALD tenants - [ ] CRAN comment responses - [ ] Logo design (hex sticker) - [ ] pkgdown website deployment

Maintainer: Praveen D Chougale (praveenmaths89@gmail.com)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.