| Type: | Package |
| Title: | Longitudinal Sports Analytics Asset and Workload Feature Processing |
| Version: | 0.1.0 |
| Description: | A synthetic, longitudinal athletic dataset generated through a transparent, rule-based simulation engine. Captures individual activity sessions across multiple athletes, environmental conditions, and physiological responses. Specifically designed as an alternative to legacy teaching datasets by introducing realistic hierarchical repeated measures, complex two-way covariate interactions, and a deliberate Missing Not At Random (MNAR) tracking mechanism suitable for advanced imputation workflows. Methodologies implemented are based on van Buuren (2018) <doi:10.1201/9780429492259> and Bates et al. (2015) <doi:10.18637/jss.v067.i01>. |
| License: | MIT + file LICENSE |
| Depends: | R (≥ 4.1.0) |
| Imports: | tibble, mice, modelsummary, lme4 |
| Suggests: | tidyverse |
| Encoding: | UTF-8 |
| LazyData: | true |
| Config/roxygen2/version: | 8.0.0 |
| Config/Needs/editorial: | MNAR Buuren et al |
| NeedsCompilation: | no |
| Packaged: | 2026-06-24 19:39:55 UTC; abbasxma |
| Author: | Mohammad Abbas [aut, cre] |
| Maintainer: | Mohammad Abbas <ma.abbas3107@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-30 20:00:02 UTC |
sportsfeatures package documentation
Description
A synthetic, longitudinal athletic dataset generated through a transparent, rule-based simulation engine. Captures individual activity sessions across multiple athletes, environmental conditions, and physiological responses. Specifically designed as an alternative to legacy teaching datasets by introducing realistic hierarchical repeated measures, complex two-way covariate interactions, and a deliberate Missing Not At Random (MNAR) tracking mechanism suitable for advanced imputation workflows. Methodologies implemented are based on van Buuren (2018) doi:10.1201/9780429492259 and Bates et al. (2015) doi:10.18637/jss.v067.i01.
Author(s)
Maintainer: Mohammad Abbas ma.abbas3107@gmail.com
Authors:
Mohammad Abbas ma.abbas3107@gmail.com
Access Sports Feature Datasets
Description
A convenient helper function to quickly load and return the package's internal sports features data assets directly into an active variable.
Usage
get_sportsdata(type = c("complete", "missing"))
Arguments
type |
A character string specifying which dataset variant to load:
|
Value
A tibble/data.frame containing the requested sports feature dataset.
Examples
# Get the clean complete dataset
clean_data <- get_sportsdata(type = "complete")
# Get the dataset containing systematic missingness
missing_data <- get_sportsdata(type = "missing")
Comprehensive Sports Features Dataset
Description
Comprehensive Sports Features Dataset
Usage
sports_features
Format
A tibble or data frame with 25 variables describing athlete sessions and performance metrics:
- session_id
Unique alphanumeric identifier for each training session.
- athlete_id
Unique alphanumeric identifier for each athlete.
- datetime
Timestamp of when the training session occurred.
- activity_type
Type of exercise performed (e.g., running, cycling, swimming).
- region
Geographical area where the session took place.
- distance_km
Total distance covered during the session in kilometers.
- weather_type
Weather condition during the session (e.g., sunny, rainy, cloudy).
- temperature_c
Ambient outdoor temperature in degrees Celsius.
- personal_status
Pre-activity physical or mental status reported by the athlete.
- is_group_activity
Logical indicator (TRUE/FALSE) if the session was done with a group.
- gender
Categorical gender of the athlete.
- age
Age of the athlete in years.
- base_fitness
Baseline fitness score of the athlete.
- base_speed
Baseline average speed capability of the athlete.
- base_stamina
Baseline stamina level of the athlete.
- base_weight
Baseline body weight of the athlete in kilograms.
- resting_heart_rate
Baseline resting heart rate in beats per minute (bpm).
- device_type
Type of tracking device used during the session.
- speed_kmh
Average speed maintained throughout the session in km/h.
- duration_min
Total duration of the training session in minutes.
- heart_rate_avg
Average heart rate monitored during the session in bpm.
- calories_burned
Estimated total energy expenditure in kilocalories (kcal).
- exhaustion_level
Subjective exhaustion level reported after the session.
- hydration_status
Hydration level (%) recorded during or after the session.
- fatigue_score
Calculated post-activity fatigue accumulation score.
Details
A rich, synthetic sports analytics dataset containing tracking metrics, environmental contexts, physiological markers, and performance data for athletes.
Source
Synthesized sports features analytics framework.
Examples
library(tidyverse)
library(lme4)
# Load the package data
data("sports_features")
# Downsample data for the example to ensure fast execution time (< 2.5s)
demo_data <- head(sports_features, 500)
# ----------------------------------------------------
# DEMO 1: Linear Regression (Fixed Effects)
# Predicting fatigue score based on workload metrics
# ----------------------------------------------------
lm_model <- lm(fatigue_score ~ distance_km + duration_min + speed_kmh + temperature_c,
data = demo_data)
summary(lm_model)
# ----------------------------------------------------
# DEMO 2: Linear Mixed-Effects Model (Hierarchical MML)
# Controlling for variation across individual athletes (athlete_id)
# ----------------------------------------------------
mml_model <- lmer(fatigue_score ~ distance_km + duration_min + speed_kmh + temperature_c +
(1 | athlete_id),
data = demo_data)
summary(mml_model)
Comprehensive Sports Features Dataset (With Missing Values)
Description
Comprehensive Sports Features Dataset (With Missing Values)
Usage
sports_features_missing
Format
A tibble or data frame with 25 variables containing structured missing values.
Details
A variant of the core sports analytics dataset containing structured missingness (NA values) across performance tracking columns to demonstrate imputation workflows.
Source
Synthesized sports features analytics framework.