The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Tools for cleaning high-frequency real-time location tracking data.
trackclean was developed to process data from playground
movement research, but applies to any study collecting high-frequency
positional data from people moving within a defined space — classrooms,
sports facilities, rehabilitation settings, and similar
environments.
# Install from CRAN
install.packages("trackclean")
# Or install the development version from GitHub
# install.packages("devtools")
devtools::install_github("tomasbil/trackclean")The package includes a small example dataset that can be used to trial the full pipeline without any real data. It simulates 10 children tracked during a school recess on a 40m × 60m playground using a UWB positioning system.
library(trackclean)
library(readr)
raw_data <- read_csv(system.file("extdata", "raw_tracking_data.csv", package = "trackclean"))
id_mapping <- system.file("extdata", "id_mapping.csv", package = "trackclean")The example dataset includes: - 10 participants with raw tag IDs
1–10, mapped to child IDs 5001–5010 - ~13.5 minutes of data
(11:45:00–11:58:30), with observations both inside and outside the
analysis window - Sub-second timestamps causing multiple readings per
second — handled by standardize_to_seconds() - Randomly
dropped seconds creating gaps — handled by
interpolate_gaps() - One tag replacement: participant 5003
starts on raw tag ID 3, which is swapped to raw tag ID 11 at 11:51:00 —
handled by fix_tag_replacement()
Analysis parameters for this dataset:
| Parameter | Value |
|---|---|
analyze_start |
"2025-03-18 11:47:00" |
analyze_end |
"2025-03-18 11:57:00" |
bell_start |
"2025-03-18 11:53:00" |
bell_end |
"2025-03-18 11:58:00" |
| Tag replacement | raw_id 3 → raw_id 11 at "2025-03-18 11:51:00" |
Raw tracking data
(raw_tracking_data.csv):
| ID | At | X | Y |
|---|---|---|---|
| 1 | 2025-03-18 11:45:00.00 | 5.000 | 10.000 |
| 1 | 2025-03-18 11:45:01.00 | 5.383 | 10.239 |
| 1 | 2025-03-18 11:45:01.47 | 5.341 | 10.261 |
| … |
ID: raw tag ID as assigned by the tracking systemAt: timestamp (POSIXct-readable, sub-second precision
supported)X, Y: position in metersID mapping (id_mapping.csv):
| raw_id | child_id |
|---|---|
| 1 | 5001 |
| 3 | 5003 |
| 11 | 5003 |
| … |
raw_id: tag ID as it appears in the raw datachild_id: standardized participant ID to use in
analysischild_id)If a participant’s tag was replaced during data collection, run this before the main pipeline:
raw_data <- fix_tag_replacement(
data = raw_data,
original_id = 3,
replacement_id = 11,
replacement_time = "2025-03-18 11:51:00"
)This will: - Keep observations from tag 3 before 11:51 - Rename tag 11 observations from 11:51 onwards to tag 3 - Remove tag 3 observations from 11:51 onwards (duplicate/invalid) - Remove tag 11 observations before 11:51 (not yet attached)
Create a CSV file with two columns mapping raw device IDs to your participant IDs:
raw_id,child_id
1,5001
2,5002
3,5003
Or use the bundled example file:
id_mapping <- system.file("extdata", "id_mapping.csv", package = "trackclean")library(trackclean)
library(readr)
raw_data <- read_csv(system.file("extdata", "raw_tracking_data.csv", package = "trackclean"))
# Fix tag replacement first (if applicable)
raw_data <- fix_tag_replacement(
data = raw_data,
original_id = 3,
replacement_id = 11,
replacement_time = "2025-03-18 11:51:00"
)
cleaned_data <- clean_playground_data(
data = raw_data,
id_mapping = system.file("extdata", "id_mapping.csv", package = "trackclean"),
analyze_start = "2025-03-18 11:47:00",
analyze_end = "2025-03-18 11:57:00",
bell_start = "2025-03-18 11:53:00",
bell_end = "2025-03-18 11:58:00",
output_file = "cleaned_data.csv"
)For more control, run each step separately:
# Step 1: Map IDs
data <- map_ids(raw_data, id_mapping)
# Step 2: Mark time periods
data <- mark_time_periods(
data,
analyze_start = "2025-03-18 11:47:00",
analyze_end = "2025-03-18 11:57:00",
bell_start = "2025-03-18 11:53:00",
bell_end = "2025-03-18 11:58:00"
)
# Step 3: Standardize to seconds
data <- standardize_to_seconds(data)
# Step 4: Interpolate gaps
data <- interpolate_gaps(
data,
max_gap_small = 10,
max_position_change = 0.3
)The package uses a two-phase approach to handle missing data:
Phase 1: Interpolates small gaps (≤10 seconds by default) - Uses linear interpolation between known points - Appropriate for brief signal losses
Phase 2: Interpolates larger gaps conditionally - Only when position change between endpoints is minimal (≤30cm by default) - Indicates the participant remained stationary during the gap - Prevents false movement estimates for longer signal dropouts
All functions provide: - Progress messages and summaries - Data integrity checks - Row count validation - Clear flagging of imputed vs. original data
| Function | Purpose |
|---|---|
clean_playground_data() |
Complete pipeline in one call |
fix_tag_replacement() |
Fix tag replacements (run before pipeline) |
map_ids() |
Map raw device IDs to participant IDs |
mark_time_periods() |
Create Analyze and Bell columns |
standardize_to_seconds() |
Aggregate to one-second intervals |
interpolate_gaps() |
Two-phase gap interpolation |
The cleaned dataset includes these flags:
id_code: Standardized participant IDAnalyze: 1 if within analysis period, 0 otherwiseBell: 1 if within bell period, 0 otherwise (if
specified)n_entries: Original number of signals in that
secondstandardized: 1 if multiple signals were averaged, 0
otherwiseimputed: 1 if row added via phase 1 interpolationimputed_large: 1 if row added via phase 2
interpolationcleaned_data <- clean_playground_data(
data = raw_data,
id_mapping = "id_mapping.csv",
analyze_start = "2025-03-18 11:47:00",
analyze_end = "2025-03-18 11:57:00",
max_gap_small = 5, # Phase 1: ≤5 seconds
max_gap_large = 30, # Phase 2: ≤30 seconds max
max_position_change = 0.5 # Phase 2: ≤50cm movement
)Tomas Bilevicius
CC BY 4.0 — you are free to use, share, and adapt this package for any purpose, including commercially, as long as you give appropriate credit to the author.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.