| Title: | Classify Aquatic Animal Behaviours from Vertical Movement Data |
| Version: | 1.1.0 |
| Maintainer: | Calvin Beale <calvin.beale.8@gmail.com> |
| Description: | Quantitatively analyse depth time-series data from pop-up satellite archival tags (PSATs) through the application of continuous wavelet transformation (CWT) combined with Principal Component Analysis (PCA), and k-means clustering. Import, crop, and plot depth time-depth records (TDRs). Using CWT to detect important signals within the non-stationary data, we create daily wavelet statistics to summarise vertical movements on different wavelet periods and combine with daily and diel depth statistics. Classify depth time-series with unsupervised k-means clustering into 24-hour periods of vertical movement behaviour with distinct patterns of vertical movement. Plot example days from each behaviour cluster, and plot the TDR coloured by cluster. Based on principals of combining CWT with k-means first developed by Sakamoto (2009) <doi:10.1371/journal.pone.0005379> and redeveloped by Beale (2026) <doi:10.21203/rs.3.rs-6907076/v1>. |
| License: | GPL (≥ 3) |
| URL: | https://github.com/calvinsbeale/FishDiveR |
| BugReports: | https://github.com/calvinsbeale/FishDiveR/issues |
| Imports: | cluster, cowplot, data.table, dplyr, FactoMineR, geometry, ggplot2, gridExtra, lubridate, moments, patchwork, colorspace, rgl, Rfast, rlang, scales, suncalc, tidyr, WaveletComp |
| Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
| Depends: | R (≥ 3.5.0) |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2026-01-21 15:29:21 UTC; User |
| Author: | Calvin Beale |
| Repository: | CRAN |
| Date/Publication: | 2026-01-26 16:30:14 UTC |
FishDiveR: Classify Aquatic Animal Behaviours from Vertical Movement Data
Description
Quantitatively analyse depth time-series data from pop-up satellite archival tags (PSATs) through the application of continuous wavelet transformation (CWT) combined with Principal Component Analysis (PCA), and k-means clustering. Import, crop, and plot depth time-depth records (TDRs). Using CWT to detect important signals within the non-stationary data, we create daily wavelet statistics to summarise vertical movements on different wavelet periods and combine with daily and diel depth statistics. Classify depth time-series with unsupervised k-means clustering into 24-hour periods of vertical movement behaviour with distinct patterns of vertical movement. Plot example days from each behaviour cluster, and plot the TDR coloured by cluster. Based on principals of combining CWT with k-means first developed by Sakamoto (2009) doi:10.1371/journal.pone.0005379 and redeveloped by Beale (2026) doi:10.21203/rs.3.rs-6907076/v1.
Author(s)
Maintainer: Calvin Beale calvin.beale.8@gmail.com (ORCID) [copyright holder]
See Also
Useful links:
Report bugs at https://github.com/calvinsbeale/FishDiveR/issues
Import depth statistics and combine with PC scores
Description
This function imports the depth statistics from each of the tags listed in tag_vector, and outputs a combined data frame then combines the depth statistics from each tag with the principal component scores, and outputs a data frame with the appropriate unique_tag_ID if necessary, ready for use in k-means clustering.
Usage
combine_data(
tag_vector = tag_list,
data_folder = NULL,
pc_scores = scores,
output = FALSE,
output_folder = NULL,
verbose = FALSE
)
Arguments
tag_vector |
A character vector of tag IDs. E.g. 'c("123456", "456283", "AB98XJ"). |
data_folder |
Parent folder path with separate folders for each tag data. E.g. "C:/Tag data". Defaults to 'data_dir' |
pc_scores |
Data frame of principal component scores extracted through PCA on wavelet statistics. Output of 'pca_scores()' function. |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
Value
A data frame containing the combined depth statistics and principal component scores from each of the tags listed in tag_vector
Examples
# Set file path
filepath <- system.file("extdata", package = "FishDiveR")
# Load pc_results
pc_scores <- readRDS(file.path(filepath, "data/4_PCA/pc_scores.rds"))
# Run combine_data function
combined_stats <- combine_data(
tag_vector = "data",
data_folder = filepath,
pc_scores = pc_scores,
output = TRUE,
output_folder = tempdir(),
verbose = TRUE
)
Create depth statistics
Description
create_depth_stats creates the various daily and diel depth statistics
for each day
Usage
create_depth_stats(
archive,
tag_ID,
diel = FALSE,
sunrise_time = NULL,
sunset_time = NULL,
GPS = FALSE,
sunset_type = "civil",
output = FALSE,
output_folder = NULL,
verbose = FALSE
)
Arguments
archive |
Data frame containing processed time series depth data |
tag_ID |
Unique tag identification number in a vector of characters. E.g. "123456" |
diel |
Include diel statistics when TRUE |
sunrise_time |
Sunrise time (local time zone) in 24-hour clock. E.g. "05:45:00" |
sunset_time |
Sunset time (local time zone) in 24-hour clock. E.g. "18:30:00" |
GPS |
Either FALSE or the location of the GPS file containing columns 'date', 'lat' (latitude) and 'lon' (longitude) if one exists. 'date' columns must be in a format readable by lubridate::dmy() |
sunset_type |
Choose which type of sunset to include 'NULL', 'civil', 'nautical', or 'astronomical' |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
Value
A set of statistics calculated daily for the depth data. If diel is 'TRUE', additional diel statistics will be returned. An attribute 'diel' with value 'TRUE' is given when diel statistics are included.
Examples
# Set file path
filepath <- system.file("extdata", package = "FishDiveR")
# Load archive_days
archive_days <- readRDS(file.path(filepath, "data/archive_days.rds"))
# Run create_depth_stats function
depthStats <- create_depth_stats(
archive = archive_days,
tag_ID = "data",
diel = TRUE,
sunrise_time = "06:00:00",
sunset_time = "18:00:00",
GPS = file.path(filepath, "data/GPS.csv"),
sunset_type = "civil",
output = TRUE,
output_folder = tempdir(),
verbose = TRUE
)
Create and plot the wavelet power spectrum
Description
create_wavelet creates the a wavelet spectrum using WaveletComp package.
Optionally loads and plots an existing my.w object.
Usage
create_wavelet(
archive,
tag_ID,
wv_period_hours = 24,
sampling_frequency = NULL,
allow_irregular_sampling = FALSE,
load_existing_wavelet = FALSE,
suboctaves = 12,
lower_period_mins = 5,
upper_period_hours = 24,
pval = FALSE,
output = FALSE,
output_folder = NULL,
verbose = FALSE,
plot_wavelet = TRUE,
max_period_ticks = 10,
plot_width = 800,
plot_height = 400,
interactive_mode = TRUE
)
Arguments
archive |
Data frame containing processed time series depth data |
tag_ID |
Unique tag identification number in a vector of characters. E.g. "123456" |
wv_period_hours |
Time resolution in hours to calculate wavelet. Currently only supports the default of 24 hours as this package is created to investigate daily diving behaviour. Defaults to 24. |
sampling_frequency |
Sampling frequency of depth data in seconds. Defaults to time between first and second depth record. Recommended to leave blank. |
allow_irregular_sampling |
Allows irregular sampling interval in the dataset. Not recommended. Defaults to FALSE. |
load_existing_wavelet |
Load an existing my.w wavelet object from the output_folder. Defaults to FALSE. |
suboctaves |
number of suboctaves between each logarithmic period. E.g. between 24 and 12 hours. Highly recommended to use 12, for easy of interpretation of hours and signal present (daily, diel, tidal). |
lower_period_mins |
Lower period of the wavelet sampling in minutes. Cannot be less than sampling frequency. Defaults to 5 minutes. |
upper_period_hours |
Upper period of the wavelet sampling in days. Defaults to 24 hours. |
pval |
Produce p-values or not. True or False. Default set to FALSE, see
|
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
plot_wavelet |
TRUE or FALSE. Plot the wavelet spectrum and mean power? |
max_period_ticks |
Number of ticks displayed on the period (y) axis in plots. |
plot_width |
Width of the wavelet spectrum plot output. Defaults to 800. |
plot_height |
Height of the wavelet spectrum plot output. Defaults to 400. |
interactive_mode |
Used for testing the package only. Defaults to TRUE. |
Details
Uses WaveletComp::analyze.wavelet() to create a univariate wavelet
power spectrum for the depth data imported, see
WaveletComp::analyze.wavelet() for more details. Plots mean wavelet power
using WaveletComp::wt.avg(). If you have errors allocating large vectors
try using library(bigmemory) and create a big matrix with
big_mat <- big.matrix(nrow = 1e7, ncol = 10, type = "double") then run
your code again. This allows greater range between lower and upper periods
Value
When output = TRUE, returns an object of class "analyze.wavelet" from package 'WaveletComp'. Additionally outputs a plot of the wavelet spectrum, and a plot of the mean power per period.
Examples
# Set file path
filepath <- system.file("extdata", package = "FishDiveR")
# Load archive_days
archive_days <- readRDS(file.path(filepath, "data/archive_days.rds"))
# Run create_wavelet function
my.w <- create_wavelet(
archive = archive_days,
tag_ID = "data",
wv_period_hours = 24,
sampling_frequency = NULL,
allow_irregular_sampling = FALSE,
load_existing_wavelet = FALSE,
suboctaves = 12,
lower_period_mins = 30,
upper_period_hours = 24,
pval = FALSE,
output = TRUE,
output_folder = tempdir(),
verbose = TRUE,
plot_wavelet = FALSE,
max_period_ticks = 10,
plot_width = 800,
plot_height = 400,
interactive_mode = FALSE
)
create_wavelet_stats
Description
create_wavelet_stats aggregates the wavelet variables over the specified
time periods
Usage
create_wavelet_stats(
wavelet,
tag_ID,
output = FALSE,
output_folder = NULL,
verbose = FALSE
)
Arguments
wavelet |
An object of class "analyze.wavelet" from package 'WaveletComp' |
tag_ID |
Unique tag identification number in a vector of characters. E.g. "123456" |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
Value
A data frame containing the seven wavelet statistics for each period. One observation is available per period per day:
Amplitude_mean
Amplitude_variance
Mean_sq_power
Power_mean
Power_variance
Phase_mean
Phase_variance
Examples
# Set file path
filepath <- system.file("extdata", package = "FishDiveR")
# Load my.w wavelet object
my.w <- readRDS(file.path(filepath, "data/1_Wavelets/data_wavelet.rds"))
# Run create_wavelet_stats function on wavelet object
waveStats <- create_wavelet_stats(
wavelet = my.w,
tag_ID = "data",
output = TRUE,
output_folder = tempdir(),
verbose = TRUE
)
Load time-depth series data from csv file
Description
import_tag_data processes the time-series depth data of marine animal tags.
Data to import should be a csv file with a 'date_time' column and a depth
column. Data is cropped by deployment and release times.
Usage
import_tag_data(
tag_ID,
tag_deploy_UTC,
tag_release_UTC,
archive,
date_time_col = 1,
depth_col = 2,
temp_col = NA,
time_zone,
output = FALSE,
output_folder = NULL,
verbose = FALSE
)
Arguments
tag_ID |
Unique tag identification number in a vector of characters. E.g. "123456" |
tag_deploy_UTC |
UTC deployment time in the allowed |
tag_release_UTC |
UTC release time in the allowed |
archive |
File path of the time-series depth archive. E.g. ("C:/Tag data/123456/123456-Archive.csv") |
date_time_col |
Column number of the date time series |
depth_col |
Column number of the depth series |
temp_col |
(Optional) Column number of temperature series |
time_zone |
Time zone of the data. E.g. "Asia/Tokyo" |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
Details
Data are cropped to full days from midnight to midnight in local time based on
the time zone supplied. If output = TRUE, the cropped data are saved as
archive_days.rds within output_folder.
Value
A data frame of processed tag data. Columns kept are:
'date' a POSIXct date_time object in format "yyyy-mm-dd hh:mm:ss"
'depth' numerical depth data
'temp' numerical temperature data
'date_only' an as.Date version of the 'date' column
An attribute 'time_zone' is added to the date frame containing the time zone of the 'date'
Examples
# Set file path
filepath <- system.file("extdata", package = "FishDiveR")
# Run import_tag_data function on tag archive csv file
archive_days <- import_tag_data(
tag_ID = "data",
tag_deploy_UTC = "2000-01-01 00:00:00",
tag_release_UTC = "2000-01-11 23:59:00",
archive = file.path(filepath, "data/data-Archive.csv"),
date_time_col = 1,
depth_col = 2,
temp_col = NA,
time_zone = "Asia/Tokyo",
output = TRUE,
output_folder = tempdir(),
verbose = TRUE
)
Perform k-means
Description
k_clustering performs k-means clustering on the PC scores with the selected
value of k
Usage
k_clustering(
kmeans_data,
standardise = TRUE,
k,
nstart = 50,
polygon = FALSE,
output = TRUE,
output_folder = NULL,
verbose = FALSE
)
Arguments
kmeans_data |
Data frame containing the combined PC scores and depth statistics to perform k-means on. Output from the 'combine_data()' function. |
standardise |
TRUE or FALSE. Whether or not to standardise the data. Defaults to TRUE. |
k |
Numerical. Value of k to use for analysis. |
nstart |
Numerical. Value of nstart for k-means analysis. |
polygon |
TRUE or FALSE. Plot polygons for cluster with more than 3 data points. Defaults to FALSE. |
output |
TRUE or FALSE. Whether or not to output the results. Defaults to TRUE. |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
Details
This function relies on random initialisation in k-means clustering.
For reproducible results, users may wish to set a random seed
prior to calling this function using set.seed().
Value
An object of class 'kmeans' containing the k-means clustering data for the data frame. Additionally plots a 3D cluster plot of the top three Principal Components.
Examples
# Set file path
filepath <- system.file("extdata", package = "FishDiveR")
# Load kmeans_data
kmeans_data <- readRDS(file.path(filepath, "data/5_k-means/combined_stats.rds"))
# Full example using the complete dataset.
# Set output to TRUE for real use!
kmeans_result <- k_clustering(
kmeans_data = kmeans_data,
standardise = TRUE,
k = 4,
nstart = 50,
polygon = FALSE,
output = FALSE,
output_folder = tempdir(),
verbose = TRUE
)
Prepare all data for Principal Component Analysis
Description
pca_data loads the wavelet statistics for each of the tags listed in
'tag_vector'. Performs various checks to ensure compatibility of wavelets,
and combines them into a data frame containing only the chosen statistics.
Usage
pca_data(
tag_vector,
data_folder = data_dir,
phase_mean = FALSE,
phase_variance = FALSE,
power_mean = TRUE,
power_variance = TRUE,
mean_sq_power = FALSE,
amplitude_mean = TRUE,
amplitude_variance = FALSE,
output = FALSE,
output_folder = NULL,
verbose = FALSE
)
Arguments
tag_vector |
A character vector of tag IDs. E.g. 'c("123456", "456283", "AB98XJ"). |
data_folder |
Parent folder path with separate folders for each tag data. E.g. "C:/Tag data". Defaults to 'data_dir' |
phase_mean |
TRUE or FALSE to include this wavelet statistic. Default FALSE |
phase_variance |
TRUE or FALSE to include this wavelet statistic. Default FALSE |
power_mean |
TRUE or FALSE to include this wavelet statistic. Default TRUE |
power_variance |
TRUE or FALSE to include this wavelet statistic. Default TRUE |
mean_sq_power |
TRUE or FALSE to include this wavelet statistic. Default FALSE |
amplitude_mean |
TRUE or FALSE to include this wavelet statistic. Default TRUE |
amplitude_variance |
TRUE or FALSE to include this wavelet statistic. Default FALSE |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
Value
A data frame with the combined data for all tag ID's listed, containing the wavelet statistics to be used in Principal Component Analysis.
Examples
# Set file path
filepath <- system.file("extdata", package = "FishDiveR")
# Run pca_data function
pc_data <- pca_data(
tag_vector = c("data"),
data_folder = filepath,
phase_mean = FALSE,
phase_variance = FALSE,
power_mean = TRUE,
power_variance = TRUE,
mean_sq_power = FALSE,
amplitude_mean = TRUE,
amplitude_variance = FALSE,
output = TRUE,
output_folder = tempdir(),
verbose = TRUE
)
Perform Principal Component Analysis
Description
pca_results performs Principal Component Analysis on the pc_data data frame
containing statistics from wavelet analysis
Usage
pca_results(
pc_data,
standardise = TRUE,
No_pcs = NULL,
PCV = NULL,
plot_eigenvalues = TRUE,
output = FALSE,
output_folder = NULL,
verbose = FALSE,
interactive_mode = TRUE
)
Arguments
pc_data |
Data frame containing the output of the pca_data() function. |
standardise |
TRUE or FALSE. Whether or not to standardise the data. Default TRUE. |
No_pcs |
Numerical. Number of principal components to retain. Null by default |
PCV |
Numerical. Percentage of cumulative variance to retain. Null by default |
plot_eigenvalues |
TRUE or FALSE. Plot PC eigenvalues and general loadings. Default TRUE. |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
interactive_mode |
TRUE or FALSE. Used for testing the package. Default FALSE. |
Value
A PCA object from 'FactoMineR' package containing the output of the Principal Component Analysis.
Examples
# Set file path
filepath <- system.file("extdata", package = "FishDiveR")
# Load pc_data
pc_data <- readRDS(file.path(filepath, "data/4_PCA/pc_data.rds"))
# Run a minimal, fast pca_results example
pc_results <- pca_results(
pc_data = pc_data,
standardise = TRUE,
No_pcs = 1,
PCV = NULL,
plot_eigenvalues = FALSE,
output = TRUE,
output_folder = tempdir(),
verbose = TRUE,
interactive_mode = FALSE
)
# Full example using the complete dataset
# Run pca_results function
pc_results <- pca_results(
pc_data = pc_data,
standardise = TRUE,
No_pcs = 3,
PCV = NULL,
plot_eigenvalues = TRUE,
output = TRUE,
output_folder = tempdir(),
verbose = TRUE,
interactive_mode = FALSE
)
Calculate Principal Component Analysis Scores not including depth statistics
Description
This function extracts the PCA scores from the PCA results and plots the
loadings. This function is to be use on output from the pca_data() function
not including depth statistics.
Usage
pca_scores(
pc_results = results,
plot_loadings = TRUE,
every_nth = 12,
output = FALSE,
output_folder = NULL,
verbose = FALSE
)
Arguments
pc_results |
PCA class object containing the output from the 'pca_results()' function. |
plot_loadings |
TRUE or FALSE. Plot PC loadings figures. Default TRUE. |
every_nth |
Numeric. Sequence of labels to show on mean power plot. Default is 12. |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
Value
A data frame of pc scores containing one column for each Principal Component kept. If processing just one tag, the attribute 'unique_tag_ID' is given to the data frame with the tag_ID. Plots the PC loadings for each row of pc_data
Examples
# Set file path
filepath <- system.file("extdata", package = "FishDiveR")
# Load pc_results
pc_results <- readRDS(file.path(filepath, "data/4_PCA/pc_results.rds"))
# Run pca_scores function
pc_scores <- pca_scores(
pc_results = pc_results,
plot_loadings = FALSE,
every_nth = 12,
output = TRUE,
output_folder = tempdir(),
verbose = TRUE
)
Plot the time-series depth dataset
Description
This function plots the time-series depth data from the imported tag.
Usage
plot_TDR(
rds_file,
data_folder = NULL,
every_nth = 20,
every_s = 0,
plot_size = c(12, 6),
X_lim = NULL,
Y_lim = c(0, 1500, 100),
date_breaks = "14 day",
dpi = 300,
output = FALSE,
output_folder = NULL,
verbose = FALSE
)
Arguments
rds_file |
Character vector file path of rds file. E.g. ("E:/data/archive_days.rds") |
data_folder |
Parent folder path with separate folders for each tag data. E.g. "C:/Tag data". Defaults to 'data_dir' |
every_nth |
Numerical. Optional down-sampling of data points to plot. Defaults to 10, plotting every 10th record. |
every_s |
Numerical. Alternative to every_nth. Optional down-sampling of data points to plot by number of seconds, as opposed to records. E.g. plots every 60th second, rather than 10th row of data. Must be a multiple of the sampling frequency. Overrides every_nth if != 0. |
plot_size |
ggSave height and width for saving the output plot. Must be numeric, positive and 2 elements long. Default to 'c(12,6)' |
X_lim |
Optional. Vector with two dates delimiting the time-depth record to plot. E.g. c("2000-01-01", "2000-11-23") |
Y_lim |
Character vector with minimum depth, maximum depth, and sequence for ticks on Y-axis. Must be numeric, positive and 3 elements long. E.g. c(0,1500,100). |
date_breaks |
X-axis ggplot2 date breaks. E.g, "24 hour, "3 day", "2 week". |
dpi |
Numerical. DPI to use for 'ggsave()' output. E.g, 600 |
output |
Logical. If TRUE, a plot file is saved to |
output_folder |
Output folder path used when |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
Value
A data frame of plot data
Examples
# Set file path
filepath <- system.file("extdata", package = "FishDiveR")
# Run plot_TDR function
TDR_plot <- plot_TDR(
rds_file = "data/archive_days.rds",
data_folder = filepath,
every_nth = 10,
every_s = 0,
plot_size = c(12, 6),
X_lim = NULL,
Y_lim = c(0, 300, 50),
date_breaks = "24 hour",
dpi = 100,
output = TRUE,
output_folder = tempdir(),
verbose = TRUE
)
Plot the time-series depth records of the selected tag. Colour days by cluster
Description
plot_cluster_TDR plots the time-series depth record of the selected
archival tag. Each day of data is coloured by the assigned cluster, this
helps to visualise changes in vertical movement behaviour over time.
Usage
plot_cluster_TDR(
tag_ID,
data_folder = NULL,
kmeans_result,
every_nth = 10,
every_s = 0,
X_lim = NULL,
Y_lim = c(0, 250, 50),
date_breaks = "14 day",
legend = TRUE,
plot_size = c(12, 6),
dpi = 300,
output = FALSE,
output_folder = NULL,
verbose = FALSE
)
Arguments
tag_ID |
Unique tag identification number in a vector of characters. E.g. "123456". |
data_folder |
Parent folder path with separate folders for each tag data. E.g. "C:/Tag data". Defaults to 'data_dir' |
kmeans_result |
An object of class 'kmeans' containing the k-means clustering data. Output of 'k_clustering()' function. |
every_nth |
Numerical. Optional down-sampling of data points to plot. Defaults to 10, plotting every 10th record. |
every_s |
Numerical. Alternative to every_nth. Optional down-sampling of data points to plot by number of seconds, as opposed to records. E.g. plots every 60th second, rather than 10th row of data. Must be a multiple of the sampling frequency. Overrides every_nth if != 0. |
X_lim |
Optional. Vector with two dates delimiting the time-depth record to plot. E.g. c("2000-01-01", "2000-11-23") |
Y_lim |
Character vector with minimum depth, maximum depth, and sequence for ticks on Y-axis. Must be numeric, positive and 3 elements long. E.g. c(0,1500,100). |
date_breaks |
X-axis ggplot2 date breaks. E.g, "24 hour, "3 day", "2 week". |
legend |
TRUE or FALSE. Whether or not to plot the figure legend. Defaults to TRUE. |
plot_size |
ggSave height and width for saving the output plot. Must be numeric, positive and 2 elements long. Default to 'c(12,6)' |
dpi |
Numerical. DPI to use for 'ggsave()' output. E.g, 600 |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
Value
Returns the cluster TDR plot. Additionally prints to file the TDR plot. Additionally outputs a facet plot of all tag_IDs.
Examples
# Set file path
filepath <- system.file("extdata", package = "FishDiveR")
# Load kmeans_result
kmeans_result <- readRDS(file.path(filepath, "data/5_k-means/kmeans_result.rds"))
# Run plot_clusters function
plot_cluster_TDR(
tag_ID = "data",
data_folder = filepath,
kmeans_result = kmeans_result,
every_nth = 10,
every_s = 0,
X_lim = NULL,
Y_lim = c(0, 300, 50),
date_breaks = "1 day",
legend = TRUE,
plot_size = c(12, 6),
dpi = 100,
output = TRUE,
output_folder = tempdir(),
verbose = TRUE
)
Plot the time-series depth records of the days closest to the centre of each cluster
Description
plot_clusters plots the time-depth records of the days closest to the
centre of each of the clusters. Each cluster is plotted both individually,
and faceted together, with both a fixed y-axis and a free y-axis (depth).
Usage
plot_clusters(
tag_vector = tag_list,
data_folder = NULL,
kmeans_result,
No_days = 1,
every_nth = 10,
every_s = 0,
Y_lim = c(0, 250, 50),
color = TRUE,
diel_shade = FALSE,
dpi = 300,
output = FALSE,
output_folder = NULL,
verbose = FALSE
)
Arguments
tag_vector |
A character vector of tag IDs. E.g. 'c("123456", "456283", "AB98XJ"). |
data_folder |
Parent folder path with separate folders for each tag data. E.g. "C:/Tag data". Defaults to 'data_dir' |
kmeans_result |
An object of class 'kmeans' containing the k-means clustering data. Output of 'k_clustering()' function. |
No_days |
Numerical. Number of days of each cluster to plot. Defaults to 1. |
every_nth |
Numerical. Optional down-sampling of data points to plot. Defaults to 10, plotting every 10th record. |
every_s |
Numerical. Alternative to every_nth. Optional down-sampling of data points to plot by number of seconds, as opposed to records. E.g. plots every 60th second, rather than 10th row of data. Must be a multiple of the sampling frequency. Overrides every_nth if != 0. |
Y_lim |
Character vector with minimum depth, maximum depth, and sequence for ticks on Y-axis. Must be numeric, positive and 3 elements long. E.g. c(0,1500,100). |
color |
TRUE or FALSE. Output clusters coloured by cluster assignment. Defaults to TRUE. |
diel_shade |
TRUE or FALSE. Output plot with night-time shading. Can be slow! Defaults to FALSE. |
dpi |
Numerical. DPI to use for 'ggsave()' output. E.g, 600 |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
Value
A plot list of all plots created of each cluster in the data. When output == TRUE this prints to file one figure for each Cluster with a fixed y-axis. Additionally outputs a facet plot of all clusters, and a free y-axis version of all plots.
Examples
# Set file path
filepath <- system.file("extdata", package = "FishDiveR")
# Load kmeans_result
kmeans_result <- readRDS(file.path(filepath, "data/5_k-means/kmeans_result.rds"))
# Run plot_clusters function
plot_clusters(
tag_vector = "data",
data_folder = filepath,
kmeans_result = kmeans_result,
No_days = 1,
every_nth = 10,
every_s = 0,
Y_lim = c(0, 300, 50),
color = TRUE,
diel_shade = FALSE,
dpi = 100,
output = TRUE,
output_folder = tempdir(),
verbose = TRUE
)
Perform k selection
Description
select_k creates the elbow plot and silhouette width plot for assistance
with selection of k
Usage
select_k(
kmeans_data,
standardise = TRUE,
Max.k = 15,
v_line = NULL,
calc_gap = FALSE,
plot_gap = FALSE,
output = FALSE,
output_folder = NULL,
verbose = FALSE
)
Arguments
kmeans_data |
Data frame containing the combined PC scores and depth statistics to perform k-means on. Output from the 'combine_data()' function. |
standardise |
TRUE or FALSE. Whether or not to standardise the data. Defaults to TRUE. |
Max.k |
Numerical. Maximum value of k to try. Defaults to 15. |
v_line |
Numerical. Option to add a vertical line to plot at a specific value of k. Defaults to NULL. |
calc_gap |
TRUE or FALSE. Whether or not to calculate the gap statistic. Defaults to FALSE |
plot_gap |
TRUE or FALSE. Whether or not to plot the gap statistic. Defaults to FALSE. |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
Details
This function relies on random initialisation in k-means clustering.
For reproducible results, users may wish to set a random seed
prior to calling this function using set.seed().
Value
A 'ggplot' class object and creates a figure containing both the within-cluster sum of squares plot (elbow) and the average silhouette width plot for 1 to 'Max.k' clusters.
Examples
# Set file path
filepath <- system.file("extdata", package = "FishDiveR")
# Load kmeans_data
kmeans_data <- readRDS(file.path(filepath, "data/5_k-means/combined_stats.rds"))
# Run select_k function
selecting_k <- select_k(
kmeans_data = kmeans_data,
standardise = TRUE,
Max.k = 8,
v_line = 4,
calc_gap = FALSE,
plot_gap = FALSE,
output = TRUE,
output_folder = tempdir(),
verbose = TRUE
)