| Title: | Classify Aquatic Animal Behaviours from Vertical Movement Data |
|---|---|
| Description: | Quantitatively analyse depth time-series data from pop-up satellite archival tags (PSATs) through the application of continuous wavelet transformation (CWT) combined with Principal Component Analysis (PCA), and k-means clustering. Import, crop, and plot depth time-depth records (TDRs). Using CWT to detect important signals within the non-stationary data, we create daily wavelet statistics to summarise vertical movements on different wavelet periods and combine with daily and diel depth statistics. Classify depth time-series with unsupervised k-means clustering into 24-hour periods of vertical movement behaviour with distinct patterns of vertical movement. Plot example days from each behaviour cluster, and plot the TDR coloured by cluster. Based on principals of combining CWT with k-means first developed by Sakamoto (2009) <doi:10.1371/journal.pone.0005379> and redeveloped by Beale (2026) <doi:10.21203/rs.3.rs-6907076/v1>. |
| Authors: | Calvin Beale [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-3505-8929>) |
| Maintainer: | Calvin Beale <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.1.0 |
| Built: | 2026-05-19 08:52:14 UTC |
| Source: | https://github.com/calvinsbeale/fishdiver |
This function imports the depth statistics from each of the tags listed in tag_vector, and outputs a combined data frame then combines the depth statistics from each tag with the principal component scores, and outputs a data frame with the appropriate unique_tag_ID if necessary, ready for use in k-means clustering.
combine_data( tag_vector = tag_list, data_folder = NULL, pc_scores = scores, output = FALSE, output_folder = NULL, verbose = FALSE )combine_data( tag_vector = tag_list, data_folder = NULL, pc_scores = scores, output = FALSE, output_folder = NULL, verbose = FALSE )
tag_vector |
A character vector of tag IDs. E.g. 'c("123456", "456283", "AB98XJ"). |
data_folder |
Parent folder path with separate folders for each tag data. E.g. "C:/Tag data". Defaults to 'data_dir' |
pc_scores |
Data frame of principal component scores extracted through PCA on wavelet statistics. Output of 'pca_scores()' function. |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
A data frame containing the combined depth statistics and principal component scores from each of the tags listed in tag_vector
# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load pc_results pc_scores <- readRDS(file.path(filepath, "data/4_PCA/pc_scores.rds")) # Run combine_data function combined_stats <- combine_data( tag_vector = "data", data_folder = filepath, pc_scores = pc_scores, output = TRUE, output_folder = tempdir(), verbose = TRUE )# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load pc_results pc_scores <- readRDS(file.path(filepath, "data/4_PCA/pc_scores.rds")) # Run combine_data function combined_stats <- combine_data( tag_vector = "data", data_folder = filepath, pc_scores = pc_scores, output = TRUE, output_folder = tempdir(), verbose = TRUE )
create_depth_stats creates the various daily and diel depth statistics
for each day
create_depth_stats( archive, tag_ID, diel = FALSE, sunrise_time = NULL, sunset_time = NULL, GPS = FALSE, sunset_type = "civil", output = FALSE, output_folder = NULL, verbose = FALSE )create_depth_stats( archive, tag_ID, diel = FALSE, sunrise_time = NULL, sunset_time = NULL, GPS = FALSE, sunset_type = "civil", output = FALSE, output_folder = NULL, verbose = FALSE )
archive |
Data frame containing processed time series depth data |
tag_ID |
Unique tag identification number in a vector of characters. E.g. "123456" |
diel |
Include diel statistics when TRUE |
sunrise_time |
Sunrise time (local time zone) in 24-hour clock. E.g. "05:45:00" |
sunset_time |
Sunset time (local time zone) in 24-hour clock. E.g. "18:30:00" |
GPS |
Either FALSE or the location of the GPS file containing columns 'date', 'lat' (latitude) and 'lon' (longitude) if one exists. 'date' columns must be in a format readable by lubridate::dmy() |
sunset_type |
Choose which type of sunset to include 'NULL', 'civil', 'nautical', or 'astronomical' |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
A set of statistics calculated daily for the depth data. If diel is 'TRUE', additional diel statistics will be returned. An attribute 'diel' with value 'TRUE' is given when diel statistics are included.
# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load archive_days archive_days <- readRDS(file.path(filepath, "data/archive_days.rds")) # Run create_depth_stats function depthStats <- create_depth_stats( archive = archive_days, tag_ID = "data", diel = TRUE, sunrise_time = "06:00:00", sunset_time = "18:00:00", GPS = file.path(filepath, "data/GPS.csv"), sunset_type = "civil", output = TRUE, output_folder = tempdir(), verbose = TRUE )# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load archive_days archive_days <- readRDS(file.path(filepath, "data/archive_days.rds")) # Run create_depth_stats function depthStats <- create_depth_stats( archive = archive_days, tag_ID = "data", diel = TRUE, sunrise_time = "06:00:00", sunset_time = "18:00:00", GPS = file.path(filepath, "data/GPS.csv"), sunset_type = "civil", output = TRUE, output_folder = tempdir(), verbose = TRUE )
create_wavelet creates the a wavelet spectrum using WaveletComp package.
Optionally loads and plots an existing my.w object.
create_wavelet( archive, tag_ID, wv_period_hours = 24, sampling_frequency = NULL, load_existing_wavelet = FALSE, suboctaves = 12, lower_period_mins = 5, upper_period_hours = 24, pval = FALSE, output = FALSE, output_folder = NULL, verbose = FALSE, plot_wavelet = TRUE, max_period_ticks = 10, plot_width = 800, plot_height = 400, interactive_mode = TRUE )create_wavelet( archive, tag_ID, wv_period_hours = 24, sampling_frequency = NULL, load_existing_wavelet = FALSE, suboctaves = 12, lower_period_mins = 5, upper_period_hours = 24, pval = FALSE, output = FALSE, output_folder = NULL, verbose = FALSE, plot_wavelet = TRUE, max_period_ticks = 10, plot_width = 800, plot_height = 400, interactive_mode = TRUE )
archive |
Data frame containing processed time series depth data |
tag_ID |
Unique tag identification number in a vector of characters. E.g. "123456" |
wv_period_hours |
Time resolution in hours to calculate wavelet. Currently only supports the default of 24 hours as this package is created to investigate daily diving behaviour. Defaults to 24. |
sampling_frequency |
Sampling frequency of depth data in seconds. Defaults to time between first and second depth record. Recommended to leave blank. |
load_existing_wavelet |
Load an existing my.w wavelet object from the output_folder. Defaults to FALSE. |
suboctaves |
number of suboctaves between each logarithmic period. E.g. between 24 and 12 hours. Highly recommended to use 12, for easy of interpretation of hours and signal present (daily, diel, tidal). |
lower_period_mins |
Lower period of the wavelet sampling in minutes. Cannot be less than sampling frequency. Defaults to 5 minutes. |
upper_period_hours |
Upper period of the wavelet sampling in days. Defaults to 24 hours. |
pval |
Produce p-values or not. True or False. Default set to FALSE, see
|
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
plot_wavelet |
TRUE or FALSE. Plot the wavelet spectrum and mean power? |
max_period_ticks |
Number of ticks displayed on the period (y) axis in plots. |
plot_width |
Width of the wavelet spectrum plot output. Defaults to 800. |
plot_height |
Height of the wavelet spectrum plot output. Defaults to 400. |
interactive_mode |
Used for testing the package only. Defaults to TRUE. |
Uses WaveletComp::analyze.wavelet() to create a univariate wavelet
power spectrum for the depth data imported, see
WaveletComp::analyze.wavelet() for more details. Plots mean wavelet power
using WaveletComp::wt.avg(). If you have errors allocating large vectors
try using library(bigmemory) and create a big matrix with
big_mat <- big.matrix(nrow = 1e7, ncol = 10, type = "double") then run
your code again. This allows greater range between lower and upper periods
When output = TRUE, returns an object of class "analyze.wavelet" from package 'WaveletComp'. Additionally outputs a plot of the wavelet spectrum, and a plot of the mean power per period.
# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load archive_days archive_days <- readRDS(file.path(filepath, "data/archive_days.rds")) # Run create_wavelet function my.w <- create_wavelet( archive = archive_days, tag_ID = "data", wv_period_hours = 24, sampling_frequency = NULL, load_existing_wavelet = FALSE, suboctaves = 12, lower_period_mins = 30, upper_period_hours = 24, pval = FALSE, output = TRUE, output_folder = tempdir(), verbose = TRUE, plot_wavelet = FALSE, max_period_ticks = 10, plot_width = 800, plot_height = 400, interactive_mode = FALSE )# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load archive_days archive_days <- readRDS(file.path(filepath, "data/archive_days.rds")) # Run create_wavelet function my.w <- create_wavelet( archive = archive_days, tag_ID = "data", wv_period_hours = 24, sampling_frequency = NULL, load_existing_wavelet = FALSE, suboctaves = 12, lower_period_mins = 30, upper_period_hours = 24, pval = FALSE, output = TRUE, output_folder = tempdir(), verbose = TRUE, plot_wavelet = FALSE, max_period_ticks = 10, plot_width = 800, plot_height = 400, interactive_mode = FALSE )
create_wavelet_stats aggregates the wavelet variables over the specified
time periods
create_wavelet_stats( wavelet, tag_ID, output = FALSE, output_folder = NULL, verbose = FALSE )create_wavelet_stats( wavelet, tag_ID, output = FALSE, output_folder = NULL, verbose = FALSE )
wavelet |
An object of class "analyze.wavelet" from package 'WaveletComp' |
tag_ID |
Unique tag identification number in a vector of characters. E.g. "123456" |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
A data frame containing the seven wavelet statistics for each period. One observation is available per period per day:
Amplitude_mean
Amplitude_variance
Mean_sq_power
Power_mean
Power_variance
Phase_mean
Phase_variance
# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load my.w wavelet object my.w <- readRDS(file.path(filepath, "data/1_Wavelets/data_wavelet.rds")) # Run create_wavelet_stats function on wavelet object waveStats <- create_wavelet_stats( wavelet = my.w, tag_ID = "data", output = TRUE, output_folder = tempdir(), verbose = TRUE )# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load my.w wavelet object my.w <- readRDS(file.path(filepath, "data/1_Wavelets/data_wavelet.rds")) # Run create_wavelet_stats function on wavelet object waveStats <- create_wavelet_stats( wavelet = my.w, tag_ID = "data", output = TRUE, output_folder = tempdir(), verbose = TRUE )
import_tag_data processes the time-series depth data of marine animal tags.
Data to import should be a csv file with a 'date_time' column and a depth
column. Data is cropped by deployment and release times.
import_tag_data( tag_ID, tag_deploy_UTC, tag_release_UTC, archive, date_time_col = 1, depth_col = 2, temp_col = NA, time_zone, output = FALSE, output_folder = NULL, verbose = FALSE )import_tag_data( tag_ID, tag_deploy_UTC, tag_release_UTC, archive, date_time_col = 1, depth_col = 2, temp_col = NA, time_zone, output = FALSE, output_folder = NULL, verbose = FALSE )
tag_ID |
Unique tag identification number in a vector of characters. E.g. "123456" |
tag_deploy_UTC |
UTC deployment time in the allowed |
tag_release_UTC |
UTC release time in the allowed |
archive |
File path of the time-series depth archive. E.g. ("C:/Tag data/123456/123456-Archive.csv") |
date_time_col |
Column number of the date time series |
depth_col |
Column number of the depth series |
temp_col |
(Optional) Column number of temperature series |
time_zone |
Time zone of the data. E.g. "Asia/Tokyo" |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
Data are cropped to full days from midnight to midnight in local time based on
the time zone supplied. If output = TRUE, the cropped data are saved as
archive_days.rds within output_folder.
A data frame of processed tag data. Columns kept are:
'date' a POSIXct date_time object in format "yyyy-mm-dd hh:mm:ss"
'depth' numerical depth data
'temp' numerical temperature data
'date_only' an as.Date version of the 'date' column
An attribute 'time_zone' is added to the date frame containing the time zone of the 'date'
# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Run import_tag_data function on tag archive csv file archive_days <- import_tag_data( tag_ID = "data", tag_deploy_UTC = "2000-01-01 00:00:00", tag_release_UTC = "2000-01-11 23:59:00", archive = file.path(filepath, "data/data-Archive.csv"), date_time_col = 1, depth_col = 2, temp_col = NA, time_zone = "Asia/Tokyo", output = TRUE, output_folder = tempdir(), verbose = TRUE )# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Run import_tag_data function on tag archive csv file archive_days <- import_tag_data( tag_ID = "data", tag_deploy_UTC = "2000-01-01 00:00:00", tag_release_UTC = "2000-01-11 23:59:00", archive = file.path(filepath, "data/data-Archive.csv"), date_time_col = 1, depth_col = 2, temp_col = NA, time_zone = "Asia/Tokyo", output = TRUE, output_folder = tempdir(), verbose = TRUE )
k_clustering performs k-means clustering on the PC scores with the selected
value of k
k_clustering( kmeans_data, standardise = TRUE, k, nstart = 50, polygon = FALSE, output = TRUE, output_folder = NULL, verbose = FALSE )k_clustering( kmeans_data, standardise = TRUE, k, nstart = 50, polygon = FALSE, output = TRUE, output_folder = NULL, verbose = FALSE )
kmeans_data |
Data frame containing the combined PC scores and depth statistics to perform k-means on. Output from the 'combine_data()' function. |
standardise |
TRUE or FALSE. Whether or not to standardise the data. Defaults to TRUE. |
k |
Numerical. Value of k to use for analysis. |
nstart |
Numerical. Value of nstart for k-means analysis. |
polygon |
TRUE or FALSE. Plot polygons for cluster with more than 3 data points. Defaults to FALSE. |
output |
TRUE or FALSE. Whether or not to output the results. Defaults to TRUE. |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
This function relies on random initialisation in k-means clustering.
For reproducible results, users may wish to set a random seed
prior to calling this function using set.seed().
An object of class 'kmeans' containing the k-means clustering data for the data frame. Additionally plots a 3D cluster plot of the top three Principal Components.
# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load kmeans_data kmeans_data <- readRDS(file.path(filepath, "data/5_k-means/combined_stats.rds")) # Full example using the complete dataset. # Set output to TRUE for real use! kmeans_result <- k_clustering( kmeans_data = kmeans_data, standardise = TRUE, k = 4, nstart = 50, polygon = FALSE, output = FALSE, output_folder = tempdir(), verbose = TRUE )# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load kmeans_data kmeans_data <- readRDS(file.path(filepath, "data/5_k-means/combined_stats.rds")) # Full example using the complete dataset. # Set output to TRUE for real use! kmeans_result <- k_clustering( kmeans_data = kmeans_data, standardise = TRUE, k = 4, nstart = 50, polygon = FALSE, output = FALSE, output_folder = tempdir(), verbose = TRUE )
pca_data loads the wavelet statistics for each of the tags listed in
'tag_vector'. Performs various checks to ensure compatibility of wavelets,
and combines them into a data frame containing only the chosen statistics.
pca_data( tag_vector, data_folder = data_dir, phase_mean = FALSE, phase_variance = FALSE, power_mean = TRUE, power_variance = TRUE, mean_sq_power = FALSE, amplitude_mean = TRUE, amplitude_variance = FALSE, output = FALSE, output_folder = NULL, verbose = FALSE )pca_data( tag_vector, data_folder = data_dir, phase_mean = FALSE, phase_variance = FALSE, power_mean = TRUE, power_variance = TRUE, mean_sq_power = FALSE, amplitude_mean = TRUE, amplitude_variance = FALSE, output = FALSE, output_folder = NULL, verbose = FALSE )
tag_vector |
A character vector of tag IDs. E.g. 'c("123456", "456283", "AB98XJ"). |
data_folder |
Parent folder path with separate folders for each tag data. E.g. "C:/Tag data". Defaults to 'data_dir' |
phase_mean |
TRUE or FALSE to include this wavelet statistic. Default FALSE |
phase_variance |
TRUE or FALSE to include this wavelet statistic. Default FALSE |
power_mean |
TRUE or FALSE to include this wavelet statistic. Default TRUE |
power_variance |
TRUE or FALSE to include this wavelet statistic. Default TRUE |
mean_sq_power |
TRUE or FALSE to include this wavelet statistic. Default FALSE |
amplitude_mean |
TRUE or FALSE to include this wavelet statistic. Default TRUE |
amplitude_variance |
TRUE or FALSE to include this wavelet statistic. Default FALSE |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
A data frame with the combined data for all tag ID's listed, containing the wavelet statistics to be used in Principal Component Analysis.
# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Run pca_data function pc_data <- pca_data( tag_vector = c("data"), data_folder = filepath, phase_mean = FALSE, phase_variance = FALSE, power_mean = TRUE, power_variance = TRUE, mean_sq_power = FALSE, amplitude_mean = TRUE, amplitude_variance = FALSE, output = TRUE, output_folder = tempdir(), verbose = TRUE )# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Run pca_data function pc_data <- pca_data( tag_vector = c("data"), data_folder = filepath, phase_mean = FALSE, phase_variance = FALSE, power_mean = TRUE, power_variance = TRUE, mean_sq_power = FALSE, amplitude_mean = TRUE, amplitude_variance = FALSE, output = TRUE, output_folder = tempdir(), verbose = TRUE )
pca_results performs Principal Component Analysis on the pc_data data frame
containing statistics from wavelet analysis
pca_results( pc_data, standardise = TRUE, No_pcs = NULL, PCV = NULL, plot_eigenvalues = TRUE, output = FALSE, output_folder = NULL, verbose = FALSE, interactive_mode = TRUE )pca_results( pc_data, standardise = TRUE, No_pcs = NULL, PCV = NULL, plot_eigenvalues = TRUE, output = FALSE, output_folder = NULL, verbose = FALSE, interactive_mode = TRUE )
pc_data |
Data frame containing the output of the pca_data() function. |
standardise |
TRUE or FALSE. Whether or not to standardise the data. Default TRUE. |
No_pcs |
Numerical. Number of principal components to retain. Null by default |
PCV |
Numerical. Percentage of cumulative variance to retain. Null by default |
plot_eigenvalues |
TRUE or FALSE. Plot PC eigenvalues and general loadings. Default TRUE. |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
interactive_mode |
TRUE or FALSE. Used for testing the package. Default FALSE. |
A PCA object from 'FactoMineR' package containing the output of the Principal Component Analysis.
# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load pc_data pc_data <- readRDS(file.path(filepath, "data/4_PCA/pc_data.rds")) # Run a minimal, fast pca_results example pc_results <- pca_results( pc_data = pc_data, standardise = TRUE, No_pcs = 1, PCV = NULL, plot_eigenvalues = FALSE, output = TRUE, output_folder = tempdir(), verbose = TRUE, interactive_mode = FALSE ) # Full example using the complete dataset # Run pca_results function pc_results <- pca_results( pc_data = pc_data, standardise = TRUE, No_pcs = 3, PCV = NULL, plot_eigenvalues = TRUE, output = TRUE, output_folder = tempdir(), verbose = TRUE, interactive_mode = FALSE )# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load pc_data pc_data <- readRDS(file.path(filepath, "data/4_PCA/pc_data.rds")) # Run a minimal, fast pca_results example pc_results <- pca_results( pc_data = pc_data, standardise = TRUE, No_pcs = 1, PCV = NULL, plot_eigenvalues = FALSE, output = TRUE, output_folder = tempdir(), verbose = TRUE, interactive_mode = FALSE ) # Full example using the complete dataset # Run pca_results function pc_results <- pca_results( pc_data = pc_data, standardise = TRUE, No_pcs = 3, PCV = NULL, plot_eigenvalues = TRUE, output = TRUE, output_folder = tempdir(), verbose = TRUE, interactive_mode = FALSE )
This function extracts the PCA scores from the PCA results and plots the
loadings. This function is to be use on output from the pca_data() function
not including depth statistics.
pca_scores( pc_results = results, plot_loadings = TRUE, every_nth = 12, output = FALSE, output_folder = NULL, verbose = FALSE )pca_scores( pc_results = results, plot_loadings = TRUE, every_nth = 12, output = FALSE, output_folder = NULL, verbose = FALSE )
pc_results |
PCA class object containing the output from the 'pca_results()' function. |
plot_loadings |
TRUE or FALSE. Plot PC loadings figures. Default TRUE. |
every_nth |
Numeric. Sequence of labels to show on mean power plot. Default is 12. |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
A data frame of pc scores containing one column for each Principal Component kept. If processing just one tag, the attribute 'unique_tag_ID' is given to the data frame with the tag_ID. Plots the PC loadings for each row of pc_data
# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load pc_results pc_results <- readRDS(file.path(filepath, "data/4_PCA/pc_results.rds")) # Run pca_scores function pc_scores <- pca_scores( pc_results = pc_results, plot_loadings = FALSE, every_nth = 12, output = TRUE, output_folder = tempdir(), verbose = TRUE )# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load pc_results pc_results <- readRDS(file.path(filepath, "data/4_PCA/pc_results.rds")) # Run pca_scores function pc_scores <- pca_scores( pc_results = pc_results, plot_loadings = FALSE, every_nth = 12, output = TRUE, output_folder = tempdir(), verbose = TRUE )
plot_cluster_TDR plots the time-series depth record of the selected
archival tag. Each day of data is coloured by the assigned cluster, this
helps to visualise changes in vertical movement behaviour over time.
plot_cluster_TDR( tag_ID, data_folder = NULL, kmeans_result, every_nth = 10, every_s = 0, X_lim = NULL, Y_lim = c(0, 250, 50), date_breaks = "14 day", legend = TRUE, plot_size = c(12, 6), dpi = 300, output = FALSE, output_folder = NULL, verbose = FALSE )plot_cluster_TDR( tag_ID, data_folder = NULL, kmeans_result, every_nth = 10, every_s = 0, X_lim = NULL, Y_lim = c(0, 250, 50), date_breaks = "14 day", legend = TRUE, plot_size = c(12, 6), dpi = 300, output = FALSE, output_folder = NULL, verbose = FALSE )
tag_ID |
Unique tag identification number in a vector of characters. E.g. "123456". |
data_folder |
Parent folder path with separate folders for each tag data. E.g. "C:/Tag data". Defaults to 'data_dir' |
kmeans_result |
An object of class 'kmeans' containing the k-means clustering data. Output of 'k_clustering()' function. |
every_nth |
Numerical. Optional down-sampling of data points to plot. Defaults to 10, plotting every 10th record. |
every_s |
Numerical. Alternative to every_nth. Optional down-sampling of data points to plot by number of seconds, as opposed to records. E.g. plots every 60th second, rather than 10th row of data. Must be a multiple of the sampling frequency. Overrides every_nth if != 0. |
X_lim |
Optional. Vector with two dates delimiting the time-depth record to plot. E.g. c("2000-01-01", "2000-11-23") |
Y_lim |
Character vector with minimum depth, maximum depth, and sequence for ticks on Y-axis. Must be numeric, positive and 3 elements long. E.g. c(0,1500,100). |
date_breaks |
X-axis ggplot2 date breaks. E.g, "24 hour, "3 day", "2 week". |
legend |
TRUE or FALSE. Whether or not to plot the figure legend. Defaults to TRUE. |
plot_size |
ggSave height and width for saving the output plot. Must be numeric, positive and 2 elements long. Default to 'c(12,6)' |
dpi |
Numerical. DPI to use for 'ggsave()' output. E.g, 600 |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
Returns the cluster TDR plot. Additionally prints to file the TDR plot. Additionally outputs a facet plot of all tag_IDs.
# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load kmeans_result kmeans_result <- readRDS(file.path(filepath, "data/5_k-means/kmeans_result.rds")) # Run plot_clusters function plot_cluster_TDR( tag_ID = "data", data_folder = filepath, kmeans_result = kmeans_result, every_nth = 10, every_s = 0, X_lim = NULL, Y_lim = c(0, 300, 50), date_breaks = "1 day", legend = TRUE, plot_size = c(12, 6), dpi = 100, output = TRUE, output_folder = tempdir(), verbose = TRUE )# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load kmeans_result kmeans_result <- readRDS(file.path(filepath, "data/5_k-means/kmeans_result.rds")) # Run plot_clusters function plot_cluster_TDR( tag_ID = "data", data_folder = filepath, kmeans_result = kmeans_result, every_nth = 10, every_s = 0, X_lim = NULL, Y_lim = c(0, 300, 50), date_breaks = "1 day", legend = TRUE, plot_size = c(12, 6), dpi = 100, output = TRUE, output_folder = tempdir(), verbose = TRUE )
plot_clusters plots the time-depth records of the days closest to the
centre of each of the clusters. Each cluster is plotted both individually,
and faceted together, with both a fixed y-axis and a free y-axis (depth).
plot_clusters( tag_vector = tag_list, data_folder = NULL, kmeans_result, No_days = 1, every_nth = 10, every_s = 0, Y_lim = c(0, 250, 50), color = TRUE, diel_shade = FALSE, dpi = 300, output = FALSE, output_folder = NULL, verbose = FALSE )plot_clusters( tag_vector = tag_list, data_folder = NULL, kmeans_result, No_days = 1, every_nth = 10, every_s = 0, Y_lim = c(0, 250, 50), color = TRUE, diel_shade = FALSE, dpi = 300, output = FALSE, output_folder = NULL, verbose = FALSE )
tag_vector |
A character vector of tag IDs. E.g. 'c("123456", "456283", "AB98XJ"). |
data_folder |
Parent folder path with separate folders for each tag data. E.g. "C:/Tag data". Defaults to 'data_dir' |
kmeans_result |
An object of class 'kmeans' containing the k-means clustering data. Output of 'k_clustering()' function. |
No_days |
Numerical. Number of days of each cluster to plot. Defaults to 1. |
every_nth |
Numerical. Optional down-sampling of data points to plot. Defaults to 10, plotting every 10th record. |
every_s |
Numerical. Alternative to every_nth. Optional down-sampling of data points to plot by number of seconds, as opposed to records. E.g. plots every 60th second, rather than 10th row of data. Must be a multiple of the sampling frequency. Overrides every_nth if != 0. |
Y_lim |
Character vector with minimum depth, maximum depth, and sequence for ticks on Y-axis. Must be numeric, positive and 3 elements long. E.g. c(0,1500,100). |
color |
TRUE or FALSE. Output clusters coloured by cluster assignment. Defaults to TRUE. |
diel_shade |
TRUE or FALSE. Output plot with night-time shading. Can be slow! Defaults to FALSE. |
dpi |
Numerical. DPI to use for 'ggsave()' output. E.g, 600 |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
A plot list of all plots created of each cluster in the data. When output == TRUE this prints to file one figure for each Cluster with a fixed y-axis. Additionally outputs a facet plot of all clusters, and a free y-axis version of all plots.
# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load kmeans_result kmeans_result <- readRDS(file.path(filepath, "data/5_k-means/kmeans_result.rds")) # Run plot_clusters function plot_clusters( tag_vector = "data", data_folder = filepath, kmeans_result = kmeans_result, No_days = 1, every_nth = 10, every_s = 0, Y_lim = c(0, 300, 50), color = TRUE, diel_shade = FALSE, dpi = 100, output = TRUE, output_folder = tempdir(), verbose = TRUE )# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load kmeans_result kmeans_result <- readRDS(file.path(filepath, "data/5_k-means/kmeans_result.rds")) # Run plot_clusters function plot_clusters( tag_vector = "data", data_folder = filepath, kmeans_result = kmeans_result, No_days = 1, every_nth = 10, every_s = 0, Y_lim = c(0, 300, 50), color = TRUE, diel_shade = FALSE, dpi = 100, output = TRUE, output_folder = tempdir(), verbose = TRUE )
This function plots the time-series depth data from the imported tag.
plot_TDR( rds_file, data_folder = NULL, every_nth = 20, every_s = 0, plot_size = c(12, 6), X_lim = NULL, Y_lim = c(0, 1500, 100), date_breaks = "14 day", dpi = 300, output = FALSE, output_folder = NULL, verbose = FALSE )plot_TDR( rds_file, data_folder = NULL, every_nth = 20, every_s = 0, plot_size = c(12, 6), X_lim = NULL, Y_lim = c(0, 1500, 100), date_breaks = "14 day", dpi = 300, output = FALSE, output_folder = NULL, verbose = FALSE )
rds_file |
Character vector file path of rds file. E.g. ("E:/data/archive_days.rds") |
data_folder |
Parent folder path with separate folders for each tag data. E.g. "C:/Tag data". Defaults to 'data_dir' |
every_nth |
Numerical. Optional down-sampling of data points to plot. Defaults to 10, plotting every 10th record. |
every_s |
Numerical. Alternative to every_nth. Optional down-sampling of data points to plot by number of seconds, as opposed to records. E.g. plots every 60th second, rather than 10th row of data. Must be a multiple of the sampling frequency. Overrides every_nth if != 0. |
plot_size |
ggSave height and width for saving the output plot. Must be numeric, positive and 2 elements long. Default to 'c(12,6)' |
X_lim |
Optional. Vector with two dates delimiting the time-depth record to plot. E.g. c("2000-01-01", "2000-11-23") |
Y_lim |
Character vector with minimum depth, maximum depth, and sequence for ticks on Y-axis. Must be numeric, positive and 3 elements long. E.g. c(0,1500,100). |
date_breaks |
X-axis ggplot2 date breaks. E.g, "24 hour, "3 day", "2 week". |
dpi |
Numerical. DPI to use for 'ggsave()' output. E.g, 600 |
output |
Logical. If TRUE, a plot file is saved to |
output_folder |
Output folder path used when |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
A data frame of plot data
# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Run plot_TDR function TDR_plot <- plot_TDR( rds_file = "data/archive_days.rds", data_folder = filepath, every_nth = 10, every_s = 0, plot_size = c(12, 6), X_lim = NULL, Y_lim = c(0, 300, 50), date_breaks = "24 hour", dpi = 100, output = TRUE, output_folder = tempdir(), verbose = TRUE )# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Run plot_TDR function TDR_plot <- plot_TDR( rds_file = "data/archive_days.rds", data_folder = filepath, every_nth = 10, every_s = 0, plot_size = c(12, 6), X_lim = NULL, Y_lim = c(0, 300, 50), date_breaks = "24 hour", dpi = 100, output = TRUE, output_folder = tempdir(), verbose = TRUE )
select_k creates the elbow plot and silhouette width plot for assistance
with selection of k
select_k( kmeans_data, standardise = TRUE, Max.k = 15, v_line = NULL, calc_gap = FALSE, plot_gap = FALSE, output = FALSE, output_folder = NULL, verbose = FALSE )select_k( kmeans_data, standardise = TRUE, Max.k = 15, v_line = NULL, calc_gap = FALSE, plot_gap = FALSE, output = FALSE, output_folder = NULL, verbose = FALSE )
kmeans_data |
Data frame containing the combined PC scores and depth statistics to perform k-means on. Output from the 'combine_data()' function. |
standardise |
TRUE or FALSE. Whether or not to standardise the data. Defaults to TRUE. |
Max.k |
Numerical. Maximum value of k to try. Defaults to 15. |
v_line |
Numerical. Option to add a vertical line to plot at a specific value of k. Defaults to NULL. |
calc_gap |
TRUE or FALSE. Whether or not to calculate the gap statistic. Defaults to FALSE |
plot_gap |
TRUE or FALSE. Whether or not to plot the gap statistic. Defaults to FALSE. |
output |
Logical. If TRUE, output is saved to |
output_folder |
Output folder path. If |
verbose |
Logical. If TRUE, progress messages are shown. Defaults to FALSE. |
This function relies on random initialisation in k-means clustering.
For reproducible results, users may wish to set a random seed
prior to calling this function using set.seed().
A 'ggplot' class object and creates a figure containing both the within-cluster sum of squares plot (elbow) and the average silhouette width plot for 1 to 'Max.k' clusters.
# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load kmeans_data kmeans_data <- readRDS(file.path(filepath, "data/5_k-means/combined_stats.rds")) # Run select_k function selecting_k <- select_k( kmeans_data = kmeans_data, standardise = TRUE, Max.k = 8, v_line = 4, calc_gap = FALSE, plot_gap = FALSE, output = TRUE, output_folder = tempdir(), verbose = TRUE )# Set file path filepath <- system.file("extdata", package = "FishDiveR") # Load kmeans_data kmeans_data <- readRDS(file.path(filepath, "data/5_k-means/combined_stats.rds")) # Run select_k function selecting_k <- select_k( kmeans_data = kmeans_data, standardise = TRUE, Max.k = 8, v_line = 4, calc_gap = FALSE, plot_gap = FALSE, output = TRUE, output_folder = tempdir(), verbose = TRUE )