The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
veesa is an R package for implementing the VEESA
pipeline for an explainable approach to training machine learning models
with functional data inputs. See a preprint manuscript describing the
approach on arXiv.
Installing veesa can be implemented using either of the
commands below.
# CRAN
install.packages("veesa")
# Development version from GitHub
remotes::install_github("sandialabs/veesa")Keep reading for an example using veesa to implement the
VEESA pipeline.
# Load R packages
library(cowplot)
library(dplyr)
library(ggplot2)
library(purrr)
library(randomForest)
library(tidyr)
library(veesa)
# Specify a color palette
color_pal = wesanderson::wes_palette("Zissou1", 5, type = "continuous")
# Specify colors for PC direction plots
col_plus1 = "#784D8C"
col_plus2 = "#A289AE"
col_minus1 = "#EA9B44"
col_minus2 = "#EBBC88"
col_pcdir_1sd = c(col_plus1, "black", col_minus1)
col_pcdir_2sd = c(col_plus2, col_plus1, "black", col_minus1, col_minus2)Simulate data:
sim_data = simulate_functions(M = 100, N = 75, seed = 20211130)Separate data into training/testing:
set.seed(20211130)
id = unique(sim_data$id)
M_test = length(id) * 0.25
id_test = sample(x = id, size = M_test, replace = FALSE)
sim_data = sim_data %>% mutate(data = ifelse(id %in% id_test, "test", "train"))Simulated functions colored by covariates:

Prepare matrices from the data frames:
prep_matrix <- function(df, train_test) {
df %>%
filter(data == train_test) %>%
select(id, t, y) %>%
ungroup() %>%
pivot_wider(id_cols = t,
names_from = id,
values_from = y) %>%
select(-t) %>%
as.matrix()
}
sim_train_matrix = prep_matrix(df = sim_data, train_test = "train")
sim_test_matrix = prep_matrix(df = sim_data, train_test = "test")Create a vector of times:
times = sim_data$t %>% unique()Prepare train data
train_transformed_jfpca <-
prep_training_data(
f = sim_train_matrix,
time = times,
fpca_method = "jfpca",
optim_method = "DPo"
)Prepare test data:
test_transformed_jfpca <-
prep_testing_data(
f = sim_test_matrix,
time = times,
train_prep = train_transformed_jfpca,
optim_method = "DPo"
)Plot several PCs:

Compare jfPCA coefficients from train and test data:

Create response variable:
x1_train <-
sim_data %>% filter(data == "train") %>%
select(id, x1) %>%
distinct() %>%
pull(x1)Create data frame with PCs and response for random forest:
rf_jfpca_df <-
train_transformed_jfpca$fpca_res$coef %>%
data.frame() %>%
rename_all(.funs = function(x) stringr::str_replace(x, "X", "pc")) %>%
mutate(x1 = x1_train) %>%
select(x1, everything())Fit random forest:
set.seed(20211130)
rf_jfpca = randomForest(x1 ~ ., data = rf_jfpca_df)Compute PFI:
set.seed(20211130)
pfi_jfpca <- compute_pfi(
x = rf_jfpca_df %>% select(-x1),
y = rf_jfpca_df$x1,
f = rf_jfpca,
K = 10,
metric = "nmse"
)PFI results (mean of reps):

PFI results (variability across reps):

Identify the top PC for each elastic fPCA method:
top_pc_jfpca <-
data.frame(pfi = pfi_jfpca$pfi) %>%
mutate(pc = 1:n()) %>%
arrange(desc(pfi)) %>%
slice(1) %>%
pull(pc)Principal directions of top PC for each jfPCA method:

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.