The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
veesa
is an R package for implementing the VEESA
pipeline for an explainable approach to training machine learning models
with functional data inputs. See a preprint manuscript describing the
approach on arXiv.
Installing veesa
can be implemented using either of the
commands below.
# CRAN
install.packages("veesa")
# Development version from GitHub
::install_github("sandialabs/veesa") remotes
Keep reading for an example using veesa
to implement the
VEESA pipeline.
# Load R packages
library(cowplot)
library(dplyr)
library(ggplot2)
library(purrr)
library(randomForest)
library(tidyr)
library(veesa)
# Specify a color palette
= wesanderson::wes_palette("Zissou1", 5, type = "continuous")
color_pal
# Specify colors for PC direction plots
= "#784D8C"
col_plus1 = "#A289AE"
col_plus2 = "#EA9B44"
col_minus1 = "#EBBC88"
col_minus2 = c(col_plus1, "black", col_minus1)
col_pcdir_1sd = c(col_plus2, col_plus1, "black", col_minus1, col_minus2) col_pcdir_2sd
Simulate data:
= simulate_functions(M = 100, N = 75, seed = 20211130) sim_data
Separate data into training/testing:
set.seed(20211130)
= unique(sim_data$id)
id = length(id) * 0.25
M_test = sample(x = id, size = M_test, replace = FALSE)
id_test = sim_data %>% mutate(data = ifelse(id %in% id_test, "test", "train")) sim_data
Simulated functions colored by covariates:
Prepare matrices from the data frames:
<- function(df, train_test) {
prep_matrix %>%
df filter(data == train_test) %>%
select(id, t, y) %>%
ungroup() %>%
pivot_wider(id_cols = t,
names_from = id,
values_from = y) %>%
select(-t) %>%
as.matrix()
}= prep_matrix(df = sim_data, train_test = "train")
sim_train_matrix = prep_matrix(df = sim_data, train_test = "test") sim_test_matrix
Create a vector of times:
= sim_data$t %>% unique() times
Prepare train data
<-
train_transformed_jfpca prep_training_data(
f = sim_train_matrix,
time = times,
fpca_method = "jfpca",
optim_method = "DPo"
)
Prepare test data:
<-
test_transformed_jfpca prep_testing_data(
f = sim_test_matrix,
time = times,
train_prep = train_transformed_jfpca,
optim_method = "DPo"
)
Plot several PCs:
Compare jfPCA coefficients from train and test data:
Create response variable:
<-
x1_train %>% filter(data == "train") %>%
sim_data select(id, x1) %>%
distinct() %>%
pull(x1)
Create data frame with PCs and response for random forest:
<-
rf_jfpca_df $fpca_res$coef %>%
train_transformed_jfpcadata.frame() %>%
rename_all(.funs = function(x) stringr::str_replace(x, "X", "pc")) %>%
mutate(x1 = x1_train) %>%
select(x1, everything())
Fit random forest:
set.seed(20211130)
= randomForest(x1 ~ ., data = rf_jfpca_df) rf_jfpca
Compute PFI:
set.seed(20211130)
<- compute_pfi(
pfi_jfpca x = rf_jfpca_df %>% select(-x1),
y = rf_jfpca_df$x1,
f = rf_jfpca,
K = 10,
metric = "nmse"
)
PFI results (mean of reps):
PFI results (variability across reps):
Identify the top PC for each elastic fPCA method:
<-
top_pc_jfpca data.frame(pfi = pfi_jfpca$pfi) %>%
mutate(pc = 1:n()) %>%
arrange(desc(pfi)) %>%
slice(1) %>%
pull(pc)
Principal directions of top PC for each jfPCA method:
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.