The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Once a dimensionality-reduction model (PCA, PLS, CCA, …) is fitted, every new sample can be projected into the low-dimensional latent space. Running a classifier there – instead of on thousands of noisy raw variables – yields
The classifier() S3 family supplied by
multiblock provides that glue: you hand it any projector
(or multiblock_biprojector,
discriminant_projector, …) plus class labels → it returns a
ready predictor object.
discriminant_projector → k-NNdata(iris)
X <- as.matrix(iris[, 1:4])
grp <- iris$Species
# Fit classical Linear DA and wrap it
if (!requireNamespace("MASS", quietly = TRUE)) {
stop("MASS package required for LDA example")
}
# 1. Define and fit the pre-processing step using the training data
preproc_fitted <- fit(center(), X)
# 2. Transform the data
Xp <- transform(preproc_fitted, X)
# Assuming discriminant_projector, prep, center, scores are available
lda_fit <- MASS::lda(X, grouping = grp)
disc_proj <- multivarious::discriminant_projector(
v = lda_fit$scaling, # loadings (p × d)
s = Xp %*% lda_fit$scaling, # scores (n × d)
sdev = lda_fit$svd, # singular values
labels = grp,
preproc = preproc_fitted # Pass the fitted pre-processor
)
print(disc_proj)
#> Projector object:
#> Input dimension: 4
#> Output dimension: 2
#> With pre-processing:
#> A finalized pre-processing pipeline:
#> Step 1 : center
#> Label counts:
#> setosa versicolor virginica
#> 50 50 50scores_df <- as_tibble(scores(disc_proj)[, 1:2],
.name_repair = ~ c("LD1","LD2")) |>
mutate(Species = iris$Species)
ggplot(scores_df, aes(LD1, LD2, colour = Species)) +
geom_point(size = 2, alpha = .7) +
stat_ellipse(level = .9, linewidth = .3) +
theme_minimal() +
ggtitle("Iris – first two LDA components")set.seed(42)
train_id <- sample(seq_len(nrow(X)), size = 0.7*nrow(X))
test_id <- setdiff(seq_len(nrow(X)), train_id)
# Assuming classifier function is available
clf_knn <- multivarious::classifier(
x = disc_proj,
labels = grp[train_id],
new_data= X[train_id, ], # Use training data to get reference scores
knn = 3
)
print(clf_knn)
#> k-NN Classifier object:
#> k-NN Neighbors (k): 3
#> Number of Training Samples: 150
#> Number of Classes: 3
#> Underlying Projector Details:
#> Projector object:
#> Input dimension: 4
#> Output dimension: 2
#> With pre-processing:
#> A finalized pre-processing pipeline:
#> Step 1 : center
#> Label counts:
#> setosa versicolor virginica
#> 50 50 50pred_knn <- predict(clf_knn, new_data = X[test_id, ],
metric = "cosine", prob_type = "knn_proportion")
head(pred_knn$prob, 3)
#> setosa versicolor virginica
#> [1,] 1 0 0
#> [2,] 1 0 0
#> [3,] 1 0 0
print(paste("Overall Accuracy:", mean(pred_knn$class == grp[test_id])))
#> [1] "Overall Accuracy: 0.888888888888889"
# Assuming rank_score and topk are available
rk <- rank_score(pred_knn$prob, grp[test_id])
tk2 <- topk (pred_knn$prob, grp[test_id], k = 2)
tibble(
prank_mean = mean(rk$prank),
top2_acc = mean(tk2$topk)
)
#> # A tibble: 1 × 2
#> prank_mean top2_acc
#> <dbl> <dbl>
#> 1 0.278 1cm <- table(
Truth = grp[test_id],
Predicted = pred_knn$class
)
# Heat-map
cm_df <- as.data.frame(cm)
ggplot(cm_df, aes(Truth, Predicted, fill = Freq)) +
geom_tile(colour = "grey80") +
geom_text(aes(label = Freq), colour = "white", size = 4) +
scale_fill_gradient(low = "#4575b4", high = "#d73027", name="Count", limits = c(0, 15)) +
scale_y_discrete(limits = rev(levels(cm_df$Predicted))) +
theme_minimal(base_size = 12) + coord_equal() +
ggtitle("k-NN (k = 3) confusion matrix – test set") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))| setosa | versicolor | virginica | |
|---|---|---|---|
| setosa | 12 | 0 | 0 |
| versicolor | 0 | 10 | 5 |
| virginica | 0 | 0 | 18 |
# Check if randomForest is installed
if (requireNamespace("randomForest", quietly = TRUE)) {
# Assuming rf_classifier.projector method is available
rf_clf <- rf_classifier( # Using the generic here
x = disc_proj,
labels = grp[train_id],
# Pass scores directly if method requires it, or let it call scores(x)
scores = scores(disc_proj)[train_id, ]
)
pred_rf <- predict(rf_clf, new_data = X[test_id, ])
print(paste("RF Accuracy:", mean(pred_rf$class == grp[test_id])))
} else {
cat("randomForest package not installed. Skipping RF example.\n")
}
#> randomForest package not installed. Skipping RF example.The RF sees exactly three input variables (the LDA components) – that keeps trees shallow and speeds-up training.
Assume that in deployment we measure only Sepal variables (cols 1–2). A partial projection keeps the classifier happy:
sepal_cols <- 1:2
# Create a classifier using reference scores from Sepal columns only
clf_knn_sepal <- multivarious::classifier(
x = disc_proj,
labels = grp[train_id],
new_data= X[train_id, sepal_cols], # Use training data subset
colind = sepal_cols, # Indicate which columns were used
knn = 3
)
# Predict using the dedicated sepal classifier
pred_sepal <- predict(
clf_knn_sepal, # Use the sepal-specific classifier
new_data = X[test_id, sepal_cols]
# No need for colind here as clf_knn_sepal expects sepal data
)
print(paste("Accuracy (Sepal only):", mean(pred_sepal$class == grp[test_id])))
#> [1] "Accuracy (Sepal only): 0.333333333333333"Accuracy drops a bit – as expected when using fewer features.
feature_importance() can rank variable groups via a
simple “leave-block-out” score drop.
blocks <- list(
Sepal = 1:2,
Petal = 3:4
)
# Assuming feature_importance is available
fi <- feature_importance(
clf_knn,
new_data = X[test_id, ],
true_labels = grp[test_id], # Pass the correct test set labels
blocks = blocks,
fun = rank_score, # Use rank_score as the performance metric
fun_direction = "lower_is_better",
approach = "marginal" # Calculate marginal drop when block is removed
)
print(fi)
#> block importance
#> 2 3,4 0.3611111
#> 1 1,2 0.1333333These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.