dicepro - Hyperparameter Search Space Visualization

dicepro Team

2026-06-24

Note: All code chunks have eval = FALSE and are shown for illustration only. To run them interactively:

library(dicepro)
# copy-paste the chunks below into your R session

1 Overview

This vignette explains the two hyper-parameter search space strategies available in dicepro and shows how to visualize the resulting \((\gamma, \lambda)\) distributions with create_gamma_lambda_plot().

The hspaceTechniqueChoose argument controls which strategy is used, both in run_experiment() and in the plot function.


2 The Two Strategies

2.1 "all" - Independent sampling

\(\lambda\) and \(\gamma\) are each drawn independently from their own log-uniform distribution:

Parameter Distribution Range
lambda_ Log-uniform \([1,\; 10^8]\)
gamma Log-uniform \([1,\; 10^8]\)
p_prime Log-uniform \([10^{-6},\; 1]\)

No structural constraint links the two parameters. The resulting \((\gamma, \lambda)\) cloud fills the entire feasible rectangle uniformly on a log-log scale.

2.2 "restrictionEspace" - Linked sampling

\(\gamma\) is the base variable; \(\lambda\) is derived via:

\[\lambda = \gamma \times \lambda_\text{factor}, \quad \lambda_\text{factor} \sim \text{LogUniform}(2,\; 100)\]

Parameter Distribution Range
gamma Log-uniform \([1,\; 10^5]\)
lambda_factor Log-uniform \([2,\; 100]\)
p_prime Log-uniform \([0.1,\; 1]\)

This guarantees \(\lambda \geq 2\gamma\) at all times. The feasibility region is bounded by two diagonal lines in the log-log plane:


3 Visualizing the Search Space

create_gamma_lambda_plot() samples 200 configurations (by default) and renders them as scatter plot on log-log axes.

3.1 "all" - Independent space

library(dicepro)

p_all <- create_gamma_lambda_plot(hspaceTechniqueChoose = "all")
p_all

The cloud fills the square \([1, 10^8]^2\) uniformly, with no structural relationship between \(\gamma\) and \(\lambda\).

3.2 "restrictionEspace" - Restricted space

p_restr <- create_gamma_lambda_plot(hspaceTechniqueChoose = "restrictionEspace")
p_restr

All points fall within the diagonal band delimited by the two dashed lines. On log–log axes, the linear \(\lambda = c * \gamma\) relationship appear as parallel straight lines.


4 Simulated Data

Before running the optimization, we simulate a self-consistent data set using simulation(). The function returns a list with three elements:

run_experiment() expects a dataset list with keys $W, $P, and $B. We therefore rename $p to $P after simulation.

library(dicepro)
set.seed(2101L)

sim <- simulation(
  loi        = "gauss",
  scenario   = "hierarchical",
  nSample    = 30L,
  nGenes     = 200L,
  nCellsType = 10L,
  sigma_bio  = 0.07,
  sigma_tech = 0.07,
  seed       = 2101L
)

my_dataset <- list(
  W = sim$W,
  P = sim$p,
  B = sim$B
)

cat("W :", nrow(my_dataset$W), "genes x", ncol(my_dataset$W), "cell types\n")
cat("P :", nrow(my_dataset$P), "samples x", ncol(my_dataset$P), "cell types\n")
cat("B :", nrow(my_dataset$B), "genes x", ncol(my_dataset$B), "samples\n")
cat("Row sums of P (range):", round(range(rowSums(my_dataset$P)), 4), "\n")

5 Running the optimization

5.1 Strategy "all" - Independent sampling

results_all <- run_experiment(
  dataset               = my_dataset,
  W_prime               = 0,
  bulkName              = "SimBulk",
  refName               = "SimRef",
  hp_max_evals          = 150L,
  algo_select           = "random",
  output_base_dir       = tempdir(),
  hspaceTechniqueChoose = "all"
)

cat("Completed trials:", nrow(results_all$trials), "\n")
head(results_all$trials[, c("lambda_", "gamma", "p_prime", "loss", "constraint")])

5.2 Strategy "restrictionEspace" - linked sampling

results_restr <- run_experiment(
  dataset               = my_dataset,
  W_prime               = 0,
  bulkName              = "SimBulk",
  refName               = "SimRef",
  hp_max_evals          = 150L,
  algo_select           = "random",
  output_base_dir       = tempdir(),
  hspaceTechniqueChoose = "restrictionEspace"
)

cat("Completed trials:", nrow(results_restr$trials), "\n")
head(results_restr$trials[, c("lambda_", "gamma", "p_prime", "loss", "constraint")])

6 Comparing the Two Strategies

Once both runs are complete, we can overlay their \((\gamma, \lambda)\) distributions to compare coverage:

best_all   <- results_all$trials[which.min(results_all$trials$loss), ]
best_restr <- results_restr$trials[which.min(results_restr$trials$loss), ]

cat("--- all ---\n")
cat(sprintf("  lambda = %.3g  |  gamma = %.3g  |  loss = %.4f\n",
            best_all$lambda_, best_all$gamma, best_all$loss))

cat("--- restrictionEspace ---\n")
cat(sprintf("  lambda = %.3g  |  gamma = %.3g  |  loss = %.4f\n",
            best_restr$lambda_, best_restr$gamma, best_restr$loss))

plot(
  results_all$trials$gamma,
  results_all$trials$lambda_,
  log  = "xy",
  pch  = 19, cex = 0.5,
  col  = adjustcolor("steelblue", 0.4),
  xlab = expression(gamma), ylab = expression(lambda),
  main = "Sampled configurations: all (blue) vs restrictionEspace (orange)"
)
points(
  results_restr$trials$gamma,
  results_restr$trials$lambda_,
  pch = 19, cex = 0.5,
  col = adjustcolor("darkorange", 0.4)
)
legend("topleft",
       legend = c("all", "restrictionEspace"),
       col    = c("steelblue", "darkorange"),
       pch    = 19, pt.cex = 1.2)

7 Session Info

sessionInfo()