The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Note: All code chunks have
eval = FALSEand are shown for illustration only. To run them interactively:
This vignette demonstrates complete dicepro deconvolution workflows on fully synthetic data, enabling end-to-end reproducibility without requiring access to proprietary data-sets.
Two simulation strategies are available and covered here:
| Function | Reference matrix | Cell-type names | Best used for |
|---|---|---|---|
simulation() |
Synthetic (generated internally) | Generic (CellType_k) |
Flexible benchmarking with custom parameters |
simulation_bluecode() |
BlueCode (bundled real reference) | Real biological names | Realistic benchmarking on human tissue cell types |
Both functions return the same standardized triplet –
$W, $p, $B – and are fully
interchangeable as input to dicepro().
The pipeline proceeds in four stages:
dicepro()
function with automated hyper-parameter optimization.simulation() generates a self-consistent triplet from
scratch: a synthetic reference matrix $W, ground-truth
proportion matrix $p, and noisy bulk matrix
$B, guaranteeing column-name alignment across all three
outputs.
library(dicepro)
set.seed(2101L)
sim_data <- simulation(
loi = "gauss",
scenario = "hierarchical",
nSample = 30,
nGenes = 200,
nCellsType = 10,
sigma_bio = 0.07,
sigma_tech = 0.07,
seed = 2101L
)
cat("Reference :", dim(sim_data$W), "\n")
cat("Proportions:", dim(sim_data$p), "\n")
cat("Bulk :", dim(sim_data$B), "\n")
cat("Row sums :", round(range(rowSums(sim_data$p)), 4), "\n")bulk_clean <- as.matrix(sim_data$W) %*% t(as.matrix(sim_data$p))
plot(
bulk_clean[seq_len(500)],
as.matrix(sim_data$B)[seq_len(500)],
xlab = "Clean bulk (first 500 entries)",
ylab = "Noisy bulk",
pch = 19, cex = 0.4, col = "#2c7bb6",
main = "Noise model: clean vs noisy bulk"
)
abline(0, 1, col = "firebrick", lwd = 1.5)out <- dicepro(
reference = as.matrix(sim_data$W)[, -c(1,5,10)],
bulk = as.matrix(sim_data$B),
methodDeconv = "FARDEEP",
W_prime = 0,
bulkName = "SimBulk",
refName = "SimRef",
hp_max_evals = 500,
algo_select = "random",
output_path = tempdir(),
hspaceTechniqueChoose = "gamma_dominant",
normalize = FALSE
)true_prop <- as.matrix(sim_data$p)
pred_prop <- as.matrix(out$H)
common_ct <- intersect(colnames(pred_prop), colnames(true_prop))
true_common <- true_prop[, common_ct, drop = FALSE]
pred_common <- pred_prop[, common_ct, drop = FALSE]
r_overall <- cor(as.vector(true_common), as.vector(pred_common))
cat(sprintf("Overall Pearson r: %.3f\n", r_overall))
plot(
as.vector(true_common),
as.vector(pred_common),
xlab = "True proportions",
ylab = "Predicted proportions",
pch = 19, cex = 0.5, col = "#2c7bb6aa",
main = sprintf("True vs Predicted (r = %.3f)", r_overall)
)
abline(0, 1, col = "firebrick", lwd = 1.5)ct_cors <- vapply(common_ct, function(ct) {
cor(true_common[, ct], pred_common[, ct])
}, numeric(1L))
par(mar = c(5, 10, 3, 1))
barplot(
sort(ct_cors),
horiz = TRUE, las = 1,
col = ifelse(sort(ct_cors) > 0.7, "#2c7bb6", "#d7191c"),
xlab = "Pearson r",
main = "Per-cell-type correlation (synthetic)",
xlim = c(-0.2, 1)
)
abline(v = 0.7, lty = 2, col = "gray40")simulation_bluecode() uses the BlueCode
reference matrix bundled with the package – a curated signature matrix
covering 34 human cell types across five tissue compartments – as the
fixed reference $W. Proportions are simulated via a
two-level Dirichlet hierarchy that reflects the biological organisation
of human tissues.
This strategy is recommended when benchmarking should be grounded in a biologically realistic reference, while keeping full control over the ground-truth proportions.
The hierarchy encodes five major tissue compartments with their constituent cell types:
| Compartment | Cell types (n) | Default α |
|---|---|---|
| Immune | B naive/memory, T CD4/CD8, NK, Monocyte, Macrophage, Neutrophil | 4.0 |
| Stromal | Fibroblasts (×5), MSC, Chondrocyte, Osteoblast | 2.5 |
| Endothelial | Large vessel, microvascular (×2) | 1.8 |
| Epithelial | Mammary, renal, respiratory, keratinocyte, melanocyte | 1.8 |
| Muscle | Smooth muscle (×7), cardiac myocyte, myometrial | 1.5 |
library(dicepro)
sim_bc <- simulation_bluecode(
nSample = 30,
sigma_bio = 0.15,
sigma_tech = 0.02,
seed = 2101L
)
cat("Reference :", dim(sim_bc$W), "\n") # nGenes x 34
cat("Proportions:", dim(sim_bc$p), "\n") # 30 x 34
cat("Bulk :", dim(sim_bc$B), "\n") # nGenes x 30
cat("Row sums :", round(range(rowSums(sim_bc$p)), 4), "\n")
# Real cell-type names from BlueCode
head(colnames(sim_bc$p))The hierarchical model induces realistic between-compartment variation. Visualising the compartment-level totals confirms the expected dominance of the Immune compartment.
# Cell-type to compartment mapping (mirrors .bluecode_cell_groups)
compartment_map <- c(
rep("Immune", 9),
rep("Stromal", 8),
rep("Endothelial", 3),
rep("Epithelial", 5),
rep("Muscle", 9)
)
names(compartment_map) <- colnames(sim_bc$p)
# Aggregate proportions by compartment for each sample
comp_props <- t(apply(sim_bc$p, 1, function(row) {
tapply(row, compartment_map[names(row)], sum)
}))
boxplot(
comp_props,
col = c("#4393c3", "#74c476", "#fd8d3c", "#9ecae1", "#fb6a4a"),
ylab = "Compartment proportion",
main = "Simulated compartment proportions (BlueCode)",
las = 2
)bulk_clean_bc <- as.matrix(sim_bc$W) %*% t(as.matrix(sim_bc$p))
plot(
bulk_clean_bc[seq_len(500)],
as.matrix(sim_bc$B)[seq_len(500)],
xlab = "Clean bulk (first 500 entries)",
ylab = "Noisy bulk",
pch = 19, cex = 0.4, col = "#74c476",
main = "Noise model: clean vs noisy bulk (BlueCode)"
)
abline(0, 1, col = "firebrick", lwd = 1.5)true_prop_bc <- as.matrix(sim_bc$p)
pred_prop_bc <- as.matrix(out_bc$H)
common_ct_bc <- intersect(colnames(pred_prop_bc), colnames(true_prop_bc))
true_common_bc <- true_prop_bc[, common_ct_bc, drop = FALSE]
pred_common_bc <- pred_prop_bc[, common_ct_bc, drop = FALSE]
r_overall_bc <- cor(as.vector(true_common_bc), as.vector(pred_common_bc))
cat(sprintf("Overall Pearson r: %.3f\n", r_overall_bc))
plot(
as.vector(true_common_bc),
as.vector(pred_common_bc),
xlab = "True proportions",
ylab = "Predicted proportions",
pch = 19, cex = 0.5, col = "#74c47699",
main = sprintf("True vs Predicted -- BlueCode (r = %.3f)", r_overall_bc)
)
abline(0, 1, col = "firebrick", lwd = 1.5)ct_cors_bc <- vapply(common_ct_bc, function(ct) {
cor(true_common_bc[, ct], pred_common_bc[, ct])
}, numeric(1L))
par(mar = c(5, 14, 3, 1))
barplot(
sort(ct_cors_bc),
horiz = TRUE, las = 1,
col = ifelse(sort(ct_cors_bc) > 0.7, "#2c7bb6", "#d7191c"),
xlab = "Pearson r",
main = "Per-cell-type correlation (BlueCode)",
xlim = c(-0.2, 1)
)
abline(v = 0.7, lty = 2, col = "gray40")When both strategies are run under the same conditions, their overall correlation scores can be compared directly.
These two strategies are complementary: simulation()
offers maximum flexibility for testing deconvolution under stress
conditions, with varying numbers of genes, cell types, or noise levels;
simulation_bluecode() anchors the benchmark to a
biologically grounded reference, making the results easier to interpret
in the context of real-world data from human tissues.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.