The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
INCVCommunityDetection implements Inductive
Node-Splitting Cross-Validation (INCV) for selecting the number
of communities in Stochastic Block Models (SBM). The package also
provides competing methods — CROISSANT, Edge
Cross-Validation (ECV), and Node Cross-Validation
(NCV) — for comprehensive model selection in network
analysis.
We start by generating a network from a planted-partition SBM with 3 communities, 150 nodes, within-community connection probability 0.5, and between-community probability 0.05.
library(INCVCommunityDetection)
set.seed(42)
net <- community.sim(k = 3, n = 150, n1 = 50, p = 0.5, q = 0.05)
table(net$membership)
#>
#> 1 2 3
#> 50 50 50The adjacency matrix is a 150 × 150 binary symmetric matrix:
dim(net$adjacency)
#> [1] 150 150
ord <- order(net$membership)
image(net$adjacency[ord, ord],
main = "Adjacency matrix (3-community SBM, reordered)",
xlab = "Node", ylab = "Node")The main function nscv.f.fold() partitions nodes into
f folds and uses spectral clustering on the training
subgraph. Held-out nodes are assigned to communities based on their
connections to training nodes, and the held-out negative log-likelihood
and MSE are computed.
result <- nscv.f.fold(net$adjacency, k.vec = 2:6, f = 5)
result$k.loss # K selected by neg-log-likelihood
#> [1] 3
result$k.mse # K selected by MSE
#> [1] 3We can inspect the full CV loss curve:
plot(2:6, result$cv.loss, type = "b", pch = 19,
xlab = "Number of communities (K)",
ylab = "CV Negative Log-Likelihood",
main = "INCV f-fold: CV loss by K")
abline(v = result$k.loss, lty = 2, col = "red")An alternative is to use repeated random node splits instead of fixed folds:
result2 <- nscv.random.split(net$adjacency, k.vec = 2:6,
split = 0.66, ite = 20)
result2$k.chosen
#> [1] 3plot(2:6, result2$cv.loss, type = "b", pch = 19,
xlab = "Number of communities (K)",
ylab = "CV Negative Log-Likelihood",
main = "INCV random-split: CV loss by K")
abline(v = result2$k.chosen, lty = 2, col = "red")ECV holds out random edges and evaluates the predictive fit of a blockmodel reconstruction. It jointly selects between SBM and DCBM.
| Method | Function | Splits | Selects K | Selects model type |
|---|---|---|---|---|
| INCV f-fold | nscv.f.fold() |
Nodes into f folds | Yes | No (SBM only) |
| INCV random | nscv.random.split() |
Random node split | Yes | No (SBM only) |
| ECV | ECV.for.blockmodel() |
Random edge holdout | Yes | Yes (SBM vs DCBM) |
| NCV | NCV.for.blockmodel() |
Node folds | Yes | Yes (SBM vs DCBM) |
| CROISSANT | croissant.blockmodel() |
Overlapping subsamples | Yes | Yes (SBM vs DCBM) |
The building blocks are also available directly:
For more realistic simulations, community.sim.sbm()
generates networks where block probabilities decay with community
distance:
sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS Sonoma 14.6.1
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] C/en_US/en_US/C/en_US/en_US
#>
#> time zone: America/Los_Angeles
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] INCVCommunityDetection_0.1.0
#>
#> loaded via a namespace (and not attached):
#> [1] Matrix_1.7-4 mvnfast_0.2.8 gtable_0.3.6
#> [4] jsonlite_2.0.0 compiler_4.5.2 Rcpp_1.1.1
#> [7] slam_0.1-55 parallel_4.5.2 cluster_2.1.8.2
#> [10] jquerylib_0.1.4 scales_1.4.0 yaml_2.3.12
#> [13] fastmap_1.2.0 lattice_0.22-7 ggplot2_4.0.2
#> [16] R6_2.6.1 knitr_1.51 zigg_0.0.2
#> [19] bslib_0.10.0 RColorBrewer_1.1-3 rlang_1.1.7
#> [22] cachem_1.1.0 ClusterR_1.3.6 xfun_0.56
#> [25] sass_0.4.10 S7_0.2.1 RcppParallel_5.1.11-2
#> [28] otel_0.2.0 viridisLite_0.4.3 cli_3.6.5
#> [31] digest_0.6.39 grid_4.5.2 irlba_2.3.7
#> [34] gmp_0.7-5.1 mclust_6.1.2 lifecycle_1.0.5
#> [37] vctrs_0.7.1 Rfast_2.1.5.2 data.table_1.18.2.1
#> [40] IMIFA_2.2.0 RSpectra_0.16-2 evaluate_1.0.5
#> [43] glue_1.8.0 farver_2.1.2 rmarkdown_2.30
#> [46] matrixStats_1.5.0 tools_4.5.2 htmltools_0.5.9These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.