Introduction to INCVCommunityDetection

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Overview

INCVCommunityDetection implements Inductive Node-Splitting Cross-Validation (INCV) for selecting the number of communities in Stochastic Block Models (SBM). The package also provides competing methods — CROISSANT, Edge Cross-Validation (ECV), and Node Cross-Validation (NCV) — for comprehensive model selection in network analysis.

Simulating a network

We start by generating a network from a planted-partition SBM with 3 communities, 150 nodes, within-community connection probability 0.5, and between-community probability 0.05.

library(INCVCommunityDetection)

set.seed(42)
net <- community.sim(k = 3, n = 150, n1 = 50, p = 0.5, q = 0.05)
table(net$membership)
#> 
#>  1  2  3 
#> 50 50 50

The adjacency matrix is a 150 × 150 binary symmetric matrix:

dim(net$adjacency)
#> [1] 150 150
ord <- order(net$membership)
image(net$adjacency[ord, ord],
      main = "Adjacency matrix (3-community SBM, reordered)",
      xlab = "Node", ylab = "Node")

Selecting K with INCV (f-fold)

The main function nscv.f.fold() partitions nodes into f folds and uses spectral clustering on the training subgraph. Held-out nodes are assigned to communities based on their connections to training nodes, and the held-out negative log-likelihood and MSE are computed.

result <- nscv.f.fold(net$adjacency, k.vec = 2:6, f = 5)
result$k.loss   # K selected by neg-log-likelihood
#> [1] 3
result$k.mse    # K selected by MSE
#> [1] 3

We can inspect the full CV loss curve:

plot(2:6, result$cv.loss, type = "b", pch = 19,
     xlab = "Number of communities (K)",
     ylab = "CV Negative Log-Likelihood",
     main = "INCV f-fold: CV loss by K")
abline(v = result$k.loss, lty = 2, col = "red")

Selecting K with INCV (random split)

An alternative is to use repeated random node splits instead of fixed folds:

result2 <- nscv.random.split(net$adjacency, k.vec = 2:6,
                             split = 0.66, ite = 20)
result2$k.chosen
#> [1] 3

plot(2:6, result2$cv.loss, type = "b", pch = 19,
     xlab = "Number of communities (K)",
     ylab = "CV Negative Log-Likelihood",
     main = "INCV random-split: CV loss by K")
abline(v = result2$k.chosen, lty = 2, col = "red")

Comparing with ECV and NCV

Edge Cross-Validation

ECV holds out random edges and evaluates the predictive fit of a blockmodel reconstruction. It jointly selects between SBM and DCBM.

ecv <- ECV.for.blockmodel(net$adjacency, max.K = 6, B = 3)
ecv$dev.model   # best by deviance
#> [1] "SBM-3"
ecv$l2.model    # best by L2
#> [1] "SBM-3"
ecv$auc.model   # best by AUC
#> [1] "SBM-6"

Node Cross-Validation

NCV holds out random nodes and evaluates predictions on the held-out sub-network:

ncv <- NCV.for.blockmodel(net$adjacency, max.K = 6, cv = 3)
ncv$dev.model
#> [1] "SBM-3"
ncv$l2.model
#> [1] "SBM-3"

Summary of methods

Method	Function	Splits	Selects K	Selects model type
INCV f-fold	`nscv.f.fold()`	Nodes into f folds	Yes	No (SBM only)
INCV random	`nscv.random.split()`	Random node split	Yes	No (SBM only)
ECV	`ECV.for.blockmodel()`	Random edge holdout	Yes	Yes (SBM vs DCBM)
NCV	`NCV.for.blockmodel()`	Node folds	Yes	Yes (SBM vs DCBM)
CROISSANT	`croissant.blockmodel()`	Overlapping subsamples	Yes	Yes (SBM vs DCBM)

Spectral clustering and probability estimation

The building blocks are also available directly:

cl <- SBM.spectral.clustering(net$adjacency, k = 3)
table(cl$cluster)
#> 
#>  1  2  3 
#> 50 50 50

prob <- SBM.prob(cl$cluster, k = 3, A = net$adjacency, restricted = TRUE)
round(prob$p.matrix, 3)
#>       [,1]  [,2]  [,3]
#> [1,] 0.502 0.050 0.050
#> [2,] 0.050 0.502 0.050
#> [3,] 0.050 0.050 0.502

Distance-decaying SBM simulation

For more realistic simulations, community.sim.sbm() generates networks where block probabilities decay with community distance:

net2 <- community.sim.sbm(n = 120, n1 = 40, eta = 0.3, rho = 0.2, K = 4)
round(net2$conn, 4)
#>        [,1]  [,2]  [,3]   [,4]
#> [1,] 0.2000 0.060 0.018 0.0054
#> [2,] 0.0600 0.200 0.060 0.0180
#> [3,] 0.0180 0.060 0.200 0.0600
#> [4,] 0.0054 0.018 0.060 0.2000

Session info

sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS Sonoma 14.6.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] C/en_US/en_US/C/en_US/en_US
#> 
#> time zone: America/Los_Angeles
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] INCVCommunityDetection_0.1.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] Matrix_1.7-4          mvnfast_0.2.8         gtable_0.3.6         
#>  [4] jsonlite_2.0.0        compiler_4.5.2        Rcpp_1.1.1           
#>  [7] slam_0.1-55           parallel_4.5.2        cluster_2.1.8.2      
#> [10] jquerylib_0.1.4       scales_1.4.0          yaml_2.3.12          
#> [13] fastmap_1.2.0         lattice_0.22-7        ggplot2_4.0.2        
#> [16] R6_2.6.1              knitr_1.51            zigg_0.0.2           
#> [19] bslib_0.10.0          RColorBrewer_1.1-3    rlang_1.1.7          
#> [22] cachem_1.1.0          ClusterR_1.3.6        xfun_0.56            
#> [25] sass_0.4.10           S7_0.2.1              RcppParallel_5.1.11-2
#> [28] otel_0.2.0            viridisLite_0.4.3     cli_3.6.5            
#> [31] digest_0.6.39         grid_4.5.2            irlba_2.3.7          
#> [34] gmp_0.7-5.1           mclust_6.1.2          lifecycle_1.0.5      
#> [37] vctrs_0.7.1           Rfast_2.1.5.2         data.table_1.18.2.1  
#> [40] IMIFA_2.2.0           RSpectra_0.16-2       evaluate_1.0.5       
#> [43] glue_1.8.0            farver_2.1.2          rmarkdown_2.30       
#> [46] matrixStats_1.5.0     tools_4.5.2           htmltools_0.5.9

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.