Community detection

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Community detection

Spectral clustering

Let us sample a network from a DCSBM:

n = 1500
Ktru = 4
lambda = 15 # expected average degree
oir = 0.1
pri = 1:Ktru

set.seed(1234)
theta <- EnvStats::rpareto(n, 2/3, 3)
B = pp_conn(n, oir, lambda, pri=pri, theta)$B
z = sample(Ktru, n, replace=T, prob=pri) # randomly smaple "true community labels" 
A = sample_dcsbm(z, B, theta) # sample the adjacency matrix

We can apply the (Laplacian-based) regularized spectral clustering for community detection:

zh = spec_clust(A, K=4)

We can evaluate the performance by computing the normalized mutual information (NMI) to a true label vector:

compute_mutual_info(z, zh)
#> [1] 0.8459515

NMI is in \([0,1]\) and the closer to 1 it is the closer the mathc between the two labels.

Performance as a function of the expected degree

Let us now consider the effect of the expected average degree \(\lambda\) on the performance of spectral clustering.

Simple planted partition model

We first generate from a simple planted partition model, with connectivity matrix, \[B_1 \propto (1-\beta)I_{K}+ \beta\mathbf{1}\mathbf{1}^{T}\] where \(\beta\) is the out-in-ratio.

set.seed(1234)
nrep = 20
nlam = 12
lamvec = 10^seq(log10(1), log10(50), length.out = nlam)  # the vector of logarithmically spaced lambda
runs = expand.grid(rep = 1:nrep, lambda = lamvec)

res = do.call(rbind, lapply(1:nrow(runs), function(j) {
 lambda = runs[j,"lambda"]
 B = pp_conn(n, oir, lambda, pri=pri, theta)$B
 A = sample_dcsbm(z, B, theta)
 zh = spec_clust(A, K = Ktru) # defaults to tau = 0.25 for the  regularization parameter
 zh_noreg = spec_clust(A, K = Ktru, tau = 0)
 data.frame(rep = runs[j,"rep"], lambda = lambda, 
            nmi = compute_mutual_info(z, zh), nmi_noreg = compute_mutual_info(z, zh_noreg))
}))

agg_nmi = aggregate(res, by = list(res$lambda), FUN = mean)

The resulting plot looks like this:

plot(agg_nmi$lambda, agg_nmi$nmi, log="x",
     type = "b", col = "blue", ylab = "NMI", xlab = "lambda", pch=19,
     main="Specral clustering performance")
lines(agg_nmi$lambda, agg_nmi$nmi_noreg, col="red", lty=2, pch=18, type = "b")
legend(1, 1, legend = c("0.25 regularization","No regularization"), 
       col = c("blue","red"), lty=1:2, box.lty=0)

This shows that increasing \(\lambda\) makes the community detection problem easier.

Randomly permuted connectivity matrix

Let us now generate the connectivity matrix randomly as follows \[B_2 \propto \gamma R + (1-\gamma) Q \] where

\(\gamma \in (0, 1)\),
\(R\) is a random symmetric permutation matrix (see function rsymperm()), and
\(Q\) a symmetric matrix with i.i.d. Unif\((0,1)\) entries on and above diagonal.

The function gen_rand_conn() generates such connectivity matrices.

set.seed(1234)
nrep = 20
nlam = 12
lamvec = 10^seq(log10(1), log10(200), length.out = nlam)  # the vector of logarithmically spaced lambda
runs = expand.grid(rep = 1:nrep, lambda = lamvec)

res = do.call(rbind, lapply(1:nrow(runs), function(j) {
 lambda = runs[j,"lambda"]
 B = gen_rand_conn(n, Ktru, lambda = lambda, gamma = 0.1, pri=pri)
 A = sample_dcsbm(z, B, theta)
 zh = spec_clust(A, K = Ktru) # defaults to tau = 0.25 for the  regularization parameter
 zh_noreg = spec_clust(A, K = Ktru, tau = 0)
 data.frame(rep = runs[j,"rep"], lambda = lambda, 
            nmi = compute_mutual_info(z, zh), nmi_noreg = compute_mutual_info(z, zh_noreg))
}))

agg_nmi = aggregate(res, by = list(res$lambda), FUN = mean)

The resulting plot looks like the following:

plot(agg_nmi$lambda, agg_nmi$nmi, log="x",
     type = "b", col = "blue", ylab = "NMI", xlab = "lambda", pch=19,
     main="Specral clustering performance")
lines(agg_nmi$lambda, agg_nmi$nmi_noreg, col="red", lty=2, pch=18, type = "b")
legend(1, max(agg_nmi$nmi), legend = c("0.25 regularization","No regularization"), 
       col = c("blue","red"), lty=1:2, box.lty=0)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.