This vignette provides an introduction to the R package SC.MEB
, where the function SC.MEB
implements the model SC-MEB
, spatial clustering with hidden Markov random field using empirical Bayes. The package can be installed with the command:
The package can be loaded with the command:
We first set the basic parameter:
library(mvtnorm)
library(GiRaF)
library(SingleCellExperiment)
set.seed(100)
G <- 4
Bet <- 1
KK <- 5
p <- 15
mu <- matrix(c( c(-6, rep(-1.5, 14)),
rep(0, 15),
c(6, rep(1.5, 14)),
c(rep(-1.5, 7), rep(1.5, 7), 6),
c(rep(1.5, 7), rep(-1.5, 7), -6)), ncol = KK)
height <- 50
width <- 50
n <- height * width # # of cell in each indviduals
Then, we generate the true clustering label, 15-dimensional PCA and position of each spot.
X <- sampler.mrf(iter = n, sampler = "Gibbs", h = height, w = width, ncolors = KK,
nei = G, param = Bet,initialise = FALSE, view = TRUE)
x <- c(X) + 1
y <- matrix(0, nrow = n, ncol = p)
for(i in 1:n) { # cell
mu_i <- mu[, x[i]]
Sigma_i <- ((x[i]==1)*2 + (x[i]==2)*2.5 + (x[i]==3)*3 +
(x[i]==4)*3.5 + (x[i]==5)*4)*diag(1, p)*4
y[i, ] <- rmvnorm(1, mu_i, Sigma_i)
}
pos <- cbind(rep(1:height, width), rep(1:height, each=width))
Subsequently, we construct the SingleCellExperiment object based on the above PCA and position.
# -------------------------------------------------
# make BayesSpace metadata used in BayesSpace
counts <- t(y)
rownames(counts) <- paste0("gene_", seq_len(p))
colnames(counts) <- paste0("spot_", seq_len(n))
## Make array coordinates - filled rectangle
cdata <- list()
nrow <- height; ncol <- width
cdata$row <- rep(seq_len(nrow), each=ncol)
cdata$col <- rep(seq_len(ncol), nrow)
cdata <- as.data.frame(do.call(cbind, cdata))
## Scale and jitter image coordinates
#scale.factor <- rnorm(1, 8); n_spots <- n
#cdata$imagerow <- scale.factor * cdata$row + rnorm(n_spots)
#cdata$imagecol <- scale.factor * cdata$col + rnorm(n_spots)
cdata$imagerow <- cdata$row
cdata$imagecol <- cdata$col
## Make SCE
## note: scater::runPCA throws warning on our small sim data, so use prcomp
sce <- SingleCellExperiment(assays=list(counts=counts), colData=cdata)
reducedDim(sce, "PCA") <- y
# sce$spatial.cluster <- floor(runif(ncol(sce), 1, 3))
metadata(sce)$BayesSpace.data <- list()
metadata(sce)$BayesSpace.data$platform <- "ST"
metadata(sce)$BayesSpace.data$is.enhanced <- FALSE
Here, we set the basic paramters for our function SC.MEB
Here, we briefly explain these parameters. ‘singlece’ is a SingleCellExperiment object containing PCA and position informatin. ‘d’ is a integer specifying the dimension of PCA. The default is 15. ‘K’ is an integer vector specifying the numbers of mixture components (clusters) for which the BIC is to be calculated. The default is K = 2:9. ‘platform’ is the name of spatial transcriptomic platform. Specify ‘Visium’ for hex lattice geometry or ‘ST’ for square lattice geometry. Specifying this parameter is optional as this information is included in their metadata. ‘bet’ is a numeric vector specifying the smoothness of Random Markov Field. The default is seq(0,5,0.2). ‘maxIter_ICM’ is the maximum iteration of ICM algorithm. The default is 10. ‘maxIter’ is the maximum iteration of EM algorithm. The default is 50.
Finally, we run our model SC-MEB
by the function SC.MEB
.
out = SC.MEB(sce = singlece, d = d, K=K, bet=bet, platform = platform,
maxIter_ICM = maxIter_ICM, maxIter = maxIter)
#> Neighbors were identified for 2500 out of 2500 spots.
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> diff Energy = 2.964219
#> diff Energy = 7.248896
#> diff Energy = 1.459156
#> ICM Converged at Iteration = 5
#> diff Energy = 2.387050
#> ICM Converged at Iteration = 3
#> diff Energy = 1.500000
#> diff Energy = 5.500000
#> diff Energy = 28.389259
#> diff Energy = 24.379114
#> diff Energy = 26.922362
#> diff Energy = 12.987930
#> diff Energy = 16.038074
#> diff Energy = 18.050068
#> diff Energy = 20.026207
#> diff Energy = 19.036880
#> diff Energy = 63.693378
#> diff Energy = 67.233236
#> diff Energy = 29.418232
#> diff Energy = 44.952632
#> diff Energy = 21.305055
#> diff Energy = 53.354939
#> diff Energy = 49.944247
#> diff Energy = 62.416177
#> diff Energy = 69.183096
#> diff Energy = 77.268544
#> diff Energy = 77.911172
#> diff Energy = 73.760339
#> diff Energy = 98.626957
#> diff Energy = 118.154632
#> diff Energy = 81.667565
#> diff Energy = 41.437990
#> diff Energy = 49.620391
#> diff Energy = 49.388027
#> diff Energy = 37.440839
#> diff Energy = 28.246270
#> diff Energy = 27.679502
#> diff Energy = 18.885296
#> diff Energy = 27.592693
#> diff Energy = 41.402540
#> diff Energy = 44.703886
#> diff Energy = 32.473432
#> diff Energy = 83.150778
#> diff Energy = 35.793333
#> diff Energy = 39.447522
#> diff Energy = 27.375558
#> diff Energy = 29.128322
#> diff Energy = 38.513857
#> diff Energy = 2.135325
#> diff Energy = 4.354338
#> diff Energy = 23.993038
#> diff Energy = 4.470369
#> diff Energy = 27.171539
#> diff Energy = 32.134548
#> diff Energy = 20.125869
#> diff Energy = 27.719353
#> diff Energy = 30.235583
#> diff Energy = 30.182309
#> diff Energy = 30.135182
#> diff Energy = 30.098038
#> diff Energy = 30.069767
#> diff Energy = 30.048637
#> diff Energy = 30.033026
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> diff Energy = 3.184378
#> diff Energy = 20.034641
#> diff Energy = 4.904329
#> diff Energy = 5.154268
#> diff Energy = 3.878258
#> diff Energy = 0.624879
#> diff Energy = 1.208386
#> diff Energy = 1.764820
#> diff Energy = 5.844396
#> diff Energy = 2.804734
#> diff Energy = 31.845635
#> diff Energy = 13.173246
#> diff Energy = 4.959802
#> diff Energy = 8.722220
#> diff Energy = 14.693575
#> diff Energy = 9.888527
#> diff Energy = 55.940863
#> diff Energy = 39.883318
#> diff Energy = 37.679781
#> diff Energy = 13.747678
#> diff Energy = 14.179673
#> diff Energy = 31.627496
#> diff Energy = 30.412374
#> diff Energy = 30.249404
#> diff Energy = 30.155450
#> diff Energy = 28.554253
#> diff Energy = 34.583086
#> diff Energy = 37.588020
#> diff Energy = 37.588387
#> diff Energy = 0.558545
#> diff Energy = 82.532668
#> diff Energy = 60.096081
#> diff Energy = 8.263505
#> diff Energy = 24.594629
#> diff Energy = 15.931189
#> diff Energy = 22.945647
#> diff Energy = 41.900328
#> diff Energy = 55.608997
#> diff Energy = 50.200922
#> diff Energy = 60.808402
#> diff Energy = 59.342456
#> diff Energy = 51.259283
#> diff Energy = 14.828672
#> diff Energy = 106.018957
#> diff Energy = 28.427274
#> diff Energy = 28.344293
#> diff Energy = 10.872051
#> diff Energy = 10.009909
#> diff Energy = 16.045412
#> diff Energy = 12.535341
#> diff Energy = 16.495630
#> diff Energy = 20.293814
#> diff Energy = 22.985089
#> diff Energy = 16.311522
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> ICM Converged at Iteration = 2
#> diff Energy = 11.441283
#> diff Energy = 11.668752
#> diff Energy = 9.184994
#> diff Energy = 13.514333
#> diff Energy = 9.529188
#> diff Energy = 0.158076
#> ICM Converged at Iteration = 3
#> diff Energy = 0.633251
#> diff Energy = 4.560284
#> diff Energy = 8.178759
#> diff Energy = 4.654479
#> diff Energy = 10.096229
#> diff Energy = 11.972263
#> diff Energy = 20.333914
#> diff Energy = 30.238293
#> diff Energy = 1.345019
#> diff Energy = 2.801115
#> diff Energy = 1.295549
#> diff Energy = 2.520804
#> diff Energy = 5.199557
#> diff Energy = 5.760384
#> diff Energy = 5.393622
#> diff Energy = 64.348873
#> diff Energy = 54.697363
#> diff Energy = 26.919711
#> diff Energy = 16.501381
#> diff Energy = 20.227417
#> diff Energy = 19.426186
#> diff Energy = 32.612922
#> diff Energy = 22.563291
#> diff Energy = 28.669585
#> diff Energy = 104.324066
#> diff Energy = 27.066394
#> diff Energy = 26.678217
#> diff Energy = 32.357453
#> diff Energy = 17.194809
#> diff Energy = 16.907446
#> diff Energy = 9.763298
#> diff Energy = 83.250552
#> diff Energy = 0.548617
#> diff Energy = 10.060693
#> diff Energy = 83.705504
#> diff Energy = 2.483507
#> diff Energy = 26.006222
#> diff Energy = 14.077007
#> diff Energy = 3.390681
#> diff Energy = 67.503274
#> diff Energy = 22.559006
#> diff Energy = 51.791649
#> diff Energy = 32.361058
#> diff Energy = 59.183905
#> diff Energy = 4.239996
#> diff Energy = 86.095106
#> diff Energy = 3.341574
#> diff Energy = 90.413300
#> diff Energy = 10.203158
#> diff Energy = 80.536149
#> diff Energy = 8.885630
#> diff Energy = 91.651611
#> diff Energy = 3.897779
str(out)
#> List of 16
#> $ best_K : int 4
#> $ best_beta : num 2
#> $ best_cluster_label: num [1:2500] 4 3 3 3 4 4 3 1 2 2 ...
#> $ best_BIC : num -202250
#> $ best_ell : num 99013
#> $ best_mu : num [1:15, 1:4] -1.25 -1.68 -1.23 -1.2 -1.28 ...
#> $ best_sigma : num [1:15, 1:15, 1:4] 13.10203 0.54826 1.60726 -0.00717 0.29732 ...
#> $ best_gam : num [1:2500, 1:4] 0.00163 0.01986 0.01624 0.00717 0.00291 ...
#> $ cluster_label : num [1:2500, 1:6, 1:3] 4 4 3 3 4 2 3 2 2 2 ...
#> $ BIC : num [1:3, 1:6] -203819 -204602 -205392 -202664 -203388 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:3] "K = 4" "K = 5" "K = 6"
#> .. ..$ : chr [1:6] "beta = 0" "beta = 1" "beta = 2" "beta = 3" ...
#> $ ell : num [1:3, 1:6] 99797 99660 99527 99219 99053 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:3] "K = 4" "K = 5" "K = 6"
#> .. ..$ : chr [1:6] "beta = 0" "beta = 1" "beta = 2" "beta = 3" ...
#> $ mu :List of 3
#> ..$ : num [1:15, 1:4, 1:6] -1.1 -1.59 -1.33 -1.15 -1.22 ...
#> ..$ : num [1:15, 1:5, 1:6] -1.46 -2.03 -1.66 -1.42 -1.26 ...
#> ..$ : num [1:15, 1:6, 1:6] -1.42 -2.22 -1.81 -1.66 -1.3 ...
#> $ sigma :List of 3
#> ..$ : num [1:15, 1:15, 1:4, 1:6] 13.709 0.935 1.475 0.426 0.572 ...
#> ..$ : num [1:15, 1:15, 1:5, 1:6] 13.854 0.348 0.889 -0.146 0.9 ...
#> ..$ : num [1:15, 1:15, 1:6, 1:6] 13.1556 0.0639 1.0285 -0.2397 1.108 ...
#> $ gam :List of 3
#> ..$ : num [1:2500, 1:4, 1:6] 0.0042 0.0773 0.2281 0.0963 0.012 ...
#> ..$ : num [1:2500, 1:5, 1:6] 0.000422 0.007368 0.072444 0.043942 0.001958 ...
#> ..$ : num [1:2500, 1:6, 1:6] 8.19e-05 7.88e-03 5.97e-02 2.38e-02 5.53e-04 ...
#> $ pxgn :List of 3
#> ..$ : num [1:2500, 1:4, 1:6] 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 ...
#> ..$ : num [1:2500, 1:5, 1:6] 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 ...
#> ..$ : num [1:2500, 1:6, 1:6] 0.167 0.167 0.167 0.167 0.167 ...
#> $ pygx :List of 3
#> ..$ : num [1:2500, 1:4, 1:6] 40.4 39.2 38.1 48.4 42 ...
#> ..$ : num [1:2500, 1:5, 1:6] 42.7 39.9 39.4 48.6 43.7 ...
#> ..$ : num [1:2500, 1:6, 1:6] 44 40.3 39.3 49.2 45 ...
Here, We briefly explain the output of the SC.MEB
.
The item ‘best_K’ is the optimal K we choose according to BIC rule.
The item ‘best_beta’ is also the optimal beta we choose according to BIC rule.
The item ‘best_cluster_label’ is the optimal clustering result corresponding to optimal K and optimal beta.
The item ‘best_BIC’ is the optimal BIC corresponding to optimal K and optimal beta.
The item ‘best_ell’ is the optimal opposite log-likelihood corresponding to optimal K and optimal beta.
The item ‘best_mu’ is the optimal mean for each component corresponding to optimal K and optimal beta.
item ‘best_sigma’ is the optimal variance for each component corresponding to optimal K nd optimal beta.
The item ‘best_gam’ is the optimal posterior probability matrix corresponding to optimal K and optimal beta.
The item ‘cluster_label’ is 3-dimensional n\(\times\)b\(\times\)q matrix, storing all clustering results for each K and beta. n is the number of cells, b is the length of vector ‘bet’, q is the length of vector ‘K’.
The item ‘BIC’ contains all BIC value for each K and beta.
The item ‘ell’ is the opposite log-likelihood for each beta and K.
The item ‘mu’ is the mean of each component for each beta and K.
The item ‘sigma’ is the variance of each component for each beta and K.
The item ‘gam’ is the posterior probability for each beta and K.