The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction to iClusterVB

iClusterVB

iClusterVB allows for fast integrative clustering and feature selection for high dimensional data.

Using a variational Bayes approach, its key features - clustering of mixed-type data, automated determination of the number of clusters, and feature selection in high-dimensional settings - address the limitations of traditional clustering methods while offering an alternative and potentially faster approach than MCMC algorithms, making iClusterVB a valuable tool for contemporary data analysis challenges.

There is a simulated dataset included as a list in the package that we can use to illustrate iClusterVB.

Data pre-processing

library(iClusterVB)

# sim_data comes with the iClusterVB package.
dat1 <- list(
  gauss_1 = sim_data$continuous1_data[c(1:20, 61:80, 121:140, 181:200), 1:75],
  gauss_2 = sim_data$continuous2_data[c(1:20, 61:80, 121:140, 181:200), 1:75],
  poisson_1 = sim_data$count_data[c(1:20, 61:80, 121:140, 181:200), 1:75])


dist <- c(
  "gaussian", "gaussian",
  "poisson"
)

Running the model

fit_iClusterVB <- iClusterVB(
  mydata = dat1,
  dist = dist,
  K = 4,
  initial_method = "VarSelLCM",
  VS_method = 1,
  max_iter = 50
)
#> ------------------------------------------------------------
#> Pre-processing and initializing the model
#> ------------------------------------------------------------
#> 
#> ------------------------------------------------------------
#> Running the CAVI algorithm
#> ------------------------------------------------------------
#> iteration = 10 elbo = -1741136.592330  
#> iteration = 20 elbo = -1685743.635713  
#> iteration = 30 elbo = -1633079.122060  
#> iteration = 40 elbo = -1602203.068229  
#> iteration = 50 elbo = -1587608.710391

Summary of the Model

# We can obtain a summary using summary()
summary(fit_iClusterVB)
#> Total number of individuals:
#> [1] 80
#> 
#> User-inputted maximum number of clusters: 4
#> Number of clusters determined by algorithm: 4
#> 
#> Cluster Membership:
#>  1  2  3  4 
#> 20 20 20 20 
#> 
#> # of variables above the posterior inclusion probability of 0.5 for View 1 - gaussian
#> [1] "50 out of a total of 75"
#> 
#> # of variables above the posterior inclusion probability of 0.5 for View 2 - gaussian
#> [1] "51 out of a total of 75"
#> 
#> # of variables above the posterior inclusion probability of 0.5 for View 3 - poisson
#> [1] "52 out of a total of 75"

Generic Plots

plot(fit_iClusterVB)

Probability of Inclusion Plots

# The `piplot` function can be used to visualize the probability of inclusion

piplot(fit_iClusterVB)

Heat maps to visualize the clusters

# The `chmap` function can be used to display heat maps for each data view

chmap(fit_iClusterVB, rho = 0,
      cols = c("green", "blue",
               "purple", "red"),
      scale = "none")

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.