An R package implementing a Projection Pursuit algorithm based on finite Gaussian Mixtures Models for density estimation using Genetic Algorithms (PPGMMGA) to maximise an approximated Negentropy index. The ppgmmga algorithm provides a method to visualise high-dimensional data in a lower-dimensional space, with special reference to reveal clustering structures.
library(mclust)
## Package 'mclust' version 5.4.3
## Type 'citation("mclust")' for citing this R package in publications.
data("banknote")
X <- banknote[,-1]
Class <- banknote$Status
table(Class)
## Class
## counterfeit genuine
## 100 100
clPairs(X, classification = Class)
pp1D <- ppgmmga(data = X, d = 1, approx = "UT", seed = 1)
pp1D
## Call:
## ppgmmga(data = X, d = 1, approx = "UT", seed = 1)
##
## 'ppgmmga' object containing:
## [1] "data" "d" "approx" "GMM" "GA"
## [6] "Negentropy" "basis" "Z"
summary(pp1D)
## ── ppgmmga ─────────────────────────────
##
## Data dimensions = 200 x 6
## Data transformation = center & scale
## Projection subspace dimension = 1
## GMM density estimate = (VEE,4)
## Negentropy approximation = UT
## GA optimal negentropy = 0.6345935
## GA encoded basis solution:
## x1 x2 x3 x4 x5
## [1,] 3.268902 2.373044 1.051365 0.3131285 0.531718
##
## Estimated projection basis:
## PP1
## Length -0.01196531
## Left -0.09347750
## Right 0.16021052
## Bottom 0.57406981
## Top 0.34503463
## Diagonal -0.71892026
pp2D <- ppgmmga(data = X, d = 2, approx = "UT", seed = 1)
summary(pp2D, check = TRUE)
## ── ppgmmga ─────────────────────────────
##
## Data dimensions = 200 x 6
## Data transformation = center & scale
## Projection subspace dimension = 2
## GMM density estimate = (VEE,4)
## Negentropy approximation = UT
## GA optimal negentropy = 1.13624
## GA encoded basis solution:
## x1 x2 x3 x4 x5 x6 x7
## [1,] 2.268667 2.929821 1.061407 1.084929 0.3044298 3.85462 0.9832903
## x8 x9 x10
## [1,] 1.11377 0.1671738 1.668403
##
## Estimated projection basis:
## PP1 PP2
## Length -0.03726866 -0.07183191
## Left 0.03125553 -0.11981164
## Right -0.15480788 0.06300918
## Bottom -0.08569311 0.86390485
## Top -0.10249897 0.46037272
## Diagonal 0.97766012 0.13505761
##
## Monte Carlo Negentropy approximation check:
## UT
## Approx Negentropy 1.136240194
## MC Negentropy 1.137260367
## MC se 0.003527379
## Relative accuracy 0.999102956
summary(pp2D$GMM)
## -------------------------------------------------------
## Density estimation via Gaussian finite mixture modeling
## -------------------------------------------------------
##
## Mclust VEE (ellipsoidal, equal shape and orientation) model with 4
## components:
##
## log-likelihood n df BIC ICL
## -1191.595 200 51 -2653.405 -2666.898
##
## Clustering table:
## 1 2 3 4
## 16 99 47 38
gmm <- densityMclust(data = scale(X, center = TRUE, scale = FALSE), G = 2)
pp3D <- ppgmmga(data = X, d = 3,
center = TRUE, scale = FALSE, gmm = gmm,
gatype = "gaisl",
options = ppgmmga.options(numIslands = 2),
seed = 1)
summary(pp3D$GA)
## ── Islands Genetic Algorithm ───────────
##
## GA settings:
## Type = real-valued
## Number of islands = 2
## Islands pop. size = 50
## Migration rate = 0.1
## Migration interval = 10
## Elitism = 1
## Crossover probability = 0.8
## Mutation probability = 0.1
## Search domain =
## x1 x2 x3 x4 x5 x6 x7
## lower 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## upper 6.283185 3.141593 3.141593 3.141593 3.141593 6.283185 3.141593
## x8 x9 x10 ... x14 x15
## lower 0.000000 0.000000 0.000000 0.000000 0.000000
## upper 3.141593 3.141593 3.141593 3.141593 3.141593
##
## GA results:
## Iterations = 170
## Epochs = 17
## Fitness function values = 0.8572447 0.8572447
## Solutions =
## x1 x2 x3 x4 x5 x6 x7
## [1,] 0.9884973 1.570908 1.110967 1.281758 0.8394515 6.213755 1.144124
## [2,] 0.9884973 1.570908 1.110967 1.281758 0.8394515 6.213755 1.144124
## x8 x9 x10 ... x14 x15
## [1,] 2.17272 2.425498 2.515146 2.423362 0.6028533
## [2,] 2.17272 2.425498 2.515146 2.423362 0.6028533
Scrucca L, Serafini A (2019). “Projection pursuit based on Gaussian mixtures and evolutionary algorithms.” Journal of Computational and Graphical Statistics. doi: 10.1080/10618600.2019.1598871 (URL: https://doi.org/10.1080/10618600.2019.1598871).
sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] mclust_5.4.3 ppgmmga_1.1 knitr_1.22
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.1 GA_3.2 compiler_3.6.0 pillar_1.4.0
## [5] plyr_1.8.4 iterators_1.0.10 tools_3.6.0 digest_0.6.18
## [9] evaluate_0.13 tibble_2.1.1 gtable_0.3.0 pkgconfig_2.0.2
## [13] rlang_0.3.4 foreach_1.4.4 cli_1.1.0 yaml_2.2.0
## [17] xfun_0.7 stringr_1.4.0 dplyr_0.8.1 grid_3.6.0
## [21] tidyselect_0.2.5 glue_1.3.1 R6_2.4.0 rmarkdown_1.12
## [25] ggplot2_3.1.1 purrr_0.3.2 magrittr_1.5 scales_1.0.0
## [29] codetools_0.2-16 htmltools_0.3.6 ggthemes_4.2.0 assertthat_0.2.1
## [33] colorspace_1.4-1 labeling_0.3 stringi_1.4.3 lazyeval_0.2.2
## [37] munsell_0.5.0 crayon_1.3.4