A quick tour of ppgmmga

Alessio Serafini, Luca Scrucca

31 May 2019

Introduction

An R package implementing a Projection Pursuit algorithm based on finite Gaussian Mixtures Models for density estimation using Genetic Algorithms (PPGMMGA) to maximise an approximated Negentropy index. The ppgmmga algorithm provides a method to visualise high-dimensional data in a lower-dimensional space, with special reference to reveal clustering structures.

library(ppgmmga)
##    ___  ___  ___ ___ _  __ _  ___ ____ _
##   / _ \/ _ \/ _ `/  ' \/  ' \/ _ `/ _ `/
##  / .__/ .__/\_, /_/_/_/_/_/_/\_, /\_,_/ 
## /_/  /_/   /___/            /___/       version 1.1

Banknote data

library(mclust)
## Package 'mclust' version 5.4.3
## Type 'citation("mclust")' for citing this R package in publications.
data("banknote")
X <- banknote[,-1]
Class <- banknote$Status
table(Class)
## Class
## counterfeit     genuine 
##         100         100
clPairs(X, classification = Class)

1-dimensional ppgmmga

pp1D <- ppgmmga(data = X, d = 1, approx = "UT", seed = 1)
pp1D
## Call:
## ppgmmga(data = X, d = 1, approx = "UT", seed = 1)
## 
## 'ppgmmga' object containing: 
## [1] "data"       "d"          "approx"     "GMM"        "GA"        
## [6] "Negentropy" "basis"      "Z"
summary(pp1D)
## ── ppgmmga ───────────────────────────── 
## 
## Data dimensions               = 200 x 6 
## Data transformation           = center & scale 
## Projection subspace dimension = 1 
## GMM density estimate          = (VEE,4)
## Negentropy approximation      = UT 
## GA optimal negentropy         = 0.6345935 
## GA encoded basis solution: 
##            x1       x2       x3        x4       x5
## [1,] 3.268902 2.373044 1.051365 0.3131285 0.531718
## 
## Estimated projection basis: 
##                  PP1
## Length   -0.01196531
## Left     -0.09347750
## Right     0.16021052
## Bottom    0.57406981
## Top       0.34503463
## Diagonal -0.71892026
plot(pp1D)

plot(pp1D, class = Class)

2-dimensional ppgmmga

pp2D <- ppgmmga(data = X, d = 2, approx = "UT", seed = 1)
summary(pp2D, check = TRUE)
## ── ppgmmga ───────────────────────────── 
## 
## Data dimensions               = 200 x 6 
## Data transformation           = center & scale 
## Projection subspace dimension = 2 
## GMM density estimate          = (VEE,4)
## Negentropy approximation      = UT 
## GA optimal negentropy         = 1.13624 
## GA encoded basis solution: 
##            x1       x2       x3       x4        x5      x6        x7
## [1,] 2.268667 2.929821 1.061407 1.084929 0.3044298 3.85462 0.9832903
##           x8        x9      x10
## [1,] 1.11377 0.1671738 1.668403
## 
## Estimated projection basis: 
##                  PP1         PP2
## Length   -0.03726866 -0.07183191
## Left      0.03125553 -0.11981164
## Right    -0.15480788  0.06300918
## Bottom   -0.08569311  0.86390485
## Top      -0.10249897  0.46037272
## Diagonal  0.97766012  0.13505761
## 
## Monte Carlo Negentropy approximation check: 
##                            UT
## Approx Negentropy 1.136240194
## MC Negentropy     1.137260367
## MC se             0.003527379
## Relative accuracy 0.999102956
summary(pp2D$GMM)
## ------------------------------------------------------- 
## Density estimation via Gaussian finite mixture modeling 
## ------------------------------------------------------- 
## 
## Mclust VEE (ellipsoidal, equal shape and orientation) model with 4
## components: 
## 
##  log-likelihood   n df       BIC       ICL
##       -1191.595 200 51 -2653.405 -2666.898
## 
## Clustering table:
##  1  2  3  4 
## 16 99 47 38
plot(pp2D$GA)

plot(pp2D)

plot(pp2D, class = Class, drawAxis = FALSE)

3-dimensional ppgmmga

gmm <- densityMclust(data = scale(X, center = TRUE, scale = FALSE), G = 2)
pp3D <- ppgmmga(data = X, d = 3, 
                center = TRUE, scale = FALSE, gmm = gmm, 
                gatype = "gaisl", 
                options = ppgmmga.options(numIslands = 2),
                seed = 1)
summary(pp3D$GA)
## ── Islands Genetic Algorithm ─────────── 
## 
## GA settings: 
## Type                  =  real-valued 
## Number of islands     =  2 
## Islands pop. size     =  50 
## Migration rate        =  0.1 
## Migration interval    =  10 
## Elitism               =  1 
## Crossover probability =  0.8 
## Mutation probability  =  0.1 
## Search domain = 
##             x1       x2       x3       x4       x5       x6       x7
## lower 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## upper 6.283185 3.141593 3.141593 3.141593 3.141593 6.283185 3.141593
##             x8       x9      x10  ...       x14      x15
## lower 0.000000 0.000000 0.000000       0.000000 0.000000
## upper 3.141593 3.141593 3.141593       3.141593 3.141593
## 
## GA results: 
## Iterations              = 170 
## Epochs                  = 17 
## Fitness function values = 0.8572447 0.8572447 
## Solutions = 
##             x1       x2       x3       x4        x5       x6       x7
## [1,] 0.9884973 1.570908 1.110967 1.281758 0.8394515 6.213755 1.144124
## [2,] 0.9884973 1.570908 1.110967 1.281758 0.8394515 6.213755 1.144124
##           x8       x9      x10  ...       x14       x15
## [1,] 2.17272 2.425498 2.515146       2.423362 0.6028533
## [2,] 2.17272 2.425498 2.515146       2.423362 0.6028533
plot(pp3D$GA)

plot(pp3D)

plot(pp3D, class = Class)

plot(pp3D, dim = c(1,2))

plot(pp3D, dim = c(1,3), class = Class)


References

Scrucca L, Serafini A (2019). “Projection pursuit based on Gaussian mixtures and evolutionary algorithms.” Journal of Computational and Graphical Statistics. doi: 10.1080/10618600.2019.1598871 (URL: https://doi.org/10.1080/10618600.2019.1598871).


sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] mclust_5.4.3 ppgmmga_1.1  knitr_1.22  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.1       GA_3.2           compiler_3.6.0   pillar_1.4.0    
##  [5] plyr_1.8.4       iterators_1.0.10 tools_3.6.0      digest_0.6.18   
##  [9] evaluate_0.13    tibble_2.1.1     gtable_0.3.0     pkgconfig_2.0.2 
## [13] rlang_0.3.4      foreach_1.4.4    cli_1.1.0        yaml_2.2.0      
## [17] xfun_0.7         stringr_1.4.0    dplyr_0.8.1      grid_3.6.0      
## [21] tidyselect_0.2.5 glue_1.3.1       R6_2.4.0         rmarkdown_1.12  
## [25] ggplot2_3.1.1    purrr_0.3.2      magrittr_1.5     scales_1.0.0    
## [29] codetools_0.2-16 htmltools_0.3.6  ggthemes_4.2.0   assertthat_0.2.1
## [33] colorspace_1.4-1 labeling_0.3     stringi_1.4.3    lazyeval_0.2.2  
## [37] munsell_0.5.0    crayon_1.3.4