Clustering Package

Luis Alfonso Pérez Martos

2020-04-11

Clustering is considered as a concise data model by which from a set of data we must partition them and introduce them in data groups, which are ́an as similar as possible. If review all clustering algorithm implements in R, can see a great number of packages that implement or improve algorithm or functionality.

The Clustering package contain multiply implementations of algorithms like: gmm, kmeans-arma, kmeans-rcpp, fuzzy_cm, fuzzy_gg, fuzzy_gk, hclust, apclusterk,aggExcluster,clara, daisy, diana,fanny,gama,mona,pam, pvclust,pvpick.

Also can use differents similarity measures to calculate the distance between points like: Euclidean, Manhattan, Jaccard, Gower, Mahalanobis, Correlation and Minkowski.

Furthermore, the package offers functions to:

Clustering

It’s the main method of the package.Clustering method processes a set of clustering algorithms. If we need to get information about the parameters that the method has we can do so by using the ?function or help(function). The way to load the datasets can be done in two different ways:

Once the method has been executed, we obtain the results divided into four parts:


df <- Clustering::clustering(df = Clustering::basketball,  
                             packages = c("clusterr"), min = 4, max = 6)

Here we have a dataframe with the result of the execution. In it you can see all the algorithms, the similarity measures used, the variables classified in order of importance, the execution time of the algorithms and the evaluation metrics.

Algorithm Distance Clusters Dataset Ranking timeExternal entropy variation_information precision recall f_measure fowlkes_mallows_index connectivity dunn silhouette timeInternal
gmm gmm_euclidean 4 dataframe 1 0.0181 0.3161 4.7617 0.1822 0.451 0.2595 0.2867 34.0929 0.1646 0.23 0.0065
gmm gmm_euclidean 4 dataframe 2 0.0187 0.3085 4.7409 0.1113 0.4005 0.1742 0.2111 34.0929 0.1646 0.23 0.0068
gmm gmm_euclidean 4 dataframe 3 0.1438 0.0064 4.72 0 0 0 0 34.0929 0.1646 0.23 0.0082
gmm gmm_euclidean 4 dataframe 4 0.1804 0.0032 4.1425 0 0 0 0 34.0929 0.1646 0.23 0.0084
gmm gmm_euclidean 4 dataframe 5 0.3559 0 3.6709 0 0 0 0 34.0929 0.1646 0.23 0.0098
gmm gmm_euclidean 5 dataframe 1 0.0174 0.4175 4.3626 0.1637 0.2865 0.2084 0.2165 42.0794 0.1619 0.25 0.0065
gmm gmm_euclidean 5 dataframe 2 0.0244 0.3857 4.3463 0.1109 0.2823 0.1592 0.1769 42.0794 0.1619 0.25 0.0066
gmm gmm_euclidean 5 dataframe 3 0.1424 0.0064 4.3418 0 0 0 0 42.0794 0.1619 0.25 0.0067
gmm gmm_euclidean 5 dataframe 4 0.1434 0.0032 4.321 0 0 0 0 42.0794 0.1619 0.25 0.0081
gmm gmm_euclidean 5 dataframe 5 0.1467 0 4.0224 0 0 0 0 42.0794 0.1619 0.25 0.0128
gmm gmm_euclidean 6 dataframe 1 0.019 0.433 4.4385 0.1744 0.2791 0.2147 0.2206 51.4599 0.1619 0.23 0.0063
gmm gmm_euclidean 6 dataframe 2 0.0233 0.4209 4.1795 0.1062 0.2473 0.1486 0.1621 51.4599 0.1619 0.23 0.0063
gmm gmm_euclidean 6 dataframe 3 0.142 0.0064 4.1586 0 0 0 0 51.4599 0.1619 0.23 0.0064
gmm gmm_euclidean 6 dataframe 4 0.149 0.0032 4.1378 0 0 0 0 51.4599 0.1619 0.23 0.0065
gmm gmm_euclidean 6 dataframe 5 0.1552 0 3.954 0 0 0 0 51.4599 0.1619 0.23 0.0065
gmm gmm_manhattan 4 dataframe 1 0.0135 0.3161 4.7617 0.1822 0.451 0.2595 0.2867 35.5869 0.1348 0.23 0.0064
gmm gmm_manhattan 4 dataframe 2 0.0169 0.3085 4.7409 0.1113 0.4005 0.1742 0.2111 35.5869 0.1348 0.23 0.0064
gmm gmm_manhattan 4 dataframe 3 0.1378 0.0064 4.72 0 0 0 0 35.5869 0.1348 0.23 0.0065
gmm gmm_manhattan 4 dataframe 4 0.1432 0.0032 4.1425 0 0 0 0 35.5869 0.1348 0.23 0.0065
gmm gmm_manhattan 4 dataframe 5 0.1434 0 3.6709 0 0 0 0 35.5869 0.1348 0.23 0.0065
gmm gmm_manhattan 5 dataframe 1 0.0162 0.4258 4.3496 0.167 0.2828 0.21 0.2173 46.8306 0.1322 0.26 0.0064
gmm gmm_manhattan 5 dataframe 2 0.0195 0.3892 4.3379 0.1114 0.2742 0.1584 0.1747 46.8306 0.1322 0.26 0.0064
gmm gmm_manhattan 5 dataframe 3 0.141 0.0064 4.3171 0 0 0 0 46.8306 0.1322 0.26 0.0064
gmm gmm_manhattan 5 dataframe 4 0.1456 0.0032 4.2962 0 0 0 0 46.8306 0.1322 0.26 0.0064
gmm gmm_manhattan 5 dataframe 5 0.1505 0 4.0593 0 0 0 0 46.8306 0.1322 0.26 0.0065
gmm gmm_manhattan 6 dataframe 1 0.0178 0.4555 4.2975 0.1669 0.2608 0.2035 0.2085 54.8667 0.1467 0.25 0.0064
gmm gmm_manhattan 6 dataframe 2 0.0211 0.4052 4.1608 0.1148 0.2606 0.1594 0.173 54.8667 0.1467 0.25 0.0065
gmm gmm_manhattan 6 dataframe 3 0.1423 0.0064 4.14 0 0 0 0 54.8667 0.1467 0.25 0.0068
gmm gmm_manhattan 6 dataframe 4 0.1465 0.0032 4.1191 0 0 0 0 54.8667 0.1467 0.25 0.0068
gmm gmm_manhattan 6 dataframe 5 0.1474 0 4.102 0 0 0 0 54.8667 0.1467 0.25 0.007
kmeans_arma kmeans_arma 4 dataframe 1 4e-04 0 0 0 0 0 0 44.2103 0.1495 0.23 0.0063
kmeans_arma kmeans_arma 4 dataframe 2 4e-04 0 0 0 0 0 0 44.2103 0.1495 0.23 0.0066
kmeans_arma kmeans_arma 4 dataframe 3 5e-04 0 0 0 0 0 0 44.2103 0.1495 0.23 0.0066
kmeans_arma kmeans_arma 4 dataframe 4 6e-04 0 0 0 0 0 0 44.2103 0.1495 0.23 0.007
kmeans_arma kmeans_arma 4 dataframe 5 9e-04 0 0 0 0 0 0 44.2103 0.1495 0.23 0.0074
kmeans_arma kmeans_arma 5 dataframe 1 3e-04 0 0 0 0 0 0 49.2159 0.1538 0.26 0.0066
kmeans_arma kmeans_arma 5 dataframe 2 4e-04 0 0 0 0 0 0 49.2159 0.1538 0.26 0.0066
kmeans_arma kmeans_arma 5 dataframe 3 4e-04 0 0 0 0 0 0 49.2159 0.1538 0.26 0.0067
kmeans_arma kmeans_arma 5 dataframe 4 4e-04 0 0 0 0 0 0 49.2159 0.1538 0.26 0.007
kmeans_arma kmeans_arma 5 dataframe 5 4e-04 0 0 0 0 0 0 49.2159 0.1538 0.26 0.0071
kmeans_arma kmeans_arma 6 dataframe 1 4e-04 0 0 0 0 0 0 57.6278 0.1619 0.24 0.007
kmeans_arma kmeans_arma 6 dataframe 2 4e-04 0 0 0 0 0 0 57.6278 0.1619 0.24 0.007
kmeans_arma kmeans_arma 6 dataframe 3 4e-04 0 0 0 0 0 0 57.6278 0.1619 0.24 0.0071
kmeans_arma kmeans_arma 6 dataframe 4 5e-04 0 0 0 0 0 0 57.6278 0.1619 0.24 0.0071
kmeans_arma kmeans_arma 6 dataframe 5 5e-04 0 0 0 0 0 0 57.6278 0.1619 0.24 0.009
kmeans_rcpp kmeans_rcpp 4 dataframe 1 0.013 0.3728 4.6267 0.1697 0.5 0.23 0.2461 51.0405 0.1741 0.23 0.0062
kmeans_rcpp kmeans_rcpp 4 dataframe 2 0.0176 0.3494 4.6058 0.1003 0.3567 0.1511 0.1753 51.0405 0.1741 0.23 0.0064
kmeans_rcpp kmeans_rcpp 4 dataframe 3 0.1355 0.0032 4.6058 9e-04 0.3065 0.0018 0.021 51.0405 0.1741 0.23 0.0064
kmeans_rcpp kmeans_rcpp 4 dataframe 4 0.1454 0.0032 4.5308 0 0 0 0 51.0405 0.1741 0.23 0.0065
kmeans_rcpp kmeans_rcpp 4 dataframe 5 0.1501 0 3.8037 0 0 0 0 51.0405 0.1741 0.23 0.0066
kmeans_rcpp kmeans_rcpp 5 dataframe 1 0.0152 0.4269 4.5505 0.1663 0.5 0.2104 0.2183 66.8492 0.152 0.19 0.0062
kmeans_rcpp kmeans_rcpp 5 dataframe 2 0.0197 0.4135 4.3288 0.1019 0.2865 0.1457 0.1613 66.8492 0.152 0.19 0.0063
kmeans_rcpp kmeans_rcpp 5 dataframe 3 0.1393 0.0032 4.308 0.0011 0.2554 0.0022 0.0232 66.8492 0.152 0.19 0.0063
kmeans_rcpp kmeans_rcpp 5 dataframe 4 0.1412 0.0032 4.308 0 0 0 0 66.8492 0.152 0.19 0.0064
kmeans_rcpp kmeans_rcpp 5 dataframe 5 0.146 0 4.0788 0 0 0 0 66.8492 0.152 0.19 0.0067
kmeans_rcpp kmeans_rcpp 6 dataframe 1 0.0161 0.4545 4.3312 0.1703 0.2458 0.2012 0.2046 74.7754 0.1522 0.19 0.0062
kmeans_rcpp kmeans_rcpp 6 dataframe 2 0.0204 0.4169 4.1035 0.1152 0.2419 0.1561 0.167 74.7754 0.1522 0.19 0.0062
kmeans_rcpp kmeans_rcpp 6 dataframe 3 0.1408 0.0064 4.0827 0 0 0 0 74.7754 0.1522 0.19 0.0062
kmeans_rcpp kmeans_rcpp 6 dataframe 4 0.149 0.0032 4.0619 0 0 0 0 74.7754 0.1522 0.19 0.0065
kmeans_rcpp kmeans_rcpp 6 dataframe 5 0.1496 0 4.0375 0 0 0 0 74.7754 0.1522 0.19 0.0067
mini_kmeans mini_kmeans 4 dataframe 1 5e-04 0 0 0 0 0 0 50.3528 0.1571 0.21 0.0061
mini_kmeans mini_kmeans 4 dataframe 2 5e-04 0 0 0 0 0 0 50.3528 0.1571 0.21 0.0063
mini_kmeans mini_kmeans 4 dataframe 3 5e-04 0 0 0 0 0 0 50.3528 0.1571 0.21 0.0066
mini_kmeans mini_kmeans 4 dataframe 4 5e-04 0 0 0 0 0 0 50.3528 0.1571 0.21 0.0066
mini_kmeans mini_kmeans 4 dataframe 5 9e-04 0 0 0 0 0 0 50.3528 0.1571 0.21 0.0067
mini_kmeans mini_kmeans 5 dataframe 1 5e-04 0 0 0 0 0 0 76.3976 0.1216 0.17 0.0065
mini_kmeans mini_kmeans 5 dataframe 2 5e-04 0 0 0 0 0 0 76.3976 0.1216 0.17 0.0065
mini_kmeans mini_kmeans 5 dataframe 3 5e-04 0 0 0 0 0 0 76.3976 0.1216 0.17 0.0065
mini_kmeans mini_kmeans 5 dataframe 4 5e-04 0 0 0 0 0 0 76.3976 0.1216 0.17 0.0066
mini_kmeans mini_kmeans 5 dataframe 5 5e-04 0 0 0 0 0 0 76.3976 0.1216 0.17 0.0067
mini_kmeans mini_kmeans 6 dataframe 1 6e-04 0 0 0 0 0 0 76.5341 0.15 0.17 0.007
mini_kmeans mini_kmeans 6 dataframe 2 6e-04 0 0 0 0 0 0 76.5341 0.15 0.17 0.0071
mini_kmeans mini_kmeans 6 dataframe 3 6e-04 0 0 0 0 0 0 76.5341 0.15 0.17 0.0072
mini_kmeans mini_kmeans 6 dataframe 4 6e-04 0 0 0 0 0 0 76.5341 0.15 0.17 0.0075
mini_kmeans mini_kmeans 6 dataframe 5 7e-04 0 0 0 0 0 0 76.5341 0.15 0.17 0.0081

This property tells us if we have made an internal evaluation of the groups

#> [1] TRUE

This property tells us if we have made an external evaluation of the groups

#> [1] TRUE

Algorithms executed

#> [1] "gmm"         "kmeans_arma" "kmeans_rcpp" "mini_kmeans"

Similarity Metrics

#> [1] "gmm_euclidean" "gmm_manhattan" "kmeans_arma"   "kmeans_rcpp"  
#> [5] "mini_kmeans"

If we want to obtain the classified variables instead of the values we must use the variable property


df_variable <- Clustering::clustering(df = Clustering::basketball,  
                             packages = c("clusterr"), min = 4, max = 6, variables = TRUE)
Algorithm Distance Clusters Dataset Ranking timeExternal entropy variation_information precision recall f_measure fowlkes_mallows_index connectivity dunn silhouette timeInternal
gmm gmm_euclidean 4 dataframe 1 5 2 3 2 2 2 2 1 1 1 3
gmm gmm_euclidean 4 dataframe 2 1 4 5 4 4 4 4 2 2 2 5
gmm gmm_euclidean 4 dataframe 3 4 3 1 1 1 1 1 3 3 3 4
gmm gmm_euclidean 4 dataframe 4 2 5 4 3 3 3 3 4 4 4 1
gmm gmm_euclidean 4 dataframe 5 3 1 2 5 5 5 5 5 5 5 2
gmm gmm_euclidean 5 dataframe 1 5 2 3 2 2 2 2 1 1 1 3
gmm gmm_euclidean 5 dataframe 2 1 4 4 4 4 4 4 2 2 2 5
gmm gmm_euclidean 5 dataframe 3 4 3 5 1 1 1 1 3 3 3 2
gmm gmm_euclidean 5 dataframe 4 2 5 1 3 3 3 3 4 4 4 4
gmm gmm_euclidean 5 dataframe 5 3 1 2 5 5 5 5 5 5 5 1
gmm gmm_euclidean 6 dataframe 1 3 2 4 2 2 2 2 1 1 1 1
gmm gmm_euclidean 6 dataframe 2 1 4 3 4 4 4 4 2 2 2 3
gmm gmm_euclidean 6 dataframe 3 5 3 5 1 1 1 1 3 3 3 2
gmm gmm_euclidean 6 dataframe 4 2 5 1 3 3 3 3 4 4 4 4
gmm gmm_euclidean 6 dataframe 5 4 1 2 5 5 5 5 5 5 5 5
gmm gmm_manhattan 4 dataframe 1 3 2 3 2 2 2 2 1 1 1 5
gmm gmm_manhattan 4 dataframe 2 1 4 5 4 4 4 4 2 2 2 4
gmm gmm_manhattan 4 dataframe 3 4 3 1 1 1 1 1 3 3 3 2
gmm gmm_manhattan 4 dataframe 4 2 5 4 3 3 3 3 4 4 4 3
gmm gmm_manhattan 4 dataframe 5 5 1 2 5 5 5 5 5 5 5 1
gmm gmm_manhattan 5 dataframe 1 3 2 4 2 2 2 2 1 1 1 1
gmm gmm_manhattan 5 dataframe 2 1 4 3 4 4 4 4 2 2 2 4
gmm gmm_manhattan 5 dataframe 3 5 3 5 1 1 1 1 3 3 3 5
gmm gmm_manhattan 5 dataframe 4 2 5 1 3 3 3 3 4 4 4 3
gmm gmm_manhattan 5 dataframe 5 4 1 2 5 5 5 5 5 5 5 2
gmm gmm_manhattan 6 dataframe 1 5 2 4 2 4 2 2 1 1 1 5
gmm gmm_manhattan 6 dataframe 2 2 4 3 4 2 4 4 2 2 2 3
gmm gmm_manhattan 6 dataframe 3 3 3 5 1 1 1 1 3 3 3 2
gmm gmm_manhattan 6 dataframe 4 1 5 1 3 3 3 3 4 4 4 4
gmm gmm_manhattan 6 dataframe 5 4 1 2 5 5 5 5 5 5 5 1
kmeans_arma kmeans_arma 4 dataframe 1 2 1 1 1 1 1 1 1 1 1 1
kmeans_arma kmeans_arma 4 dataframe 2 4 2 2 2 2 2 2 2 2 2 2
kmeans_arma kmeans_arma 4 dataframe 3 1 3 3 3 3 3 3 3 3 3 3
kmeans_arma kmeans_arma 4 dataframe 4 5 4 4 4 4 4 4 4 4 4 5
kmeans_arma kmeans_arma 4 dataframe 5 3 5 5 5 5 5 5 5 5 5 4
kmeans_arma kmeans_arma 5 dataframe 1 1 1 1 1 1 1 1 1 1 1 1
kmeans_arma kmeans_arma 5 dataframe 2 2 2 2 2 2 2 2 2 2 2 5
kmeans_arma kmeans_arma 5 dataframe 3 5 3 3 3 3 3 3 3 3 3 4
kmeans_arma kmeans_arma 5 dataframe 4 3 4 4 4 4 4 4 4 4 4 2
kmeans_arma kmeans_arma 5 dataframe 5 4 5 5 5 5 5 5 5 5 5 3
kmeans_arma kmeans_arma 6 dataframe 1 3 1 1 1 1 1 1 1 1 1 4
kmeans_arma kmeans_arma 6 dataframe 2 2 2 2 2 2 2 2 2 2 2 5
kmeans_arma kmeans_arma 6 dataframe 3 5 3 3 3 3 3 3 3 3 3 2
kmeans_arma kmeans_arma 6 dataframe 4 1 4 4 4 4 4 4 4 4 4 1
kmeans_arma kmeans_arma 6 dataframe 5 4 5 5 5 5 5 5 5 5 5 3
kmeans_rcpp kmeans_rcpp 4 dataframe 1 5 4 5 2 3 2 2 1 1 1 5
kmeans_rcpp kmeans_rcpp 4 dataframe 2 1 2 1 4 2 4 4 2 2 2 4
kmeans_rcpp kmeans_rcpp 4 dataframe 3 4 3 3 3 4 3 3 3 3 3 2
kmeans_rcpp kmeans_rcpp 4 dataframe 4 2 5 4 1 1 1 1 4 4 4 1
kmeans_rcpp kmeans_rcpp 4 dataframe 5 3 1 2 5 5 5 5 5 5 5 3
kmeans_rcpp kmeans_rcpp 5 dataframe 1 5 2 4 2 3 2 2 1 1 1 5
kmeans_rcpp kmeans_rcpp 5 dataframe 2 1 4 5 4 2 4 4 2 2 2 4
kmeans_rcpp kmeans_rcpp 5 dataframe 3 4 3 1 3 4 3 3 3 3 3 2
kmeans_rcpp kmeans_rcpp 5 dataframe 4 2 5 3 1 1 1 1 4 4 4 1
kmeans_rcpp kmeans_rcpp 5 dataframe 5 3 1 2 5 5 5 5 5 5 5 3
kmeans_rcpp kmeans_rcpp 6 dataframe 1 5 2 4 2 2 2 2 1 1 1 2
kmeans_rcpp kmeans_rcpp 6 dataframe 2 1 4 3 4 4 4 4 2 2 2 5
kmeans_rcpp kmeans_rcpp 6 dataframe 3 3 3 5 1 1 1 1 3 3 3 3
kmeans_rcpp kmeans_rcpp 6 dataframe 4 2 5 1 3 3 3 3 4 4 4 1
kmeans_rcpp kmeans_rcpp 6 dataframe 5 4 1 2 5 5 5 5 5 5 5 4
mini_kmeans mini_kmeans 4 dataframe 1 3 1 1 1 1 1 1 1 1 1 1
mini_kmeans mini_kmeans 4 dataframe 2 1 2 2 2 2 2 2 2 2 2 5
mini_kmeans mini_kmeans 4 dataframe 3 2 3 3 3 3 3 3 3 3 3 2
mini_kmeans mini_kmeans 4 dataframe 4 4 4 4 4 4 4 4 4 4 4 4
mini_kmeans mini_kmeans 4 dataframe 5 5 5 5 5 5 5 5 5 5 5 3
mini_kmeans mini_kmeans 5 dataframe 1 2 1 1 1 1 1 1 1 1 1 2
mini_kmeans mini_kmeans 5 dataframe 2 5 2 2 2 2 2 2 2 2 2 3
mini_kmeans mini_kmeans 5 dataframe 3 1 3 3 3 3 3 3 3 3 3 5
mini_kmeans mini_kmeans 5 dataframe 4 4 4 4 4 4 4 4 4 4 4 1
mini_kmeans mini_kmeans 5 dataframe 5 3 5 5 5 5 5 5 5 5 5 4
mini_kmeans mini_kmeans 6 dataframe 1 5 1 1 1 1 1 1 1 1 1 4
mini_kmeans mini_kmeans 6 dataframe 2 2 2 2 2 2 2 2 2 2 2 5
mini_kmeans mini_kmeans 6 dataframe 3 1 3 3 3 3 3 3 3 3 3 1
mini_kmeans mini_kmeans 6 dataframe 4 3 4 4 4 4 4 4 4 4 4 2
mini_kmeans mini_kmeans 6 dataframe 5 4 5 5 5 5 5 5 5 5 5 3

If we only want to obtain the best classified variables or values for the external variables we execute the following method:


df_best_ranked_external <- Clustering::best_ranked_external_metrics(df$result)
Algorithm Distance Clusters Dataset Ranking timeExternal entropy variation_information precision recall f_measure fowlkes_mallows_index
gmm gmm_euclidean 4 dataframe 1 0.0181 0.3161 4.7617 0.1822 0.451 0.2595 0.2867
gmm gmm_euclidean 5 dataframe 1 0.0174 0.4175 4.3626 0.1637 0.2865 0.2084 0.2165
gmm gmm_euclidean 6 dataframe 1 0.019 0.433 4.4385 0.1744 0.2791 0.2147 0.2206
gmm gmm_manhattan 4 dataframe 1 0.0135 0.3161 4.7617 0.1822 0.451 0.2595 0.2867
gmm gmm_manhattan 5 dataframe 1 0.0162 0.4258 4.3496 0.167 0.2828 0.21 0.2173
gmm gmm_manhattan 6 dataframe 1 0.0178 0.4555 4.2975 0.1669 0.2608 0.2035 0.2085
kmeans_arma kmeans_arma 4 dataframe 1 4e-04 0 0 0 0 0 0
kmeans_arma kmeans_arma 5 dataframe 1 3e-04 0 0 0 0 0 0
kmeans_arma kmeans_arma 6 dataframe 1 4e-04 0 0 0 0 0 0
kmeans_rcpp kmeans_rcpp 4 dataframe 1 0.013 0.3728 4.6267 0.1697 0.5 0.23 0.2461
kmeans_rcpp kmeans_rcpp 5 dataframe 1 0.0152 0.4269 4.5505 0.1663 0.5 0.2104 0.2183
kmeans_rcpp kmeans_rcpp 6 dataframe 1 0.0161 0.4545 4.3312 0.1703 0.2458 0.2012 0.2046
mini_kmeans mini_kmeans 4 dataframe 1 5e-04 0 0 0 0 0 0
mini_kmeans mini_kmeans 5 dataframe 1 5e-04 0 0 0 0 0 0
mini_kmeans mini_kmeans 6 dataframe 1 6e-04 0 0 0 0 0 0

We also obtain the best classified values for internal evaluation


df_best_ranked_internal <- Clustering::best_ranked_internal_metrics(df$result)
Algorithm Distance Clusters Dataset Ranking timeInternal connectivity dunn silhouette
gmm gmm_euclidean 4 dataframe 1 0.0065 34.0929 0.1646 0.23
gmm gmm_euclidean 5 dataframe 1 0.0065 42.0794 0.1619 0.25
gmm gmm_euclidean 6 dataframe 1 0.0063 51.4599 0.1619 0.23
gmm gmm_manhattan 4 dataframe 1 0.0064 35.5869 0.1348 0.23
gmm gmm_manhattan 5 dataframe 1 0.0064 46.8306 0.1322 0.26
gmm gmm_manhattan 6 dataframe 1 0.0064 54.8667 0.1467 0.25
kmeans_arma kmeans_arma 4 dataframe 1 0.0063 44.2103 0.1495 0.23
kmeans_arma kmeans_arma 5 dataframe 1 0.0066 49.2159 0.1538 0.26
kmeans_arma kmeans_arma 6 dataframe 1 0.007 57.6278 0.1619 0.24
kmeans_rcpp kmeans_rcpp 4 dataframe 1 0.0062 51.0405 0.1741 0.23
kmeans_rcpp kmeans_rcpp 5 dataframe 1 0.0062 66.8492 0.152 0.19
kmeans_rcpp kmeans_rcpp 6 dataframe 1 0.0062 74.7754 0.1522 0.19
mini_kmeans mini_kmeans 4 dataframe 1 0.0061 50.3528 0.1571 0.21
mini_kmeans mini_kmeans 5 dataframe 1 0.0065 76.3976 0.1216 0.17
mini_kmeans mini_kmeans 6 dataframe 1 0.007 76.5341 0.15 0.17

In order to obtain the best evaluation by algorithm


df_best_validation_external <- Clustering::evaluate_best_validation_external_by_metrics(df$result)
Algorithm Distance timeExternal entropy variation_information precision recall f_measure fowlkes_mallows_index
gmm gmm_euclidean 0.019 0.433 4.7617 0.1822 0.451 0.2595 0.2867
gmm gmm_manhattan 0.0178 0.4555 4.7617 0.1822 0.451 0.2595 0.2867
kmeans_arma kmeans_arma 4e-04 0 0 0 0 0 0
kmeans_rcpp kmeans_rcpp 0.0161 0.4545 4.6267 0.1703 0.5 0.23 0.2461
mini_kmeans mini_kmeans 6e-04 0 0 0 0 0 0

Based on the results obtained we can see that the gmm algorithm behaves better.

From the algorithm with the best rating we can select the most appropriate number of clusters.


df_result_external <- Clustering::result_external_algorithm_by_metric(df$result,"gmm")
Algorithm Clusters timeExternal entropy variation_information precision recall f_measure fowlkes_mallows_index
gmm 4 0.0181 0.3161 4.7617 0.1822 0.451 0.2595 0.2867
gmm 5 0.0174 0.4258 4.3626 0.167 0.2865 0.21 0.2173
gmm 6 0.019 0.4555 4.4385 0.1744 0.2791 0.2147 0.2206

The same checks performed for external evaluation metrics, we can perform for internal evaluation.


df_best_validation_internal <-   
  Clustering::evaluate_best_validation_internal_by_metrics(df$result)
Algorithm Distance timeInternal connectivity dunn silhouette
gmm gmm_euclidean 0.0065 51.4599 0.1646 0.25
gmm gmm_manhattan 0.0064 54.8667 0.1467 0.26
kmeans_arma kmeans_arma 0.007 57.6278 0.1619 0.26
kmeans_rcpp kmeans_rcpp 0.0062 74.7754 0.1741 0.23
mini_kmeans mini_kmeans 0.007 76.5341 0.1571 0.21

In this case we can see that depending on the evaluation you want to make, one algorithm or another is chosen.