Clustering Package

Luis Alfonso Pérez Martos

2020-04-17

Clustering is considered as a concise data model by which from a set of data we must partition them and introduce them in data groups, which are ́an as similar as possible. If review all clustering algorithm implements in R, can see a great number of packages that implement or improve algorithm or functionality.

The Clustering package contain multiply implementations of algorithms like: gmm, kmeans-arma, kmeans-rcpp, fuzzy_cm, fuzzy_gg, fuzzy_gk, hclust, apclusterk,aggExcluster,clara, daisy, diana,fanny,gama,mona,pam, pvclust,pvpick.

Also can use differents similarity measures to calculate the distance between points like: Euclidean, Manhattan, Jaccard, Gower, Mahalanobis, Correlation and Minkowski.

Furthermore, the package offers functions to:

Clustering

It’s the main method of the package.Clustering method processes a set of clustering algorithms. If we need to get information about the parameters that the method has we can do so by using the ?function or help(function). The way to load the datasets can be done in two different ways:

Once the method has been executed, we obtain the results divided into four parts:


df <- Clustering::clustering(df = Clustering::basketball,  
                             packages = c("clusterr"), min = 4, max = 6)

Here we have a dataframe with the result of the execution. In it you can see all the algorithms, the similarity measures used, the variables classified in order of importance, the execution time of the algorithms and the evaluation metrics.

Algorithm Distance Clusters Dataset Ranking timeExternal entropy variation_information precision recall f_measure fowlkes_mallows_index connectivity dunn silhouette timeInternal
gmm gmm_euclidean 4 dataframe 1 0.0186 0.3161 4.7617 0.1822 0.451 0.2595 0.2867 34.0929 0.1646 0.23 0.0063
gmm gmm_euclidean 4 dataframe 2 0.0202 0.3085 4.7409 0.1113 0.4005 0.1742 0.2111 34.0929 0.1646 0.23 0.0068
gmm gmm_euclidean 4 dataframe 3 0.1455 0.0064 4.72 0 0 0 0 34.0929 0.1646 0.23 0.0073
gmm gmm_euclidean 4 dataframe 4 0.1875 0.0032 4.1425 0 0 0 0 34.0929 0.1646 0.23 0.0084
gmm gmm_euclidean 4 dataframe 5 0.359 0 3.6709 0 0 0 0 34.0929 0.1646 0.23 0.01
gmm gmm_euclidean 5 dataframe 1 0.0176 0.4175 4.3626 0.1637 0.2865 0.2084 0.2165 42.0794 0.1619 0.25 0.0064
gmm gmm_euclidean 5 dataframe 2 0.0218 0.3857 4.3463 0.1109 0.2823 0.1592 0.1769 42.0794 0.1619 0.25 0.0066
gmm gmm_euclidean 5 dataframe 3 0.1413 0.0064 4.3418 0 0 0 0 42.0794 0.1619 0.25 0.007
gmm gmm_euclidean 5 dataframe 4 0.1462 0.0032 4.321 0 0 0 0 42.0794 0.1619 0.25 0.0071
gmm gmm_euclidean 5 dataframe 5 0.1498 0 4.0224 0 0 0 0 42.0794 0.1619 0.25 0.0138
gmm gmm_euclidean 6 dataframe 1 0.0193 0.433 4.4385 0.1744 0.2791 0.2147 0.2206 51.4599 0.1619 0.23 0.0065
gmm gmm_euclidean 6 dataframe 2 0.0237 0.4209 4.1795 0.1062 0.2473 0.1486 0.1621 51.4599 0.1619 0.23 0.0066
gmm gmm_euclidean 6 dataframe 3 0.1446 0.0064 4.1586 0 0 0 0 51.4599 0.1619 0.23 0.0066
gmm gmm_euclidean 6 dataframe 4 0.1479 0.0032 4.1378 0 0 0 0 51.4599 0.1619 0.23 0.0066
gmm gmm_euclidean 6 dataframe 5 0.1501 0 3.954 0 0 0 0 51.4599 0.1619 0.23 0.0073
gmm gmm_manhattan 4 dataframe 1 0.014 0.3161 4.7617 0.1822 0.451 0.2595 0.2867 35.5869 0.1348 0.23 0.0065
gmm gmm_manhattan 4 dataframe 2 0.0174 0.3085 4.7409 0.1113 0.4005 0.1742 0.2111 35.5869 0.1348 0.23 0.0065
gmm gmm_manhattan 4 dataframe 3 0.1432 0.0064 4.72 0 0 0 0 35.5869 0.1348 0.23 0.0065
gmm gmm_manhattan 4 dataframe 4 0.1435 0.0032 4.1425 0 0 0 0 35.5869 0.1348 0.23 0.0066
gmm gmm_manhattan 4 dataframe 5 0.1572 0 3.6709 0 0 0 0 35.5869 0.1348 0.23 0.0069
gmm gmm_manhattan 5 dataframe 1 0.0165 0.4258 4.3496 0.167 0.2828 0.21 0.2173 46.8306 0.1322 0.26 0.0066
gmm gmm_manhattan 5 dataframe 2 0.0198 0.3892 4.3379 0.1114 0.2742 0.1584 0.1747 46.8306 0.1322 0.26 0.0067
gmm gmm_manhattan 5 dataframe 3 0.1421 0.0064 4.3171 0 0 0 0 46.8306 0.1322 0.26 0.0069
gmm gmm_manhattan 5 dataframe 4 0.1437 0.0032 4.2962 0 0 0 0 46.8306 0.1322 0.26 0.0069
gmm gmm_manhattan 5 dataframe 5 0.1474 0 4.0593 0 0 0 0 46.8306 0.1322 0.26 0.0074
gmm gmm_manhattan 6 dataframe 1 0.0187 0.4555 4.2975 0.1669 0.2608 0.2035 0.2085 54.8667 0.1467 0.25 0.0065
gmm gmm_manhattan 6 dataframe 2 0.0245 0.4052 4.1608 0.1148 0.2606 0.1594 0.173 54.8667 0.1467 0.25 0.0068
gmm gmm_manhattan 6 dataframe 3 0.144 0.0064 4.14 0 0 0 0 54.8667 0.1467 0.25 0.0068
gmm gmm_manhattan 6 dataframe 4 0.1472 0.0032 4.1191 0 0 0 0 54.8667 0.1467 0.25 0.0069
gmm gmm_manhattan 6 dataframe 5 0.1543 0 4.102 0 0 0 0 54.8667 0.1467 0.25 0.0087
kmeans_arma kmeans_arma 4 dataframe 1 4e-04 0 0 0 0 0 0 44.2103 0.1495 0.23 0.0066
kmeans_arma kmeans_arma 4 dataframe 2 4e-04 0 0 0 0 0 0 44.2103 0.1495 0.23 0.0067
kmeans_arma kmeans_arma 4 dataframe 3 4e-04 0 0 0 0 0 0 44.2103 0.1495 0.23 0.0067
kmeans_arma kmeans_arma 4 dataframe 4 4e-04 0 0 0 0 0 0 44.2103 0.1495 0.23 0.0068
kmeans_arma kmeans_arma 4 dataframe 5 0.001 0 0 0 0 0 0 44.2103 0.1495 0.23 0.0069
kmeans_arma kmeans_arma 5 dataframe 1 4e-04 0 0 0 0 0 0 49.2159 0.1538 0.26 0.0066
kmeans_arma kmeans_arma 5 dataframe 2 4e-04 0 0 0 0 0 0 49.2159 0.1538 0.26 0.007
kmeans_arma kmeans_arma 5 dataframe 3 4e-04 0 0 0 0 0 0 49.2159 0.1538 0.26 0.0072
kmeans_arma kmeans_arma 5 dataframe 4 4e-04 0 0 0 0 0 0 49.2159 0.1538 0.26 0.0073
kmeans_arma kmeans_arma 5 dataframe 5 5e-04 0 0 0 0 0 0 49.2159 0.1538 0.26 0.0075
kmeans_arma kmeans_arma 6 dataframe 1 4e-04 0 0 0 0 0 0 57.6278 0.1619 0.24 0.0072
kmeans_arma kmeans_arma 6 dataframe 2 4e-04 0 0 0 0 0 0 57.6278 0.1619 0.24 0.0073
kmeans_arma kmeans_arma 6 dataframe 3 4e-04 0 0 0 0 0 0 57.6278 0.1619 0.24 0.0073
kmeans_arma kmeans_arma 6 dataframe 4 4e-04 0 0 0 0 0 0 57.6278 0.1619 0.24 0.0075
kmeans_arma kmeans_arma 6 dataframe 5 4e-04 0 0 0 0 0 0 57.6278 0.1619 0.24 0.0076
kmeans_rcpp kmeans_rcpp 4 dataframe 1 0.0132 0.3728 4.6267 0.1697 0.5 0.23 0.2461 51.0405 0.1741 0.23 0.0064
kmeans_rcpp kmeans_rcpp 4 dataframe 2 0.0184 0.3494 4.6058 0.1003 0.3567 0.1511 0.1753 51.0405 0.1741 0.23 0.0065
kmeans_rcpp kmeans_rcpp 4 dataframe 3 0.138 0.0032 4.6058 9e-04 0.3065 0.0018 0.021 51.0405 0.1741 0.23 0.0066
kmeans_rcpp kmeans_rcpp 4 dataframe 4 0.1444 0.0032 4.5308 0 0 0 0 51.0405 0.1741 0.23 0.007
kmeans_rcpp kmeans_rcpp 4 dataframe 5 0.1519 0 3.8037 0 0 0 0 51.0405 0.1741 0.23 0.0072
kmeans_rcpp kmeans_rcpp 5 dataframe 1 0.0154 0.4269 4.5505 0.1663 0.5 0.2104 0.2183 66.8492 0.152 0.19 0.0064
kmeans_rcpp kmeans_rcpp 5 dataframe 2 0.0199 0.4135 4.3288 0.1019 0.2865 0.1457 0.1613 66.8492 0.152 0.19 0.0064
kmeans_rcpp kmeans_rcpp 5 dataframe 3 0.1411 0.0032 4.308 0.0011 0.2554 0.0022 0.0232 66.8492 0.152 0.19 0.0066
kmeans_rcpp kmeans_rcpp 5 dataframe 4 0.1436 0.0032 4.308 0 0 0 0 66.8492 0.152 0.19 0.0066
kmeans_rcpp kmeans_rcpp 5 dataframe 5 0.1448 0 4.0788 0 0 0 0 66.8492 0.152 0.19 0.0066
kmeans_rcpp kmeans_rcpp 6 dataframe 1 0.0167 0.4545 4.3312 0.1703 0.2458 0.2012 0.2046 74.7754 0.1522 0.19 0.0065
kmeans_rcpp kmeans_rcpp 6 dataframe 2 0.0209 0.4169 4.1035 0.1152 0.2419 0.1561 0.167 74.7754 0.1522 0.19 0.0065
kmeans_rcpp kmeans_rcpp 6 dataframe 3 0.1412 0.0064 4.0827 0 0 0 0 74.7754 0.1522 0.19 0.0065
kmeans_rcpp kmeans_rcpp 6 dataframe 4 0.1453 0.0032 4.0619 0 0 0 0 74.7754 0.1522 0.19 0.0066
kmeans_rcpp kmeans_rcpp 6 dataframe 5 0.1495 0 4.0375 0 0 0 0 74.7754 0.1522 0.19 0.0068
mini_kmeans mini_kmeans 4 dataframe 1 5e-04 0 0 0 0 0 0 50.3528 0.1571 0.21 0.0066
mini_kmeans mini_kmeans 4 dataframe 2 5e-04 0 0 0 0 0 0 50.3528 0.1571 0.21 0.0066
mini_kmeans mini_kmeans 4 dataframe 3 5e-04 0 0 0 0 0 0 50.3528 0.1571 0.21 0.0067
mini_kmeans mini_kmeans 4 dataframe 4 5e-04 0 0 0 0 0 0 50.3528 0.1571 0.21 0.0068
mini_kmeans mini_kmeans 4 dataframe 5 9e-04 0 0 0 0 0 0 50.3528 0.1571 0.21 0.0068
mini_kmeans mini_kmeans 5 dataframe 1 5e-04 0 0 0 0 0 0 76.3976 0.1216 0.17 0.0065
mini_kmeans mini_kmeans 5 dataframe 2 5e-04 0 0 0 0 0 0 76.3976 0.1216 0.17 0.0066
mini_kmeans mini_kmeans 5 dataframe 3 6e-04 0 0 0 0 0 0 76.3976 0.1216 0.17 0.0068
mini_kmeans mini_kmeans 5 dataframe 4 6e-04 0 0 0 0 0 0 76.3976 0.1216 0.17 0.0068
mini_kmeans mini_kmeans 5 dataframe 5 7e-04 0 0 0 0 0 0 76.3976 0.1216 0.17 0.0069
mini_kmeans mini_kmeans 6 dataframe 1 5e-04 0 0 0 0 0 0 76.5341 0.15 0.17 0.0067
mini_kmeans mini_kmeans 6 dataframe 2 5e-04 0 0 0 0 0 0 76.5341 0.15 0.17 0.0068
mini_kmeans mini_kmeans 6 dataframe 3 5e-04 0 0 0 0 0 0 76.5341 0.15 0.17 0.0069
mini_kmeans mini_kmeans 6 dataframe 4 5e-04 0 0 0 0 0 0 76.5341 0.15 0.17 0.0069
mini_kmeans mini_kmeans 6 dataframe 5 5e-04 0 0 0 0 0 0 76.5341 0.15 0.17 0.0071

This property tells us if we have made an internal evaluation of the groups

#> [1] TRUE

This property tells us if we have made an external evaluation of the groups

#> [1] TRUE

Algorithms executed

#> [1] "gmm"         "kmeans_arma" "kmeans_rcpp" "mini_kmeans"

Similarity Metrics

#> [1] "gmm_euclidean" "gmm_manhattan" "kmeans_arma"   "kmeans_rcpp"  
#> [5] "mini_kmeans"

If we want to obtain the classified variables instead of the values we must use the variable property


df_variable <- Clustering::clustering(df = Clustering::basketball,  
                             packages = c("clusterr"), min = 4, max = 6, variables = TRUE)
Algorithm Distance Clusters Dataset Ranking timeExternal entropy variation_information precision recall f_measure fowlkes_mallows_index connectivity dunn silhouette timeInternal
gmm gmm_euclidean 4 dataframe 1 5 2 3 2 2 2 2 1 1 1 4
gmm gmm_euclidean 4 dataframe 2 1 4 5 4 4 4 4 2 2 2 5
gmm gmm_euclidean 4 dataframe 3 4 3 1 1 1 1 1 3 3 3 1
gmm gmm_euclidean 4 dataframe 4 2 5 4 3 3 3 3 4 4 4 2
gmm gmm_euclidean 4 dataframe 5 3 1 2 5 5 5 5 5 5 5 3
gmm gmm_euclidean 5 dataframe 1 4 2 3 2 2 2 2 1 1 1 1
gmm gmm_euclidean 5 dataframe 2 1 4 4 4 4 4 4 2 2 2 5
gmm gmm_euclidean 5 dataframe 3 3 3 5 1 1 1 1 3 3 3 2
gmm gmm_euclidean 5 dataframe 4 2 5 1 3 3 3 3 4 4 4 4
gmm gmm_euclidean 5 dataframe 5 5 1 2 5 5 5 5 5 5 5 3
gmm gmm_euclidean 6 dataframe 1 3 2 4 2 2 2 2 1 1 1 3
gmm gmm_euclidean 6 dataframe 2 1 4 3 4 4 4 4 2 2 2 4
gmm gmm_euclidean 6 dataframe 3 5 3 5 1 1 1 1 3 3 3 2
gmm gmm_euclidean 6 dataframe 4 2 5 1 3 3 3 3 4 4 4 5
gmm gmm_euclidean 6 dataframe 5 4 1 2 5 5 5 5 5 5 5 1
gmm gmm_manhattan 4 dataframe 1 3 2 3 2 2 2 2 1 1 1 3
gmm gmm_manhattan 4 dataframe 2 1 4 5 4 4 4 4 2 2 2 5
gmm gmm_manhattan 4 dataframe 3 4 3 1 1 1 1 1 3 3 3 4
gmm gmm_manhattan 4 dataframe 4 2 5 4 3 3 3 3 4 4 4 2
gmm gmm_manhattan 4 dataframe 5 5 1 2 5 5 5 5 5 5 5 1
gmm gmm_manhattan 5 dataframe 1 3 2 4 2 2 2 2 1 1 1 3
gmm gmm_manhattan 5 dataframe 2 1 4 3 4 4 4 4 2 2 2 4
gmm gmm_manhattan 5 dataframe 3 4 3 5 1 1 1 1 3 3 3 2
gmm gmm_manhattan 5 dataframe 4 2 5 1 3 3 3 3 4 4 4 1
gmm gmm_manhattan 5 dataframe 5 5 1 2 5 5 5 5 5 5 5 5
gmm gmm_manhattan 6 dataframe 1 4 2 4 2 4 2 2 1 1 1 1
gmm gmm_manhattan 6 dataframe 2 2 4 3 4 2 4 4 2 2 2 2
gmm gmm_manhattan 6 dataframe 3 3 3 5 1 1 1 1 3 3 3 4
gmm gmm_manhattan 6 dataframe 4 1 5 1 3 3 3 3 4 4 4 5
gmm gmm_manhattan 6 dataframe 5 5 1 2 5 5 5 5 5 5 5 3
kmeans_arma kmeans_arma 4 dataframe 1 1 1 1 1 1 1 1 1 1 1 2
kmeans_arma kmeans_arma 4 dataframe 2 4 2 2 2 2 2 2 2 2 2 3
kmeans_arma kmeans_arma 4 dataframe 3 5 3 3 3 3 3 3 3 3 3 1
kmeans_arma kmeans_arma 4 dataframe 4 3 4 4 4 4 4 4 4 4 4 5
kmeans_arma kmeans_arma 4 dataframe 5 2 5 5 5 5 5 5 5 5 5 4
kmeans_arma kmeans_arma 5 dataframe 1 4 1 1 1 1 1 1 1 1 1 1
kmeans_arma kmeans_arma 5 dataframe 2 3 2 2 2 2 2 2 2 2 2 2
kmeans_arma kmeans_arma 5 dataframe 3 1 3 3 3 3 3 3 3 3 3 3
kmeans_arma kmeans_arma 5 dataframe 4 2 4 4 4 4 4 4 4 4 4 4
kmeans_arma kmeans_arma 5 dataframe 5 5 5 5 5 5 5 5 5 5 5 5
kmeans_arma kmeans_arma 6 dataframe 1 2 1 1 1 1 1 1 1 1 1 2
kmeans_arma kmeans_arma 6 dataframe 2 1 2 2 2 2 2 2 2 2 2 3
kmeans_arma kmeans_arma 6 dataframe 3 3 3 3 3 3 3 3 3 3 3 4
kmeans_arma kmeans_arma 6 dataframe 4 5 4 4 4 4 4 4 4 4 4 1
kmeans_arma kmeans_arma 6 dataframe 5 4 5 5 5 5 5 5 5 5 5 5
kmeans_rcpp kmeans_rcpp 4 dataframe 1 5 4 5 2 3 2 2 1 1 1 1
kmeans_rcpp kmeans_rcpp 4 dataframe 2 1 2 1 4 2 4 4 2 2 2 4
kmeans_rcpp kmeans_rcpp 4 dataframe 3 4 3 3 3 4 3 3 3 3 3 2
kmeans_rcpp kmeans_rcpp 4 dataframe 4 2 5 4 1 1 1 1 4 4 4 3
kmeans_rcpp kmeans_rcpp 4 dataframe 5 3 1 2 5 5 5 5 5 5 5 5
kmeans_rcpp kmeans_rcpp 5 dataframe 1 5 2 4 2 3 2 2 1 1 1 1
kmeans_rcpp kmeans_rcpp 5 dataframe 2 1 4 5 4 2 4 4 2 2 2 4
kmeans_rcpp kmeans_rcpp 5 dataframe 3 4 3 1 3 4 3 3 3 3 3 3
kmeans_rcpp kmeans_rcpp 5 dataframe 4 2 5 3 1 1 1 1 4 4 4 2
kmeans_rcpp kmeans_rcpp 5 dataframe 5 3 1 2 5 5 5 5 5 5 5 5
kmeans_rcpp kmeans_rcpp 6 dataframe 1 3 2 4 2 2 2 2 1 1 1 3
kmeans_rcpp kmeans_rcpp 6 dataframe 2 1 4 3 4 4 4 4 2 2 2 4
kmeans_rcpp kmeans_rcpp 6 dataframe 3 4 3 5 1 1 1 1 3 3 3 5
kmeans_rcpp kmeans_rcpp 6 dataframe 4 2 5 1 3 3 3 3 4 4 4 1
kmeans_rcpp kmeans_rcpp 6 dataframe 5 5 1 2 5 5 5 5 5 5 5 2
mini_kmeans mini_kmeans 4 dataframe 1 2 1 1 1 1 1 1 1 1 1 1
mini_kmeans mini_kmeans 4 dataframe 2 3 2 2 2 2 2 2 2 2 2 2
mini_kmeans mini_kmeans 4 dataframe 3 4 3 3 3 3 3 3 3 3 3 3
mini_kmeans mini_kmeans 4 dataframe 4 1 4 4 4 4 4 4 4 4 4 4
mini_kmeans mini_kmeans 4 dataframe 5 5 5 5 5 5 5 5 5 5 5 5
mini_kmeans mini_kmeans 5 dataframe 1 1 1 1 1 1 1 1 1 1 1 3
mini_kmeans mini_kmeans 5 dataframe 2 2 2 2 2 2 2 2 2 2 2 2
mini_kmeans mini_kmeans 5 dataframe 3 4 3 3 3 3 3 3 3 3 3 1
mini_kmeans mini_kmeans 5 dataframe 4 3 4 4 4 4 4 4 4 4 4 4
mini_kmeans mini_kmeans 5 dataframe 5 5 5 5 5 5 5 5 5 5 5 5
mini_kmeans mini_kmeans 6 dataframe 1 2 1 1 1 1 1 1 1 1 1 1
mini_kmeans mini_kmeans 6 dataframe 2 3 2 2 2 2 2 2 2 2 2 3
mini_kmeans mini_kmeans 6 dataframe 3 1 3 3 3 3 3 3 3 3 3 2
mini_kmeans mini_kmeans 6 dataframe 4 5 4 4 4 4 4 4 4 4 4 5
mini_kmeans mini_kmeans 6 dataframe 5 4 5 5 5 5 5 5 5 5 5 4

If we only want to obtain the best classified variables or values for the external variables we execute the following method:


df_best_ranked_external <- Clustering::best_ranked_external_metrics(df$result)
Algorithm Distance Clusters Dataset Ranking timeExternal entropy variation_information precision recall f_measure fowlkes_mallows_index
gmm gmm_euclidean 4 dataframe 1 0.0186 0.3161 4.7617 0.1822 0.451 0.2595 0.2867
gmm gmm_euclidean 5 dataframe 1 0.0176 0.4175 4.3626 0.1637 0.2865 0.2084 0.2165
gmm gmm_euclidean 6 dataframe 1 0.0193 0.433 4.4385 0.1744 0.2791 0.2147 0.2206
gmm gmm_manhattan 4 dataframe 1 0.014 0.3161 4.7617 0.1822 0.451 0.2595 0.2867
gmm gmm_manhattan 5 dataframe 1 0.0165 0.4258 4.3496 0.167 0.2828 0.21 0.2173
gmm gmm_manhattan 6 dataframe 1 0.0187 0.4555 4.2975 0.1669 0.2608 0.2035 0.2085
kmeans_arma kmeans_arma 4 dataframe 1 4e-04 0 0 0 0 0 0
kmeans_arma kmeans_arma 5 dataframe 1 4e-04 0 0 0 0 0 0
kmeans_arma kmeans_arma 6 dataframe 1 4e-04 0 0 0 0 0 0
kmeans_rcpp kmeans_rcpp 4 dataframe 1 0.0132 0.3728 4.6267 0.1697 0.5 0.23 0.2461
kmeans_rcpp kmeans_rcpp 5 dataframe 1 0.0154 0.4269 4.5505 0.1663 0.5 0.2104 0.2183
kmeans_rcpp kmeans_rcpp 6 dataframe 1 0.0167 0.4545 4.3312 0.1703 0.2458 0.2012 0.2046
mini_kmeans mini_kmeans 4 dataframe 1 5e-04 0 0 0 0 0 0
mini_kmeans mini_kmeans 5 dataframe 1 5e-04 0 0 0 0 0 0
mini_kmeans mini_kmeans 6 dataframe 1 5e-04 0 0 0 0 0 0

We also obtain the best classified values for internal evaluation


df_best_ranked_internal <- Clustering::best_ranked_internal_metrics(df$result)
Algorithm Distance Clusters Dataset Ranking timeInternal connectivity dunn silhouette
gmm gmm_euclidean 4 dataframe 1 0.0063 34.0929 0.1646 0.23
gmm gmm_euclidean 5 dataframe 1 0.0064 42.0794 0.1619 0.25
gmm gmm_euclidean 6 dataframe 1 0.0065 51.4599 0.1619 0.23
gmm gmm_manhattan 4 dataframe 1 0.0065 35.5869 0.1348 0.23
gmm gmm_manhattan 5 dataframe 1 0.0066 46.8306 0.1322 0.26
gmm gmm_manhattan 6 dataframe 1 0.0065 54.8667 0.1467 0.25
kmeans_arma kmeans_arma 4 dataframe 1 0.0066 44.2103 0.1495 0.23
kmeans_arma kmeans_arma 5 dataframe 1 0.0066 49.2159 0.1538 0.26
kmeans_arma kmeans_arma 6 dataframe 1 0.0072 57.6278 0.1619 0.24
kmeans_rcpp kmeans_rcpp 4 dataframe 1 0.0064 51.0405 0.1741 0.23
kmeans_rcpp kmeans_rcpp 5 dataframe 1 0.0064 66.8492 0.152 0.19
kmeans_rcpp kmeans_rcpp 6 dataframe 1 0.0065 74.7754 0.1522 0.19
mini_kmeans mini_kmeans 4 dataframe 1 0.0066 50.3528 0.1571 0.21
mini_kmeans mini_kmeans 5 dataframe 1 0.0065 76.3976 0.1216 0.17
mini_kmeans mini_kmeans 6 dataframe 1 0.0067 76.5341 0.15 0.17

In order to obtain the best evaluation by algorithm


df_best_validation_external <- Clustering::evaluate_best_validation_external_by_metrics(df$result)
Algorithm Distance timeExternal entropy variation_information precision recall f_measure fowlkes_mallows_index
gmm gmm_euclidean 0.0193 0.433 4.7617 0.1822 0.451 0.2595 0.2867
gmm gmm_manhattan 0.0187 0.4555 4.7617 0.1822 0.451 0.2595 0.2867
kmeans_arma kmeans_arma 4e-04 0 0 0 0 0 0
kmeans_rcpp kmeans_rcpp 0.0167 0.4545 4.6267 0.1703 0.5 0.23 0.2461
mini_kmeans mini_kmeans 5e-04 0 0 0 0 0 0

Based on the results obtained we can see that the gmm algorithm behaves better.

From the algorithm with the best rating we can select the most appropriate number of clusters.


df_result_external <- Clustering::result_external_algorithm_by_metric(df$result,"gmm")
Algorithm Clusters timeExternal entropy variation_information precision recall f_measure fowlkes_mallows_index
gmm 4 0.0186 0.3161 4.7617 0.1822 0.451 0.2595 0.2867
gmm 5 0.0176 0.4258 4.3626 0.167 0.2865 0.21 0.2173
gmm 6 0.0193 0.4555 4.4385 0.1744 0.2791 0.2147 0.2206

The same checks performed for external evaluation metrics, we can perform for internal evaluation.


df_best_validation_internal <-   
  Clustering::evaluate_best_validation_internal_by_metrics(df$result)
Algorithm Distance timeInternal connectivity dunn silhouette
gmm gmm_euclidean 0.0065 51.4599 0.1646 0.25
gmm gmm_manhattan 0.0066 54.8667 0.1467 0.26
kmeans_arma kmeans_arma 0.0072 57.6278 0.1619 0.26
kmeans_rcpp kmeans_rcpp 0.0065 74.7754 0.1741 0.23
mini_kmeans mini_kmeans 0.0067 76.5341 0.1571 0.21

In this case we can see that depending on the evaluation you want to make, one algorithm or another is chosen.

If we want to see graphically the representation of any metric as a function of the number of clusters and algorithm we can do it in the following way depending if the evaluation metric is internal or external


Clustering::plot_external_validation(df,"variation_information")