Clustering is considered as a concise data model by which from a set of data we must partition them and introduce them in data groups, which are ́an as similar as possible. If review all clustering algorithm implements in R, can see a great number of packages that implement or improve algorithm or functionality.
The Clustering package contain multiply implementations of algorithms like: gmm, kmeans-arma, kmeans-rcpp, fuzzy_cm, fuzzy_gg, fuzzy_gk, hclust, apclusterk,aggExcluster,clara, daisy, diana,fanny,gama,mona,pam, pvclust,pvpick.
Also can use differents similarity measures to calculate the distance between points like: Euclidean, Manhattan, Jaccard, Gower, Mahalanobis, Correlation and Minkowski.
Furthermore, the package offers functions to:
It’s the main method of the package.Clustering method processes a set of clustering algorithms. If we need to get information about the parameters that the method has we can do so by using the ?function or help(function). The way to load the datasets can be done in two different ways:
Once the method has been executed, we obtain the results divided into four parts:
df <- Clustering::clustering(df = Clustering::basketball,
packages = c("clusterr"), min = 4, max = 6)
Here we have a dataframe with the result of the execution. In it you can see all the algorithms, the similarity measures used, the variables classified in order of importance, the execution time of the algorithms and the evaluation metrics.
Algorithm | Distance | Clusters | Dataset | Ranking | timeExternal | entropy | variation_information | precision | recall | f_measure | fowlkes_mallows_index | connectivity | dunn | silhouette | timeInternal |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gmm | gmm_euclidean | 4 | dataframe | 1 | 0.0181 | 0.3161 | 4.7617 | 0.1822 | 0.451 | 0.2595 | 0.2867 | 34.0929 | 0.1646 | 0.23 | 0.0065 |
gmm | gmm_euclidean | 4 | dataframe | 2 | 0.0187 | 0.3085 | 4.7409 | 0.1113 | 0.4005 | 0.1742 | 0.2111 | 34.0929 | 0.1646 | 0.23 | 0.0068 |
gmm | gmm_euclidean | 4 | dataframe | 3 | 0.1438 | 0.0064 | 4.72 | 0 | 0 | 0 | 0 | 34.0929 | 0.1646 | 0.23 | 0.0082 |
gmm | gmm_euclidean | 4 | dataframe | 4 | 0.1804 | 0.0032 | 4.1425 | 0 | 0 | 0 | 0 | 34.0929 | 0.1646 | 0.23 | 0.0084 |
gmm | gmm_euclidean | 4 | dataframe | 5 | 0.3559 | 0 | 3.6709 | 0 | 0 | 0 | 0 | 34.0929 | 0.1646 | 0.23 | 0.0098 |
gmm | gmm_euclidean | 5 | dataframe | 1 | 0.0174 | 0.4175 | 4.3626 | 0.1637 | 0.2865 | 0.2084 | 0.2165 | 42.0794 | 0.1619 | 0.25 | 0.0065 |
gmm | gmm_euclidean | 5 | dataframe | 2 | 0.0244 | 0.3857 | 4.3463 | 0.1109 | 0.2823 | 0.1592 | 0.1769 | 42.0794 | 0.1619 | 0.25 | 0.0066 |
gmm | gmm_euclidean | 5 | dataframe | 3 | 0.1424 | 0.0064 | 4.3418 | 0 | 0 | 0 | 0 | 42.0794 | 0.1619 | 0.25 | 0.0067 |
gmm | gmm_euclidean | 5 | dataframe | 4 | 0.1434 | 0.0032 | 4.321 | 0 | 0 | 0 | 0 | 42.0794 | 0.1619 | 0.25 | 0.0081 |
gmm | gmm_euclidean | 5 | dataframe | 5 | 0.1467 | 0 | 4.0224 | 0 | 0 | 0 | 0 | 42.0794 | 0.1619 | 0.25 | 0.0128 |
gmm | gmm_euclidean | 6 | dataframe | 1 | 0.019 | 0.433 | 4.4385 | 0.1744 | 0.2791 | 0.2147 | 0.2206 | 51.4599 | 0.1619 | 0.23 | 0.0063 |
gmm | gmm_euclidean | 6 | dataframe | 2 | 0.0233 | 0.4209 | 4.1795 | 0.1062 | 0.2473 | 0.1486 | 0.1621 | 51.4599 | 0.1619 | 0.23 | 0.0063 |
gmm | gmm_euclidean | 6 | dataframe | 3 | 0.142 | 0.0064 | 4.1586 | 0 | 0 | 0 | 0 | 51.4599 | 0.1619 | 0.23 | 0.0064 |
gmm | gmm_euclidean | 6 | dataframe | 4 | 0.149 | 0.0032 | 4.1378 | 0 | 0 | 0 | 0 | 51.4599 | 0.1619 | 0.23 | 0.0065 |
gmm | gmm_euclidean | 6 | dataframe | 5 | 0.1552 | 0 | 3.954 | 0 | 0 | 0 | 0 | 51.4599 | 0.1619 | 0.23 | 0.0065 |
gmm | gmm_manhattan | 4 | dataframe | 1 | 0.0135 | 0.3161 | 4.7617 | 0.1822 | 0.451 | 0.2595 | 0.2867 | 35.5869 | 0.1348 | 0.23 | 0.0064 |
gmm | gmm_manhattan | 4 | dataframe | 2 | 0.0169 | 0.3085 | 4.7409 | 0.1113 | 0.4005 | 0.1742 | 0.2111 | 35.5869 | 0.1348 | 0.23 | 0.0064 |
gmm | gmm_manhattan | 4 | dataframe | 3 | 0.1378 | 0.0064 | 4.72 | 0 | 0 | 0 | 0 | 35.5869 | 0.1348 | 0.23 | 0.0065 |
gmm | gmm_manhattan | 4 | dataframe | 4 | 0.1432 | 0.0032 | 4.1425 | 0 | 0 | 0 | 0 | 35.5869 | 0.1348 | 0.23 | 0.0065 |
gmm | gmm_manhattan | 4 | dataframe | 5 | 0.1434 | 0 | 3.6709 | 0 | 0 | 0 | 0 | 35.5869 | 0.1348 | 0.23 | 0.0065 |
gmm | gmm_manhattan | 5 | dataframe | 1 | 0.0162 | 0.4258 | 4.3496 | 0.167 | 0.2828 | 0.21 | 0.2173 | 46.8306 | 0.1322 | 0.26 | 0.0064 |
gmm | gmm_manhattan | 5 | dataframe | 2 | 0.0195 | 0.3892 | 4.3379 | 0.1114 | 0.2742 | 0.1584 | 0.1747 | 46.8306 | 0.1322 | 0.26 | 0.0064 |
gmm | gmm_manhattan | 5 | dataframe | 3 | 0.141 | 0.0064 | 4.3171 | 0 | 0 | 0 | 0 | 46.8306 | 0.1322 | 0.26 | 0.0064 |
gmm | gmm_manhattan | 5 | dataframe | 4 | 0.1456 | 0.0032 | 4.2962 | 0 | 0 | 0 | 0 | 46.8306 | 0.1322 | 0.26 | 0.0064 |
gmm | gmm_manhattan | 5 | dataframe | 5 | 0.1505 | 0 | 4.0593 | 0 | 0 | 0 | 0 | 46.8306 | 0.1322 | 0.26 | 0.0065 |
gmm | gmm_manhattan | 6 | dataframe | 1 | 0.0178 | 0.4555 | 4.2975 | 0.1669 | 0.2608 | 0.2035 | 0.2085 | 54.8667 | 0.1467 | 0.25 | 0.0064 |
gmm | gmm_manhattan | 6 | dataframe | 2 | 0.0211 | 0.4052 | 4.1608 | 0.1148 | 0.2606 | 0.1594 | 0.173 | 54.8667 | 0.1467 | 0.25 | 0.0065 |
gmm | gmm_manhattan | 6 | dataframe | 3 | 0.1423 | 0.0064 | 4.14 | 0 | 0 | 0 | 0 | 54.8667 | 0.1467 | 0.25 | 0.0068 |
gmm | gmm_manhattan | 6 | dataframe | 4 | 0.1465 | 0.0032 | 4.1191 | 0 | 0 | 0 | 0 | 54.8667 | 0.1467 | 0.25 | 0.0068 |
gmm | gmm_manhattan | 6 | dataframe | 5 | 0.1474 | 0 | 4.102 | 0 | 0 | 0 | 0 | 54.8667 | 0.1467 | 0.25 | 0.007 |
kmeans_arma | kmeans_arma | 4 | dataframe | 1 | 4e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 44.2103 | 0.1495 | 0.23 | 0.0063 |
kmeans_arma | kmeans_arma | 4 | dataframe | 2 | 4e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 44.2103 | 0.1495 | 0.23 | 0.0066 |
kmeans_arma | kmeans_arma | 4 | dataframe | 3 | 5e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 44.2103 | 0.1495 | 0.23 | 0.0066 |
kmeans_arma | kmeans_arma | 4 | dataframe | 4 | 6e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 44.2103 | 0.1495 | 0.23 | 0.007 |
kmeans_arma | kmeans_arma | 4 | dataframe | 5 | 9e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 44.2103 | 0.1495 | 0.23 | 0.0074 |
kmeans_arma | kmeans_arma | 5 | dataframe | 1 | 3e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 49.2159 | 0.1538 | 0.26 | 0.0066 |
kmeans_arma | kmeans_arma | 5 | dataframe | 2 | 4e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 49.2159 | 0.1538 | 0.26 | 0.0066 |
kmeans_arma | kmeans_arma | 5 | dataframe | 3 | 4e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 49.2159 | 0.1538 | 0.26 | 0.0067 |
kmeans_arma | kmeans_arma | 5 | dataframe | 4 | 4e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 49.2159 | 0.1538 | 0.26 | 0.007 |
kmeans_arma | kmeans_arma | 5 | dataframe | 5 | 4e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 49.2159 | 0.1538 | 0.26 | 0.0071 |
kmeans_arma | kmeans_arma | 6 | dataframe | 1 | 4e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 57.6278 | 0.1619 | 0.24 | 0.007 |
kmeans_arma | kmeans_arma | 6 | dataframe | 2 | 4e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 57.6278 | 0.1619 | 0.24 | 0.007 |
kmeans_arma | kmeans_arma | 6 | dataframe | 3 | 4e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 57.6278 | 0.1619 | 0.24 | 0.0071 |
kmeans_arma | kmeans_arma | 6 | dataframe | 4 | 5e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 57.6278 | 0.1619 | 0.24 | 0.0071 |
kmeans_arma | kmeans_arma | 6 | dataframe | 5 | 5e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 57.6278 | 0.1619 | 0.24 | 0.009 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 1 | 0.013 | 0.3728 | 4.6267 | 0.1697 | 0.5 | 0.23 | 0.2461 | 51.0405 | 0.1741 | 0.23 | 0.0062 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 2 | 0.0176 | 0.3494 | 4.6058 | 0.1003 | 0.3567 | 0.1511 | 0.1753 | 51.0405 | 0.1741 | 0.23 | 0.0064 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 3 | 0.1355 | 0.0032 | 4.6058 | 9e-04 | 0.3065 | 0.0018 | 0.021 | 51.0405 | 0.1741 | 0.23 | 0.0064 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 4 | 0.1454 | 0.0032 | 4.5308 | 0 | 0 | 0 | 0 | 51.0405 | 0.1741 | 0.23 | 0.0065 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 5 | 0.1501 | 0 | 3.8037 | 0 | 0 | 0 | 0 | 51.0405 | 0.1741 | 0.23 | 0.0066 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 1 | 0.0152 | 0.4269 | 4.5505 | 0.1663 | 0.5 | 0.2104 | 0.2183 | 66.8492 | 0.152 | 0.19 | 0.0062 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 2 | 0.0197 | 0.4135 | 4.3288 | 0.1019 | 0.2865 | 0.1457 | 0.1613 | 66.8492 | 0.152 | 0.19 | 0.0063 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 3 | 0.1393 | 0.0032 | 4.308 | 0.0011 | 0.2554 | 0.0022 | 0.0232 | 66.8492 | 0.152 | 0.19 | 0.0063 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 4 | 0.1412 | 0.0032 | 4.308 | 0 | 0 | 0 | 0 | 66.8492 | 0.152 | 0.19 | 0.0064 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 5 | 0.146 | 0 | 4.0788 | 0 | 0 | 0 | 0 | 66.8492 | 0.152 | 0.19 | 0.0067 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 1 | 0.0161 | 0.4545 | 4.3312 | 0.1703 | 0.2458 | 0.2012 | 0.2046 | 74.7754 | 0.1522 | 0.19 | 0.0062 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 2 | 0.0204 | 0.4169 | 4.1035 | 0.1152 | 0.2419 | 0.1561 | 0.167 | 74.7754 | 0.1522 | 0.19 | 0.0062 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 3 | 0.1408 | 0.0064 | 4.0827 | 0 | 0 | 0 | 0 | 74.7754 | 0.1522 | 0.19 | 0.0062 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 4 | 0.149 | 0.0032 | 4.0619 | 0 | 0 | 0 | 0 | 74.7754 | 0.1522 | 0.19 | 0.0065 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 5 | 0.1496 | 0 | 4.0375 | 0 | 0 | 0 | 0 | 74.7754 | 0.1522 | 0.19 | 0.0067 |
mini_kmeans | mini_kmeans | 4 | dataframe | 1 | 5e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 50.3528 | 0.1571 | 0.21 | 0.0061 |
mini_kmeans | mini_kmeans | 4 | dataframe | 2 | 5e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 50.3528 | 0.1571 | 0.21 | 0.0063 |
mini_kmeans | mini_kmeans | 4 | dataframe | 3 | 5e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 50.3528 | 0.1571 | 0.21 | 0.0066 |
mini_kmeans | mini_kmeans | 4 | dataframe | 4 | 5e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 50.3528 | 0.1571 | 0.21 | 0.0066 |
mini_kmeans | mini_kmeans | 4 | dataframe | 5 | 9e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 50.3528 | 0.1571 | 0.21 | 0.0067 |
mini_kmeans | mini_kmeans | 5 | dataframe | 1 | 5e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 76.3976 | 0.1216 | 0.17 | 0.0065 |
mini_kmeans | mini_kmeans | 5 | dataframe | 2 | 5e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 76.3976 | 0.1216 | 0.17 | 0.0065 |
mini_kmeans | mini_kmeans | 5 | dataframe | 3 | 5e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 76.3976 | 0.1216 | 0.17 | 0.0065 |
mini_kmeans | mini_kmeans | 5 | dataframe | 4 | 5e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 76.3976 | 0.1216 | 0.17 | 0.0066 |
mini_kmeans | mini_kmeans | 5 | dataframe | 5 | 5e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 76.3976 | 0.1216 | 0.17 | 0.0067 |
mini_kmeans | mini_kmeans | 6 | dataframe | 1 | 6e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 76.5341 | 0.15 | 0.17 | 0.007 |
mini_kmeans | mini_kmeans | 6 | dataframe | 2 | 6e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 76.5341 | 0.15 | 0.17 | 0.0071 |
mini_kmeans | mini_kmeans | 6 | dataframe | 3 | 6e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 76.5341 | 0.15 | 0.17 | 0.0072 |
mini_kmeans | mini_kmeans | 6 | dataframe | 4 | 6e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 76.5341 | 0.15 | 0.17 | 0.0075 |
mini_kmeans | mini_kmeans | 6 | dataframe | 5 | 7e-04 | 0 | 0 | 0 | 0 | 0 | 0 | 76.5341 | 0.15 | 0.17 | 0.0081 |
This property tells us if we have made an internal evaluation of the groups
#> [1] TRUE
This property tells us if we have made an external evaluation of the groups
#> [1] TRUE
Algorithms executed
#> [1] "gmm" "kmeans_arma" "kmeans_rcpp" "mini_kmeans"
Similarity Metrics
#> [1] "gmm_euclidean" "gmm_manhattan" "kmeans_arma" "kmeans_rcpp"
#> [5] "mini_kmeans"
If we want to obtain the classified variables instead of the values we must use the variable property
df_variable <- Clustering::clustering(df = Clustering::basketball,
packages = c("clusterr"), min = 4, max = 6, variables = TRUE)
Algorithm | Distance | Clusters | Dataset | Ranking | timeExternal | entropy | variation_information | precision | recall | f_measure | fowlkes_mallows_index | connectivity | dunn | silhouette | timeInternal |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gmm | gmm_euclidean | 4 | dataframe | 1 | 5 | 2 | 3 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 3 |
gmm | gmm_euclidean | 4 | dataframe | 2 | 1 | 4 | 5 | 4 | 4 | 4 | 4 | 2 | 2 | 2 | 5 |
gmm | gmm_euclidean | 4 | dataframe | 3 | 4 | 3 | 1 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | 4 |
gmm | gmm_euclidean | 4 | dataframe | 4 | 2 | 5 | 4 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 1 |
gmm | gmm_euclidean | 4 | dataframe | 5 | 3 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 2 |
gmm | gmm_euclidean | 5 | dataframe | 1 | 5 | 2 | 3 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 3 |
gmm | gmm_euclidean | 5 | dataframe | 2 | 1 | 4 | 4 | 4 | 4 | 4 | 4 | 2 | 2 | 2 | 5 |
gmm | gmm_euclidean | 5 | dataframe | 3 | 4 | 3 | 5 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | 2 |
gmm | gmm_euclidean | 5 | dataframe | 4 | 2 | 5 | 1 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 4 |
gmm | gmm_euclidean | 5 | dataframe | 5 | 3 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 1 |
gmm | gmm_euclidean | 6 | dataframe | 1 | 3 | 2 | 4 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 |
gmm | gmm_euclidean | 6 | dataframe | 2 | 1 | 4 | 3 | 4 | 4 | 4 | 4 | 2 | 2 | 2 | 3 |
gmm | gmm_euclidean | 6 | dataframe | 3 | 5 | 3 | 5 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | 2 |
gmm | gmm_euclidean | 6 | dataframe | 4 | 2 | 5 | 1 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 4 |
gmm | gmm_euclidean | 6 | dataframe | 5 | 4 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
gmm | gmm_manhattan | 4 | dataframe | 1 | 3 | 2 | 3 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 5 |
gmm | gmm_manhattan | 4 | dataframe | 2 | 1 | 4 | 5 | 4 | 4 | 4 | 4 | 2 | 2 | 2 | 4 |
gmm | gmm_manhattan | 4 | dataframe | 3 | 4 | 3 | 1 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | 2 |
gmm | gmm_manhattan | 4 | dataframe | 4 | 2 | 5 | 4 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 3 |
gmm | gmm_manhattan | 4 | dataframe | 5 | 5 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 1 |
gmm | gmm_manhattan | 5 | dataframe | 1 | 3 | 2 | 4 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 |
gmm | gmm_manhattan | 5 | dataframe | 2 | 1 | 4 | 3 | 4 | 4 | 4 | 4 | 2 | 2 | 2 | 4 |
gmm | gmm_manhattan | 5 | dataframe | 3 | 5 | 3 | 5 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | 5 |
gmm | gmm_manhattan | 5 | dataframe | 4 | 2 | 5 | 1 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 3 |
gmm | gmm_manhattan | 5 | dataframe | 5 | 4 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 2 |
gmm | gmm_manhattan | 6 | dataframe | 1 | 5 | 2 | 4 | 2 | 4 | 2 | 2 | 1 | 1 | 1 | 5 |
gmm | gmm_manhattan | 6 | dataframe | 2 | 2 | 4 | 3 | 4 | 2 | 4 | 4 | 2 | 2 | 2 | 3 |
gmm | gmm_manhattan | 6 | dataframe | 3 | 3 | 3 | 5 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | 2 |
gmm | gmm_manhattan | 6 | dataframe | 4 | 1 | 5 | 1 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 4 |
gmm | gmm_manhattan | 6 | dataframe | 5 | 4 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 1 |
kmeans_arma | kmeans_arma | 4 | dataframe | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
kmeans_arma | kmeans_arma | 4 | dataframe | 2 | 4 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
kmeans_arma | kmeans_arma | 4 | dataframe | 3 | 1 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
kmeans_arma | kmeans_arma | 4 | dataframe | 4 | 5 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 5 |
kmeans_arma | kmeans_arma | 4 | dataframe | 5 | 3 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 |
kmeans_arma | kmeans_arma | 5 | dataframe | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
kmeans_arma | kmeans_arma | 5 | dataframe | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 5 |
kmeans_arma | kmeans_arma | 5 | dataframe | 3 | 5 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 4 |
kmeans_arma | kmeans_arma | 5 | dataframe | 4 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 2 |
kmeans_arma | kmeans_arma | 5 | dataframe | 5 | 4 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 3 |
kmeans_arma | kmeans_arma | 6 | dataframe | 1 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 4 |
kmeans_arma | kmeans_arma | 6 | dataframe | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 5 |
kmeans_arma | kmeans_arma | 6 | dataframe | 3 | 5 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 2 |
kmeans_arma | kmeans_arma | 6 | dataframe | 4 | 1 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 1 |
kmeans_arma | kmeans_arma | 6 | dataframe | 5 | 4 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 3 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 1 | 5 | 4 | 5 | 2 | 3 | 2 | 2 | 1 | 1 | 1 | 5 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 2 | 1 | 2 | 1 | 4 | 2 | 4 | 4 | 2 | 2 | 2 | 4 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 3 | 4 | 3 | 3 | 3 | 4 | 3 | 3 | 3 | 3 | 3 | 2 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 4 | 2 | 5 | 4 | 1 | 1 | 1 | 1 | 4 | 4 | 4 | 1 |
kmeans_rcpp | kmeans_rcpp | 4 | dataframe | 5 | 3 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 3 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 1 | 5 | 2 | 4 | 2 | 3 | 2 | 2 | 1 | 1 | 1 | 5 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 2 | 1 | 4 | 5 | 4 | 2 | 4 | 4 | 2 | 2 | 2 | 4 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 3 | 4 | 3 | 1 | 3 | 4 | 3 | 3 | 3 | 3 | 3 | 2 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 4 | 2 | 5 | 3 | 1 | 1 | 1 | 1 | 4 | 4 | 4 | 1 |
kmeans_rcpp | kmeans_rcpp | 5 | dataframe | 5 | 3 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 3 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 1 | 5 | 2 | 4 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 2 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 2 | 1 | 4 | 3 | 4 | 4 | 4 | 4 | 2 | 2 | 2 | 5 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 3 | 3 | 3 | 5 | 1 | 1 | 1 | 1 | 3 | 3 | 3 | 3 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 4 | 2 | 5 | 1 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 1 |
kmeans_rcpp | kmeans_rcpp | 6 | dataframe | 5 | 4 | 1 | 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 |
mini_kmeans | mini_kmeans | 4 | dataframe | 1 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
mini_kmeans | mini_kmeans | 4 | dataframe | 2 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 5 |
mini_kmeans | mini_kmeans | 4 | dataframe | 3 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 2 |
mini_kmeans | mini_kmeans | 4 | dataframe | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
mini_kmeans | mini_kmeans | 4 | dataframe | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 3 |
mini_kmeans | mini_kmeans | 5 | dataframe | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 |
mini_kmeans | mini_kmeans | 5 | dataframe | 2 | 5 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 |
mini_kmeans | mini_kmeans | 5 | dataframe | 3 | 1 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 5 |
mini_kmeans | mini_kmeans | 5 | dataframe | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 1 |
mini_kmeans | mini_kmeans | 5 | dataframe | 5 | 3 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 |
mini_kmeans | mini_kmeans | 6 | dataframe | 1 | 5 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 4 |
mini_kmeans | mini_kmeans | 6 | dataframe | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 5 |
mini_kmeans | mini_kmeans | 6 | dataframe | 3 | 1 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 1 |
mini_kmeans | mini_kmeans | 6 | dataframe | 4 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 2 |
mini_kmeans | mini_kmeans | 6 | dataframe | 5 | 4 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 3 |
If we only want to obtain the best classified variables or values for the external variables we execute the following method:
|
We also obtain the best classified values for internal evaluation
|
In order to obtain the best evaluation by algorithm
Algorithm | Distance | timeExternal | entropy | variation_information | precision | recall | f_measure | fowlkes_mallows_index |
---|---|---|---|---|---|---|---|---|
gmm | gmm_euclidean | 0.019 | 0.433 | 4.7617 | 0.1822 | 0.451 | 0.2595 | 0.2867 |
gmm | gmm_manhattan | 0.0178 | 0.4555 | 4.7617 | 0.1822 | 0.451 | 0.2595 | 0.2867 |
kmeans_arma | kmeans_arma | 4e-04 | 0 | 0 | 0 | 0 | 0 | 0 |
kmeans_rcpp | kmeans_rcpp | 0.0161 | 0.4545 | 4.6267 | 0.1703 | 0.5 | 0.23 | 0.2461 |
mini_kmeans | mini_kmeans | 6e-04 | 0 | 0 | 0 | 0 | 0 | 0 |
Based on the results obtained we can see that the gmm algorithm behaves better.
From the algorithm with the best rating we can select the most appropriate number of clusters.
Algorithm | Clusters | timeExternal | entropy | variation_information | precision | recall | f_measure | fowlkes_mallows_index |
---|---|---|---|---|---|---|---|---|
gmm | 4 | 0.0181 | 0.3161 | 4.7617 | 0.1822 | 0.451 | 0.2595 | 0.2867 |
gmm | 5 | 0.0174 | 0.4258 | 4.3626 | 0.167 | 0.2865 | 0.21 | 0.2173 |
gmm | 6 | 0.019 | 0.4555 | 4.4385 | 0.1744 | 0.2791 | 0.2147 | 0.2206 |
The same checks performed for external evaluation metrics, we can perform for internal evaluation.
Algorithm | Distance | timeInternal | connectivity | dunn | silhouette |
---|---|---|---|---|---|
gmm | gmm_euclidean | 0.0065 | 51.4599 | 0.1646 | 0.25 |
gmm | gmm_manhattan | 0.0064 | 54.8667 | 0.1467 | 0.26 |
kmeans_arma | kmeans_arma | 0.007 | 57.6278 | 0.1619 | 0.26 |
kmeans_rcpp | kmeans_rcpp | 0.0062 | 74.7754 | 0.1741 | 0.23 |
mini_kmeans | mini_kmeans | 0.007 | 76.5341 | 0.1571 | 0.21 |
In this case we can see that depending on the evaluation you want to make, one algorithm or another is chosen.