The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Same results as k-means. A fraction of the distance computations, memory, and energy.
Geometric-k-means (Gk-means) accelerates the classic k-means algorithm by treating data as a first-class citizen. Instead of recomputing every point against every centroid each iteration, it uses simple geometry — scalar projection — to focus computation only on the points that can actually change cluster membership (the high-expressive, HE, points), and skips the rest (low-expressive, LE).
The payoff, established in the paper and reproducible from this repository:
This repository provides both a header-only C++
library and an R package
(geokmeans) that wraps it, so you get C++ speed from a
one-line R call.
A point only needs a distance computation if it is oriented toward a neighboring centroid (HE). Points oriented back toward their own centroid (LE/LHE) keep their membership and are skipped.
Only C and D are high-expressive (HE) — the angle from
the midpoint toward the neighbor centroid is < 90°. A and B point
back toward their own centroid and are treated as LE. Figure from
Sharma et al. (2026), Machine Learning (open access).
geokmeansSeven fast k-means variants behind one uniform interface —
all the heavy lifting runs in C++ via Rcpp and RcppEigen.
# From CRAN (once published)
install.packages("geokmeans")
# Development version from this repository (Option A: package lives in a subdirectory)
# install.packages("remotes")
remotes::install_github("parichit/Geometric-k-means", subdir = "geokmeans")A C++17 compiler is required (the standard R toolchains on Windows, macOS, and Linux qualify).
library(geokmeans)
set.seed(1)
X <- rbind(matrix(rnorm(200, mean = 0), ncol = 2),
matrix(rnorm(200, mean = 6), ncol = 2))
# Geometric-k-means
fit <- geo_kmeans(X, centers = 2)
fit # iterations, distance computations, centroids
fit$cluster # per-point cluster labels
fit$centroids # final cluster centres
# Pick any algorithm by name
kmeans_dc(X, centers = 2, method = "elkan")
# Supply your own starting centroids (like stats::kmeans)
geo_kmeans(X, centers = X[c(1, 101), ])Every function shares the same signature:
geo_kmeans(data, centers,
iter_max = 100, threshold = 1e-3,
init = c("random", "sequential"),
seed = NULL, with_labels = TRUE,
drop_empty = TRUE, verbose = FALSE)centers is either the number of clusters k or a
matrix of initial centroids. Random initialization uses R’s RNG, so
results are reproducible with set.seed() or the
seed argument.
| Function | Algorithm | Type |
|---|---|---|
geo_kmeans() |
Geometric-k-means | bound-free |
ball_kmeans() |
Ball k-means++ | bound-free |
lloyd_kmeans() |
Lloyd’s k-means | baseline |
elkan_kmeans() |
Elkan | bounded |
hamerly_kmeans() |
Hamerly | bounded |
annulus_kmeans() |
Annulus | bounded |
exponion_kmeans() |
Exponion | bounded |
kmeans_dc() |
dispatcher — choose via method = |
— |
Two small datasets ship with the package:
bc <- as.matrix(read.csv(
system.file("extdata", "Breastcancer.csv", package = "geokmeans"),
header = FALSE))
geo_kmeans(bc, centers = 2)The algorithms are header-only and can be included directly in your
own C++ project (see the src/ directory). For example:
#include "geokmeans.h"
// output_data result = geokmeans(dataset, num_clusters, threshold,
// num_iterations, num_cols, "random", seed);Ball k-means additionally uses Eigen; the R package obtains
Eigen through RcppEigen.
Geometric-k-means/
├── src/ # header-only C++ implementations
├── data/ # datasets used in the paper
├── images/ # figures
└── geokmeans/ # the R package (CRAN)
If you use Geometric-k-means (the algorithm, the C++ code, or the R package) in your work, please cite the paper. It helps us keep contributing to the open-source community.
@article{Sharma2026Geometrickmeans,
title = {Geometric-k-means: A Bound Free Approach to Fast and Eco-Friendly k-means},
author = {Sharma, Parichit and Malec, Marcin and Kurban, Hasan and Kulekci, Oguzhan and Dalkilic, Mehmet},
journal = {Machine Learning},
year = {2026},
volume = {115},
number = {2},
pages = {30},
doi = {10.1007/s10994-025-06891-1},
url = {https://doi.org/10.1007/s10994-025-06891-1}
}Sharma, P., Malec, M., Kurban, H., Kulekci, O., & Dalkilic, M. (2026). Geometric-k-means: A Bound Free Approach to Fast and Eco-Friendly k-means. Machine Learning, 115(2), article 30. https://doi.org/10.1007/s10994-025-06891-1
Read the paper: Machine Learning (Springer, open access) ·
Preprint: arXiv:2508.06353
Released under the GPL-3 license.
Built at the Luddy School of Informatics, Computing & Engineering, Indiana University.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.