The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Geometric-k-means

A bound-free approach to fast and eco-friendly k-means


DOI License: GPL v3 Language: C++17 R

Same results as k-means. A fraction of the distance computations, memory, and energy.


Overview

Geometric-k-means (Gk-means) accelerates the classic k-means algorithm by treating data as a first-class citizen. Instead of recomputing every point against every centroid each iteration, it uses simple geometry — scalar projection — to focus computation only on the points that can actually change cluster membership (the high-expressive, HE, points), and skips the rest (low-expressive, LE).

The payoff, established in the paper and reproducible from this repository:

This repository provides both a header-only C++ library and an R package (geokmeans) that wraps it, so you get C++ speed from a one-line R call.

The idea in one picture

A point only needs a distance computation if it is oriented toward a neighboring centroid (HE). Points oriented back toward their own centroid (LE/LHE) keep their membership and are skipped.

LHE vs HE illustration
Only C and D are high-expressive (HE) — the angle from the midpoint toward the neighbor centroid is < 90°. A and B point back toward their own centroid and are treated as LE. Figure from Sharma et al. (2026), Machine Learning (open access).


The R package: geokmeans

Seven fast k-means variants behind one uniform interface — all the heavy lifting runs in C++ via Rcpp and RcppEigen.

Installation

# From CRAN (once published)
install.packages("geokmeans")

# Development version from this repository (Option A: package lives in a subdirectory)
# install.packages("remotes")
remotes::install_github("parichit/Geometric-k-means", subdir = "geokmeans")

A C++17 compiler is required (the standard R toolchains on Windows, macOS, and Linux qualify).

Quick start

library(geokmeans)

set.seed(1)
X <- rbind(matrix(rnorm(200, mean = 0), ncol = 2),
           matrix(rnorm(200, mean = 6), ncol = 2))

# Geometric-k-means
fit <- geo_kmeans(X, centers = 2)

fit                 # iterations, distance computations, centroids
fit$cluster         # per-point cluster labels
fit$centroids       # final cluster centres

# Pick any algorithm by name
kmeans_dc(X, centers = 2, method = "elkan")

# Supply your own starting centroids (like stats::kmeans)
geo_kmeans(X, centers = X[c(1, 101), ])

Every function shares the same signature:

geo_kmeans(data, centers,
           iter_max = 100, threshold = 1e-3,
           init = c("random", "sequential"),
           seed = NULL, with_labels = TRUE,
           drop_empty = TRUE, verbose = FALSE)

centers is either the number of clusters k or a matrix of initial centroids. Random initialization uses R’s RNG, so results are reproducible with set.seed() or the seed argument.

Supported algorithms

Function Algorithm Type
geo_kmeans() Geometric-k-means bound-free
ball_kmeans() Ball k-means++ bound-free
lloyd_kmeans() Lloyd’s k-means baseline
elkan_kmeans() Elkan bounded
hamerly_kmeans() Hamerly bounded
annulus_kmeans() Annulus bounded
exponion_kmeans() Exponion bounded
kmeans_dc() dispatcher — choose via method =

Example data

Two small datasets ship with the package:

bc <- as.matrix(read.csv(
  system.file("extdata", "Breastcancer.csv", package = "geokmeans"),
  header = FALSE))

geo_kmeans(bc, centers = 2)

The C++ library

The algorithms are header-only and can be included directly in your own C++ project (see the src/ directory). For example:

#include "geokmeans.h"
// output_data result = geokmeans(dataset, num_clusters, threshold,
//                                num_iterations, num_cols, "random", seed);

Ball k-means additionally uses Eigen; the R package obtains Eigen through RcppEigen.

Repository layout

Geometric-k-means/
├── src/            # header-only C++ implementations
├── data/           # datasets used in the paper
├── images/         # figures
└── geokmeans/      # the R package (CRAN)

How to cite

If you use Geometric-k-means (the algorithm, the C++ code, or the R package) in your work, please cite the paper. It helps us keep contributing to the open-source community.

@article{Sharma2026Geometrickmeans,
  title   = {Geometric-k-means: A Bound Free Approach to Fast and Eco-Friendly k-means},
  author  = {Sharma, Parichit and Malec, Marcin and Kurban, Hasan and Kulekci, Oguzhan and Dalkilic, Mehmet},
  journal = {Machine Learning},
  year    = {2026},
  volume  = {115},
  number  = {2},
  pages   = {30},
  doi     = {10.1007/s10994-025-06891-1},
  url     = {https://doi.org/10.1007/s10994-025-06891-1}
}

Sharma, P., Malec, M., Kurban, H., Kulekci, O., & Dalkilic, M. (2026). Geometric-k-means: A Bound Free Approach to Fast and Eco-Friendly k-means. Machine Learning, 115(2), article 30. https://doi.org/10.1007/s10994-025-06891-1

Read the paper: Machine Learning (Springer, open access) ·

Preprint: arXiv:2508.06353

License

Released under the GPL-3 license.

Built at the Luddy School of Informatics, Computing & Engineering, Indiana University.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.