The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Package {geokmeans}


Type: Package
Title: A Collection of Fast, Exact and Eco-Friendly k-Means Clustering Algorithms
Version: 0.1.0
Description: A collection of fast k-means clustering algorithms under a single, uniform interface. The core method is Geometric-k-means, a bound-free algorithm of Sharma et al. (2026) <doi:10.1007/s10994-025-06891-1> that uses geometry to restrict computation to the data points able to change clusters, substantially reducing distance computations and runtime while returning the same result as standard k-means. Also included are Lloyd's algorithm, Elkan, Hamerly, Annulus, Exponion, and Ball k-means. All algorithms are implemented in 'C++' via 'Rcpp' and 'RcppEigen' and return the final centroids, optional per-point cluster assignments, and computational statistics.
License: GPL-3
Encoding: UTF-8
Imports: Rcpp
LinkingTo: Rcpp, RcppEigen
SystemRequirements: C++17
Suggests: testthat (≥ 3.0.0), knitr, rmarkdown
Config/testthat/edition: 3
VignetteBuilder: knitr
URL: https://github.com/parichit/Geometric-k-means
BugReports: https://github.com/parichit/Geometric-k-means/issues
NeedsCompilation: yes
Packaged: 2026-06-17 16:23:38 UTC; parichit
Author: Parichit Sharma [aut, cre, cph], Hasan Kurban [aut]
Maintainer: Parichit Sharma <parishar@iu.edu>
Config/roxygen2/version: 8.0.0
Repository: CRAN
Date/Publication: 2026-06-22 16:10:02 UTC

geokmeans: Fast and Eco-Friendly k-Means Clustering Algorithms

Description

Fast C++ implementations of several k-means clustering algorithms exposed to R through a uniform interface: Lloyd's algorithm, Elkan, Hamerly, Annulus, Exponion, Ball k-means, and the bound-free Geometric-k-means method.

Details

The main entry points are geo_kmeans(), lloyd_kmeans(), elkan_kmeans(), hamerly_kmeans(), annulus_kmeans(), exponion_kmeans(), ball_kmeans(), and the dispatcher kmeans_dc().

Author(s)

Maintainer: Parichit Sharma parishar@iu.edu [copyright holder]

Authors:

References

Sharma, P., Stanislaw, M., Kurban, H., Kulekci, O., and Dalkilic, M. (2026). Geometric-k-means: A Bound Free Approach to Fast and Eco-Friendly k-means. doi:10.1007/s10994-025-06891-1

See Also

Useful links:


k-Means clustering algorithms

Description

Run one of the bundled k-means variants on a numeric data matrix. All functions share the same interface and return value; they differ only in the acceleration strategy used internally. geo_kmeans() runs the bound-free Geometric-k-means method.

Usage

geo_kmeans(
  data,
  centers,
  iter_max = 100L,
  threshold = 0.001,
  init = c("random", "sequential"),
  seed = NULL,
  with_labels = TRUE,
  verbose = FALSE,
  drop_empty = TRUE
)

lloyd_kmeans(
  data,
  centers,
  iter_max = 100L,
  threshold = 0.001,
  init = c("random", "sequential"),
  seed = NULL,
  with_labels = TRUE,
  verbose = FALSE,
  drop_empty = TRUE
)

elkan_kmeans(
  data,
  centers,
  iter_max = 100L,
  threshold = 0.001,
  init = c("random", "sequential"),
  seed = NULL,
  with_labels = TRUE,
  verbose = FALSE,
  drop_empty = TRUE
)

hamerly_kmeans(
  data,
  centers,
  iter_max = 100L,
  threshold = 0.001,
  init = c("random", "sequential"),
  seed = NULL,
  with_labels = TRUE,
  verbose = FALSE,
  drop_empty = TRUE
)

annulus_kmeans(
  data,
  centers,
  iter_max = 100L,
  threshold = 0.001,
  init = c("random", "sequential"),
  seed = NULL,
  with_labels = TRUE,
  verbose = FALSE,
  drop_empty = TRUE
)

exponion_kmeans(
  data,
  centers,
  iter_max = 100L,
  threshold = 0.001,
  init = c("random", "sequential"),
  seed = NULL,
  with_labels = TRUE,
  verbose = FALSE,
  drop_empty = TRUE
)

ball_kmeans(
  data,
  centers,
  iter_max = 100L,
  threshold = 0.001,
  init = c("random", "sequential"),
  seed = NULL,
  with_labels = TRUE,
  verbose = FALSE,
  drop_empty = TRUE
)

Arguments

data

A numeric matrix or data frame with observations in rows and features in columns. Missing values are not allowed.

centers

Either a single positive integer giving the number of clusters k, or a numeric matrix of initial cluster centres (one centroid per row, with ncol(centers) == ncol(data)).

iter_max

Maximum number of iterations.

threshold

Convergence threshold on centroid movement.

init

Initialisation strategy when centers is a number: "random" (random observations) or "sequential" (the first k observations). Ignored when centers is a matrix.

seed

Optional integer seed for the random initialisation, or NULL (the default). Initialisation uses R's random number generator: supplying a seed sets it via set.seed() so the result is reproducible, while NULL leaves the RNG untouched, so the ambient stream (e.g. a preceding set.seed() in your session) is honoured.

with_labels

Logical; if TRUE (default) the result includes a per-observation cluster assignment computed from the final centroids.

verbose

Logical; if TRUE, print the algorithm's convergence message.

drop_empty

Logical; if TRUE (default), clusters that end up with no assigned observations are removed from the result and the remaining cluster labels are renumbered, with a message. Requesting more clusters than the number of distinct rows in data is always an error.

Value

An object of class "geokmeans": a list with components

centroids

A ⁠k x ncol(data)⁠ matrix of final cluster centres.

cluster

Integer vector of cluster ids (1-based), if with_labels = TRUE.

iterations

Number of iterations performed.

distance_calculations

Total number of point-to-centroid distance computations.

method

The algorithm used.

k

The number of clusters.

References

Sharma, P., Stanislaw, M., Kurban, H., Kulekci, O., and Dalkilic, M. (2026). Geometric-k-means: A Bound Free Approach to Fast and Eco-Friendly k-means. doi:10.1007/s10994-025-06891-1

Examples

set.seed(1)
X <- rbind(matrix(rnorm(100, 0), ncol = 2),
           matrix(rnorm(100, 5), ncol = 2))
fit <- geo_kmeans(X, centers = 2)
fit$centroids
table(fit$cluster)

# Supplying explicit starting centroids:
geo_kmeans(X, centers = X[c(1, 51), ])


Run a k-means variant by name

Description

A thin dispatcher over the individual algorithm functions.

Usage

kmeans_dc(
  data,
  centers,
  method = c("geokmeans", "lloyd", "elkan", "hamerly", "annulus", "exponion", "ball"),
  ...
)

Arguments

data

A numeric matrix or data frame with observations in rows and features in columns. Missing values are not allowed.

centers

Either a single positive integer giving the number of clusters k, or a numeric matrix of initial cluster centres (one centroid per row, with ncol(centers) == ncol(data)).

method

The algorithm to use. One of "geokmeans", "lloyd", "elkan", "hamerly", "annulus", "exponion", "ball".

...

Further arguments passed to the chosen algorithm.

Value

An object of class "geokmeans"; see geo_kmeans().

Examples

set.seed(1)
X <- rbind(matrix(rnorm(100, 0), ncol = 2),
           matrix(rnorm(100, 5), ncol = 2))
kmeans_dc(X, centers = 2, method = "elkan")

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.