The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The bigANNOY package provides approximate
nearest-neighbour search specialised for
bigmemory::big.matrix objects through persisted Annoy
indexes. It keeps the reference data in bigmemory storage
during build and query workflows, supports repeated-query sessions
through explicit open/load helpers, and can stream neighbour indices and
distances directly into destination big.matrix objects.
Current features include:
big.matrix objects, descriptors, descriptor paths, and
external pointers,big.matrix destinations,annoy_open_index(),
annoy_load_bigmatrix(), annoy_is_loaded(),
annoy_close_index(), and
annoy_validate_index(), andbigKNN baseline when bigKNN
is available.These workflows make bigANNOY useful both as a
standalone approximate search package and as the ANN side of an
exact-versus-approximate evaluation pipeline built around
bigKNN.
The package is currently easiest to install from GitHub:
# install.packages("remotes")
remotes::install_github("fbertran/bigANNOY")If you prefer a local source install, clone the repository and run:
R CMD build bigANNOY
R CMD INSTALL bigANNOY_0.3.0.tar.gzThe package defines a small set of runtime options:
| Option | Default value | Description |
|---|---|---|
bigANNOY.block_size |
1024L |
Default number of rows processed per build/search block. |
bigANNOY.progress |
FALSE |
Emit simple progress messages during long-running builds, searches, and benchmarks. |
bigANNOY.backend |
"cpp" |
Backend request. "cpp" uses the native compiled
backend, "auto" falls back when compiled symbols are not
loaded, and "r" forces the debug-only R backend. |
All options can be changed with options() at runtime.
For example, options(bigANNOY.block_size = 2048L) increases
the default block size used by the build and search helpers.
The examples below use a small Euclidean reference matrix so the returned neighbours are easy to inspect.
library(bigmemory)
library(bigANNOY)
reference <- as.big.matrix(matrix(
c(0, 0,
1, 0,
0, 1,
1, 1,
2, 2),
ncol = 2,
byrow = TRUE
))
query <- matrix(
c(0.1, 0.1,
1.8, 1.9),
ncol = 2,
byrow = TRUE
)
index <- annoy_build_bigmatrix(
reference,
path = tempfile(fileext = ".ann"),
metric = "euclidean",
n_trees = 20L,
seed = 123L,
load_mode = "eager"
)
result <- annoy_search_bigmatrix(
index,
query = query,
k = 2L,
search_k = 100L
)
result$index
round(result$distance, 3)reopened <- annoy_open_index(index$path, load_mode = "lazy")
annoy_is_loaded(reopened)
report <- annoy_validate_index(
reopened,
strict = TRUE,
load = TRUE
)
report$valid
annoy_is_loaded(reopened)index_store <- big.matrix(nrow(query), 2L, type = "integer")
distance_store <- big.matrix(nrow(query), 2L, type = "double")
annoy_search_bigmatrix(
index,
query = query,
k = 2L,
xpIndex = index_store,
xpDistance = distance_store
)
bigmemory::as.matrix(index_store)
round(bigmemory::as.matrix(distance_store), 3)benchmark_annoy_bigmatrix(
n_ref = 2000L,
n_query = 200L,
n_dim = 20L,
k = 10L,
n_trees = 50L,
search_k = 1000L,
metric = "euclidean",
exact = TRUE
)If bigKNN is installed, the Euclidean benchmark helpers
also report exact search timing and recall against the exact
baseline.
An installed command-line benchmark script is also available at:
system.file("benchmarks", "benchmark_annoy.R", package = "bigANNOY")Example single-run command:
Rscript "$(R -q -e 'cat(system.file(\"benchmarks\", \"benchmark_annoy.R\", package = \"bigANNOY\"))')" \
--mode=single \
--n_ref=5000 \
--n_query=500 \
--n_dim=50 \
--k=20 \
--n_trees=100 \
--search_k=5000 \
--load_mode=eagerThe package now ships with focused vignettes for the main workflows:
getting-started-bigannoypersistent-indexes-and-lifecyclefile-backed-bigmemory-workflowsbenchmarking-recall-and-latencymetrics-and-tuningvalidation-and-sharing-indexesbigannoy-vs-bigknnTogether they cover the basic ANN workflow, loaded-index lifecycle,
file-backed bigmemory usage, benchmarking and recall evaluation, tuning,
validation and sharing of persisted indexes, and the relationship
between approximate bigANNOY search and exact
bigKNN search.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.