The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This article summarizes the benchmark recorded in the
bench/ directory for bigKNN 0.3.0. The goal is
not to claim a universal winner. The goal is to show:
bigKNN compares with several exact dense
k-nearest-neighbour implementations on the same Euclidean workloadsbigKNN behaves when the reference data is stored as
a file-backed bigmemory::big.matrixThe benchmark driver is bench/exact_knn_benchmark.R, and
the full recorded outputs are:
bench/exact_knn_benchmark_report.mdbench/exact_knn_benchmark_results.csvbench/exact_knn_benchmark_validation.csvThe dense exact comparator set in the recorded run was:
Rfast::dista(..., index = TRUE)FNN::get.knnx(..., algorithm = "brute")dbscan::kNN(..., approx = 0)RANN::nn2(..., eps = 0)nabor::knn(..., eps = 0)BiocNeighbors::queryKNN(..., BNPARAM = KmknnParam(distance = "Euclidean"))bigKNN itself was measured in three dense modes:
knn_bigmatrix()knn_search_prepared()knn_search_stream_prepared()For larger scale runs, the benchmark used file-backed
big.matrix references and measured:
knn_prepare_bigmatrix()knn_search_prepared()knn_search_stream_prepared()The recorded benchmark was generated on:
bigKNN 0.3.0bigmemory 4.6.4Rfast 2.1.5.2FNN 1.1.4.1dbscan 1.2.4RANN 2.6.2nabor 0.5.0BiocNeighbors 2.2.0The benchmark covers three dense comparator cases and two larger file-backed scale cases.
| Case | n_ref |
n_query |
p |
k |
Reference size | Full dense pairwise matrix |
|---|---|---|---|---|---|---|
dense_small |
5,000 |
250 |
16 |
10 |
0.61 MB |
0.01 GB |
dense_medium |
20,000 |
500 |
16 |
10 |
2.44 MB |
0.07 GB |
dense_large |
50,000 |
1,000 |
16 |
10 |
6.10 MB |
0.37 GB |
filebacked_xlarge |
100,000 |
1,000 |
32 |
10 |
24.41 MB |
0.75 GB |
filebacked_2xlarge |
200,000 |
1,000 |
32 |
10 |
48.83 MB |
1.49 GB |
The full_pairwise_gb column is intentionally included
because it highlights what bigKNN does not materialize: a
full dense query-by-reference distance matrix.
All dense exact comparators matched bigKNN on neighbour
indices for the recorded cases. All distance-capable dense comparators
matched on distances as well.
That matters because the benchmark is intended to compare exact
search implementations, not approximate recall. The validation table in
bench/exact_knn_benchmark_validation.csv is therefore as
important as the timing table.
The table below compares the recorded bigKNN_prepared
median with the fastest external exact comparator in each dense
case.
| Case | bigKNN_prepared median |
Fastest external exact comparator |
|---|---|---|
dense_small |
0.110 s |
0.042 s (FNN_brute and
dbscan_kdtree tie) |
dense_medium |
0.822 s |
0.328 s (RANN_kd,
Rfast_index_only, and nabor_kd tie) |
dense_large |
4.642 s |
1.289 s (nabor_kd) |
The broader dense median ranking in the recorded run was:
bigKNN1.29 s, while
bigKNN_prepared remained around 4.64 sThis is a sensible result. The dense comparator packages are
optimized for ordinary in-memory matrix search, while
bigKNN is designed around
bigmemory::big.matrix, reusable prepared references,
streaming, and larger workflows that do not assume everything should
round-trip through ordinary R matrices.
The scale portion of the benchmark focuses on the feature set that is
more specific to bigKNN: exact search on file-backed
big.matrix references and streamed output into destination
big.matrix objects.
| Case | bigKNN_prepare |
bigKNN_prepared_search |
bigKNN_stream |
|---|---|---|---|
filebacked_xlarge
(100,000 x 1,000 x 32) |
0.039 s |
9.714 s |
9.290 s |
filebacked_2xlarge
(200,000 x 1,000 x 32) |
0.078 s |
18.594 s |
18.533 s |
Two points stand out:
That second point is important in practice. It means the file-backed
exact path behaves predictably as the reference grows, which is exactly
the kind of workflow bigKNN is meant to support.
These benchmark numbers should be interpreted in context:
k neighbours back in ordinary R matrices, the
specialist dense comparator packages can be fasterbigmemory::big.matrix,
reusable prepared references, streamed output, resumable jobs, or larger
file-backed data, bigKNN provides functionality that goes
beyond the dense in-memory comparator setIn other words, the benchmark is useful both for performance comparison and for positioning the package.
From the package root, rerun the benchmark with:
The script prefers benchmarking the source tree through
pkgload::load_all() when available, and otherwise falls
back to the installed bigKNN package.
Benchmark results are machine-specific. The recorded values in this article are best treated as a reproducible snapshot rather than a universal ranking.
The benchmark is also deliberately split into two stories:
big.matrix-oriented workflow of bigKNNThat split keeps the benchmark honest about what is truly comparable and what is specific to the package’s design goals.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.