The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
embed_llamar() — high-level embedding provider
compatible with ragnar_store_create(embed = ...). Supports
partial application (lazy model loading), direct call returning a
matrix, and data.frame input. L2 normalization on by default.llama_embed_batch() — embed multiple texts in one call.
Uses true pooled batch decode (llama_get_embeddings_seq)
for embedding models, with automatic fallback to sequential last-token
decode for generative models.llama_get_embeddings_ith() — get embedding vector for
the i-th token (supports negative indexing).llama_get_embeddings_seq() — get pooled embedding for a
sequence ID.llama_new_context() gains embedding
parameter. When TRUE, sets
cparams.embeddings = true and disables causal attention at
creation time. llama_embed_batch() uses this flag to choose
the optimal code path.llama_load_model() gains devices parameter
for explicit backend selection. Accepts device names from
llama_backend_devices(), type keywords ("cpu",
"gpu"), or numeric indices. Multiple devices enable
multi-GPU split.llama_backend_devices() — list all available compute
devices (CPU, GPU, iGPU, accelerator) as a data.frame.llama_numa_init() — NUMA optimization with strategies:
disabled, distribute, isolate, numactl, mirror.llama_time_us() — current time in microseconds.llama_token_to_piece() — convert a single token ID to
its text piece.llama_encode() — run the encoder pass for
encoder-decoder models (e.g. T5, BART).llama_batch_init() / llama_batch_free() —
low-level batch allocation and release with automatic GC finalizer.extern "C" block wrapping
#include <R.h> in r_llama_compat.h (C++
templates cannot appear inside extern "C" linkage).Rinternals.h
#define length(x) and std::codecvt::length()
in r_llama_interface.cpp: C++ standard headers are now
included before R headers, followed by #undef length.llama_token_to_piece,
llama_batch_init, llama_batch_free, and
llama_encode, including GPU context variants.llama_hf_list() — list GGUF files in a Hugging Face
repository.llama_hf_download() — download a GGUF model with local
caching. Supports exact filename, glob pattern, or Ollama-style tag
selection.llama_load_model_hf() — download and load a model in
one step.llama_hf_cache_dir() — get the cache directory
path.llama_hf_cache_info() — inspect cached models.llama_hf_cache_clear() — clear the model cache.jsonlite and utils to
Imports.configure.win and
Makevars.win.in.ggmlR is built
with GPU support.exit() / _Exit() overrides to
r_llama_compat.h to prevent process termination (redirects
to Rf_error()).ggmlR >= 0.5.4.ggmlR).ggmlR.\value tags to all exported functions
describing return class, structure, and meaning.\dontrun{} with \donttest{} in
all examples.cph) for
bundled ‘llama.cpp’ code.NEWS.md in the package tarball (removed from
.Rbuildignore).cran-comments.md..Rbuildignore.Full LLM inference cycle is now available from R:
llama_load_model() / llama_free_model() —
load and free GGUF modelsllama_new_context() / llama_free_context()
— context managementllama_tokenize() / llama_detokenize() —
tokenization and detokenizationllama_generate() — text generation with temperature,
top_k, top_p, greedy supportllama_embeddings() — embedding extractionllama_model_info() — model metadataModel and context are wrapped as ExternalPtr with automatic GC finalizers. The context holds a reference to the model ExternalPtr, preventing premature collection.
llama_generate() runs the full pipeline in a single C++
call: prompt tokenization → encode → autoregressive decode loop with a
sampler chain → detokenization of generated tokens.
19 assertions across 7 test blocks, all passing.
libggml.a from ggmlR packageggml_build_forward_select replaced with simplified
branch selectionThese binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.