The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Problem: You see the error “Backend library is not loaded. Please run install_localLLM() first.”
Solution: Run the installation function after loading the package:
This downloads the platform-specific backend library. You only need to do this once.
Problem: install_localLLM() fails to
download or install.
Solution: Check your platform is supported: - Windows (x86-64) - macOS (Apple Silicon / ARM64) - macOS (Intel / x86-64) - Linux (x86-64)
If you’re on an unsupported platform, you may need to compile llama.cpp manually.
Problem: A previous download was interrupted and left a lock file.
Solution: Clear the cache directory:
cache_root <- tools::R_user_dir("localLLM", which = "cache")
models_dir <- file.path(cache_root, "models")
unlink(models_dir, recursive = TRUE, force = TRUE)Then try downloading again.
Problem: Large model downloads fail partway through.
Solution: 1. Check your internet connection 2. Try a smaller model first 3. Download manually and load from local path:
Problem: You’re trying to load a model by name but it’s not found.
Solution: Check what’s actually cached:
Use the exact filename or a unique substring that matches only one model.
Problem: Downloading a gated/private model fails with authentication error.
Solution: Set your Hugging Face token:
# Get token from https://huggingface.co/settings/tokens
set_hf_token("hf_your_token_here")
# Now download should work
model <- model_load("https://huggingface.co/private/model.gguf")Problem: R crashes or freezes when calling
model_load().
Solution: The model is too large for your available RAM. Try:
Problem: localLLM warns about insufficient memory.
Solution: The safety check detected potential issues. Options:
Use a smaller model
Reduce context size:
If you’re sure you have enough memory, proceed when prompted
Problem: Generation is slow even with
n_gpu_layers = 999.
Solution: Check if GPU is detected:
If no GPU is listed, the backend may not support your GPU. Currently supported:
| Platform | GPU backend | Supported hardware |
|---|---|---|
| macOS (Apple Silicon) | Metal | All Apple Silicon (M1 and later) |
| macOS (Intel) | Metal | Intel Macs running macOS 12+ |
| Windows (x86-64) | Vulkan | NVIDIA GeForce 10xx+, AMD RX 400+, Intel Arc |
| Linux (x86-64) | Vulkan | NVIDIA GeForce 10xx+, AMD RX 400+, Intel Arc |
If your GPU is not listed, install with force_cpu = TRUE
to use the CPU build:
Problem: model_load() or
context_create() print hardware information, model
metadata, or other log lines that clutter the console or appear in
knitted documents and R CMD check output.
Solution: Reduce the verbosity level:
# Default (verbosity = 1): warnings only — hardware limits, context size notes
model <- model_load("model.gguf")
# Fully silent loading
model <- model_load("model.gguf", verbosity = 0)
ctx <- context_create(model, verbosity = 0)Verbosity levels: 0 = silent, 1 = warnings
only (default for model_load and
context_create), 2 = informational messages,
3 = full debug output. generate() and
generate_parallel() already default to
verbosity = 0.
Note: backend_init() always prints one line
(localLLM backend library loaded successfully.) regardless
of verbosity; this cannot be suppressed.
Problem: The model produces meaningless text.
Solution: 1. Ensure you’re using a chat template:
messages <- list(
list(role = "user", content = "Your question")
)
prompt <- apply_chat_template(model, messages)
result <- generate(ctx, prompt)<|eot_id|>Problem: Output includes control tokens.
Solution: Use the clean = TRUE
parameter:
Problem: Output is cut off before completion.
Solution: Increase max_tokens:
Problem: Text generation takes much longer than expected.
Solutions:
Use GPU acceleration:
Use a smaller model: Q4 quantization is faster than Q8
Reduce context size:
Use parallel processing for multiple prompts:
Problem: Trying to load a non-GGUF model.
Solution: localLLM only supports GGUF format. Convert your model or find a GGUF version on Hugging Face (search for “model-name gguf”).
| Error | Cause | Solution |
|---|---|---|
| “Backend library is not loaded” | Backend not installed | Run install_localLLM() |
| “Invalid model handle” | Model was freed/invalid | Reload the model |
| “Invalid context handle” | Context was freed/invalid | Recreate the context |
| “Failed to open library” | Backend installation issue | Reinstall with
install_localLLM(force_reinstall = TRUE) |
| “Download timeout” | Network issue or lock file | Clear cache and retry |
If you encounter issues not covered here:
?function_namesessionInfo()hardware_profile()# Check installation status
lib_is_installed()
# Check hardware
hardware_profile()
# List cached models
list_cached_models()
# List Ollama models
list_ollama_models()
# Clear model cache
cache_dir <- file.path(tools::R_user_dir("localLLM", "cache"), "models")
unlink(cache_dir, recursive = TRUE)
# Force reinstall backend (re-runs GPU detection)
install_localLLM(force_reinstall = TRUE)These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.