The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Frequently Asked Questions

Installation Issues

“Backend library is not loaded” error

Problem: You see the error “Backend library is not loaded. Please run install_localLLM() first.”

Solution: Run the installation function after loading the package:

library(localLLM)
install_localLLM()

This downloads the platform-specific backend library. You only need to do this once.

Installation fails on my platform

Problem: install_localLLM() fails to download or install.

Solution: Check your platform is supported: - Windows (x86-64) - macOS (ARM64 / Apple Silicon) - Linux (x86-64)

If you’re on an unsupported platform, you may need to compile llama.cpp manually.

“Library already installed” but functions don’t work

Problem: install_localLLM() says the library is installed, but generation fails.

Solution: Try reinstalling:

# Force reinstall
install_localLLM(force = TRUE)

# Verify installation
lib_is_installed()

Model Download Issues

“Download lock” or “Another download in progress” error

Problem: A previous download was interrupted and left a lock file.

Solution: Clear the cache directory:

cache_root <- tools::R_user_dir("localLLM", which = "cache")
models_dir <- file.path(cache_root, "models")
unlink(models_dir, recursive = TRUE, force = TRUE)

Then try downloading again.

Download times out or fails

Problem: Large model downloads fail partway through.

Solution: 1. Check your internet connection 2. Try a smaller model first 3. Download manually and load from local path:

# Download with browser or wget, then:
model <- model_load("/path/to/downloaded/model.gguf")

“Model not found” when using cached model

Problem: You’re trying to load a model by name but it’s not found.

Solution: Check what’s actually cached:

cached <- list_cached_models()
print(cached)

Use the exact filename or a unique substring that matches only one model.

Private Hugging Face model fails

Problem: Downloading a gated/private model fails with authentication error.

Solution: Set your Hugging Face token:

# Get token from https://huggingface.co/settings/tokens
set_hf_token("hf_your_token_here")

# Now download should work
model <- model_load("https://huggingface.co/private/model.gguf")

Memory Issues

R crashes when loading a model

Problem: R crashes or freezes when calling model_load().

Solution: The model is too large for your available RAM. Try:

  1. Use a smaller quantized model (Q4 instead of Q8)
  2. Free up memory by closing other applications
  3. Check model requirements:
hw <- hardware_profile()
cat("Available RAM:", hw$ram_gb, "GB\n")

“Memory check failed” warning

Problem: localLLM warns about insufficient memory.

Solution: The safety check detected potential issues. Options:

  1. Use a smaller model

  2. Reduce context size:

    ctx <- context_create(model, n_ctx = 512)  # Smaller context
  3. If you’re sure you have enough memory, proceed when prompted

Context creation fails with large n_ctx

Problem: Creating a context with large n_ctx fails.

Solution: Reduce the context size or use a smaller model:

# Instead of n_ctx = 32768, try:
ctx <- context_create(model, n_ctx = 4096)

GPU Issues

GPU not being used

Problem: Generation is slow even with n_gpu_layers = 999.

Solution: Check if GPU is detected:

hw <- hardware_profile()
print(hw$gpu)

If no GPU is listed, the backend may not support your GPU. Currently supported: - NVIDIA GPUs (via CUDA) - Apple Silicon (Metal)

“CUDA out of memory” error

Problem: GPU runs out of memory during generation.

Solution: Reduce GPU layer count:

# Offload fewer layers to GPU
model <- model_load("model.gguf", n_gpu_layers = 20)

Generation Issues

Output is garbled or nonsensical

Problem: The model produces meaningless text.

Solution: 1. Ensure you’re using a chat template:

messages <- list(
  list(role = "user", content = "Your question")
)
prompt <- apply_chat_template(model, messages)
result <- generate(ctx, prompt)
  1. The model file may be corrupted - redownload it

Output contains strange tokens like <|eot_id|>

Problem: Output includes control tokens.

Solution: Use the clean = TRUE parameter:

result <- generate(ctx, prompt, clean = TRUE)
# or
result <- quick_llama("prompt", clean = TRUE)

Generation stops too early

Problem: Output is cut off before completion.

Solution: Increase max_tokens:

result <- quick_llama("prompt", max_tokens = 500)

Same prompt gives different results

Problem: Running the same prompt twice gives different outputs.

Solution: Set a seed for reproducibility:

result <- quick_llama("prompt", seed = 42)

With temperature = 0 (default), outputs should be deterministic.


Performance Issues

Generation is very slow

Problem: Text generation takes much longer than expected.

Solutions:

  1. Use GPU acceleration:

    model <- model_load("model.gguf", n_gpu_layers = 999)
  2. Use a smaller model: Q4 quantization is faster than Q8

  3. Reduce context size:

    ctx <- context_create(model, n_ctx = 512)
  4. Use parallel processing for multiple prompts:

    results <- quick_llama(c("prompt1", "prompt2", "prompt3"))

Parallel processing isn’t faster

Problem: generate_parallel() is no faster than sequential generation.

Solution: Ensure n_seq_max is set appropriately:

ctx <- context_create(
  model,
  n_ctx = 2048,
  n_seq_max = 10  # Allow 10 parallel sequences
)

Compatibility Issues

“GGUF format required” error

Problem: Trying to load a non-GGUF model.

Solution: localLLM only supports GGUF format. Convert your model or find a GGUF version on Hugging Face (search for “model-name gguf”).

Model works in Ollama but not localLLM

Problem: An Ollama model doesn’t work when loaded directly.

Solution: Use the Ollama integration:

# List available Ollama models
list_ollama_models()

# Load via Ollama reference
model <- model_load("ollama:model-name")

Common Error Messages

Error Cause Solution
“Backend library is not loaded” Backend not installed Run install_localLLM()
“Invalid model handle” Model was freed/invalid Reload the model
“Invalid context handle” Context was freed/invalid Recreate the context
“Failed to open library” Backend installation issue Reinstall with install_localLLM(force = TRUE)
“Download timeout” Network issue or lock file Clear cache and retry

Getting Help

If you encounter issues not covered here:

  1. Check the documentation: ?function_name
  2. Report bugs: Email with:
    • Your code
    • The error message
    • Output of sessionInfo()
    • Output of hardware_profile()

Quick Reference

# Check installation status
lib_is_installed()

# Check hardware
hardware_profile()

# List cached models
list_cached_models()

# List Ollama models
list_ollama_models()

# Clear model cache
cache_dir <- file.path(tools::R_user_dir("localLLM", "cache"), "models")
unlink(cache_dir, recursive = TRUE)

# Force reinstall backend
install_localLLM(force = TRUE)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.