The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

edgemodelr 0.2.0

SIMD Optimizations for Faster CPU Inference

New Features

Flash attention support: Enabled by default in edge_load_model() via flash_attn = TRUE. Reduces memory usage and improves attention computation speed on CPU.
Full hardware thread utilization: Removed the 4-thread cap for small contexts. edge_load_model() now uses all available CPU threads by default, with n_threads_batch set to max for prompt processing.
User-configurable threading: New n_threads parameter in edge_load_model() allows explicit control over CPU thread count. Pass NULL (default) for auto-detect or an integer to limit cores.
Apple Accelerate framework (macOS): Automatically links the Accelerate framework on macOS builds, enabling hardware-accelerated vDSP vector operations for faster matrix math.
Compiler auto-vectorization: Added -ftree-vectorize to GGML compilation flags on all platforms, allowing GCC/Clang to generate SIMD instructions for eligible loops beyond the hand-tuned GGML kernels.

Existing Features

SIMD-optimized build system: Replaced generic scalar fallback with architecture-aware SIMD detection in both Makevars (Unix) and Makevars.win (Windows)
- x86_64: Enables SSE4.2 baseline by default (universal since Intel Nehalem 2008)
- aarch64/arm64: NEON support built into the ABI (no extra flags needed)
- Other architectures: Automatic generic fallback
User-configurable SIMD levels: Set EDGEMODELR_SIMD environment variable before install to select optimization level:
- GENERIC: Scalar fallback (maximum compatibility)
- SSE42: SSE4.2 baseline (default on x86_64)
- AVX: AVX + F16C (Intel Sandy Bridge 2011+)
- AVX2: AVX2 + FMA + F16C (Intel Haswell 2013+, recommended)
- AVX512: AVX-512 (Intel Skylake-X 2017+)
- NATIVE: Uses -march=native for maximum performance on the build machine
edge_simd_info(): New function to query compile-time SIMD status including architecture, compiler features, and GGML optimization flags
x86 architecture-specific quantization: Enabled optimized x86 quantization kernels (arch/x86/quants.c, arch/x86/repack.cpp) with SIMD-accelerated dot products and matrix operations

Performance

15-40% faster inference on x86_64 with SSE4.2 baseline vs generic scalar
Up to 2-3x faster with AVX2 for quantized model operations
SSSE3-accelerated integer multiply-accumulate for quantized dot products

edgemodelr 0.1.5

CRAN Policy Fixes

Bug Fixes

Fixed donttest examples: Changed resource-intensive examples from \donttest{} to \dontrun{} to prevent downloading multi-GB models during CRAN checks
Fixed M1 Mac compiler warnings: Added explicit static_cast<> for:
- double to float conversions for temperature/top_p parameters
- size_type to int32_t conversions for buffer size parameters
Fixed connection handling: Replaced on.exit() with tryCatch/finally for proper connection cleanup in loops (thanks @eddelbuettel)

edgemodelr 0.1.4

Performance Optimizations for Small Language Models

New Features

Small Model Configuration Helper: New edge_small_model_config() function provides optimized settings for small models (1B-3B parameters)
- Device-specific presets: mobile, laptop, desktop, and server
- Adaptive configuration based on model size and available RAM
- Built-in performance tips and recommendations
- Automatic parameter tuning for optimal inference speed
Adaptive Batch Processing: Intelligent batch size optimization based on context length
- Small contexts (≤512): Uses up to full context for batching
- Medium contexts (512-2048): Uses 1/2 context for optimal throughput
- Large contexts (2048-4096): Uses 1/4 context to balance speed and memory
- Very large contexts (>4096): Caps at 2048 tokens for stability
Smart Thread Allocation: Context-aware CPU thread management
- Small models automatically limit threads to avoid overhead
- Reduces CPU contention on resource-constrained devices
- Improves inference speed for models with contexts ≤2048 tokens
Automatic Context Optimization: Model size-based context tuning
- Small models (<1GB): Optimized to 1024 tokens for faster inference
- Medium models (1-2GB): Set to 1536 tokens for balanced performance
- Large models (>2GB): Maintains 2048+ tokens for quality
- User override available via n_ctx parameter

Performance Improvements

Faster Small Model Inference: 15-30% speed improvement for small models through optimized batch and thread settings
Reduced Memory Footprint: Better memory efficiency for resource-constrained environments
Lower Latency: Optimized thread allocation reduces context switching overhead
Better Scalability: Adaptive configurations scale from mobile devices to servers

Examples and Documentation

Small Model Optimization Example: Comprehensive example demonstrating all optimization features
- Configuration comparison across device types
- Performance benchmarking workflow
- Best practices for different model sizes
- Manual tuning guidelines
Enhanced Testing: New test suite for small model configuration
- Tests for all device target configurations
- Validation of adaptive parameter adjustments
- Safety checks for edge cases

Technical Details

Improved C++ bindings with adaptive batch size calculations
Enhanced R API with intelligent parameter defaults
Better integration between model size detection and configuration
Comprehensive documentation for optimization features

edgemodelr 0.1.2

Major New Features

Ollama Integration

Native Ollama Support: Complete integration with Ollama models through automatic model discovery and SHA-256 hash-based loading
edge_find_ollama_models() - Discover all locally available Ollama models across platforms (Windows, macOS, Linux)
edge_load_ollama_model() - Load Ollama models using convenient SHA-256 hash prefixes instead of full file paths
test_ollama_model_compatibility() - Built-in compatibility testing for Ollama models
Cross-platform Model Detection: Robust model discovery supporting standard installations, snap packages (Linux), and various Windows configurations
Windows OneDrive Compatibility: Enhanced path detection that properly handles Windows OneDrive document folder redirections

Comprehensive Examples Suite

Structured Learning Path: Complete examples directory with progressive difficulty levels (Beginner → Intermediate → Advanced)
01_basic_usage.R: Fundamental operations including model loading, text generation, parameter tuning, and error handling
02_ollama_integration.R: Complete Ollama workflow with model discovery, hash-based loading, and compatibility testing
03_streaming_generation.R: Real-time streaming text generation with interactive chat interfaces and callback processing
04_performance_optimization.R: Advanced performance tuning including GPU acceleration, benchmarking, memory management, and batch processing
examples/README.md: Comprehensive documentation with learning paths, troubleshooting guide, and customization instructions

Package Structure Improvements

Organized File Structure: Consolidated all examples into structured examples/ directory with consistent formatting
Enhanced Documentation: Improved inline documentation and example comments throughout

edgemodelr 0.1.1

Bug Fixes and Improvements

Compilation Fixes

macOS Boolean Conflicts: Completely resolved Boolean enum conflicts by avoiding problematic system headers and using direct function declarations
Filesystem Compatibility: Added comprehensive fallback implementation for disabled std::filesystem on macOS builds
Header Protection: Implemented robust cross-platform header inclusion strategy that works with R, Rcpp, and system headers
System Header Workarounds: Replaced <mach-o/dyld.h> inclusion with direct function declarations to avoid enum conflicts
Format Attribute Warnings: Suppressed unsupported printf format attribute warnings on macOS Apple Clang compiler
CRAN Compliance: Removed non-portable optimization flags (-march=native, -mtune=native, etc.) from Makevars for CRAN compatibility
Cross-platform Build: Enhanced Makevars configuration for better macOS compatibility with R package requirements

Demo and Documentation Updates

Modern UI: Updated streaming chat demo with modern bslib interface for enhanced user experience
Documentation: Improved documentation for edge_clean_cache() function
Examples: Enhanced streaming chat example with better UI components

Technical Improvements

Build System: Updated Makevars files for improved compilation on Windows and Unix systems
Core Bindings: Enhanced C++ bindings for better performance and stability

edgemodelr 0.1.0

Initial CRAN Release

New Features

Local LLM Inference: Complete R interface for running large language models locally using llama.cpp and GGUF model files
Model Management: Built-in functions for downloading and managing popular models from Hugging Face
Text Generation: Support for both blocking and streaming text completion
Interactive Chat: Real-time streaming chat interface with conversation history
Privacy-First: All processing happens locally without external API calls

Core Functions

edge_load_model() - Load GGUF model files for inference
edge_completion() - Generate text completions
edge_stream_completion() - Stream text generation with real-time callbacks
edge_chat_stream() - Interactive chat session with streaming responses
edge_free_model() - Memory management and cleanup
is_valid_model() - Model context validation

Model Management

edge_list_models() - List pre-configured popular models
edge_download_model() - Download models from Hugging Face Hub
edge_quick_setup() - One-line model download and setup

System Support

Self-contained: Includes complete llama.cpp implementation
Cross-platform: Works on Windows, macOS, and Linux
CPU optimized: Runs efficiently on standard hardware
Memory efficient: Support for quantized models

Documentation

Comprehensive getting started vignette
Complete API documentation with examples
README with extensive usage examples
Test coverage for all major functionality

Technical Implementation

C++17 integration via Rcpp
Real-time token streaming with callback support
Automatic memory management with RAII
Robust error handling and validation
Thread-safe model operations

This release provides a complete, production-ready solution for Local Large Language Model Inference Engine in R, enabling private, offline text generation workflows.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.