The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

sd2R 0.2.1

Diagnostic logging: sd_ctx() now logs all behavioural parameters from R at startup, the resolved weight-residency mode (resident vs streamed when offload_params_to_cpu/params_backend put params on a different backend), and a running summary of weight data re-uploaded to the GPU each pass — making silent performance regimes visible.
FLUX.2 (Klein 4B) support via model_type = "flux2", with auto-detection from tensors/filename.
New llm_path argument in sd_ctx() for LLM text encoders (Qwen3 for FLUX.2 Klein, Mistral-Small for full FLUX.2).
Inpainting: new mask argument in sd_img2img() regenerates only the masked region. Accepts a PNG path, a numeric matrix, or an SD image (white = generate). Works on plain SD/SDXL/FLUX 1/2 weights via the denoise mask. New helper sd_load_mask().
Shiny GUI now shares sd_generate()’s auto-routing (CFG, VAE tiling, highres-fix), fixing FLUX.2 VAE-decode crashes.
New meta_backend argument in sd_ctx(): runs the diffusion model through the ggml meta backend for multi-GPU tensor split (a single model sharded across all GPUs). Requires ggmlR >= 0.7.8; falls back to the normal single-backend path otherwise. The Shiny GUI enables it automatically for FLUX.2.

sd2R 0.2.0

Performance: VAE Decode

vae_conv_direct now defaults to TRUE — VAE decoder uses GGML_OP_CONV_2D (direct convolution via conv2d_mm.comp) instead of the legacy IM2COL + MUL_MAT path.
On RX 9070 (RDNA4) with Vulkan coopmat (KHR): VAE decode 12.6s → 0.5s for 768×768.
All convolutions now run through the coopmat cm1 path (~16-17 TFLOPS/s) when coopmat_support is available; scalar FMA fallback used otherwise.

sd2R 0.1.9

Shiny GUI

New sd_app() launches an interactive Shiny application for image generation.
- Auto-detection of model architecture (Flux, SD3, SDXL, SD1/2) from filenames in the models folder — no manual configuration needed.
- Non-blocking async generation via C++ std::thread: the UI remains responsive during image generation, with a live progress bar and ETA display.
- Automatic role assignment for multi-file models (diffusion, VAE, CLIP-L, T5-XXL).
- Prevents loading incompatible model combinations (e.g. SD1.5 + Flux).

Async C++ Generation API

New internal functions for non-blocking generation from R:
- sd_generate_async() — launches generation in a background C++ thread.
- sd_generate_poll() — checks completion status (atomic flags).
- sd_generate_result() — retrieves results after completion.
Progress callback writes JSON to a temp file (step, steps, pct, elapsed, eta_sec), read by Shiny via later::later() polling.
R API calls (Rprintf, R_CheckUserInterrupt) are suppressed in the worker thread to prevent stack corruption.

Build System

tools/patch_sd_sources.sh rewritten: all sed calls replaced with perl -pi -e for cross-platform compatibility (macOS BSD sed + Linux GNU sed).

sd2R 0.1.8

Bug Fixes

Fixed undefined symbol: ggml_backend_vk_get_device_count load error on CRAN Fedora (clang and gcc). Root cause: ggmlR’s shared library (ggmlR.so) was built with Vulkan, but the static library (libggml.a) shipped without Vulkan objects. The old configure relied on ggml_vulkan_status() which queries ggmlR.so — it reported “AVAILABLE”, causing sd2R to compile with -DSD_USE_VULKAN against a libggml.a that lacked the symbols. Now configure checks nm libggml.a for a defined (T) symbol directly, ignoring the runtime ggmlR check entirely.

sd2R 0.1.7

Multi-GPU Model Parallelism

New device_layout parameter in sd_ctx(): distribute sub-models across multiple Vulkan GPUs without separate processes.
- "mono" — all on one GPU (default, backward-compatible).
- "split_encoders" — CLIP/T5 on GPU 1, diffusion + VAE on GPU 0.
- "split_vae" — CLIP/T5 + VAE on GPU 1, diffusion on GPU 0.
- "encoders_cpu" — text encoders on CPU, diffusion + VAE on GPU.
Low-level diffusion_gpu, clip_gpu, vae_gpu integer arguments for manual device assignment (override presets).

Profiling

New profiling API for per-stage timing of image generation:
- sd_profile_start() / sd_profile_stop() — control event capture.
- sd_profile_get() — raw event data frame.
- sd_profile_summary() — formatted summary with durations and percentages.
Stages tracked: text_encode (with text_encode_clip and text_encode_t5 sub-stages), sampling, vae_decode, vae_encode, model loading.
Pretty-printed output via print.sd_profile().

sd2R 0.1.6

Pipeline Graph API

New sd_pipeline() / sd_node() — sequential graph-based pipeline. Node types: "txt2img", "img2img", "upscale", "save".
sd_run_pipeline(pipeline, ctx) — execute pipeline with a single context.
sd_save_pipeline() / sd_load_pipeline() — JSON serialization.

sd2R 0.1.5

Flux Support

Flux model family (flux1-dev, etc.) fully supported: text-to-image, image-to-image, highres fix, tiled sampling, multi-GPU.
Separate model paths: diffusion_model_path, vae_path, clip_l_path, t5xxl_path in sd_ctx().
cfg_scale auto-defaults to 1.0 for Flux (guidance-distilled models).

img2img Improvements

sd_generate() now defaults width/height to init image dimensions when not specified explicitly.

sd2R 0.1.4

Build System

configure.win rewritten to use template approach (Makevars.win.in → Makevars.win), matching ggmlR pattern.

sd2R 0.1.3

Unified `sd_generate()` Entry Point

New sd_generate() — single function for all generation modes. Automatically selects the optimal strategy (direct, tiled sampling, or highres fix) based on output resolution and available VRAM.
vram_gb parameter in sd_ctx(): set once, auto-routing handles the rest.

Multi-GPU

New sd_generate_multi_gpu() — parallel generation across multiple Vulkan GPUs via callr, one process per GPU, with progress reporting.

Performance

Batch compute optimization for tiled sampling: pre-allocated compute context buffer eliminates ~110 MB malloc/free per UNet call.

sd2R 0.1.2

Highres Fix

New sd_highres_fix() — classic two-pass highres pipeline: txt2img at native resolution → upscale → tiled img2img refinement.
hr_strength parameter (default 0.4) controls refinement intensity.

Tiled img2img

New sd_img2img_tiled() — img2img with MultiDiffusion tiled sampling for large images.

sd2R 0.1.1

VAE Tiling

New vae_mode parameter: "normal", "tiled", "auto" (default). Auto-tiles when image area exceeds threshold.
vae_tile_rel_x / vae_tile_rel_y for adaptive tile sizing.

High-Resolution Pipeline

New sd_txt2img_highres() — patch-based generation for 2K, 4K+ images.
model_type parameter in sd_ctx(): "sd1", "sd2", "sdxl", "flux", "sd3".

Tiled Sampling (MultiDiffusion)

New sd_txt2img_tiled() — tiled diffusion sampling at any resolution. VRAM bounded by tile size, not output resolution.

sd2R 0.1.0

Core

Text-to-image generation via stable-diffusion.cpp (C++ backend).
Support for SD 1.x, SD 2.x, SDXL model versions.
SafeTensors and GGUF model format loading.
Vulkan GPU backend via ggmlR.
Samplers: Euler, Euler A, Heun, DPM2, DPM++ (2M), LCM, DDIM, TCD.
Schedulers: Discrete, Karras, Exponential, Simple, SGM Uniform, AYS, LCM.

R API

sd_ctx() — create model context.
sd_generate() — unified entry point.
sd_txt2img(), sd_img2img() — low-level generation.
sd_save_image(), sd_system_info().

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.

sd2R 0.2.1

sd2R 0.2.0

Performance: VAE Decode

sd2R 0.1.9

Shiny GUI

Async C++ Generation API

Build System

sd2R 0.1.8

Bug Fixes

sd2R 0.1.7

Multi-GPU Model Parallelism

Profiling

sd2R 0.1.6

Pipeline Graph API

sd2R 0.1.5

Flux Support

img2img Improvements

sd2R 0.1.4

Build System

sd2R 0.1.3

Unified sd_generate() Entry Point

Multi-GPU

Performance

sd2R 0.1.2

Highres Fix

Tiled img2img

sd2R 0.1.1

VAE Tiling

High-Resolution Pipeline

Tiled Sampling (MultiDiffusion)

sd2R 0.1.0

Core

R API

Unified `sd_generate()` Entry Point