The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
serve(): a single-process, OpenAI-compatible HTTP
STT server (POST /v1/audio/transcriptions and
/translations, GET /health) built on base R
sockets, with no new dependencies. It loads the model once and keeps it
resident, so it drops in for the OpenAI API or a Whisper container;
point stt.api at it with set_stt_base().
Returns text, json, or
verbose_json (segment timestamps, plus per-word timestamps
when the request includes timestamp_granularities[]=word).
An example systemd unit ships in
system.file("whisper.service", package = "whisper").jit_compile’d TorchScript call instead of dozens of
dispatched R->torch calls, several times faster end-to-end and
token-for-token equivalent to the eager path. Covers both greedy and
word-timestamp decoding. On by default via the new jit
argument to transcribe()/whisper_pipeline();
pass jit = FALSE for the eager decoder. No effect on CPU or
beam search.openai-whisper: decoding suppresses non-speech tokens
(brackets, music notes, speaker tags) and control tokens at every step,
so output no longer contains
[BLANK_AUDIO]/[MUSIC PLAYING]-style
annotations; the seek loop decodes only the real audio
(content_frames), not the fixed 30s of mel padding, so a 7s
clip no longer trails off into hallucinated text up to 30s; and a
no-speech-probability gate skips windows that read as silence. The
special-token table gains sot_lm and
sot_prev.sample_len) rather than the full context, and the default
temperatures enable the existing compression-ratio
fallback, which re-decodes too-repetitive output at a higher
temperature.tokenizer_encode() crashing for models whose
vocab.json omits the <|endoftext|> key
(large-v3): the end-of-text id now comes from the special-token table
(as in the Python reference, which keeps special tokens out of the BPE
vocab), and the lookup can no longer return a list. A regression test
covers a vocab without the key, and encode_special()
resolves the core special tokens from the table too.whisper_dtype() now falls back to float32 on the GTX
16-series (TU116/TU117: GTX 1630/1650/1660 and Ti/Super variants), which
compute fp16 incorrectly and return NaN (seen as repeated “!” tokens).
Detection is by GPU name, CUDA-gated and tryCatch-guarded (dormant on
non-CUDA/CRAN machines); pass dtype = "float16" to
override.whisper_tune_gc(): opt-in helper that tunes torch’s
CUDA allocator GC rates for inference. No-op off CUDA, and only sets
options that are unset.torch::torch_scaled_dot_product_attention() instead of
reaching into torch’s namespace; the torch dependency is floored at
0.17.0, where it is exported.transcribe() now defaults to
language = NULL, which detects the spoken language from the
audio before decoding. New exported function
detect_language() for standalone language identification.
Breaking: previous default was
language = "en". Code relying on the default now
auto-detects instead of assuming English. Pass
language = "en" explicitly to restore old behavior.whisper_pipeline() for cached model reuse across
multiple transcriptionsadded_tokens.json download)transcribe_chunk()These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.