The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
stt.api is a minimal, backend-agnostic
R client for OpenAI-compatible speech-to-text (STT)
APIs, with optional local fallbacks.
It lets you transcribe audio in R without caring which backend actually performs the transcription.
A unified interface for speech-to-text in R
A way to switch easily between:
{whisper} (native R torch, local GPU/CPU,
in-process)whisper::serve() endpoint over HTTP
(source = "api")/v1/audio/transcriptions (cloud or local
servers)Designed for scripting, Shiny apps, containers, and reproducible pipelines
{whisper}install.packages("stt.api")Required dependencies are minimal:
curljsonliteOptional backends:
{whisper} (recommended, on CRAN)Development version:
remotes::install_github("cornball-ai/stt.api")install.packages(c("whisper", "stt.api"))
library(stt.api)
res <- stt("speech.wav")
res$textThat’s it. With {whisper} installed, stt()
transcribes locally on GPU or CPU with no configuration needed.
stt.api also supports OpenAI-compatible APIs for cloud or container-based transcription:
set_stt_base("http://localhost:4123")
# Optional, for hosted services like OpenAI
set_stt_key(Sys.getenv("OPENAI_API_KEY"))
res <- stt("speech.wav", backend = "openai")This works with OpenAI, Whisper containers, LM Studio, OpenWebUI,
AnythingLLM, or any server implementing
/v1/audio/transcriptions.
source axissource selects where a backend runs, separately
from backend (which engine): "auto"
(default), "api" (HTTP), or "package"
(in-process). "auto" keeps the previous behavior — whisper
in-process, openai via the API — so existing calls are unchanged. To
reach a self-hosted whisper::serve() endpoint instead of
running whisper in-process:
set_stt_base("http://troy-g5:7809") # the whisper::serve() endpoint
res <- stt("speech.wav", backend = "whisper", source = "api")When you call stt() without specifying a backend, it
picks the first available:
{whisper} (native R torch, if installed)stt.api_base is set)Regardless of backend, stt() always returns the same
structure:
list(
text = "Transcribed text",
segments = NULL | data.frame(...),
words = data.frame(word, start, end), # only with word-level timing
language = "en",
backend = "api" | "whisper", # legacy execution route
raw = <raw backend response>
)words is present only when the API returns word
granularity (verbose_json); otherwise it’s absent.
backend reports where the engine ran (the legacy
execution route), not the engine itself: the resolved
backend/source pair lives in the
"call_record" attribute.
This makes it easy to switch backends without changing downstream code.
stt_health()Returns:
list(
ok = TRUE,
backend = "api",
message = "OK"
)Useful for Shiny apps and deployment checks.
Explicit backend choice:
stt("speech.wav", backend = "openai")
stt("speech.wav", backend = "whisper")Automatic selection (default):
stt("speech.wav")stt.api targets the OpenAI-compatible STT
spec:
POST /v1/audio/transcriptions
This is intentionally chosen because it is:
options(
stt.api_base = NULL,
stt.api_key = NULL,
stt.timeout = 60
)Setters:
set_stt_base()
set_stt_key()Example:
Error in stt():
No transcription backend available.
Install whisper or set stt.api_base.
stt.api is designed to pair cleanly with
tts.api:
| Task | Package |
|---|---|
| Speech → Text | stt.api |
| Text → Speech | tts.api |
Both share:
Installing and maintaining local Whisper backends can be difficult:
stt.api lets you decouple your R code from those
concerns.
Your transcription code stays the same whether the backend is:
MIT
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.