The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

stt.api

CRAN status

stt.api is a minimal, backend-agnostic R client for OpenAI-compatible speech-to-text (STT) APIs, with optional local fallbacks.

It lets you transcribe audio in R without caring which backend actually performs the transcription.


What stt.api is (and is not)

✅ What it is

❌ What it is not


Installation

install.packages("stt.api")

Required dependencies are minimal:

Optional backends:

Development version:

remotes::install_github("cornball-ai/stt.api")

Quick start

install.packages(c("whisper", "stt.api"))

library(stt.api)

res <- stt("speech.wav")
res$text

That’s it. With {whisper} installed, stt() transcribes locally on GPU or CPU with no configuration needed.


Other backends

stt.api also supports OpenAI-compatible APIs for cloud or container-based transcription:

set_stt_base("http://localhost:4123")
# Optional, for hosted services like OpenAI
set_stt_key(Sys.getenv("OPENAI_API_KEY"))

res <- stt("speech.wav", backend = "openai")

This works with OpenAI, Whisper containers, LM Studio, OpenWebUI, AnythingLLM, or any server implementing /v1/audio/transcriptions.

Where the engine runs: the source axis

source selects where a backend runs, separately from backend (which engine): "auto" (default), "api" (HTTP), or "package" (in-process). "auto" keeps the previous behavior — whisper in-process, openai via the API — so existing calls are unchanged. To reach a self-hosted whisper::serve() endpoint instead of running whisper in-process:

set_stt_base("http://troy-g5:7809")          # the whisper::serve() endpoint
res <- stt("speech.wav", backend = "whisper", source = "api")

Automatic backend selection

When you call stt() without specifying a backend, it picks the first available:

  1. {whisper} (native R torch, if installed)
  2. OpenAI-compatible API (if stt.api_base is set)
  3. Error with guidance

Normalized output

Regardless of backend, stt() always returns the same structure:

list(
  text     = "Transcribed text",
  segments = NULL | data.frame(...),
  words    = data.frame(word, start, end),  # only with word-level timing
  language = "en",
  backend  = "api" | "whisper",             # legacy execution route
  raw      = <raw backend response>
)

words is present only when the API returns word granularity (verbose_json); otherwise it’s absent. backend reports where the engine ran (the legacy execution route), not the engine itself: the resolved backend/source pair lives in the "call_record" attribute.

This makes it easy to switch backends without changing downstream code.


Health checks

stt_health()

Returns:

list(
  ok = TRUE,
  backend = "api",
  message = "OK"
)

Useful for Shiny apps and deployment checks.


Backend selection

Explicit backend choice:

stt("speech.wav", backend = "openai")
stt("speech.wav", backend = "whisper")

Automatic selection (default):

stt("speech.wav")

Supported endpoints

stt.api targets the OpenAI-compatible STT spec:

POST /v1/audio/transcriptions

This is intentionally chosen because it is:


Configuration options

options(
  stt.api_base = NULL,
  stt.api_key  = NULL,
  stt.timeout  = 60
)

Setters:

set_stt_base()
set_stt_key()

Error handling philosophy

Example:

Error in stt():
No transcription backend available.
Install whisper or set stt.api_base.

Relationship to tts.api

stt.api is designed to pair cleanly with tts.api:

Task Package
Speech → Text stt.api
Text → Speech tts.api

Both share:


Why this package exists

Installing and maintaining local Whisper backends can be difficult:

stt.api lets you decouple your R code from those concerns.

Your transcription code stays the same whether the backend is:


License

MIT

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.