The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

llm.api 0.1.4

CRAN release consolidating the 0.1.3.1–0.1.3.5 development cycle. Highlights since the on-CRAN 0.1.3:

The per-cycle detail follows.

llm.api 0.1.3.5

Refreshed default models

When no model is given, each provider now defaults to a recent, cost-appropriate, snapshot-priceable model, replacing dated defaults:

This affects chat(), agent(), and the chat_*() / chat_session_*() wrappers. Pass model = explicitly to use any other model.

llm.api 0.1.3.4

Cache-aware cost estimates

usage$cost (from chat() and agent()) now accounts for prompt caching instead of billing every input token at the full rate. Anthropic cache writes/reads are priced from Anthropic’s published multipliers (5-minute write 1.25x, 1-hour write 2x, read 0.1x of the base input rate), and OpenAI / Moonshot cache hits are priced from each model’s cached-input rate in the bundled snapshot.

New exported helpers:

agent()$usage now also carries cumulative cache_read_input_tokens and cache_creation_input_tokens so callers can inspect cache activity after a multi-turn run.

The bundled price snapshot was refreshed (2026-05-24) to carry per-model cached-input rates; base input/output rates for existing models are unchanged. Cost estimates remain offline and approximate; prices_snapshot_date() docs now spell that out, with source URLs.

llm.api 0.1.3.3

Fix: cache / thinking_budget_tokens silently disabled under the default provider

The Anthropic-only guards in chat() ran before provider auto-detection, comparing against the literal "auto" default. So chat(prompt, model = "claude-...", cache = "5m") tripped a spurious “Anthropic-only” warning, downgraded the opt-in, and fell through to the default provider. Detection now runs first, so the guards see the resolved provider. .validate_thinking_budget() still runs up front as provider-independent input validation. Network-free regression coverage added.

llm.api 0.1.3.2

Three additions, all backward-compatible (new parameters default to no-op behaviour) and zero new dependencies.

Anthropic prompt caching (cache parameter)

chat(cache = c("none", "5m", "1h")) and agent(cache = c("none", "5m", "1h")). Default "none" preserves current behaviour; opting in wraps the system message in an ephemeral cache_control block. "5m" uses Anthropic’s default TTL; "1h" requests the longer cache window. Worth turning on when the system prompt is long-lived across calls — cache reads cost ~10% of normal input tokens but cache writes cost ~25% more, so opt-in is the right default. Anthropic-only; warns and degrades to no-op for other providers.

Anthropic extended thinking budget (thinking_budget_tokens)

chat(thinking_budget_tokens = N) and agent(thinking_budget_tokens = N). When set, sends thinking = {type: "enabled", budget_tokens: N} to the Anthropic Messages API. Validates inputs early: must be a single integer >= 1024, and (when max_tokens is set) must be strictly less than it since the budget is counted against max_tokens. Anthropic-only; warns and degrades for other providers.

OpenAI max_tokensmax_completion_tokens mapping

OpenAI deprecated max_tokens in favour of max_completion_tokens, and o-series reasoning models reject max_tokens entirely. chat() and agent() now rename for OpenAI requests only; Moonshot and Ollama (which share the OpenAI-compatible code path) continue to receive max_tokens since their endpoints still expect it. The rename is gated on the caller not already passing max_completion_tokens, so explicit-set values win.

llm.api 0.1.3.1

llm.api 0.1.3

llm.api 0.1.1

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.