The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Tidy pipelines and structured output

knitr::opts_chunk$set(
  collapse = TRUE, comment = "#>",
  eval = identical(tolower(Sys.getenv("LLMR_RUN_VIGNETTES", "false")), "true")
)

We will show both unstructured and structured pipelines, using open models: - deepseek-chat (DeepSeek) - llama-3.1-8b-instant (Groq) - openai/gpt-oss-20b (Groq)

You will need environment variables DEEPSEEK_API_KEY and GROQ_API_KEY.

library(LLMR)
library(dplyr)

cfg_ds    <- llm_config("deepseek", "deepseek-chat")
cfg_groq1 <- llm_config("groq",     "llama-3.1-8b-instant")
cfg_groq  <- llm_config("groq",     "openai/gpt-oss-20b")

llm_fn: unstructured (DeepSeek)

words <- c("excellent", "awful", "fine")
out <- llm_fn(
  words,
  prompt  = "Classify '{x}' as Positive, Negative, or Neutral.",
  .config = cfg_ds,
  .return = "columns"
)
out

llm_fn: unstructured (Groq)

out_groq <- llm_fn(
  words,
  prompt  = "Classify '{x}' as Positive, Negative, or Neutral.",
  .config = cfg_groq1,
  .return = "columns"
)
out_groq

llm_fn_structured: schema-first (DeepSeek)

schema <- list(
  type = "object",
  properties = list(
    label = list(type = "string", description = "Sentiment label"),
    score = list(type = "number", description = "Confidence 0..1")
  ),
  required = list("label", "score"),
  additionalProperties = FALSE
)

out_s <- llm_fn_structured(
  x = words,
  prompt  = "Classify '{x}' as Positive, Negative, or Neutral with confidence.",
  .config = cfg_ds,
  .schema = schema,
  .fields = c("label", "score")
)
out_s

llm_mutate: unstructured (Groq)

df <- tibble::tibble(
  id   = 1:3,
  text = c("Cats are great pets", "The weather is bad", "I like tea")
)

df_u <- df |>
  llm_mutate(
    answer  = "Give a short category for: {text}",
    .config = cfg_groq,
    .return = "columns"
  )

df_u

llm_mutate: shorthand syntax

The shorthand lets you combine output column and prompt in one argument:

df |>
  llm_mutate(
    category = "Give a short category for: {text}",
    .config = cfg_groq
  )
# Equivalent to: llm_mutate(category, prompt = "Give...", .config = cfg_groq)

Or with multi-turn messages:

df |>
  llm_mutate(
    classified = c(
      system = "You are a text classifier. One word only.",
      user = "Category for: {text}"
    ),
    .config = cfg_ds
  )

llm_mutate with .structured flag

Enable structured output directly in llm_mutate() using .structured = TRUE:

schema <- list(
  type = "object",
  properties = list(
    category = list(type = "string"),
    confidence = list(type = "number")
  ),
  required = list("category", "confidence")
)

# Using .structured = TRUE (equivalent to calling llm_mutate_structured)
df |>
  llm_mutate(
    structured_result = "{text}",
    .config = cfg_ds,
    .structured = TRUE,
    .schema = schema
  )

This is equivalent to calling llm_mutate_structured() and supports all the same shorthand syntax.

Soft structured output with tags

When a strict JSON schema is unnecessary, request simple XML-like tags and let LLMR parse them into columns. In the ordinary one-row-per-call mode below, tags should be flat (not nested); the row-batching mode further down deliberately introduces one level of nesting and is documented there.

cities <- tibble::tibble(city = c("Cairo", "Lima", "Seoul"))

cities |>
  llm_mutate(
    geo = "Where is {city}? Give country and continent in their own tags.",
    .config = cfg_groq1,
    .system_prompt = paste(
      "Use XML tags to specify different parts of the answer, but do not nest tags.",
      "Return <country>...</country> and <continent>...</continent>."
    ),
    .tags = c("country", "continent")
  )

The result includes tags_ok, tags_data, and one column per requested tag. Use llm_parse_tags_col() to parse an existing response column.

Row batching: many rows per call

By default LLMR sends one request per row. With .rows_per_prompt > 1, several rows are packed into a single request: each row’s prompt is wrapped in a numbered tag (<row_1>...</row_1>, <row_2>...</row_2>, …), the block is appended to the message, and the model is asked to answer each item inside a matching numbered tag. LLMR splits the reply back into the original rows. .rows_per_prompt = Inf sends the whole frame in one call.

cities |>
  llm_mutate(
    geo = "Where is {city}? Give country and continent in their own tags.",
    .config = cfg_groq1,
    .tags = c("country", "continent"),
    .rows_per_prompt = 3
  )

A few points worth keeping in mind:

Preview before you spend, summarize after

llm_preview() renders exactly what llm_fn() / llm_mutate() would send, without any API call and without reading or encoding files. It flags problems up front: missing files, a "file" role combined with .rows_per_prompt > 1, an embedding config with row batching, and so on. The batch plan columns show how rows would be grouped into calls.

df <- data.frame(text = c("good", "bad", "fine"), stringsAsFactors = FALSE)
LLMR::llm_preview(df, prompt = "Sentiment of: {text}", .rows_per_prompt = 2)

After a run, llm_usage() summarizes outcomes and token totals, and llm_failures() lists the rows that failed or were truncated. Both read the diagnostic columns that llm_mutate() and call_llm_par() already produce. llm_usage() reports tokens, not dollars: multiply by your provider’s current per-token prices yourself.

out <- df |>
  llm_mutate(sentiment = "One-word sentiment for: {text}", .config = cfg_groq)

llm_usage(out)       # counts + sent/received/total/reasoning tokens
llm_failures(out)    # which rows failed or were truncated, and why

For a call_llm_par() result you can re-run only the failures with llm_par_resume().

llm_mutate_structured: structured with shorthand (Groq)

schema2 <- list(
  type = "object",
  properties = list(
    category  = list(type = "string"),
    rationale = list(type = "string")
  ),
  required = list("category", "rationale"),
  additionalProperties = FALSE
)

# Traditional call
df_s <- df |>
  llm_mutate_structured(
    annot,
    prompt  = "Extract category and a one-sentence rationale for: {text}",
    .config = cfg_groq,
    .schema = schema2
    # Because a schema is present, fields auto-hoist; you can also pass:
    # .fields = c("category", "rationale")
  )

df_s

# Or use shorthand
df |>
  llm_mutate_structured(
    annot = "Extract category and rationale for: {text}",
    .config = cfg_groq,
    .schema = schema2
  )

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.