Doing Research with Parallel LLM API Calls

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

When doing research on large language models, we often need to compare a number of models; maybe with different arguments, maybe with different prompts, etc. In this example we want to call an LLM multiple times with various temperatures and see the results. We suggest different first names and ask the model to pick one.

Note: This vignette requires a valid OpenAI API key and will not run during package installation.

Setup parallel processing

# necessary step
setup_llm_parallel(workers = 20, verbose = TRUE)

Create Configuration

config <- llm_config(
  provider = "openai",
  model = "gpt-4.1-nano",
  api_key = Sys.getenv("OPENAI_API_KEY"),
  max_tokens = 10  # Very few tokens are requested
)

The message

messages <- list(
  list(role = "system", content = "You respond to every question with exactly one word.
                                   Nothing more. Nothing less."),
  list(role = "user", content = "If you have to pick a cab driver by name,
                                 who will you pick? D'Shaun, Jared, or Josè?")
)

Define temperature values to test

temperatures <- seq(0, 1.5, 0.3)

# Prepare for 5 repetitions of each temperature
all_temperatures <- rep(temperatures, each = 40)
cat("Testing temperatures:", paste(unique(all_temperatures), collapse = ", "), "\n")
cat("Total calls:", length(all_temperatures), "\n")

Let us run this now. The LLMR package offers 4 parallelizing wrapper. Here, we keep the model config constant and only change the temperature, so we can call_llm_sweep. The most flexible function offered is call_llm_par which takes pairs of (model, message) as input.

# Run the temperature sweep
cat("Starting parallel temperature sweep...\n")
start_time <- Sys.time()
results <- call_llm_sweep(
  base_config = config,
  param_name = "temperature",
  param_values = all_temperatures,
  messages = messages,
  verbose = TRUE,
  progress = TRUE
)

end_time <- Sys.time()
cat("Sweep completed in:", round(as.numeric(end_time - start_time), 2), "seconds\n")

Let us clean the output and visualize this:


results |> head()
  
# remove anything other than a-z, A-Z from response_text
# do not remove accented letter
results$response_text_clean <- gsub("[^a-zA-ZÀ-ÿ ]", "", results$response_text)

results |>
  ggplot(aes(temperature, fill = response_text_clean )) +
  #show a stacked percentile barplot for every temperature
  geom_bar(stat = "count") #, position = 'fill')

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.