The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

vitals 0.2.0

New features

Images, audio, and video in user messages and tool call results will now be logged compatibly with the log viewer (#138, #171).
Solvers and scorers can now return arbitrary R objects in metadata; they will be summarized in a lossy format when logged to .json and available as-is via $get_samples().
generate() now accepts a zero-argument chat factory for solver_chat, enabling a fresh chat per call instead of cloning an existing chat (#190).
$eval() now routes arguments to solvers and scorers based on their function signatures, allowing users to pass arguments specific to each without requiring ellipses in both functions (#152). $eval() now errors when supplied unnamed arguments.
Scorers that don’t return scorer_chats can now return an explanation slot that explains the scoring output. The built-in detect-based scorers now return an explanation slot (#189).

Viewing logs

Updated the vendored Inspect Log Viewer to Inspect version 0.3.122, bringing all sorts of new features and bug fixes (#138).
Assistant turns now have precise durations in generated logs. Previously, their timings were averaged across the course of the evaluation (#115).
The log viewer previously reported the solver’s response as the answer provided to the scorer. However, these two texts can differ when post-processing of the solver’s response is performed. This is now fixed in the log viewer (#166, #169 by @mattwarkentin).
The log viewer previously reported the scorer’s response as both the solver’s and scorers response—this is now fixed (#141, #142 by @mattwarkentin).
Tool uses from scorers will now be visible in the log viewer (#186).

Minor improvements and bug fixes

vitals_view() will now pick a random available port rather than its previous default port, 7576.
The default accuracy() metric will now report a score of 0 rather than NaN when all scores are 0.
Fixed bug where non-default grading systems in model-graded evals would result in scores being wiped during logging (#139).
The full suite of package tests can now be ran without active API keys via the vcr package (#163).
$eval() and $log() will now write log files to the same default directory–the one specified when initializing the Task object. Previously, $eval() wrote to that directory, while $log() wrote to vitals_log_dir() (#158 by @SokolovAnatoliy).
Manifest files for deployed logs are now named listing.json rather than logs.json for compatibility with newer Inspect versions.
Removed dependency on the rstudioapi package (#146).
The package will now set the envvar IN_VITALS_EVAL to "true" during solving and scoring.
Numeric task targets will no longer introduce errors in the log viewer.
detect_match() now lists the correct location options in its default value (#140, #142 by @mattwarkentin).

vitals 0.1.0

Initial CRAN submission.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.