evals/. Run them locally with
veryfront eval <eval-id> and store JSON or JUnit reports in CI.
Prerequisites
- A Veryfront project with an
agents/directory. - An agent target such as
agent:researcher. - A dataset with stable example IDs.
Quick start
Create an eval file:Datasets
Use inline data for smoke coverage:Metrics
Use deterministic metrics for stable requirements:Checks
Usecheck for assertions that depend on the full record:
Discovery
Eval files are discovered fromevals/:
ai.evals.discovery.paths in project config to use a different directory.
Studio editing
Studio can list eval definitions, show source location, and expose form fields for stable parts of the definition: name, target, dataset source, repetitions, tags, metadata, and metrics. If code is dynamic, Studio should fall back to source editing for the same file. UsecreateEvalSourceDocument(discoveredEval) to normalize a discovered eval
for Studio panels. The document exposes editableFields, dynamicFields,
source.filePath, source.exportName, dataset metadata, metric metadata, and
the eval capabilities required by the panel.
Use project.evals.read for listing reports and definitions. Use
project.evals.write for editing eval source definitions. Triggering an eval run
also records a canonical run with kind eval when the durable run API is used.
Verify it worked
List discovered evals:0 when all gate and budget checks pass. It exits
with status 1 when any gate or budget check fails.