Overview

Playground for your agent evals to view experiments, evals, and compare runs.

Experiments
0
Total Runs
0
Eval Fixtures
0
Latest Pass Rate

Recent Experiments

View all →
No experiments yet. Run agent-eval to get started.

Eval Fixtures

View all →
No evals found. Create evals in your evals/ directory.

Compare

Compare two experiment runs side-by-side to see pass rate deltas, duration changes, and per-eval breakdowns.

Open Compare →