Skip to content

docs: prefer rubrics over llm-grader with inline prompt#1097

Merged
christso merged 2 commits intomainfrom
docs/prefer-rubrics-over-inline-llm-grader
Apr 14, 2026
Merged

docs: prefer rubrics over llm-grader with inline prompt#1097
christso merged 2 commits intomainfrom
docs/prefer-rubrics-over-inline-llm-grader

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Summary

Replaces all instances of the llm-grader-with-inline-prompt: antipattern across examples, skill docs, and web docs with rubrics assertions.

Why: rubrics gives structured, per-criterion scores that are easier to debug when a test fails. An inline prompt: on an llm-grader is an opaque free-form blob that produces the same result but with no criterion breakdown.

Preferred hierarchy (now reflected in examples):

  1. Deterministic assertions (contains, regex, equals, is-json) — no LLM cost, fully reproducible
  2. rubrics / string shorthand — structured LLM judgment with per-criterion scores
  3. llm-grader (no prompt) — implicit eval against criteria, fine for simple cases
  4. llm-grader with prompt: ./file.md — custom prompts; the legitimate use case for a file-backed grader

Files changed:

  • examples/features/preprocessors/evals/dataset.eval.yaml
  • examples/features/threshold-evaluator/evals/dataset.eval.yaml
  • examples/features/experiments/evals/coding-ability.eval.yaml
  • examples/features/default-evaluators/evals/dataset.eval.yaml
  • plugins/agentv-dev/skills/agentv-bench/SKILL.md
  • apps/web/src/content/docs/docs/evaluation/examples.mdx
  • apps/web/src/content/docs/docs/guides/agent-eval-layers.mdx

Deleted baselines (need regeneration with real LLM calls):

  • examples/features/threshold-evaluator/evals/dataset.eval.baseline.jsonl
  • examples/features/default-evaluators/evals/dataset.eval.baseline.jsonl

To regenerate: bun scripts/check-eval-baselines.ts --update --eval-file <file>

Not changed: apps/cli/test/commands/eval/pipeline/fixtures/input-test.eval.yaml — this test fixture explicitly tests llm-grader pipeline behavior (prompt content writing, llm_grader_results/ directory) and is not a teaching example.

🤖 Generated with Claude Code

Replace all instances of the llm-grader-with-inline-prompt antipattern
in examples, skill docs, and web docs with rubrics assertions. Rubrics
give per-criterion score breakdowns and make failure debugging easier;
inline llm-grader prompts are an opaque free-form alternative.

Also update agent-eval-layers.mdx tables to reference `rubrics` instead
of `llm-grader` for LLM-based quality checks, and delete two stale
baseline files (need regeneration after grader type change).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Apr 14, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2f63c97
Status: ✅  Deploy successful!
Preview URL: https://42ae0559.agentv.pages.dev
Branch Preview URL: https://docs-prefer-rubrics-over-inl.agentv.pages.dev

View logs

Replace full `type: rubrics` + `criteria:` blocks with inline string
shorthand where there are no weights, required gates, or composite
naming concerns. Composite children in threshold-evaluator keep named
rubrics graders to preserve threshold aggregation semantics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@christso christso merged commit 209db97 into main Apr 14, 2026
4 checks passed
@christso christso deleted the docs/prefer-rubrics-over-inline-llm-grader branch April 14, 2026 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant