Quality Systems 18 min
Evals for Prompts
Create lightweight tests that show whether a prompt still works.
Learning Objectives
- Define pass/fail checks for prompt quality.
- Test prompts against real examples.
- Track failure modes such as hallucination, weak format, and missing evidence.
Prompt quality should be testable
An eval is a repeatable check that tells you whether a prompt still works. Without evals, prompt editing becomes vibes and hope.
Start with five real examples. For each example, define what a passing answer must include and what it must avoid. The goal is not academic perfection. The goal is to catch failures before a prompt becomes part of a workflow.
Useful failure categories
- Missing evidence
- Unsupported claim
- Wrong format
- Weak or generic next action
- Too long for the workflow
- Incorrect audience or tone
Track failures by category. When a prompt fails the same way twice, update the prompt or the rubric.
Examples
Five-case eval set
Use five real customer notes and define the expected fields, prohibited claims, and required evidence for each output.
Practice Exercise
Write a prompt eval
Create five test inputs for one prompt and define what a passing answer must include.
- There are at least five realistic inputs.
- The pass criteria are observable.
- The failure categories are named.
Mini Prompt Templates
Prompt Evaluator
Prompt: [PROMPT] Test cases: [INPUTS_AND_OUTPUTS] Task: Create a pass/fail evaluation checklist and score each output. Format: Table plus top three prompt fixes.
