Quality Systems 18 min

Evals for Prompts

Create lightweight tests that show whether a prompt still works.

Learning Objectives

Define pass/fail checks for prompt quality.
Test prompts against real examples.
Track failure modes such as hallucination, weak format, and missing evidence.

Prompt quality should be testable

An eval is a repeatable check that tells you whether a prompt still works. Without evals, prompt editing becomes vibes and hope.

Start with five real examples. For each example, define what a passing answer must include and what it must avoid. The goal is not academic perfection. The goal is to catch failures before a prompt becomes part of a workflow.

Useful failure categories

Missing evidence
Unsupported claim
Wrong format
Weak or generic next action
Too long for the workflow
Incorrect audience or tone

Track failures by category. When a prompt fails the same way twice, update the prompt or the rubric.

Examples

Five-case eval set

Use five real customer notes and define the expected fields, prohibited claims, and required evidence for each output.

Practice Exercise

Write a prompt eval

Create five test inputs for one prompt and define what a passing answer must include.

There are at least five realistic inputs.
The pass criteria are observable.
The failure categories are named.

Mini Prompt Templates

Prompt Evaluator

Prompt: [PROMPT]
Test cases: [INPUTS_AND_OUTPUTS]
Task: Create a pass/fail evaluation checklist and score each output.
Format: Table plus top three prompt fixes.