Back to lab
Note

Write the eval before the prompt

We started writing evals before prompts. The prompts got worse, then much better.

On the System tier, we now write a tiny eval set before touching the prompt. Twenty examples, two columns: input and expected behaviour.

First the prompt gets worse, because we can finally see what it is doing wrong. Then it gets much better, because we can iterate against a number instead of a vibe.

Take-away: the eval is the spec. If you cannot write the eval, you do not yet know what you are building.

Want this kind of work for your own product? Start a project.

Start a project