Iteration beats perfection

Craft → Execution Sense

Defining

"Evals" refers to evaluations used to test and measure AI model or product performance in development.

The goal is not to do evals perfectly, it's to actionably improve your product.

Hamel Husain & Shreya ShankarWhy AI evals are the hottest new skill for product builders

Supporting

Persistence is extremely valuable. Successful companies right now building in any new area, they are going through the pain of learning this, implementing this and understanding what works and what doesn't work. Pain is the new moat.

Aishwarya Naresh Reganti + Kiriti BadamWhy most AI products fail: Lessons from 50+ AI deployments at OpenAI, Google & Amazon

Watch at 00:01:16

Supporting

I would bias less toward, trying in one go to tell the model, 'Hey, here's exactly what I want you to do.' Instead what I would do is I would chop things up into bits.

Michael TruellThe rise of Cursor: The $300M ARR AI tool that engineers can't stop using

Watch at 00:47:06

With caveats

"This step" refers to analyzing and categorizing actual AI system failures before building tests, and "evals" means automated evaluations or tests for AI systems.

You don't want to skip this step. The reason I'm kind of spending so much time on this is this is where people get lost. They go straight into evals like, 'Let me just write some tests,' and that is where things go off the rails.

Hamel Husain & Shreya ShankarWhy AI evals are the hottest new skill for product builders

Watch at 47:24

With caveats

LLM judges are AI models used to automatically evaluate other AI outputs, and "evals" refers to these automated evaluation systems.

Before you release your LLM as a judge, you want to make sure it's aligned to the human. A lot of people stop there and they say, 'Okay, I have my judge prompt. We're done.' Don't do that, because that's the fastest way that you can have evals that don't match what's going on, and when people lose trust in your evals, they lose trust in you.

Hamel Husain & Shreya ShankarWhy AI evals are the hottest new skill for product builders

Watch at 00:56:28

User testing reveals patterns with shocking consistency · Product design must match your model's accuracy · AI dramatically shifts productivity baselines

AI requires starting with problems, not capabilities · AI changes the speed equation entirely · AI changes the game: taste and judgment matter more than execution

Also in Execution Sense:

Do the work no one else will · Ship fast, learn faster · Resourcefulness beats resources

Iteration beats perfection

Add to Home Screen

The Missing Stamp