How I stopped doing Product Management and became obsessed with AI evals

That title isn't clickbait: it's what actually happened. Over the last three months, I've spent 90% of my time obsessing over whether our AI search returns the right results. Not writing PRDs. Not in stakeholder meetings. Just testing, scoring, and re-testing AI outputs.
Only recently did I learn that there is a trending term for what I am doing: AI evals. I thought that was "quality assurance" or "testing". But AI Evals is a sexier name for selling courses.
Jokes aside. The main reason the industry began separating it from traditional QA frameworks is the non-deterministic nature of the output. Different results from the same input were considered as the definition of madness.
Now it is our new reality.






