Speed Has It Backwards - Ankur Kakroo

Everyone wants faster agents. Speed is the thing. And look, I know what fast actually looks like. Groq, Cerebras. Tokens flying at you before you've finished reading the last batch. That's a different tier entirely. The incremental speed improvements everyone else is shipping right now? They're not that. Not even close. And unless they reach that level, the marginal gains in latency don't actually change how I work. Counterintuitively, for where I am right now, slower is better.

Here's why. When a model is truly fast, Groq fast, it changes the interaction model completely. That's its own thing, worth exploring. But the in-between? A model that's 20% faster than last month? That doesn't unlock a new workflow. It just means you get responses back slightly before you've finished thinking about the last one. You glance at the output, fire off the next thing. Tight loop. Feels productive. Except you're not actually processing any of it. You're reacting. And reacting isn't engineering. Reacting is email.

Slow models fix this. Counterintuitive, I know. But when Opus takes its time on a response, I'm not sitting idle. I'm spinning up another agent on a different problem. Or reading the code the first agent just touched. Or thinking, actually thinking, about whether the direction even makes sense. The latency becomes space. Space to be parallel. Space to be deliberate.

That's the nuance with the Codex family too. They're not the quickest. Nobody's picking them for speed benchmarks. But that pace matches the kind of work where I need them most: longer steps, deeper reasoning, problems where rushing produces confident garbage. When I'm zeroing in on something and every decision compounds on the last, I don't want a model that outruns my judgment. I want one that keeps pace with it.

So the speed conversation has it backwards, I think. At least for serious work. Fast models optimize for throughput on a single thread. Slow models, paired with a human who knows how to fan out, optimize for total output across many threads. Parallelism is the multiplier. The model doesn't need to be fast. You need to be fast at deciding what to throw at it next.