Models didn't get smarter. Tools did.

At the start of the year, I was bullish on fully agentic workflows being the big story of 2025. They mostly haven’t been. Production use cases for full AI automation are still rare, and the same two problems keep blocking them: unpredictability, and hallucination.

The real progress this year has been in human-in-the-loop coding tools like Claude Code and OpenAI Codex. They’re not perfect, but they’re now writing genuinely good code in the hands of experienced developers.

The interesting part is why they got better. It mostly isn’t because the underlying models got dramatically smarter — GPT-5 wasn’t the leap people expected. The models got better at using tools to manage their own context. Protocols like MCP, and just plain old shell utilities, ended up doing more than another round of pretraining could.

Claude Code is the clearest example. It understands a codebase by calling find and grep. That sounds dated until you watch it work — the model is augmenting its training data with live, relevant information whenever it needs to, rather than relying on whatever it memorized. Same base model, much more capable in practice.

Where this goes from here is harder to call. But six months in, the trend is unmistakable: the meaningful steps forward in coding tools have come from giving the model better ways to find what it needs, not from making the model bigger.