Go Big or Go Small

Picture a young engineer who spent a summer shadowing one of the best people in the building. In interviews they are flawless. Every question has a clean answer — the right approach, the textbook trade-off, the elegant line their mentor would have written. They watched a master at close range and learned to give the master's answers. Then comes something real: an open-ended, week-long problem with no clean answer and a dozen ways to go wrong. They freeze. They know exactly what brilliant work looks like — they memorized its shape — but they never absorbed the thing that produced it. They can reproduce the answers; they can't reproduce the understanding.

That, more or less, is how you build a small AI model today. The technique is called distillation: you take a big, capable "teacher" model and train a smaller "student" to imitate it. There are two ways to do it. If you own the teacher, you can read its internal confidences directly — not just the answer it picks but how strongly it leans toward every other answer it rejected, which is where most of the real knowledge hides. If you don't own it, you do the cruder version: you have only the public interface, so you ask it millions of questions, keep the answers, and train your student on the transcript. Either way the student learns to sound like the teacher without ever paying what the teacher paid to become one.

The second kind is where it gets uncomfortable. Anthropic recently accused a competitor of doing exactly this at scale — tens of thousands of fake accounts and tens of millions of queries, all to harvest one model's answers and feed them to another. Every terms-of-service forbids it, and yet the front door is the leak: each answer you sell is a perfectly labeled training example. The asymmetry is brutal. A frontier lab spends billions discovering a capability; a follower can copy the behavior for the price of some API calls. Capability is expensive to create and cheap to imitate, and that single fact is rearranging the whole industry.

So you get a real choice between the small model and the big one. Small, open-weight models are the distilled students — Meta's Llama, DeepSeek, Alibaba's Qwen, Google's Gemma, Microsoft's Phi, Mistral. They run on your own hardware — a server you control, sometimes a laptop or a phone. Your data never leaves the building, the cost per query rounds to nothing, and you can fine-tune and truly own the thing. The catch is the interview problem: they ace the benchmarks and then stumble on the elaborate task, the messy open-ended work that the cheap reproduction of an answer can't fake.

The big models are the teachers, and you rent them — OpenAI's GPT, Anthropic's Claude, Google's Gemini. They hold the frontier — the genuinely hard, ambiguous work where understanding actually matters. But you send your data to someone else's servers to get it, you pay real money per call, and the price is ultimately set by whoever controls the scarce compute underneath. You trade ownership and privacy for capability you can't yet build yourself.

Which brings us back to the engineer. The mistake is imagining one teacher and one student. There's a whole faculty — many teachers, many students, a few models that play both roles — and users sort themselves accordingly: the cheap fast one for the easy thing, the frontier one for the hard thing, the private one for the sensitive thing. None of which settles the question the interview should have raised in the first place. If a student can ace every test and still freeze on the real work, then the tests are measuring the wrong thing. The whole game turns on telling the difference between looking smart and being useful — and how we actually evaluate these models is, perhaps, a topic for another post.