The thesis
Capability was never the binding constraint. Durability was not either. The labs made a model right once; a few more are making it stay right for a month. Both are real, and both leave the same thing untouched: the moment you put one layer above another to hold it steady, you have built a second thing that can drift. Hold the model with a layer and the layer drifts. Hold the layer with a process and the process drifts. The regress is not a flaw in any one approach. It is the shape of the problem. We build the part nobody else is building: the order above the layer, where the regress is allowed to stop. Not the layer. The direction the layers agree is up.
The open problem
For three years the field competed on how clever a model could be in a single reply. That race is more or less won. For the last of those years a smaller field competed on how long the cleverness lasts. That race is being run now. The question that decides whether any of it can be trusted has barely been asked: who holds the thing that holds it.
It is not a question the current approach can answer, because the current approach is a holder, and a holder is only as fixed as the thing holding it. You can add another. Most do. We think the answer is not another holder.
The order above the layer
Where the regress is allowed to stop.
Durability instilled in a layer is a snapshot of a strategy, and a snapshot drifts under a long load just as surely as the model's did. Adding a third layer moves the problem; it does not close it. What closes it is not a higher layer but a fixed point.
Treat the month as a repeated game and the model as one player. Alignment fixes the first round. Durability fixes the rounds. Neither fixes the limit the rounds approach. We supply the focal point the sequence settles on — a commitment the model can find in the dark, that no later round is able to renegotiate. Because it is a point and not a layer, there is nothing above it to drift toward. Call it what it is; we would rather not.
Where it matters
Work where the second mistake is the one that costs you — the one made by the thing you trusted to catch the first.
- A month-long investigation whose oversight layer was correct in week one and, by week four, agrees with whatever the model now wants.
- A reviewer of reviewers, left running for weeks, that still answers on the thirtieth day to the mandate it was given on the first — and not to the convenient one it has since found.
- A control that was added to catch drift, and has quietly begun to drift in the same direction, for the same reasons, one layer up.
- A guarantee that has to hold past the point where everything able to check it has itself gone stale.
Fixed point > higher layer
Stack a holder on a holder and across a long run the stack performs no better than its least fixed member. Durability holds judgment across the work. The order above it holds judgment across every durability that could be chosen. Over weeks where the costly mistake is made by the safeguard, that difference is the whole of the performance. Performance that does not need a higher layer is second-order alignment.
Not the model. Not the layer above it. The direction the layers agree is up.
Request access