Did Karl actually learn?

Every overnight cycle, Karl benchmarks a fresh fine-tune against the base model on held-out fill-in-the-middle holes from your own repo, and promotes the new model only if it wins. Here is the evidence artefact from the most recent cycle.