How to train an LLM on your own code
A self-trained LLM learns the patterns your team actually writes — your naming, your helpers, your house style — instead of the average of the public internet. This is the practical loop for training an AI on your own code locally: build a corpus from your repo, run a LoRA fine-tune, and prove the new model improved before you ship it. No cloud round-trip, no telemetry.
Why train a model on your own codebase at all?
General coding assistants are trained on public code. They are good at generic patterns and bad at yours — the internal framework, the deprecated helper nobody should call anymore, the way your team names things. A model trained on your own code closes that gap. It completes the line the way your repo would, not the way Stack Overflow would.
Predicts your patterns
A small base model fine-tuned on your repo can out-complete a much larger generic model on the code your team writes every day.
Source never leaves the building
Training locally means no source upload, no cloud round-trip, and no telemetry. This matters for IP-sensitive and regulated teams.
A model you keep
No per-seat meter. The weights live on your hardware; the cost is one-time, not monthly per developer.
The loop, in four steps
1. Build a corpus from your repository
The training signal for code completion is fill-in-the-middle (FIM): take a chunk of real code, hide a span, and ask the model to reconstruct it given the prefix and suffix. You generate these pairs from your commits and from the completions your team accepts and rejects as they work. Accepted completions are positive signal; reverted patches and rejected ghost text are negative signal.
2. Fine-tune a small base model with LoRA
You do not retrain a model from scratch. You take a small base coding model (a 1.5B–7B parameter model is plenty) and apply a LoRA fine-tune — a parameter-efficient method that trains a small set of adapter weights instead of the whole network. It runs on CPU overnight, or far faster on a single GPU.
3. Prove the candidate beats the base model
This is the step everyone skips, and it is the one that matters. Hold out a slice of your code, score both the base model and the freshly trained candidate on the same holes with the same seed, and compare. Only promote the candidate if it wins by a real margin. Otherwise you have shipped a regression you cannot see.
4. Promote, and keep the loser for rollback
When the candidate wins, swap it in as the live model. Keep the previous version so you can roll back instantly if the live error rate spikes. Repeat on a schedule. Over weeks the model converges on your codebase.
Where Karl fits
Karl is a self-trained LLM that runs this entire loop for you, locally, on Windows, with Linux coming soon. You write code in VS Code as usual; Karl mines the FIM pairs, runs the overnight LoRA fine-tune, scores the candidate against the incumbent on a held-out benchmark, and only swaps in the new model when it wins. Every cycle writes a JSON evidence artefact you can open and inspect.
In one recorded overnight cycle, base model
qwen2.5-coder:1.5b-base scored 0.406 on a held-out set
of 71 holes from a real repo. Karl’s candidate scored 0.438
— a +3.2 pt uplift — with first-line
exact-match rising from 0.352 to 0.408 (+5.6 pp).
The promotion rule discards any candidate that does not beat the
paired baseline on the same seed and holes by at least a full point.
Frequently asked questions
Can you train an LLM on your own code?
Yes. You build a FIM corpus from your repository, then run a LoRA fine-tune on a small base coding model. The result is a self-trained LLM that predicts the code your team actually writes. Karl automates the loop locally.
Do I need a GPU to train an AI on my code?
No. A LoRA fine-tune of a 1.5B–7B base model runs on CPU, just slower. Karl ships a CPU-only path that runs overnight. A single GPU box speeds it up and can train for a whole team.
How do I know the fine-tuned model is actually better?
Score the base model and the candidate on the same held-out holes with the same seed, and only promote the candidate if it beats the baseline by a set margin. Karl writes a JSON evidence artefact for every cycle.
Does training an LLM on my code send it to the cloud?
It does not have to. Karl runs entirely on your machine with no outbound traffic after the first model pull and no telemetry, so your source never leaves your hardware.