How to train an LLM on your own code

Q: Can you train an LLM on your own code?

Yes. You build a training corpus from your repository — fill-in-the-middle pairs derived from your commits and the completions you accept — then run a parameter-efficient fine-tune (LoRA) on a small base coding model. The result is a self-trained LLM that predicts the code your team actually writes. Karl automates the whole loop locally on Windows, with Linux coming soon.

Q: Do I need a GPU to train an AI on my code?

No. A LoRA fine-tune of a 1.5B–7B base coding model runs on CPU, just slower. Karl ships a CPU-only path and runs overnight. A single GPU box speeds it up and can train for a whole team, with the rest of the team pulling the promoted model automatically.

Q: How do I know the fine-tuned model is actually better?

Hold out a set of code from your repo, score the base model and the candidate on the same holes with the same seed, and only promote the candidate if it beats the baseline by a set margin. Karl writes a JSON evidence artefact for every cycle and only swaps the live model when the candidate wins.

Q: Does training an LLM on my code send it to the cloud?

It does not have to. Karl runs entirely on your machine against a local model backend. There is no outbound traffic after the first model pull and no telemetry, so your source never leaves your hardware.

A self-trained LLM learns the patterns your team actually writes — your naming, your helpers, your house style — instead of the average of the public internet. This is the practical loop for training an AI on your own code locally: build a corpus from your repo, run a LoRA fine-tune, and prove the new model improved before you ship it. No cloud round-trip, no telemetry.

Download the free demo Windows · ~40 MB · 14-day demo See the benchmark numbers

Why train a model on your own codebase at all?

General coding assistants are trained on public code. They are good at generic patterns and bad at yours — the internal framework, the deprecated helper nobody should call anymore, the way your team names things. A model trained on your own code closes that gap. It completes the line the way your repo would, not the way Stack Overflow would.

Relevance

Predicts your patterns

A small base model fine-tuned on your repo can out-complete a much larger generic model on the code your team writes every day.

Privacy

Source never leaves the building

Training locally means no source upload, no cloud round-trip, and no telemetry. This matters for IP-sensitive and regulated teams.

Ownership

A model you keep

No per-seat meter. The weights live on your hardware; the cost is one-time, not monthly per developer.

The loop, in four steps

1. Build a corpus from your repository

The training signal for code completion is fill-in-the-middle (FIM): take a chunk of real code, hide a span, and ask the model to reconstruct it given the prefix and suffix. You generate these pairs from your commits and from the completions your team accepts and rejects as they work. Accepted completions are positive signal; reverted patches and rejected ghost text are negative signal.

2. Fine-tune a small base model with LoRA

You do not retrain a model from scratch. You take a small base coding model (a 1.5B–7B parameter model is plenty) and apply a LoRA fine-tune — a parameter-efficient method that trains a small set of adapter weights instead of the whole network. It runs on CPU overnight, or far faster on a single GPU.

3. Prove the candidate beats the base model

This is the step everyone skips, and it is the one that matters. Hold out a slice of your code, score both the base model and the freshly trained candidate on the same holes with the same seed, and compare. Only promote the candidate if it wins by a real margin. Otherwise you have shipped a regression you cannot see.

4. Promote, and keep the loser for rollback

When the candidate wins, swap it in as the live model. Keep the previous version so you can roll back instantly if the live error rate spikes. Repeat on a schedule. Over weeks the model converges on your codebase.

Where Karl fits

Karl is a self-trained LLM that runs this entire loop for you, locally, on Windows, with Linux coming soon. You write code in VS Code as usual; Karl mines the FIM pairs, runs the overnight LoRA fine-tune, scores the candidate against the incumbent on a held-out benchmark, and only swaps in the new model when it wins. Every cycle writes a JSON evidence artefact you can open and inspect.

In one recorded overnight cycle, base model qwen2.5-coder:1.5b-base scored 0.406 on a held-out set of 71 holes from a real repo. Karl’s candidate scored 0.438 — a +3.2 pt uplift — with first-line exact-match rising from 0.352 to 0.408 (+5.6 pp). The promotion rule discards any candidate that does not beat the paired baseline on the same seed and holes by at least a full point.

Try it on your own repo See pricing

Frequently asked questions

Can you train an LLM on your own code?

Yes. You build a FIM corpus from your repository, then run a LoRA fine-tune on a small base coding model. The result is a self-trained LLM that predicts the code your team actually writes. Karl automates the loop locally.

Do I need a GPU to train an AI on my code?

No. A LoRA fine-tune of a 1.5B–7B base model runs on CPU, just slower. Karl ships a CPU-only path that runs overnight. A single GPU box speeds it up and can train for a whole team.

How do I know the fine-tuned model is actually better?

Score the base model and the candidate on the same held-out holes with the same seed, and only promote the candidate if it beats the baseline by a set margin. Karl writes a JSON evidence artefact for every cycle.

Does training an LLM on my code send it to the cloud?

It does not have to. Karl runs entirely on your machine with no outbound traffic after the first model pull and no telemetry, so your source never leaves your hardware.