18 KiB
LeVCS: A Technical Report
Status: v0.1.0 — protocol substrate complete, workflow surface deferred. Audience: engineers evaluating LeVCS for their own projects, or designing the workflow tooling that will sit on top of it.
TL;DR
- LeVCS is a distributed version control system in the same lineage as git, fossil, pijul, and sapling — content-addressed objects, signed history, three-way merge.
- Five things are different by design: identity is in the protocol, federation is a first-class concept, the merge engine is a cascading pipeline of format-aware handlers, hashes are BLAKE3, and releases are signed objects rather than ad-hoc tags.
- It is a substrate, not a workflow tool. There is no PR review surface, no issue tracker, no CI integration, no web UI today. Those are the next layer up.
- You can host an instance on a small VPS behind nginx or Caddy. The protocol terminates over HTTP; signing is at the application layer.
- The codebase is small (~10 crates) and runs
cargo testin under a minute on a laptop. 194 tests pass at v0.1.0; baseline benchmarks are in the repo.
1. Why a new VCS?
Git is the dominant DVCS. It is also a tool that grew up in 2005 around a hashing algorithm that had visible cracks (SHA-1) and a federation model that was, fundamentally, "your remote is a URL string." Twenty years on, the world git serves looks different:
- Identity is no longer optional. Many projects need to know not just who claims to have authored a commit but who is authorized to alter the repo's history.
- Replication is more complex than push/pull. Mirrors, archives, cold-storage replicas, and read-only forks are all common — and git treats them with the same primitives as the source-of-truth remote.
- Merge conflicts are still mostly resolved at the line level. JSON, YAML, TOML, source code with semantic structure — the line-diff treatment is wrong for all of these and produces false conflicts on reformats every team has hit.
- SHA-1 is broken; git's SHA-256 transition has been "in progress" for most of a decade.
LeVCS is an attempt at a clean restart that takes the DAG model and content addressing as obvious wins, and rebuilds identity, federation, merging, and hashing as protocol-level concerns rather than conventions or sidecar tools.
2. The shape of the system
LeVCS is layered:
┌───────────────────────────────────────────────────────┐
│ Workflow tools (TBD: review, issues, web UI) │
├───────────────────────────────────────────────────────┤
│ CLI: `levcs init / commit / push / merge / release` │
├───────────────────────────────────────────────────────┤
│ Federation HTTP API (instances, mirrors, releases) │
├───────────────────────────────────────────────────────┤
│ Object model: Blob / Tree / Commit / Release / Authority │
│ Merge engine: textual → format-aware → tree-sitter │
│ Trust root: signed authority chain (Ed25519) │
│ Content addressing: BLAKE3 │
└───────────────────────────────────────────────────────┘
Five object kinds, all content-addressed by their BLAKE3 digest:
- Blob — raw file contents.
- Tree —
(name, type, mode, hash)entries; sorted, no duplicates. - Commit — tree + parents + authority + author key + message, signed.
- Release — first-class artifact: tree + predecessor commit + parent release + label + notes, signed by a maintainer or owner.
- Authority — the membership document for a repo: who has what role, signed and chained.
A repository is the set of these objects plus a refs/ map (branches/*,
releases/*, authority/{genesis,current}) indexing into them. The
repo_id is the BLAKE3 of the genesis authority — globally unique by
construction, no central registrar needed.
3. What's different from git
| Axis | git | LeVCS |
|---|---|---|
| Hash | SHA-1 (deprecated, transitioning) | BLAKE3 |
| Identity | Author string in commit | Signed authority object with explicit roles |
| Push authorization | Server-side hook or hosting platform | Protocol-level role check (Reader/Contributor/Maintainer/Owner) |
| Force-push rule | Server policy (off-protocol) | Protocol enforces maintainer-or-owner role |
| Federation | URL-bound remotes | Global repo_id + replicating instances |
| Mirror replication | git fetch --mirror (best-effort) |
First-class with three storage modes |
| Tags / releases | Mutable string refs (often) | Signed objects with predecessor + parent_release chain |
| Merge granularity | Line-level (myers / patience) | Cascade: textual → format → tree-sitter → plugin |
| Merge audit | No artifact | .levcs/merge-record TOML, signed with the commit |
| Web UI / issues | Provided by hosting platform | Out of scope for v1 |
The rest of this section unpacks each axis.
3.1 Identity in the protocol, not on top
Git stores Author: Name <email> and Committer: Name <email> strings
in commits. There is nothing cryptographic about either. Signed commits
are an opt-in (gpg-sign, since 2014, and ssh-sign, since 2021), but
even when signed they answer "did some key sign this?" — not "is the
signer authorized to write to this repo right now?"
LeVCS makes membership a first-class object. An authority body has:
schema_version repo_id previous_authority version created_micros
members: [(public_key, handle, role, added_micros, added_by), ...]
policy: [(key, value), ...]
Roles are a strict ordering: Reader < Contributor < Maintainer < Owner.
Every commit references the authority hash that was current when it was
signed. Updating membership is a versioned operation: you write a new
authority object, signed by an Owner, with previous_authority pointing
at the prior one. The instance walks the chain on push and rejects any
push whose author key isn't a current member.
The practical consequence: "give Bob push access" is not a hosting- platform toggle. It is a signed authority update that travels in the repo and is auditable for the lifetime of the project.
3.2 Federation, not "remotes"
A git remote is a URL plus some credentials. There is no fact-of-the- matter about whether two URLs refer to the same repository — git checks by walking commits, but "same project" is by convention.
LeVCS has a global repo_id. It is the BLAKE3 of the genesis
authority object, so two clones of the same project have the same
repo_id even if they live on instances on opposite continents. An
instance is a federation peer: it serves /levcs/v1/repos/<repo_id>/...
endpoints and replicates state from other instances when configured to.
Mirroring is the protocol's normal mode, not a git fetch --mirror cron
job.
This composes with three storage modes (§4.3 of the spec):
- Full — every reachable object. The source-of-truth instance.
- Release — only release objects, their reachable trees and blobs, and the authority chain. Skips inter-release commits. For long-lived archive replicas.
- Metadata — authority objects, release headers, signed refs only. No content. For "is this project still alive?" pings.
The instance enforces these on push: a release-mode replica refuses pushes that update branches; a metadata-mode replica refuses all pushes (it's populated by mirroring).
3.3 The merge cascade
This is the technical centerpiece.
A traditional three-way merge — git, mercurial, fossil — works at the line level. It is correct for prose and acceptable for code, but it generates false conflicts on:
- Reformats (linters, prettifiers, whitespace-policy bumps).
- Key reorderings in JSON / YAML / TOML.
- Imports lists in source files that two branches both edited.
- Markdown files where two contributors modified disjoint sections of the same paragraph.
LeVCS dispatches per-file to a handler cascade ranked by aggressiveness:
rank 0 textual universal line-level fallback
rank 1 format-aware json | yaml | toml | xml | markdown | prose
rank 2 tree-sitter rust | python | js | ts | go | c | cpp |
java | ruby | bash
rank 3 plugin wasm-sandboxed, user-supplied
A repo's .levcs/merge.toml maps glob patterns to handlers. Per-user
.levcs/merge.local.toml can demote but never promote, so a
distrusted plugin can be locally turned off without a repo edit. Each
merged file produces a FileRecord in .levcs/merge-record listing the
handler used and its hash; the merge-record blob is committed alongside
the resolved tree, so every merge in history is auditable.
Format-aware example: package.json where Alice adds a dependency at
the top of dependencies and Bob adds one at the bottom. Git produces a
conflict because the lines are adjacent. The JSON handler parses both
sides, computes the structural diff, and merges them — both new entries
appear in the output, no conflict.
Tree-sitter example: two contributors add unrelated use statements to
a Rust file. Line diff conflicts. Tree-sitter handler treats the
use_declaration list as an ordered set, merges both additions, no
conflict.
The cascade is fail-safe: a tree-sitter handler that bails on a syntax error falls through to the format-aware handler if applicable, then to textual. The textual handler always merges — it might produce conflicts, but it never fails to produce some output.
3.4 Hashing
Git uses SHA-1. SHAttered (2017) was a practical collision. The SHA-256 transition is still incomplete in 2026 and is unlikely to ever finish for the long tail of git infrastructure.
LeVCS uses BLAKE3 from day one. Faster than SHA-256 in practice (the
benchmarks in bench-results/ show ~5 GiB/s on a laptop for blob
serialize+hash), tree-hashed, no commitment to a specific length-tag
convention. Object IDs are 32 bytes everywhere.
3.5 Releases as objects
Git tags are refs that point to commits — or to tag objects, if you
remember to use -a. Either way, they are names, not artifacts. A
release in LeVCS is a signed object:
tree commit's root tree
predecessor commit being released
parent_release prior release in the chain (or zero)
authority authority hash at release time
declarer_key public key of the signing maintainer/owner
timestamp Unix micros
label "v1.0.0" or similar
notes release notes (UTF-8, up to 4 GiB)
The chain parent_release → parent_release → ... gives you a clean
release history independent of branch topology. The replica modes
above can replicate just releases (and their trees and authority) for
archive instances that don't need the inter-release commit history.
4. How you use it
4.1 Bootstrap
levcs key generate --label primary
levcs init --key primary
levcs track --all
levcs commit -m "initial import"
After init, .levcs/ exists alongside your tree. The genesis authority
names your key as the sole Owner; the repo_id is fixed forever. After
commit, you have one commit on refs/branches/main.
4.2 Branch and merge
levcs branch feature/x
# ... edit files ...
levcs commit -m "wip on x"
levcs branch main # switch back
levcs merge feature/x
If the merge produces conflicts, drop into the resolution TUI:
levcs merge --resolve
The TUI shows each conflicted file with the ours/base/theirs panes the
handler emitted, plus the cascade decision (which handler ran, why it
fell through if it did). On accept, it writes the resolved file and a
signed .levcs/merge-record entry.
4.3 Release
levcs release v1.0.0 --notes "first release"
Writes a Release object with the current commit as predecessor, signs
it with your active key, and adds refs/releases/v1.0.0. If you've cut
prior releases, parent_release chains to the most recent one
automatically.
4.4 Federation
levcs instance --set https://levcs.example.com/levcs/v1
levcs push refs/branches/main
The first push to a fresh instance auto-inits the repo using your
genesis authority. Subsequent pushes are role-checked. Pulls are
public-read by default (the public_read policy bit on the genesis
authority).
To migrate to a new home:
levcs migrate https://new-host.example.com/levcs/v1 --set-active
migrate re-inits and replays the full history at the destination, then
points your local repo at it. The repo_id is unchanged — it's the
same project at a new location.
5. Operating an instance
A single binary, levcs-instance, reads a TOML config and listens on
HTTP. Production deployments terminate TLS at a reverse proxy; the
instance binds to localhost. See deploy/README.md for a full
walkthrough — systemd unit, Caddy and nginx examples, firewall, and the
laptop-side bootstrap.
The protocol surface is small:
GET /health
GET /levcs/v1/instance/info
GET /levcs/v1/instance/peers
GET /levcs/v1/repos/<repo_id>/info
GET /levcs/v1/repos/<repo_id>/refs
GET /levcs/v1/repos/<repo_id>/objects/<hash>
GET /levcs/v1/repos/<repo_id>/pack?have=...&want=...
POST /levcs/v1/repos/<repo_id>/init
POST /levcs/v1/repos/<repo_id>/push
That's it. No admin endpoints, no users-and-passwords table, no web UI
to firewall. POSTs require a signed LeVCS-Signature header
(Ed25519-over-canonical-request, with timestamp and nonce for replay
protection). GETs are public unless the genesis authority's policy
turned that off.
Storage is a directory tree. Per-object atomic writes via temp-then-
rename, per-repo serializing mutex on push. A consistent backup is just
a snapshot of /var/lib/levcs.
6. What LeVCS isn't (yet)
The honest list of things you'd want for a full project home that LeVCS does not provide:
- Code review. No PR object, no review threads, no comments. The workflow spec coming next defines these.
- Issue tracking. Same — protocol substrate doesn't cover it.
- CI integration. No webhooks. CI systems would need to poll
/refson a cadence, which is fine but not turnkey. - Web UI. No branch browser, no diff view, no blame. These can be built atop the existing GET endpoints; nothing in the protocol is hostile to a UI, but none ship.
- Search. No
git grepequivalent on the server side. Local-only. - Submodules / monorepo tooling. No analog yet.
If your use case requires any of the above today, run LeVCS parallel
to your existing platform. Forgejo, GitHub, Gitea continue to host the
workflow; the LeVCS instance acts as a dogfood replica that gets the
same commits via a push-both wrapper. When the workflow surface
lands, the migration story flips.
7. What is true today (and how we know)
The repo at v0.1.0 has 194 passing tests covering:
- The full §2-§7 object model and protocol surface.
- A 14-scenario merge conformance corpus, eight of which are git- false-conflict cases the cascade resolves cleanly.
- Property tests on the pack codec and object parsers (fuzz + structured proptest round-trip).
- An end-to-end "dogfood" integration test that stands up three instances (source-of-truth, peer, mirror), pushes a chain of commits plus a release, replicates via mirror sync, migrates to the peer, and asserts byte-for-byte object equality across all three.
A baseline microbenchmark suite is checked in (scripts/bench.sh). On
a Ryzen 7 laptop:
- Pack decode of a 10 × 1 MiB pack: ~2.3 ms (4.3 GiB/s).
- BLAKE3+serialize on 1 MiB blobs: ~190 µs (5.1 GiB/s).
- Textual three-way merge of a 100 KiB document: ~4.6 ms (~80 MiB/s).
- Encode is the bottleneck — zstd level 3 at ~380 MiB/s on incompressible data.
Numbers are reproducible via scripts/bench.sh --quick.
8. Where the project goes next
The immediate roadmap, in order:
- Workflow spec — the missing layer above. PR/review object, discussion threads, CI hook conventions, web UI design. This is the document the rest of v1 builds toward.
- Reference workflow tools — a minimal web UI that reads the federation API and lets you browse, review, and merge. Probably a separate repo and process, not bundled into the instance.
- CI conventions — a published webhook protocol so existing CI systems can integrate without polling.
- Plugin handler examples — a few real wasm handlers (e.g. protobuf, SQL migrations) to validate the plugin protocol.
- Git import — a one-way import path so existing projects can adopt LeVCS without hand-replaying history.
If you're reading this because you might write that workflow spec: the substrate guarantees you have are (a) signed objects with a verifiable authority chain, (b) per-file merge records that travel with each commit, (c) a content-addressed object store that doesn't care what kind of content it stores, and (d) federation as a normal operating mode rather than a special case. Workflow surface is free to use these as building blocks — a "PR" is just an object kind we don't have yet, an "issue" is another, and the storage modes already define how a CI system would replicate the metadata it needs without pulling source.
9. Trying it
Build:
git clone <this repo>
cargo build --release
sudo install -m 0755 target/release/levcs target/release/levcs-instance /usr/local/bin/
Local single-machine tour:
levcs key generate --label me
levcs init --key me /tmp/demo
cd /tmp/demo
echo "hello" > a.txt
levcs track --all
levcs commit -m "first"
levcs log
Self-host: see deploy/README.md.
Read the spec: spec/levcs-spec.pdf (kept private until the workflow
spec lands; ask the maintainer for a copy).
Read the code: every crate is small and documented. crates/levcs-core
is the object model, crates/levcs-merge is the cascade,
crates/levcs-instance is the server, crates/levcs-cli is the user-
facing tool.
Comments and corrections welcome to the maintainer. The next document in this series is the workflow spec.