Preserve external works the site cites against link rot, host them at permanent /archive/<slug>/ URLs in site chrome, and treat them as first-class citizens of the backlinks and similar-pages indexes. Curated, not crawled: the author adds one line to archive/manifest.yaml and the build fetches, hashes, snapshots, and indexes the work. * archive/manifest.yaml + tools/archive.py (fetch / refresh / wayback / check / gc) — PDFs downloaded directly, HTML pages snapshotted with a vendored monolith (tools/bin/monolith @ 2.10.1) into a single self-contained file with the archive CSP and a noarchive robots meta injected. Per-entry PROVENANCE.json committed; gitignored .txt sidecars regenerated from the artifact's SHA-256. * build/Archive.hs + build/ArchiveIndex.hs + build/Filters/Archive.hs — Hakyll rules for /archive/ and /archive/<slug>/, a body Pandoc filter that appends an archive affordance to live citations and flips dead ones to the local copy on archive.py check's asymmetric hysteresis (rotted needs 3 fails over >= 14 days; one ok recovers). * build/Backlinks.hs — keeps archived external URLs through pass 1 and canonicalises them to /archive/<slug>/ in pass 2, producing a "Referenced by" section grouped by the fragment each citation targets. build/Stats.hs gains a "Link archive" telemetry block on /build/ (count, total size, median age, by-status / by-quality / by-visibility, orphans). * Integrity: archive.py fetch and build/Archive.hs (via sha256sum) both re-hash every committed artifact, so a tampered file halts the build even with cabal invoked directly or no .venv present. refresh refuses to replace an uncommitted prior snapshot and rolls back atomically on any exit path. removed.yaml is honoured by fetch, wayback, and check using canonical-form (tracking-stripped, arXiv-canonicalised) comparison. * visibility: private keeps an entry in-repo but undeployed. nginx/archive.conf emits X-Robots-Tag: noindex, noarchive for raw artifacts that cannot carry meta directives. The full design, phase plan (1-5), and three refinement passes live in ARCHIVE.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|---|---|---|
| archive | ||
| build | ||
| content | ||
| data | ||
| nginx | ||
| static | ||
| templates | ||
| tools | ||
| .env.example | ||
| .gitignore | ||
| .python-version | ||
| ARCHIVE.md | ||
| AUDIT.md | ||
| HOMEPAGE.md | ||
| LICENSE | ||
| MARKS.md | ||
| Makefile | ||
| PHOTOGRAPHY.md | ||
| README.md | ||
| WRITING.md | ||
| cabal.project | ||
| cabal.project.freeze | ||
| levineuwirth.cabal | ||
| pyproject.toml | ||
| uv.lock | ||
README.md
levineuwirth.org
Personal site of Levi Neuwirth — essays, blog posts, poetry, fiction, and music.
Built with Hakyll and Pandoc,
with a custom build system in build/ and a Haskell + JS + Python toolchain.
Quickstart
make build # one-shot production build into _site/
make dev # dev build (drafts visible) + local server on :8000
make watch # cabal-watch rebuild (drafts visible)
make clean # cabal run site -- clean
make deploy # clean → build → sign → push → rsync to VPS
make build always runs make clean implicitly when invoked from make deploy.
For day-to-day work, prefer make dev (which serves the site on
http://localhost:8000) or make watch (rebuilds on save without a server).
Run make build any time you add or replace binary assets (JPEG/PNG
figures, PDFs, music assets). make dev and make watch skip the
convert-images.sh / pdf-thumbs preprocessing steps, so a fresh JPEG
will have no .webp companion and a fresh PDF will have no thumbnail
until a full make build regenerates them. Once the companions exist
they survive subsequent make dev runs.
Optional features
-
Similar-links and embeddings.
tools/embed.pyprecomputes page-level embeddings for the "Related" block. To enable:uv sync # creates .venv with sentence-transformers, faiss-cpuThe build silently skips embedding when
.venvis absent. -
Client-side semantic search. Downloads a quantized ONNX model used by
static/js/semantic-search.js(run once; files are gitignored):make download-model -
Image conversion.
make buildcallstools/convert-images.shto produce.webpcompanions next to every JPEG/PNG. Requirescwebp(libwebpon Arch,webpon Debian/Ubuntu). -
PDF thumbnails.
make pdf-thumbsgenerates first-page thumbnails for PDFs instatic/papers/usingpdftoppm(poppleron Arch,poppler-utilson Debian/Ubuntu). Skipped silently when missing.
Configuration
.env (gitignored, copy from .env.example) holds the GitHub PAT and
the VPS rsync target consumed by make deploy. Never commit it.
Repository layout
build/— Haskell build system (Hakyll rules, Pandoc filters, contexts). Seebuild/Filters/for the Pandoc AST transforms (sidenotes, wikilinks, transclusion, score embedding, viz, …).content/— authored Markdown (essays, blog, poetry, fiction, music).templates/— Hakyll/Pandoc HTML templates.static/— CSS, JS, fonts, images, vendored PDF.js.tools/— Python tooling (embeddings, importers) and shell scripts.data/— generated and source data (commonplace.yaml, annotations.json, bibliographies, similar-links.json).nginx/— vhost snippets shipped to the VPS (security-headers.conf,static-assets.conf,popup-proxy.conf). The live vhost on the VPS is the source of truth; seenginx/vhost.conf.examplefor the canonical structure and the include order these snippets expect.
Architecture pointers
build/Site.hsis the Hakyll rules entry point.build/Patterns.hsdefines canonical content patterns shared by Backlinks, Authors, Tags, and Site.build/Compilers.hswires the Pandoc filter chain into Hakyll.build/Filters/Images.hsdoes WebP<picture>wrapping; requires the.webpcompanions produced bytools/convert-images.sh.
License
See LICENSE.