This repository contains the source files for levineuwirth.org in their entirety and is automatically updated whenever the website is rebuilt. https://levineuwirth.org
Go to file
Levi Neuwirth 77e31efdae Add link archive system: snapshots, backlinks, link-rot
Preserve external works the site cites against link rot, host them at
permanent /archive/<slug>/ URLs in site chrome, and treat them as
first-class citizens of the backlinks and similar-pages indexes.
Curated, not crawled: the author adds one line to archive/manifest.yaml
and the build fetches, hashes, snapshots, and indexes the work.

* archive/manifest.yaml + tools/archive.py (fetch / refresh / wayback /
  check / gc) — PDFs downloaded directly, HTML pages snapshotted with a
  vendored monolith (tools/bin/monolith @ 2.10.1) into a single
  self-contained file with the archive CSP and a noarchive robots meta
  injected. Per-entry PROVENANCE.json committed; gitignored .txt
  sidecars regenerated from the artifact's SHA-256.
* build/Archive.hs + build/ArchiveIndex.hs + build/Filters/Archive.hs
  — Hakyll rules for /archive/ and /archive/<slug>/, a body Pandoc
  filter that appends an archive affordance to live citations and
  flips dead ones to the local copy on archive.py check's asymmetric
  hysteresis (rotted needs 3 fails over >= 14 days; one ok recovers).
* build/Backlinks.hs — keeps archived external URLs through pass 1 and
  canonicalises them to /archive/<slug>/ in pass 2, producing a
  "Referenced by" section grouped by the fragment each citation
  targets. build/Stats.hs gains a "Link archive" telemetry block on
  /build/ (count, total size, median age, by-status / by-quality /
  by-visibility, orphans).
* Integrity: archive.py fetch and build/Archive.hs (via sha256sum)
  both re-hash every committed artifact, so a tampered file halts the
  build even with cabal invoked directly or no .venv present. refresh
  refuses to replace an uncommitted prior snapshot and rolls back
  atomically on any exit path. removed.yaml is honoured by fetch,
  wayback, and check using canonical-form (tracking-stripped,
  arXiv-canonicalised) comparison.
* visibility: private keeps an entry in-repo but undeployed.
  nginx/archive.conf emits X-Robots-Tag: noindex, noarchive for raw
  artifacts that cannot carry meta directives.

The full design, phase plan (1-5), and three refinement passes live
in ARCHIVE.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 10:06:33 -04:00
archive Add link archive system: snapshots, backlinks, link-rot 2026-05-23 10:06:33 -04:00
build Add link archive system: snapshots, backlinks, link-rot 2026-05-23 10:06:33 -04:00
content auto: 2026-05-16T23:42:26Z [skip ci] 2026-05-16 19:42:26 -04:00
data Internal audit 2026-05-07 17:20:27 -04:00
nginx Add link archive system: snapshots, backlinks, link-rot 2026-05-23 10:06:33 -04:00
static Add link archive system: snapshots, backlinks, link-rot 2026-05-23 10:06:33 -04:00
templates Add link archive system: snapshots, backlinks, link-rot 2026-05-23 10:06:33 -04:00
tools Add link archive system: snapshots, backlinks, link-rot 2026-05-23 10:06:33 -04:00
.env.example States/Context/Embeddings fixes 2026-04-26 11:22:57 -04:00
.gitignore Add link archive system: snapshots, backlinks, link-rot 2026-05-23 10:06:33 -04:00
.python-version GPG signing, embedding pipeline, visualization filter, search timing, sig popups 2026-03-20 20:14:49 -04:00
ARCHIVE.md Add link archive system: snapshots, backlinks, link-rot 2026-05-23 10:06:33 -04:00
AUDIT.md Add AUDIT.md 2026-05-07 15:07:48 -04:00
HOMEPAGE.md affiliation, cabal helper script 2026-03-26 08:14:50 -04:00
LICENSE Add MIT License to the project 2026-03-15 19:02:33 +00:00
MARKS.md Marks I 2026-05-07 23:51:14 -04:00
Makefile Add link archive system: snapshots, backlinks, link-rot 2026-05-23 10:06:33 -04:00
PHOTOGRAPHY.md Spec dilemma 2026-05-01 21:22:01 -04:00
README.md nginx: ship security baseline, reference vhost, and tighter cache 2026-05-07 15:08:03 -04:00
WRITING.md Marks I 2026-05-07 23:51:14 -04:00
cabal.project initial deploy! whoop 2026-03-17 21:56:14 -04:00
cabal.project.freeze Internal audit 2026-05-07 17:20:27 -04:00
levineuwirth.cabal Add link archive system: snapshots, backlinks, link-rot 2026-05-23 10:06:33 -04:00
pyproject.toml Bump requires-python floor to 3.14 2026-05-07 15:08:20 -04:00
uv.lock Internal audit 2026-05-07 17:20:27 -04:00

README.md

levineuwirth.org

Personal site of Levi Neuwirth — essays, blog posts, poetry, fiction, and music. Built with Hakyll and Pandoc, with a custom build system in build/ and a Haskell + JS + Python toolchain.

Quickstart

make build              # one-shot production build into _site/
make dev                # dev build (drafts visible) + local server on :8000
make watch              # cabal-watch rebuild (drafts visible)
make clean              # cabal run site -- clean
make deploy             # clean → build → sign → push → rsync to VPS

make build always runs make clean implicitly when invoked from make deploy. For day-to-day work, prefer make dev (which serves the site on http://localhost:8000) or make watch (rebuilds on save without a server).

Run make build any time you add or replace binary assets (JPEG/PNG figures, PDFs, music assets). make dev and make watch skip the convert-images.sh / pdf-thumbs preprocessing steps, so a fresh JPEG will have no .webp companion and a fresh PDF will have no thumbnail until a full make build regenerates them. Once the companions exist they survive subsequent make dev runs.

Optional features

  • Similar-links and embeddings. tools/embed.py precomputes page-level embeddings for the "Related" block. To enable:

    uv sync                 # creates .venv with sentence-transformers, faiss-cpu
    

    The build silently skips embedding when .venv is absent.

  • Client-side semantic search. Downloads a quantized ONNX model used by static/js/semantic-search.js (run once; files are gitignored):

    make download-model
    
  • Image conversion. make build calls tools/convert-images.sh to produce .webp companions next to every JPEG/PNG. Requires cwebp (libwebp on Arch, webp on Debian/Ubuntu).

  • PDF thumbnails. make pdf-thumbs generates first-page thumbnails for PDFs in static/papers/ using pdftoppm (poppler on Arch, poppler-utils on Debian/Ubuntu). Skipped silently when missing.

Configuration

.env (gitignored, copy from .env.example) holds the GitHub PAT and the VPS rsync target consumed by make deploy. Never commit it.

Repository layout

  • build/ — Haskell build system (Hakyll rules, Pandoc filters, contexts). See build/Filters/ for the Pandoc AST transforms (sidenotes, wikilinks, transclusion, score embedding, viz, …).
  • content/ — authored Markdown (essays, blog, poetry, fiction, music).
  • templates/ — Hakyll/Pandoc HTML templates.
  • static/ — CSS, JS, fonts, images, vendored PDF.js.
  • tools/ — Python tooling (embeddings, importers) and shell scripts.
  • data/ — generated and source data (commonplace.yaml, annotations.json, bibliographies, similar-links.json).
  • nginx/ — vhost snippets shipped to the VPS (security-headers.conf, static-assets.conf, popup-proxy.conf). The live vhost on the VPS is the source of truth; see nginx/vhost.conf.example for the canonical structure and the include order these snippets expect.

Architecture pointers

  • build/Site.hs is the Hakyll rules entry point.
  • build/Patterns.hs defines canonical content patterns shared by Backlinks, Authors, Tags, and Site.
  • build/Compilers.hs wires the Pandoc filter chain into Hakyll.
  • build/Filters/Images.hs does WebP <picture> wrapping; requires the .webp companions produced by tools/convert-images.sh.

License

See LICENSE.