levineuwirth.org/AUDIT.md

24 KiB

title date
Repository audit 2026-05-07

Repository audit — levineuwirth.org

Comprehensive audit of the repo on main at commit 670d477 (working tree modified: data/now.yaml, static/cv.pdf, static/resume.pdf; untracked Fermata_2.pdf).

Severity legend: HIGH (likely to break a build, cause data loss, or expose a security weakness) — MED (latent bug, brittleness, or documentation drift) — LOW (minor robustness gap or fragile assumption) — NIT (style, polish, or paranoia).

Numbers are file:line. "Unverified" means I noticed the issue but did not reproduce its consequence; the line still appears load-bearing enough to flag.


1. Build & dependency chain

1.1 cabal build from scratch is unsolvable with the current freeze — HIGH

Running cabal build resolves the dependency tree freshly because no fresh .ghc.environment link exists for the current GHC. The freeze pins aeson ==2.2.1.0, but warp (pulled in by hakyll +previewserver) needs hashable ==1.4.7.0/installed, while aeson 2.2.1.0 needs hashable >=1.4.2.0 && <1.4.5.0. Result:

[__8] fail (backjumping, conflict set: aeson, levineuwirth, warp)
After searching the rest of the dependency tree exhaustively, these were
the goals I've had most trouble fulfilling: aeson, warp, hakyll, http2,
async, network-control, unliftio, levineuwirth, hakyll:previewserver

Day-to-day this is masked because dist-newstyle/ has cached binaries from an earlier successful resolve. A fresh clone, a cabal clean, or a GHC upgrade will make make build fail. (cabal.project.freeze:9 pins aeson; levineuwirth.cabal:60 allows >= 2.1 && < 2.3.)

Fix: regenerate the freeze (tools/refreeze.sh) against the current hackage index. If tools/refreeze.sh is what produced the broken freeze, a manual cabal freeze --constraint='aeson >= 2.2.2' is needed.

1.2 levineuwirth.cabal upper bounds are tight — MED

  • hakyll >= 4.16 && < 4.17 (levineuwirth.cabal:52) — pins to a single minor line. 4.17.x is already on Hackage; the freeze is one rebase away from forcing a bound bump.
  • pandoc >= 3.1 && < 3.7 (levineuwirth.cabal:53) — pandoc historically ships breaking changes on minor bumps, so the caution is fair, but 3.7 exists.
  • aeson >= 2.1 && < 2.3 (levineuwirth.cabal:60) — see 1.1; this bound combined with the freeze conflict is what makes the build unsolvable.

1.3 Python version mismatch — HIGH

.python-version says 3.14. pyproject.toml:5 says requires-python = ">=3.12". uv.lock:3 agrees with pyproject. Anyone who clones with pyenv/asdf will install Python 3.14. Anyone whose system ships 3.12/3.13 only will be told the project is fine, then hit .python-version later. Either bump requires-python to >=3.14 or downgrade .python-version to a release that's actually a baseline.

1.4 No tools/model-checksums.sha256 despite supply-chain hardening

in download-model.shHIGH

tools/download-model.sh:75-78 reads the checksum file when present and falls through with a printed note when it's missing. The file is absent from the tree. So today: model weights are pulled from Hugging Face unverified. If the upstream is compromised or MITM'd, the embedding + client-side semantic search ship trojaned weights. The fix path is already documented in the script comments — generate and commit the checksum file.

1.5 Cabal modules vs filesystem — verified consistent

Every .hs under build/ is listed in levineuwirth.cabal's other-modules. No orphan files. No phantom modules.


2. Makefile

2.1 rsync line does not quote variables — MED (security-shaped)

Makefile:147:

rsync -avz --delete _site/ $(VPS_USER)@$(VPS_HOST):$(VPS_PATH)/

If VPS_PATH ever contains a space or a shell metacharacter, the expansion splits and rsync is handed extra arguments. The Makefile does guard VPS_PATH against /, /srv, etc., but does not guard against whitespace or against ; / &&. Most variables in this Makefile are already quoted (@test -s _site/index.html), so this is the odd one out. Quote with "$(VPS_USER)@$(VPS_HOST):$(VPS_PATH)/".

2.2 > IGNORE.txt line — NIT

Makefile:55. The recipe truncates IGNORE.txt at the repo root. It is gitignored. The purpose is undocumented in this Makefile (its intent seems to be "tell whatever sync tool watches the workspace to ignore the build output"). Either replace with : > IGNORE.txt (POSIX no-op) and a one-line comment explaining its consumer, or drop it.

2.3 notify-send … || true swallows errors — NIT

Makefile:141. Fine for a desktop notification, but the || true silently masks notify-send failures. Acceptable.

2.4 Auto-snapshot recipe — NIT (worth re-reading)

Makefile:12-26 runs git add content/ and creates an automatic auto: <ts> [skip ci] commit before every build. The .gitignore excludes credential-shaped patterns under content/, so accidental secrets won't be staged. But:

  • The commit happens regardless of the build outcome. A build that starts and crashes mid-way still leaves a snapshot commit. The comment says this is intentional. It does mean the recent commit history is full of auto: commits even for failed builds.
  • The recipe reads .env via -include .env and exports VPS_USER VPS_HOST VPS_PATH GITHUB_REPO. The comment claims this prevents future GITHUB_TOKEN from leaking. That's correct only if GITHUB_TOKEN is never added to the explicit export list. Worth a comment in .env.example reminding the future author.

2.5 Nested $(MAKE) and parallelism — LOW

Makefile:29 (@$(MAKE) -s pdf-thumbs) and :126 (@$(MAKE) -C yaml-source all) — fine in serial mode, but make -j build will parallelize sub-makes against the parent's job server only if they inherit MAKEFLAGS. The -s flag is fine, but if parallelism is ever desired, audit this.


3. Haskell build code (build/)

3.1 unsafePerformIO with module-global IORef — MED

build/Filters/SourceRefs.hs:155:

{-# NOINLINE existsCacheRef #-}
existsCacheRef :: IORef (Map.Map Text Bool)
existsCacheRef = unsafePerformIO (newIORef Map.empty)

Standard "global mutable cache" pattern. NOINLINE is correct. cabal run site -- watch and cabal run site -- build are single-threaded today (Hakyll's compile loop is sequential), but the cabal file enables -threaded, and the cache is reachable from any compiler thread. Cache entries can also become stale between watches if a referenced source file is deleted: the cache holds Just True, but doesFileExist would now return False. Two practical consequences:

  1. If a file is moved, watch may keep treating wikilinks/source-refs to the old path as live until the build server restarts.
  2. If existsCacheRef is ever read concurrently by two threads, the atomicModifyIORef' is safe but the underlying check race could let two threads call doesFileExist on the same path. Harmless.

Acceptable as-is; document the staleness caveat.

3.2 Lazy readFile in IO — MED

  • build/Stats.hs:857-860: readFile "data/last-build-seconds.txt" is lazy, wrapped in a catch that returns "\x2014" on any IOException. The em-dash fallback hides "file missing", "permission denied", and "encoding error" alike. Worse, lazy IO means the handle may be open at the time the catch fires. Use Data.Text.IO.readFile or withFile+hGetContents'.
  • build/BibExtras.hs:66: parseBibExtras path = … <$> readFile path. Same concern. Failure surfaces only when the result is forced.

Fix: standardize on strict Data.Text.IO.readFile (already used in build/Stability.hs:56,144 and build/Now.hs).

3.3 Defensive but technically partial pattern matches — LOW

These are "this case can't happen because of the guard" patterns. They all carry a comment, so they're not bugs, but -Wall may warn (and they reduce confidence under refactor). Cite-and-fix is straightforward.

  • build/Stats.hs:169-172median falls through to 0 on unreachable empty after a length-based guard.
  • build/Stability.hs:109-113stabilityFromDates falls through.
  • build/Catalog.hs:233-235renderGroup [] when groupBy cannot produce empty groups.
  • build/Tags.hs:181init segs after a length-> 1 guard.
  • build/Stability.hs:297, 311, 324last (newest:more).

Replace each with structural pattern matches ((x:xs), NonEmpty) or use Data.List.NonEmpty. Or pragma-suppress the warning.

3.4 Magic offsets / hardcoded prefixes — LOW

  • build/Site.hs:388, 392: replaceExtension (drop 8 fp) "html"drop 8 is "strip content/". T.stripPrefix reads better and fails closed.
  • build/Filters/Wikilinks.hs:43, 77-78: assumes destination URLs end with .html. Documented in code; brittle if routing changes.

3.5 fail for parse errors aborts the entire build — LOW

  • build/Commonplace.hs:144 and build/Now.hs:258: a malformed commonplace.yaml or now.yaml aborts the build. The data is hand-edited and small, so this is fine; a friendly error message would be nicer.
  • build/Backlinks.hs:359: fail "backlinks: could not parse data/backlinks.json" aborts every page that uses the backlinks context. The file is generated at build time, so corruption is unlikely, but consider degrading to "no backlinks" instead.

3.6 Silent-drop parsers — LOW

  • build/BibExtras.hs:95: malformed .bib entries become [] with no warning. The author edits these by hand; a stderr note for dropped entries would catch typos.
  • build/Contexts.hs:198-205: malformed history entries are silently dropped. Same trade-off.
  • build/Stats.hs:464: listDirectory dir catch returns [] on any IOException. Acceptable for stats.

3.7 trim does double-reverse — NIT

build/Utils.hs:61. dropWhileEnd (Data.List) avoids the second reverse. Cosmetic.


4. Tools (tools/)

4.1 tools/extract-exif.py:292 uses Pillow's deprecated _getexif()MED

exif = img._getexif() or {}

Pillow has marked _getexif private since 9.0. The public API is img.getexif(). The bound in pyproject.toml allows up to Pillow 12, so a future uv sync could break this silently. One-line fix.

4.2 embed.py and import-poetry.py are not executable — LOW

Both have #!/usr/bin/env python3 shebangs but bits are 0644, while their siblings (extract-*.py) are 0755. The Makefile invokes them via uv run python tools/embed.py, so this is cosmetic — unless a future contributor tries ./tools/embed.py. chmod +x both.

4.3 tools/import-photo.sh does not check magick exit codes — MED

  • Lines ~115-122: the resize/-strip magick call has no || exit.
  • Line ~144: magick mogrify -strip "$TARGET" likewise. If mogrify fails, EXIF survives, but the script proceeds to write frontmatter asserting the photo was stripped.

The shell prelude already runs set -euo pipefail, but magick … | … can still partial-succeed with the pipefail correctly catching it. A direct magick … "$TARGET" || exit 1 is clearer.

4.4 tools/import-photo.sh does not validate $SLUGLOW

The slug is taken from CLI input and used as content/photography/$SLUG. A slug containing ../ traverses out of the photography tree. The Hakyll build would refuse to ingest it later, but the import has already written files. Add a [[ "$SLUG" =~ ^[a-z0-9-]+$ ]] || exit 1 near the argument parse.

4.5 subset-fonts.sh hardcodes Arch font paths — LOW

SPECTRAL=/usr/share/fonts/ttf-spectral, FIRA=/usr/share/fonts/TTF, etc. macOS / Debian put fonts elsewhere. Doesn't break the site (the script is rarely run), but the README does not mention this constraint.

4.6 download-pdfjs.sh checksum scope is narrow — LOW

tools/pdfjs-checksums.sha256 pins only the archive. After extraction, the unpacked tree is trusted blindly. Compare to tools/leaflet-checksums.sha256, which pins individual extracted files. The archive pin is sufficient against tampered downloads but offers nothing against a corrupted unzip on disk.

4.7 add-popup-source.sh masks curl failures — LOW

Lines ~67, ~98: curl -sSI … 2>&1 || true followed by piping into grep. A network failure produces an empty $HEADERS, and the downstream "CORS allowed?" detection silently reports OK. The script is interactive, so a user notices, but a stricter if curl … ; then guard would be better.

4.8 embed.py model staleness window — LOW

tools/embed.py:39 hardcodes MODEL_NAME = "all-MiniLM-L6-v2" and DIM = 384. The Hugging Face cache is unpinned, so a model bump would silently change embedding semantics. The script regenerates everything, so the immediate breakage would be benign, but commits referencing the similar-links file would then drift. Pin to a model revision SHA.

4.9 embed.py needs_update race — LOW

tools/embed.py:79-84 calls .stat().st_mtime while iterating SITE_DIR.rglob("*.html"). A file deleted mid-walk raises FileNotFoundError. In practice the build runs solo, so this never triggers; mention it.

4.10 extract-*.py swallow exceptions without traceback — LOW

extract-dimensions.py:101, extract-exif.py:424, extract-palette.py:105: each prints f"…: {e}" and continues. When a file is corrupt, the operator sees the exception type but no stack trace. Adding traceback.format_exc() to the stderr line costs nothing.

4.11 Other shell scripts

  • All shell scripts in tools/ already use set -euo pipefail.
  • convert-images.sh, compress-assets.sh, download-leaflet.sh, sign-site.sh, preset-signing-passphrase.sh are clean.
  • compress-assets.sh:21 — no validation that MIN_SIZE is numeric. A misconfigured env var fails with a cryptic arithmetic error. NIT.

4.12 Stray TODOs in tooling — NIT

  • tools/add-popup-source.sh:12,128,131,134,137,156,194 — by design; the script is a scaffolder.
  • tools/import-photo.sh:185 — emits caption: TODO — short caption… into the generated index.md. Authors who forget to edit will ship the literal TODO. A make-time check (! grep -r "TODO " content/ photography) would catch it.

5. Content & frontmatter (content/)

5.1 Every date: in frontmatter is unquoted — MED

WRITING.md:103 shows the canonical form as date: "2026-03-01". Across all of content/ (sample: 40+ files), every date: line is unquoted. Examples:

  • content/index.md, content/about.md, content/colophon.md, content/library.md, content/search.md, content/current.md, content/links.md, content/gpg.md, content/commonplace.md
  • All essays under content/essays/ and drafts under content/drafts/
  • All tag-meta files

YAML promotes ISO 8601 to a Date, not a String. Hakyll's dateField historically reads the string back, but as the Pandoc YAML decoder evolves, this can shift. Either the documentation is wrong (and dates are deliberately stored as YAML dates) or the corpus is. Reconcile by either:

  1. Quoting all dates project-wide (sed across content/).
  2. Updating WRITING.md:103 to show the unquoted form.

5.2 content/tag-meta/*.md lack title:MED (likely intentional, undocumented)

Nine files under content/tag-meta/ have only tooltip: in their frontmatter, no title:. WRITING.md documents title: as required on every authored page. Either:

  • The Hakyll rules for tag-meta consume a different schema (likely — the title comes from the tag itself), in which case WRITING.md should mention this exception, or
  • Hakyll is silently inserting empty titles into rendered tag pages.

Files: ai.md, fiction.md, miscellany.md, music.md, nonfiction.md, photography.md, poetry.md, research.md, tech.md.

5.3 Fermata_2.pdf at the repo root — MED

48 KB PDF, untracked, not in .gitignore, not referenced by any template/CSS/script/Markdown. git log shows no history. Likely dropped by accident during writing. Either move it under static/papers/ (with thumbnail) or delete it. While present at the root, the auto-snapshot git add content/ will not pick it up — but any future git add . typo will.

5.4 data/now.yaml shows last-updated: 2026-05-06, today is 2026-05-07 — NIT

Working-tree modification, not yet committed. If the page is meant to read "yesterday", it's fine; if it's meant to read "today", refresh.

A spot-grep for [[...]] references against the page slugs found nothing pointing outside the corpus. The audit only verified the high-traffic pages (essays, drafts, photography); a complete walk-through would need a Hakyll-aware checker.

5.6 Image references — verified

All relative image references in essays I sampled (memento-mori, specification-dilemma, beyond-comorbidity-indices, where-does-simd-help-post-quantum-cryptography) resolve to existing files.


6. Static assets (static/)

6.1 static/images/canto31.jpg is 4.0 MB — MED

Single largest static asset. Loads on whichever page references it. A 2400px JPEG should be ≤ 800 KB at quality 85. WebP companion will help modern browsers, but the legacy JPEG still ships. Either re-export at quality 80 / 2400px, or move to content/ so the photography pipeline can manage it.

6.2 No console.log survivors — verified

A grep across static/js/ finds none.

6.3 No orphaned vendored libraries — verified

pdfjs/, leaflet/, models/ are all .gitignore'd and downloaded fresh by the Makefile.

6.4 No http:// references in CSS / templates — verified

Only the SVG/XML namespace declarations in vendored pdfjs/ use http://, which is the correct (non-fetched) form for XML.


7. Templates (templates/)

7.1 No robots.txt and no sitemap.xml are emitted — MED (SEO)

_site/ after a build does not contain either file. build/Site.hs has no rule for them. templates/ has no template for them. For a content-heavy personal site this is meaningful: search engines have no crawl guidance and no canonical URL list. Add a create "robots.txt" and a create "sitemap.xml" rule (Hakyll supports both via makeItem/renderRss-style compilers).

7.2 No <meta name="robots">NIT

templates/partials/head.html has og:image, canonical, og:title / og:description. No <meta name="robots" content="…"> and no fallback indexing hint. Together with §7.1, this is "search visibility is unconfigured".

7.3 Tag-balance — verified

A pairing check across templates/*.html for $if$/$endif$ and $for$/$endfor$ blocks (accounting for partial inheritance) reported no mismatches. The earlier flagged occurrences resolve when the relevant partial is included.


8. Data files (data/)

8.1 data/annotations.json is {}NIT

Empty object. Either populate or document that it's intentionally a schema slot.

8.2 data/now.yaml — see §5.4.

8.3 Generated files (semantic-index.bin, semantic-meta.json,

similar-links.json, build-start.txt, last-build-seconds.txt) — verified gitignored.


9. nginx (nginx/)

9.1 No security headers — HIGH (security)

nginx/static-assets.conf and nginx/popup-proxy.conf set neither of:

  • server_tokens off;
  • Strict-Transport-Security (HSTS, with preload if HSTS-preload-listed)
  • Content-Security-Policy (or at minimum a CSP report-only)
  • X-Content-Type-Options: nosniff
  • X-Frame-Options: SAMEORIGIN (or frame-ancestors in CSP)
  • Referrer-Policy: strict-origin-when-cross-origin
  • Permissions-Policy (camera/microphone/geolocation deny)

These would normally live in the vhost rather than these include snippets, which is presumably where they belong on the VPS. But the repo has no vhost file checked in, which means the configuration in the repo is incomplete. Either commit a nginx/vhost.conf with the security headers or document explicitly that the vhost is owned outside the repo.

9.2 nginx/static-assets.conf:75-78 — CSS/JS must-revalidate with

max-age=86400MED

CSS/JS filenames are not fingerprinted (no app.abc123.css). A 1-day must-revalidate means a stylesheet bug ships for up to 24 hours per client. Either drop max-age to 3600 or add a build-time content hash to filenames (and switch to immutable).

9.3 popup-proxy.conf:28 — public DNS resolver — LOW

resolver 1.1.1.1 8.8.8.8 ipv6=off valid=300s;. Fine on a VPS without local DNS, but if the host runs systemd-resolved, prefer 127.0.0.1:53. Also leaks "this server proxies to {arxiv, internet-archive, ncbi}" to whichever resolver answers — the public resolvers see the upstream queries.

9.4 popup-proxy caching — verified

30-day cache on arXiv/PubMed metadata, 7-day on Internet Archive, proxy_cache_lock on, proxy_cache_use_stale. PubMed has limit_req zone=pubmed burst=3 nodelay;, which matches NCBI etiquette.


10. README and ancillary docs

10.1 README.md references files that do not exist — MED

  • README.md:70-71: "paper/ — LaTeX source for in-progress academic papers." There is no paper/ directory.
  • README.md:71, 82: "spec.md — full architectural notes". There is no spec.md.

yaml-source/ is mentioned and explained as local-only on README.md:118. paper/ and spec.md are not. Either create the files (even as stubs) or remove the references.

10.2 README "Repository layout" section is otherwise current —

verified

build/, content/, templates/, static/, tools/, data/, all present and described accurately.

10.3 checklist.md, HOMEPAGE.md, PHOTOGRAPHY.md, WRITING.md

not shipped to _site/ — verified

checklist.md is gitignored. HOMEPAGE.md, PHOTOGRAPHY.md, WRITING.md are tracked but not copied into _site/ (no Hakyll rule matches them). Acceptable.


11. .env, .env.example, .gitignore

11.1 .env is mode 0600 and gitignored — verified.

11.2 .env.example documents every variable the Makefile reads —

verified.

11.3 .gitignore defense-in-depth credential exclusion — verified

(.gitignore:10-27).

11.4 Redundant entries — NIT

.gitignore:81-86 lists README.profile.md, README.arcana.md, README.simd.md, README.icd.md, README.neuropose.md. None exist. These are presumably scratch-pad names; harmless but cluttering.


12. Repo hygiene

12.1 Working-tree dirty on mainNIT

data/now.yaml, static/cv.pdf, static/resume.pdf are modified; Fermata_2.pdf is untracked. The CV/resume PDFs are produced by make pdfs, so the diff is presumably expected. Commit or revert before the next deploy.

12.2 Cache size — NIT

_cache/ 8.5 MB, dist-newstyle/ 22 MB. Reasonable.

12.3 Auto-commit pollution — NIT

git log --oneline -20 shows ~12 of the last 20 commits are auto: <timestamp> [skip ci]. This is by design (see §2.4); just note that git log for narrative review needs --invert-grep --grep='^auto:'.


In rough order of cost-to-impact:

  1. §1.1 — Regenerate cabal.project.freeze so a fresh clone can build. (Single command if tools/refreeze.sh works; otherwise a manual cabal freeze after bumping aeson.)
  2. §9.1 — Commit a vhost (or document explicitly that the vhost lives on the VPS) and add the standard security header set.
  3. §1.4 — Generate and commit tools/model-checksums.sha256.
  4. §1.3 — Reconcile .python-version (3.14) and requires-python (>= 3.12).
  5. §5.1 — Decide canonical date form, then sweep content/ or WRITING.md:103.
  6. §10.1 — Drop paper/ + spec.md references from README.md (or write them).
  7. §7.1 — Emit robots.txt + sitemap.xml from Hakyll.
  8. §5.3 — Move or delete Fermata_2.pdf.
  9. §4.1, §4.3, §4.4, §3.2 — Small Python and Haskell hardening.
  10. §3.3 — Replace defensive partial matches with structural ones.

Everything else in this document is style/polish or low-risk brittleness.