levineuwirth.org/AUDIT-2026-06-09.md

46 KiB
Raw Blame History

title date
Repository audit 2026-06-09

Repository audit — levineuwirth.org (2026-06-09)

Comprehensive audit of the repo on main at commit 620b974 (working tree modified: branding refresh across static/ + templates/partials/, plus tools/embed.py rework; untracked static/og-image.png, templates/partials/logo-mark.svg, data/embed-cache-pages.npz.tmp.npz).

Severity legend: HIGH (likely to break a build, cause data loss, or expose a security weakness) — MED (latent bug, brittleness, or documentation drift) — LOW (minor robustness gap or fragile assumption) — NIT (style, polish, or paranoia).

Numbers are file:line against the working tree at audit time. Findings marked "verified" were reproduced empirically (solver runs, built _site/ output inspection, live HTTP checks, binary parsing); the rest were confirmed by reading the code.

Prior audit: AUDIT.md (2026-05-07). Follow-up status in §10.


1. Build & dependency chain

1.1 cabal.project.freeze is unsolvable again — next clean build fails — HIGH

cabal build --dry-run fails today (verified): the freeze pins distributive ==0.6.2.1, but the system (pacman) GHC package db has comonad-5.0.10 built against distributive-0.6.3:

rejecting: distributive-0.6.3/installed... (constraint from
cabal.project.freeze requires ==0.6.2.1)
After searching the rest of the dependency tree exhaustively...

The conflict set also names aeson, warp, hakyll, http2, semigroupoids. This is the same failure mode as prior-audit §1.1 — that audit's specific aeson pin was fixed (now 2.2.2.0/hashable 1.4.7.0), but a different package broke the same way after a system update. Recent builds succeed only off the cached dist-newstyle/cache/plan.json; the freeze file has since changed, so the next cabal invocation re-solves and fails. Because make deploy starts with make clean, the next deploy hits this. levineuwirth.cabal's own bounds are compatible with the freeze — the conflict is freeze-vs-installed-db, not freeze-vs-cabal-file.

Fix: tools/refreeze.sh (written for exactly this post-pacman -Syu situation). The underlying fragility — freezing against a mutable system package db — remains; consider documenting the refreeze step as part of any system-upgrade ritual. (In progress at time of writing.)

1.2 Missing data/archive-index.json / archive-state.json crashes the build — HIGH

build/ArchiveIndex.hs:134-146. The module doc (lines 18-22) promises "An absent or malformed file degrades safely: an empty index makes the link consumers no-op; an absent state file makes every entry @Live@." But rawIndex = unsafePerformIO $ do decoded <- A.eitherDecodeFileStrict' indexPath (and identically rawState) never checks doesFileExist, and aeson's eitherDecodeFileStrict' throws an uncaught IOException on a missing file (verified: withBinaryFile: does not exist). Both files are gitignored (.gitignore:84-85), so a fresh clone or a no-.venv build — the exact path build/Archive.hs:20-24 promises to support — throws when the CAF is first forced. Contrast readUrlSet (line 109) in the same file, which guards correctly. Currently latent on this machine only because both generated files happen to exist.

1.3 embed.py trust_remote_code=True executes unpinned third-party code — HIGH

tools/embed.py:329 (line ~341 in the uncommitted version). The new page-model load is SentenceTransformer(PAGE_MODEL_NAME, revision=PAGE_MODEL_REVISION, trust_remote_code=True). The revision arg pins only the nomic-ai/nomic-embed-text-v1.5 repo; the actual modeling code is pulled via auto_map from a different repo — verified in the local HF cache: the executed code lives under transformers_modules/nomic_hyphen_ai/nomic_hyphen_bert_hyphen_2048/..., i.e. nomic-ai/nomic-bert-2048 at its current head, which nothing pins. A compromise of that second repo runs arbitrary Python at build time, in a repo whose every other download path (download-model.sh, pdfjs, leaflet) is sha256-pinned. The comment "Both pins are deliberate" is therefore misleading. Fix: pin via code_revision, or run with HF_HUB_OFFLINE=1 after first fetch, or document the accepted risk.

1.4 Working-tree commit hazard: tracked templates reference untracked files — HIGH (process)

templates/partials/nav.html:5 (tracked, modified) adds $partial("templates/partials/logo-mark.svg")$ and templates/partials/head.html references /og-image.png — both target files are untracked (no git history). Committing the template diff without git add-ing both breaks every page's Hakyll build on a fresh clone ($partial$ aborts compilation) and 404s the og:image. They must land in the same commit. Conversely, data/embed-cache-pages.npz.tmp.npz must not be committed (see §4.1). The partial itself is safe as a Hakyll template (verified: zero $ characters; match "templates/**" compiles it).

1.5 einops dependency: undocumented, unbounded, imported nowhere — LOW

pyproject.toml:27 adds einops>=0.8.2. No import anywhere in tools//build//static/js/; its only consumer is nomic's trust_remote_code module (§1.3). Every sibling dependency has an explanatory comment and an upper bound per the file's own stated policy ("Upper bounds are intentionally generous (next major) but always present"); einops has neither. uv lock --check passes (0.8.2 pinned).


2. Haskell build code — core

build/Site.hs:50-60 (homePortals contains ("Fiction","fiction"), ("Poetry","poetry")), templates/partials/nav.html:56,61, templates/library.html:44,58. No rule generates either index: fiction and poetry are not in tagIndexable (build/Patterns.hs:148-151 = essays + blog + photos) and Site.hs has no landing rule. Verified: _site/fiction does not exist; _site/poetry/ has no index.html. nginx has no redirects. Both links 404 in production today.

2.2 Tag/route collisions guarded for photography only — MED

build/Tags.hs:98-99. tagIdentifier maps tag tt ++ "/index.html"; sectionOwnedTopLevelTags = ["photography"] is the only guard. A tagIndexable item tagged music (or music/x, which expands to music) emits music/index.html, already owned by the music index route (build/Site.hs:486-487); similarly essays, blog, cv, archive, authors, bibliography. Hakyll does not error on duplicate routes — one silently overwrites the other.

2.3 Sidenotes filter destroys the documented no-JS fallback — MED

build/Filters/Sidenotes.hs:30-36 vs static/css/sidenotes.css:125-135. The module doc claims the Pandoc <section class="footnotes"> "serves as fallback," but apply replaces every Note, so the writer never emits the section. CSS depends on it below 1500px. Verified in output: _site/essays/scaling_outage.html has 3 class="sidenote" and zero footnotes occurrences. With JS disabled, footnote content is invisible on narrow viewports. The comment, the CSS, and ozymandias.md's own prose all contradict actual behavior.

2.4 Sidenote bodies rendered without the KaTeX writer — MED

build/Filters/Sidenotes.hs:103-115. inlinesToHtml/blocksToHtml use writeHtml5String (def :: WriterOptions) (PlainMath), while the main pipeline uses KaTeX "" (build/Compilers.hs:47). Math inside a footnote never gets <span class="math inline">\(...\)</span>, so KaTeX never renders it — degrades to plain italics, silently inconsistent with body math.

2.5 SourceRefs whitelist vs /source/ serving whitelist have drifted — MED

build/Filters/SourceRefs.hs:114-141 vs build/Site.hs:217-240. Site.hs:209 says "must stay aligned with 'isSourcePath'". Mismatches: SourceRefs wraps content/ and yaml-source/ (no Site counterpart); static/ + any known ext vs Site's static/js/**/static/css/** only; tools/ + any ext vs Site's tools/**.sh/tools/**.py; data/ at any depth vs Site's top-level data/*.{json,yaml,md,bib}. Each mismatch yields a wrapped source-ref whose popup fetch 404s (Forgejo href fallback still works). Inverse: Site serves data/*.bib but .bib is missing from hasKnownExt — dead whitelist entry.

2.6 epistemicEntry ignores confidence: provedMED

build/Site.hs:1014-1024. Comment: "Compute overall-score the same way Contexts.overallScoreField does," but it uses readMaybe =<< lookupString "confidence" meta, which is Nothing for "proved"/"proven", whereas Contexts.overallScoreField (build/Contexts.hs:574-576) substitutes 100 via isProvedConfidence. Proved pages get no score in data/epistemic-meta.json and export the raw string under confidence, so client-side filtering silently misses them.

2.7 Empty affiliation <div> ships on every essay without affiliation:MED

build/Contexts.hs:84-89 + templates/partials/metadata-tail.html:12. affiliationField returns an empty list instead of noResult; Hakyll's $if$ is truthy for empty list fields (the codebase knows this — tagLinksFieldExcludingScope uses noResult for exactly this reason). Verified in output: _site/essays/asymmetric-forgetting.html contains <div class="meta-row meta-affiliation"> with whitespace-only content.

2.8 Library page hard-depends on content/library.mdLOW

build/Site.hs:675. _ <- loadSnapshot libraryIntroId "body" is a top-level compiler statement (not inside a field), so it's a hard failure. The block is documented as "optional prose block"; deleting content/library.md breaks the whole library.html compile. Contrast the existence-guarded sidecars at build/Tags.hs:277-283 and build/Site.hs:843-850.

2.9 Library primaryPortalOf reads only list-form tags:LOW

build/Site.hs:632-638. lookupStringList "tags" returns Nothing for scalar comma form (tags: research, ai), which Hakyll's getTags accepts. Such an item appears on tag pages but is silently dropped from the library. All current content uses list form — latent.

build/Patterns.hs:124-133, used by build/Backlinks.hs:334,345. Despite "Every content file the backlinks pass should index," content/me/index.md and content/memento-mori/index.md (full essays, rendered with backlinksField) never have their outgoing links extracted; photography likewise. Either deliberate-but-undocumented or the exact silent omission the module header says it exists to prevent.

2.11 Paginated tag pages: split by creation date, sorted by display date — LOW

build/Tags.hs:371-377. buildPaginateWith (sortAndGroupAt tagPageSize) partitions via sortRecentFirst (creation date), then each page re-sorts with recentFirstByDisplay (revision-aware). A recently revised old item stays on a late page but jumps to its top — cross-page ordering is not monotone. Only fires above the 150-item threshold.

2.12 fill:#000 replacement corrupts longer hex colors — LOW

build/Filters/Score.hs:118-133 (and Filters/Viz.hs processColors). The 6-digit pass protects only #000000; for fill:#000080 the 3-digit pass produces fill:currentColor80 — invalid CSS, silently mangled SVG. Quoted attribute forms are safe; only unquoted style-property forms are exposed.

2.13 Source-level preprocessors rewrite inside fenced code blocks — LOW

build/Filters/Wikilinks.hs:24-31, Filters/Transclusion.hs:18-20, Filters/EmbedPdf.hs. All run on the raw source before Pandoc parses fences: [[anything]] in a code block becomes a link; a code-block line that is exactly {{slug}} or {{pdf:...}} becomes raw HTML. Transclusion's comment ("prevents accidental substitution inside prose or code") is false for full-line directives in code blocks. A live foot-gun for a site that documents its own syntax (ozymandias.md does exactly this).

2.14 domainIcon matches substrings of the whole URL, not the host — LOW

build/Filters/Links.hs:120-153. "x.com" T.isInfixOf url etc. — https://example.org/why-x.com-failed gets the Twitter icon. Contradicts the strict-hostname discipline isExternal documents at lines 95-101 of the same file. Cosmetic (icon only).

2.15 gsubRoute "content/" strips every occurrence, not just the prefix — LOW

build/Site.hs:171,357,417 etc. Hakyll's gsubRoute is replace-all; a co-located directory literally named content would be silently mangled (content/essays/slug/content/data.csvessays/slug/data.csv). Same for gsubRoute "static/". Improbable but silent.

2.16 existsCached memoizes non-existence for the process lifetime — LOW

build/Filters/SourceRefs.hs:160-166. Under make watch, a source file created after first reference stays cached as absent until restart.

2.17 Core NITs

  • build/Site.hs:42-44: comment says "eight portals"; the list has nine. Echoed at Site.hs:606 ("the eight") vs line 657's "nine times".
  • build/Site.hs:866-877: random-pages.json comment says "essays + blog posts only" but the rule loads fiction and flat poetry too; uses flat-only content/poetry/*.md while the epistemic rule uses allPoetry — collection poems are epistemic-indexed but never randomizable.
  • build/Utils.hs:64-73: authorSlugify comment claims runs of spaces collapse; code maps each space ("A B""a--b"). Consistent everywhere, so links work; comment wrong.
  • build/Utils.hs:31-32: readingTime truncates (div 200) — 399 words reports "1 min"; comment implies ceiling semantics.
  • build/Pagination.hs:42 + build/Site.hs:77-82: hardcoded pattern literals duplicate Patterns.hs, defeating that module's stated purpose (Patterns.hs:6-10).
  • build/Contexts.hs:174-180: plain tagLinksField returns an empty list rather than noResult$if(item-tags)$ is true and templates emit empty tag wrappers (author-index.html, item-card.html).
  • build/Tags.hs:296-304: tagItemCtx composes defaultContext, not siteCtx, so $if(has-monogram)$ never fires on tag pages — monograms render on new.html/library but silently never on tag indexes.
  • build/Contexts.hs:485-492: dotsField comment says "15" but accepts 0 (max 0 (min 5 n)) — importance: 0 renders five empty circles.
  • build/Contexts.hs:375-381: descriptionField doc says noResult; code uses fail — behaviorally fine under Hakyll 4.16 $if$ (verified against Hakyll 4.16.7.1 source) but logs [ERROR] debug noise per abstract-less page. Same in abstractField, summaryField, bibliographyField.
  • build/Filters/Images.hs:233-234: webpSrc interpolated into srcset unescaped while sibling src goes through esc.
  • build/Filters/Links.hs:37-46,63-69: internal PDF links double-classified (pdf-link + link-internal chrome) despite the "no overlap" comment.
  • build/Filters/Smallcaps.hs:31-34 + Filters/Archive.hs:42-44: "headers are skipped" only at top level; a Header nested in a Div/BlockQuote is processed, contradicting the comments.

Verified clean: no unguarded head/fromJust/read/!! hazards in the core modules; filter composition order matches its documenting comments; Hakyll 4.16.7.1 $if$ treats both fail and noResult as false.


3. Haskell build code — feature modules

3.1 Stats heatmap day-of-week off-by-one: Sunday clipped out of the SVG — MED

build/Stats.hs:185,300,317. dowOf d = fromEnum (dayOfWeek d) -- Mon=0..Sun=6 — but time-1.12.2 is ISO-numbered (verified: map fromEnum [Monday..Sunday] == [1..7]). So Sunday lands at y=106 while svgH = 104 — every Sunday cell is clipped out of the viewBox and grid row 0 is permanently blank. Relatedly, weekStart returns the previous Sunday (and for a Sunday, 7 days back), not the "first Monday on or before" its comment claims; builds run on a Sunday also clip the newest column horizontally.

3.2 Commonplace.hs uses Char8.pack — non-ASCII YAML corruption — MED

build/Commonplace.hs:143. Y.decodeEither' (BS.pack raw) with Data.ByteString.Char8 truncates each Char to 8 bits — the exact hazard build/Now.hs:249-253 documents and fixes with TE.encodeUtf8. data/commonplace.yaml is currently pure ASCII, so latent — but a commonplace book of quotations is the likeliest file to acquire an em-dash or curly quote, which will then either fail the YAML parse or publish mojibake.

build/Backlinks.hs:220-226. extractLinksWithContext's go handles Para, BlockQuote, Div, BulletList, OrderedList, then go _ = []. Tight list items (the default - item form) are Plain blocks, not Para, so recursion into list children yields nothing. Every internal link written in a tight list never produces a backlink. Header, Table, and DefinitionList blocks are likewise skipped. The doc comment implies coverage it doesn't deliver.

3.4 Stability "age" is the first→last commit span, not time since first commit — MED

build/Stability.hs:89-93,99-112. Docs say "age in days since first commit," but classify (length dates) (daySpan (last dates) newest) computes the span between first and most recent commit, with no reference to today. A piece written in a one-week burst years ago reports "volatile" forever; time passing without commits can never increase stability. Either the comment or the metric is wrong.

3.5 Frontmatter history: assumed newest-first; WRITING.md documents oldest-first — MED

build/Stability.hs:204-217,299-336 vs WRITING.md:105-109. loadVersionHistory keeps authored order and all range fields treat the head as newest (es@(newest:_) -> let oldest = last es). Git history is newest-first, but WRITING.md's history: example is oldest-first. With the documented ordering, version-history-range renders reversed ("14 March 2026 1 March 2026"), range-start returns the newest date, and version-history-primary shows the three oldest entries.

3.6 Archive manifest→provenance join is exact-string, rest of system is normalized — MED

build/Archive.hs:269. Map.lookup (meUrl me) provByUrl joins on the raw URL; everywhere else equivalence is normalizeUrl (ArchiveIndex filtering, dup detection, ARCHIVE.md:189-192). Editing a manifest URL to a normalization-equivalent form (httphttps, trailing slash, tracking param) silently unpublishes /archive/<slug>/ while ArchiveIndex's normalized filter keeps the slug active — links keep pointing at a 404.

3.7 Photography buildPin computes wrong slug/thumb/title for flat entries — MED

build/Photography.hs:354,362. slug = takeFileName (takeDirectory fp) — for a flat content/photography/foo.md this yields "photography", so map.json gets "slug": "photography", the title fallback is wrong, and thumb = "/photography/photography/<p>" 404s (flat-single assets route to /photography/<asset>). PHOTOGRAPHY.md:214 explicitly supports flat singles. Latent — content/photography/ currently has only index.md — but breaks the first geo-tagged flat single.

3.8 geo-precision fails open: a typo'd "hidden" publishes coordinates — MED

build/Photography.hs:347-349,312-320. Only the exact string matches ((_, Just "hidden", _) -> return Nothing); any other value (e.g. Hidden, hiddn) falls into roundCoord, whose catch-all treats unknown values as city (~10 km rounding) — publishing coordinates the author meant to suppress. Contradicts the file's own privacy comment (lines 287-289) and the fail-closed precedent for visibility: in build/Archive.hs:77-83.

3.9 Archive state is process-lifetime cached — watch goes stale — LOW

build/ArchiveIndex.hs:123-146 + build/Archive.hs:304. activeUrls/rawIndex/rawState are NOINLINE unsafePerformIO CAFs read once per process, and archiveRules reads the manifest in preprocess. Under site watch, edits to manifest.yaml, removed.yaml, or the regenerated state JSONs are never re-read until restart. One-shot builds unaffected.

3.10 Pinned pages render raw ISO in $last-reviewed$LOW

build/Stability.hs:166-170. The git branch formats via fmtIso ("1 May 2026"); the IGNORE.txt-pinned branch returns the frontmatter value verbatim ("2026-05-01") — inconsistent display formatting.

3.11 Empty/all-comments manifest.yaml halts the build — LOW

build/Archive.hs:158-170. An empty YAML stream decodes as Null, which fails to parse as [ManifestEntry] and takes the exitFailure branch — draining the manifest to zero entries is fatal rather than the empty archive the absent-file branch supports.

build/Backlinks.hs:275-281. Strips .html but not index.html/trailing slash: a page routed essays/foo/index.html keys as /essays/foo/index, but a body link authored /essays/foo/ doesn't match — backlink silently dropped. build/SimilarLinks.hs:97-99 handles exactly this case and its comment flags the divergence.

build/SimilarLinks.hs:155-164. viewerUrl = "/pdfjs/web/viewer.html?file=" ++ escapeHtml rawescapeHtml handles HTML metachars only; a path containing &, ?, #, or spaces breaks the file= query value.

3.14 Photography feed thumbnails only for directory-form entries — LOW

build/Photography.hs:449-453. imgTag requires isDir; flat singles and series children (<series>/<photo>.md) get text-only feed entries, against PHOTOGRAPHY.md's "thumbnails embedded inline" (lines 36, 445) and the feed's deliberate inclusion of series children.

3.15 Marks: missing confidence/evidence renders a literal "0 TRUST" — LOW

build/Marks.hs:272-278,565. computeTrust _ _ = 0 with a comment claiming the figure "collapses to the bare frame," but renderEpistemicFigure unconditionally calls renderTrustLabel, so a piece with status: but no confidence/evidence (a case MARKS.md:696 says should render) displays a prominent center "0" — indistinguishable from an authored zero-trust score.

3.16 Feature-module NITs

  • build/Catalog.hs:228-235: two distinct unknown categories render as adjacent duplicate "Other" sections (equal rank, groupBy on raw string).
  • build/Stats.hs:754-777: pageTOC comment says "nine h2 sections"; lists eleven (matching the eleven rendered).
  • build/SimilarLinks.hs:51-54: comment says "the template caps the display"; the code caps it (take maxSimilar at line 80).
  • build/Stats.hs:169-171, build/Archive.hs:564-569: "median" is the upper-median for even-length lists.
  • build/Backlinks.hs:133-153: protocol-relative //host/path URLs pass isPageLink and pollute backlinks.json.
  • build/BibExtras.hs:75-98: @string/@comment/@preamble blocks parsed as citekey entries — only consequential on a citekey/macro-name collision.

Verified clean: Marks tick positions/axis order/radii match MARKS.md §3; proved-confidence trust substitution matches §4.3; Archive's fail-closed visibility validation, removed.yaml conflict rejection, and double-sided SHA-256 verification all match ARCHIVE.md.


4. Python & shell tooling

4.1 data/embed-cache-pages.npz.tmp.npz orphan: explained; cleanup + ignore gaps — MED

The orphan (mtime May 26) is the fossil of a fixed bug: an earlier embed.py passed a bare path to np.savez_compressed, numpy appended .npz (verified in numpy's _savez source), and the subsequent os.replace raised FileNotFoundError, stranding the file. The current file-handle code (tools/embed.py:173-183) is correct, but: (a) nothing deletes the stale orphan — delete it, don't commit it; (b) the tmp write has no try/finally, so any mid-write exception strands embed-cache-pages.npz.tmp; (c) the new .gitignore entry is exact-path (data/embed-cache-pages.npz) and covers neither .tmp nor .tmp.npz variants — widen to data/embed-cache-pages.npz*; (d) the fixed tmp name means two concurrent runs interleave writes.

4.2 Corrupt embed cache crashes instead of being discarded — MED

tools/embed.py:154. The discard path catches (OSError, KeyError, ValueError), but np.load on a truncated .npz raises zipfile.BadZipFile (verified MRO: BadZipFile → Exception), and EOFError is also uncaught. A half-written cache (exactly what §4.1(b) can produce) makes every subsequent build print "Warning: embedding failed" and leaves similar-links/semantic index stale until the file is manually deleted — the opposite of the docstring's "unreadable → discarding" contract.

4.3 embed.py staleness check structurally defeated by stamp-build-time — MED

tools/embed.py:195-200 + Makefile:68. needs_update() compares _site/**/*.html mtimes against embed's outputs — but the build order is embed.pystamp-build-time.py _site, and the stamper rewrites the footer timestamp in essentially every HTML file each build. So every page is always newer than embed's outputs and the "skip if fresh" fast path never fires: the full paragraph-embedding pass (and model load) runs on every build. The new page cache papers over half the cost; the paragraph pass pays full price every time. Related (tools/embed.py:297-299): model/config changes never invalidate outputs — currently masked by this bug; fixing one exposes the other.

4.4 archive.py writes provenance/index/state non-atomically — MED

tools/archive.py:718-721,734-737,953-957,1077-1080. All plain write_text(). An interrupt mid-write truncates PROVENANCE.json; the next build's json.loads (line 642) raises an unhandled JSONDecodeError — and a truncated provenance is indistinguishable from corruption in a tool whose whole contract is integrity checking. embed.py got atomic-write helpers; archive.py did not.

4.5 download-leaflet.sh: checksum verification bypassable — MED

tools/download-leaflet.sh:43-47,90. The early-exit skip checks file existence only (download-model.sh re-verifies on its skip path), and curl -o "$target" writes directly to the final path: a download that fails verify_or_warn aborts via set -e after the bad file is in place, and the next run's existence check accepts it permanently. A MITM'd unpkg.com download survives one failed run and is silently vendored on the next.

4.6 Other download/convert scripts leave partial files in final paths — LOW

tools/download-model.sh:84: interrupted curl leaves a partial model_quantized.onnx; caught today only because model-checksums.sha256 pins all five files — any unpinned file would persist forever. Use -o "$dst.part" && mv. tools/convert-images.sh:33: interrupted cwebp leaves a partial .webp that the -nt staleness gate then skips forever — a truncated WebP ships until manually deleted.

4.7 archive.py robustness gaps — LOW

  • tools/archive.py:788,795-799: provenance missing the artifact key makes prev_artifact == slug_dir, then sha256_of raises an uncaught IsADirectoryError instead of the structured "prior snapshot incomplete" error.
  • tools/archive.py:614-617,938-940,1066-1068: non-dict manifest entries (- https://example.com instead of - url: ...) crash with AttributeError: 'str' object has no attribute 'get'.
  • tools/archive.py:896: wayback_save concatenates the raw URL (contrast wayback_lookup at 909, which uses quote(url, safe="")).

4.8 add-popup-source.sh: dead CSP reminder + unvalidated nginx interpolation — LOW

tools/add-popup-source.sh:214: the connect-src reminder gates on [[ "$NEEDS_PROXY" -eq 0 && -n "$UPSTREAM_HOST" ]], but UPSTREAM_HOST is only set in the NEEDS_PROXY -eq 1 branch (lines 124-131) — the reminder can never print, and the no-proxy case is exactly when it's needed (the provider will be CSP-blocked with no hint). Line 71: NAME from a free-text prompt is interpolated into location /proxy/$NAME//set $upstream_$NAME with no ^[a-z0-9-]+$ validation (import-photo.sh validates; this doesn't).

4.9 refreeze.sh deletes the freeze before the replacement succeeds — LOW

tools/refreeze.sh:13-16. rm -f "$FREEZE" then cabal freeze; a failed resolve leaves no freeze file (recoverable via git, but write-temp-then-move is safer).

4.10 embed.py / atomic-write NITs — LOW/NIT

tools/embed.py:109-115: atomic_write_bytes uses a fixed .tmp name (concurrent-run collision) and no fsync before os.replace (power loss can leave an empty target). Same pattern in _atomic_write_yaml of extract-exif.py:377, extract-palette.py:65, extract-dimensions.py:65. tools/embed.py:144: NpzFile never closed — use with np.load(...) as npz:.

4.11 Tooling NITs

  • tools/import-photo.sh:147-155: on mogrify -strip failure the EXIF-laden JPEG (GPS, serials) remains under content/, where make build's git add content/ could auto-commit it. Delete $TARGET on that failure path.
  • tools/hooks/pre-commit-marks.sh:28-31: awk '{ print $2 }' truncates paths with spaces; the status: probe reads the working tree, not the staged blob. Advisory-only hook.
  • tools/preset-signing-passphrase.sh:30: echo -n "$PASSPHRASE" eats a passphrase starting with -e/-n/-E; use printf '%s'.
  • tools/stamp-build-time.py:52-54: in-place non-atomic rewrite of _site/ HTML.
  • tools/archive.py:244: pdftotext without --; a slug starting with - parses as an option. Same in extract-exif.py:159.
  • tools/monolith-version.txt records a sha256 (matches the binary today, verified) but find_monolith() never checks it.

Verified clean: sign-site.sh (atomic sig writes, post-pass manifest verification); compress-assets.sh and download-pdfjs.sh (mktemp + EXIT trap, hash verified before extraction); audit-marks.py, viz_theme.py, extract-dimensions.py, extract-palette.py; embed.py's faiss -1 padding is safely filtered; uv lock --check passes; model-checksums.sha256 pins all five model files.


5. Frontend JavaScript

5.1 Score-reader pages never restore theme/settings — MED

templates/score-reader-default.html:10 + static/js/theme.js:12-13. The template loads theme.js without utils.js (unlike head.html:66-67), so window.lnUtils.safeStorage is undefined and theme/text-size/focus-mode/ reduce-motion all silently fail to restore — a dark-theme user gets a light flash-and-stay on every score page. Compounding: settings.js (line 15; the template does render the settings toggle) falls back to its no-op store, so theme picks made on score pages never persist either.

5.2 search-filters.js: epistemic filters silently bypass clean-URL pages — MED

static/js/search-filters.js:117-125. normUrl() returns u.pathname verbatim and looks it up in epistemicMeta[url]. Verified: _site/data/epistemic-meta.json keys include /essays/beyond-comorbidity-indices/index.html while rendered result links use /essays/beyond-comorbidity-indices/. The lookup misses, passes(null) returns true ("no metadata = don't filter"), so every directory-style page bypasses all active epistemic filters. Flat .html pages match fine, which hides the bug.

5.3 viz.js ignores the cappuccino theme — MED

static/js/viz.js:94-99. isDark() knows only 'dark'/'light'/OS-preference, but theme.js/settings.js support 'cappuccino' — a dark-brown theme (--bg: #553a28, base.css:203). With OS-light + cappuccino, charts render the LIGHT config (near-black marks and axis labels) on a dark background.

5.4 collapse.js localStorage keys collide across pages — MED

static/js/collapse.js:44,83. Key is 'section-collapsed:' + heading.id with no pathname namespace (contrast annotations.js). Pandoc auto-slugs (#introduction, #background) recur across essays, so collapsing "Introduction" on one essay collapses it everywhere. Also uses raw localStorage rather than lnUtils.safeStorage.

5.5 semantic-search.js: stale-response race + duplicate index fetch — MED

static/js/semantic-search.js:117-144. runSearch has no generation token; overlapping queries render in promise-resolution order, so an older query's hits can replace a newer one's (with setStatus('') masking it). loadIndex() (42-59) has no in-flight-promise dedup (unlike loadModel's loadModelPromise), so concurrent first searches fetch semantic-index.bin + semantic-meta.json twice.

5.6 lightbox.js: aria-modal with no focus trap, no keyboard activation — MED

static/js/lightbox.js. Overlay sets role="dialog" + aria-modal="true" but has no Tab handling (gallery.js's trapTab at 235-257 shows the in-repo pattern) — focus walks into the obscured page. Trigger images get only a click listener and no tabindex/keydown, so keyboard users can't open it; close() focuses a non-focusable <img>, which no-ops.

5.7 Frontend LOWs

  • static/js/gallery.js:122-125,270-275: math/score overlay is click-only (no role/tabindex/keydown); closeOverlay() focus-returns to a non-focusable div — focus drops to <body>.
  • static/js/popups.js:478,515: the Wikipedia provider's decodeURIComponent runs synchronously before the .catch attaches — a malformed percent sequence in a link path throws an uncaught URIError per hover.
  • static/js/popups.js:359,390: fetched monogram SVG injected via innerHTML unescaped — the single unsanitized path in an otherwise fully escaped pipeline. Build-authored content, so not exploitable today; the comment acknowledges the trust assumption.
  • static/js/citations.js: dead file — no template loads it; popups.js supersedes it. If ever re-added it would double-bind and inject bibliography innerHTML without popups.js's cloned-node hardening. Delete.
  • static/js/nav.js:26,30-31: raw localStorage unguarded; if storage access throws, the throw lands before toggle.addEventListener, leaving the Portals toggle completely dead (utils.js exists precisely for this).
  • static/js/annotations.js:209-215: marks are mouse-only; the tooltip's Delete button is unreachable by keyboard (only recourse is the all-or-nothing "Clear Annotations").
  • static/js/search.js:10: unguarded new PagefindUI(...) — if the pagefind bundle 404s, the ReferenceError aborts the whole handler including the ?q= pre-fill that the selection-popup "Here" flow depends on.
  • static/js/semantic-search.js:55-56,96-107: no vectors.length === meta.length * DIM consistency check — a stale CDN-cached mismatch yields NaN scores and silently garbage ranking. (Current files verified consistent: 1,256,448 bytes = 818 × 384 × 4.)
  • static/js/transclude.js:149-151 + collapse.js:111-114: nested transcludes render a bare placeholder (no rescan of injected content); reinitCollapse is not idempotent (would stack toggle buttons if ever called twice on the same container).
  • static/js/popups.js:985-988,1009-1014: daysBetween uses Math.abs, so future dates render "N days ago" (now.js:17 handles this correctly).

5.8 Frontend NITs

  • static/js/copy.js:20-22,39: code-less <pre> fallback copies the "copy" button label along with content.
  • static/js/score-reader.js:50: URL rewritten to ?p=1 on every load even without a ?p= param.
  • static/js/search-filters.js:271: parseInt(v,10) || 0 turns junk threshold input into an active ≥0 filter that matches everything.
  • static/js/selection-popup.js:90-95: shift-keyup while typing capitals in the annotation picker re-summons the selection toolbar over it.

Verified clean: the semantic-search ↔ embed.py contract post-model-split (DIM 384, 818-entry meta, no prefix for MiniLM — the nomic search_document: prefix is confined to the build-only page path); XSS escaping across semantic-search, popups providers, map tooltips, annotations (sole exception §5.7 monogram); theme.js ↔ settings.js storage schema identical; all JS selector contracts against templates (including the uncommitted head/nav edits); popups/sidenotes double-init guards; settings.js and gallery.js focus traps.


6. Templates & content

6.1 Draft in undocumented location is never built — MED

content/drafts/inclusionist-manifesto.md. WRITING.md:34 says drafts go under content/drafts/essays/; draftEssayPattern (build/Patterns.hs:46-49) matches only that, so this file is invisible even to make watch/make dev — silently orphaned.

6.2 SIMD/PQC essay repository: URL 404s — MED

content/essays/where-does-simd-help-post-quantum-cryptography/index.md:24. https://git.levineuwirth.org/where-simd-helps is missing the owner segment — verified HTTP 404, while the sibling essay's .../neuwirth/beyond_comorbidity_indices returns 200.

6.3 Tracked drafts contradict the gitignore policy — MED

.gitignore:88 ignores content/drafts/ as local-only "working notes," but git ls-files -i -c shows four tracked drafts (digital_progeny.md, modern_idolatry.md, test-essay.md, university_care.md) — ignore rules don't untrack, so edits are auto-staged by make build and pushed publicly by deploy. The over-broad **/.env.* pattern also matches the tracked .env.example.

6.4 Template/content LOWs and NITs

  • content/colophon.md:5: modified: is dead frontmatter — nothing reads it; $date-modified$ (page-footer.html:108) is Hakyll's dateField over the date key.
  • Seven files end frontmatter with a valueless confidence-history: (YAML null; WRITING.md:97 documents a list of ints) — harmless, but content/essays/scaling_outage.md also retains the full WRITING.md scaffold comments in a published essay.
  • static/images/canto31.jpg: still 4.0 MB (prior-audit §6.1 unfixed).
  • templates/blog-post.html:25,34: id="similar-links" appears twice in mutually exclusive $if$ branches — safe, fragile under edit.
  • content/drafts/essays/digital_progeny.md: title duplicates the published "The Specification Dilemma" — stale draft.
  • Frontmatter flags home:/library:/links:/search:/portal: are consumed (head.html CSS gates, default.html:6 data-portal) but undocumented in WRITING.md.

Verified clean: all $partial(...)$ includes resolve; all ~140 distinct template variables have context providers; no missing alt attributes, tag-balance failures, or within-page duplicate IDs in composed pages; all 26 CSS files referenced by head.html exist; sampled enum values across all sections are legal per WRITING.md and Contexts.hs validation lists.


7. Documentation / spec drift (WRITING.md, README.md)

7.1 js: page-script paths documented as content-relative; emitted root-relative — MED

WRITING.md:773-775 vs templates/default.html:37 (<script src="/$script-src$" defer>). The doc claims a composition's js: scripts/widget.js serves at /music/symphony/scripts/widget.js; the template emits raw root-relative frontmatter. The only current user (memento-mori) works by coincidence of its root-level route. A composition following the doc would 404.

7.2 "Standalone page content/my-page/index.md" has no generic rule — MED

WRITING.md:20 presents directory-form standalone pages as a general capability; build/Site.hs hardcodes only content/me/index.md (293) and content/memento-mori/index.md (307); the generic rule (351) matches flat content/*.md only. A new content/my-page/index.md silently doesn't build.

7.3 Portal table lists 8 portals; the build has 9 — MED

WRITING.md:221-231 omits Photography, which is in homePortals (build/Site.hs:50-60), the nav, and content/tag-meta/photography.md.

7.4 Three implemented frontmatter fields undocumented — MED

WRITING.md:3 claims to cover "all frontmatter fields"; zero hits for: summary: (build/Contexts.hs:415-427, rendered by essay.html:16 and reading.html:12, in live use), revised: (build/Contexts.hs:815 getRevisions — drives $date-display$/$date-original$/ $revision-note$ and list sort order), keywords: (build/Contexts.hs:283/bibliography/<kw>/ links).

7.5 Documentation LOWs

  • WRITING.md:268-269,82: default citation style called "Chicago Author-Date"; the injected CSL (build/Citations.hs:114,167-168) is data/chicago-notes.csl, titled "Chicago Notes Bibliography".
  • README.md:12,19: make watch described as "rebuilds on save without a server"; it runs Hakyll's preview server (WRITING.md:1139 has it right).
  • WRITING.md:105-109: history: example ordering contradicts the code (see §3.5).

8. nginx, Makefile & deployment

8.1 Multi-line CSP value embeds literal \ + LF bytes — MED

nginx/security-headers.conf:60-71. The Content-Security-Policy-Report-Only value is a single quoted string spanning 12 lines with trailing \ characters — nginx has no line-continuation inside quoted strings, so the emitted header contains raw backslash, LF, and leading-space bytes between directives. Raw LF in a header value is illegal in HTTP/2 (vhost example enables http2 on); strict clients reject the whole response. Sent on every response even as Report-Only. Must be collapsed to one line.

8.2 CSP gaps that will fire under enforcement — MED

nginx/security-headers.conf:66-67. (a) font-src 'self' data: blocks KaTeX webfonts: head.html:61 loads katex.min.css from cdn.jsdelivr.net, whose relative font URLs resolve to the CDN. (b) connect-src 'self' blocks the onnxruntime .wasm that transformers.js v2 (dynamically imported in static/js/semantic-search.js:25) fetches from jsdelivr — the config comment covers the same-origin model files but not the runtime. Both latent while Report-Only.

8.3 Makefile auto-commit sweeps any pre-staged changes — MED

Makefile:28-29. git add content/ followed by git diff --cached --quiet || git commit -m "auto: ..." commits the entire index — anything previously staged gets folded into an auto: <timestamp> [skip ci] commit and pushed publicly on deploy. Use git commit -- content/ or verify no foreign paths are staged.

8.4 Makefile LOWs

  • pdf-thumbs: the find | while read pipeline swallows pdftoppm failures (loop exit status is the last iteration's) — a corrupt PDF silently ships without a thumbnail.
  • deploy: prerequisite order clean build sign is guaranteed only under serial make; no .NOTPARALLEL: guard for -j invocations. (Confirmed: deploy does run clean first; .PHONY is complete; .env export allowlist is sound.)
  • tools/hooks/pre-commit-marks.sh is documented (Makefile:175 comment) but not installed — .git/hooks/ has only samples and core.hooksPath is unset.

Verified clean: all seven data/ JSON/YAML files parse; data/embed-cache-pages.npz is untracked, so the new gitignore entry is fully effective; nginx archive.conf's add_header-inheritance re-include is correct; no redirect loops; popup-proxy rate-limit/cache zones correctly documented for http{} scope.


9. Working-tree diff review (branding refresh + embed split)

The model contract is intact — the diff splits one MiniLM pipeline into two: pages now use nomic-embed-text-v1.5 (768d, build-only, for similar-links.json); paragraphs stay on all-MiniLM-L6-v2@c9745ed (384d, the browser contract). download-model.sh, model-checksums.sha256, semantic-search.js (DIM = 384), and both WRITING.md lines (1108 nomic for Related-pages, 1128 MiniLM for client search) are all consistent. Icon declarations all match real files (verified with file: apple-touch 180×180, favicon-96 96×96, manifest PNGs 192/512, og-image 1200×630 matching declared og:image dimensions; the webp sidecar was regenerated).

Open items beyond §1.3/§1.4/§4.1:

9.1 32.8 KB traced SVG inlined into every page — MED

templates/partials/logo-mark.svg (32,818 bytes, potrace-style single giant <path>) is inlined via the nav partial into every HTML page — a ~33 KB per-page weight regression (pre-compression). The two-tone --logo-ink/--logo-bg cutout (components.css:72-98) genuinely needs inline SVG or <use>; an external sprite + <use href> restores cacheability. Better still: a hand-drawn or simplified path — a traced bitmap at nav size carries detail that can never resolve.

9.2 Icon asset bloat — LOW

static/favicon.ico is now 71,766 bytes; parsed directory shows 16/32/48/64/128/256 px entries, the 128+256 pair alone 55.8 KB. The .ico is only the legacy fallback (modern browsers take the SVG); 16+32+48 (~8 KB) is conventional. static/favicon.svg is a 32,844-byte traced path. static/images/link-icons/internal.svg went ~2 KB → 32,818 bytes yet renders at 0.71.6 rem via CSS mask in three stylesheets (components.css:853, typography.css:833, popups.css:161).

9.3 Webmanifest regressions — NIT

static/site.webmanifest: purpose changed maskable→any for both icons (Android adaptive launchers will letterbox; convention is separate any + maskable entries); still no start_url/scope/description (Lighthouse installability warnings). JSON valid; icons verified.


10. Prior audit (AUDIT.md 2026-05-07) follow-up

Finding Status
§1.1 freeze unsolvable Effectively still open — aeson pin fixed, but the freeze broke again via distributive after a system update (§1.1 above); the underlying freeze-vs-system-db fragility is unaddressed
§1.3 Python version mismatch Fixed (requires-python = ">=3.14" matches .python-version)
§1.4 model checksums Fixed (tools/model-checksums.sha256, 5 entries)
§9.1 nginx headers Fixed (nginx/security-headers.conf + vhost example, README'd) — but see §8.1/§8.2 for new issues in that file
§6.1 canto31.jpg 4 MB Unfixed
robots.txt / sitemap Fixed (Site.hs:941/963, present in _site/)
README paper//spec.md ghosts Fixed
rsync target quoting Fixed
date-quoting doc Fixed (WRITING.md:106)
tag-meta no-title exception Fixed (WRITING.md:238-251)

Suggested triage order

  1. tools/refreeze.sh (§1.1 — in progress)
  2. Delete data/embed-cache-pages.npz.tmp.npz; widen the gitignore pattern; git add logo-mark.svg + og-image.png before committing the branding diff (§1.4, §4.1)
  3. Guard ArchiveIndex.hs file reads with doesFileExist (§1.2)
  4. Pin or sandbox the nomic remote code (§1.3)
  5. Fix the /fiction//poetry/ 404s (§2.1) and the production-visible frontend MEDs (§5.1, §5.2)
  6. Collapse the nginx CSP to one line before ever flipping it to enforcing (§8.1, §8.2)
  7. The rest by severity as time allows