From 70ad44e9f42c9a57731aac80cc88de2ba4365fc3 Mon Sep 17 00:00:00 2001 From: Levi Neuwirth Date: Tue, 9 Jun 2026 18:57:43 -0400 Subject: [PATCH] Add 2026-06-09 repository audit findings Co-Authored-By: Claude Fable 5 --- AUDIT-2026-06-09.md | 931 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 931 insertions(+) create mode 100644 AUDIT-2026-06-09.md diff --git a/AUDIT-2026-06-09.md b/AUDIT-2026-06-09.md new file mode 100644 index 0000000..30affcd --- /dev/null +++ b/AUDIT-2026-06-09.md @@ -0,0 +1,931 @@ +--- +title: Repository audit +date: 2026-06-09 +--- + +# Repository audit — levineuwirth.org (2026-06-09) + +Comprehensive audit of the repo on `main` at commit `620b974` (working tree +modified: branding refresh across `static/` + `templates/partials/`, plus +`tools/embed.py` rework; untracked `static/og-image.png`, +`templates/partials/logo-mark.svg`, `data/embed-cache-pages.npz.tmp.npz`). + +Severity legend: **HIGH** (likely to break a build, cause data loss, or +expose a security weakness) — **MED** (latent bug, brittleness, or +documentation drift) — **LOW** (minor robustness gap or fragile assumption) — +**NIT** (style, polish, or paranoia). + +Numbers are file:line against the working tree at audit time. Findings +marked "verified" were reproduced empirically (solver runs, built `_site/` +output inspection, live HTTP checks, binary parsing); the rest were +confirmed by reading the code. + +Prior audit: `AUDIT.md` (2026-05-07). Follow-up status in §10. + +--- + +## 1. Build & dependency chain + +### 1.1 `cabal.project.freeze` is unsolvable again — next clean build fails — **HIGH** + +`cabal build --dry-run` fails today (verified): the freeze pins +`distributive ==0.6.2.1`, but the system (pacman) GHC package db has +`comonad-5.0.10` built against `distributive-0.6.3`: + +``` +rejecting: distributive-0.6.3/installed... (constraint from +cabal.project.freeze requires ==0.6.2.1) +After searching the rest of the dependency tree exhaustively... +``` + +The conflict set also names aeson, warp, hakyll, http2, semigroupoids. This +is the same failure mode as prior-audit §1.1 — that audit's specific aeson +pin was fixed (now 2.2.2.0/hashable 1.4.7.0), but a different package broke +the same way after a system update. Recent builds succeed only off the +cached `dist-newstyle/cache/plan.json`; the freeze file has since changed, +so the next cabal invocation re-solves and fails. Because `make deploy` +starts with `make clean`, the next deploy hits this. `levineuwirth.cabal`'s +own bounds are compatible with the freeze — the conflict is +freeze-vs-installed-db, not freeze-vs-cabal-file. + +Fix: `tools/refreeze.sh` (written for exactly this post-`pacman -Syu` +situation). The underlying fragility — freezing against a mutable system +package db — remains; consider documenting the refreeze step as part of any +system-upgrade ritual. *(In progress at time of writing.)* + +### 1.2 Missing `data/archive-index.json` / `archive-state.json` crashes the build — **HIGH** + +`build/ArchiveIndex.hs:134-146`. The module doc (lines 18-22) promises "An +absent or malformed file degrades safely: an empty index makes the link +consumers no-op; an absent state file makes every entry @Live@." But +`rawIndex = unsafePerformIO $ do decoded <- A.eitherDecodeFileStrict' indexPath` +(and identically `rawState`) never checks `doesFileExist`, and aeson's +`eitherDecodeFileStrict'` throws an uncaught `IOException` on a missing +file (verified: `withBinaryFile: does not exist`). Both files are +gitignored (`.gitignore:84-85`), so a fresh clone or a no-`.venv` build — +the exact path `build/Archive.hs:20-24` promises to support — throws when +the CAF is first forced. Contrast `readUrlSet` (line 109) in the same file, +which guards correctly. Currently latent on this machine only because both +generated files happen to exist. + +### 1.3 `embed.py` `trust_remote_code=True` executes unpinned third-party code — **HIGH** + +`tools/embed.py:329` (line ~341 in the uncommitted version). The new +page-model load is +`SentenceTransformer(PAGE_MODEL_NAME, revision=PAGE_MODEL_REVISION, trust_remote_code=True)`. +The `revision` arg pins only the `nomic-ai/nomic-embed-text-v1.5` repo; the +actual modeling code is pulled via `auto_map` from a *different* repo — +verified in the local HF cache: the executed code lives under +`transformers_modules/nomic_hyphen_ai/nomic_hyphen_bert_hyphen_2048/...`, +i.e. `nomic-ai/nomic-bert-2048` at its current head, which nothing pins. A +compromise of that second repo runs arbitrary Python at build time, in a +repo whose every other download path (download-model.sh, pdfjs, leaflet) is +sha256-pinned. The comment "Both pins are deliberate" is therefore +misleading. Fix: pin via `code_revision`, or run with `HF_HUB_OFFLINE=1` +after first fetch, or document the accepted risk. + +### 1.4 Working-tree commit hazard: tracked templates reference untracked files — **HIGH (process)** + +`templates/partials/nav.html:5` (tracked, modified) adds +`$partial("templates/partials/logo-mark.svg")$` and +`templates/partials/head.html` references `/og-image.png` — both target +files are **untracked** (no git history). Committing the template diff +without `git add`-ing both breaks every page's Hakyll build on a fresh +clone (`$partial$` aborts compilation) and 404s the og:image. They must +land in the same commit. Conversely, `data/embed-cache-pages.npz.tmp.npz` +must **not** be committed (see §4.1). The partial itself is safe as a +Hakyll template (verified: zero `$` characters; `match "templates/**"` +compiles it). + +### 1.5 `einops` dependency: undocumented, unbounded, imported nowhere — **LOW** + +`pyproject.toml:27` adds `einops>=0.8.2`. No import anywhere in +`tools/`/`build/`/`static/js/`; its only consumer is nomic's +`trust_remote_code` module (§1.3). Every sibling dependency has an +explanatory comment and an upper bound per the file's own stated policy +("Upper bounds are intentionally generous (next major) but always +present"); einops has neither. `uv lock --check` passes (0.8.2 pinned). + +--- + +## 2. Haskell build code — core + +### 2.1 Nav, home grid, and library link `/fiction/` and `/poetry/` — confirmed 404s — **MED** + +`build/Site.hs:50-60` (`homePortals` contains `("Fiction","fiction")`, +`("Poetry","poetry")`), `templates/partials/nav.html:56,61`, +`templates/library.html:44,58`. No rule generates either index: fiction and +poetry are not in `tagIndexable` (`build/Patterns.hs:148-151` = essays + +blog + photos) and Site.hs has no landing rule. Verified: `_site/fiction` +does not exist; `_site/poetry/` has no `index.html`. nginx has no +redirects. Both links 404 in production today. + +### 2.2 Tag/route collisions guarded for `photography` only — **MED** + +`build/Tags.hs:98-99`. `tagIdentifier` maps tag `t` → `t ++ "/index.html"`; +`sectionOwnedTopLevelTags = ["photography"]` is the only guard. A +tagIndexable item tagged `music` (or `music/x`, which expands to `music`) +emits `music/index.html`, already owned by the music index route +(`build/Site.hs:486-487`); similarly `essays`, `blog`, `cv`, `archive`, +`authors`, `bibliography`. Hakyll does not error on duplicate routes — one +silently overwrites the other. + +### 2.3 Sidenotes filter destroys the documented no-JS fallback — **MED** + +`build/Filters/Sidenotes.hs:30-36` vs `static/css/sidenotes.css:125-135`. +The module doc claims the Pandoc `
` "serves as +fallback," but `apply` replaces every `Note`, so the writer never emits the +section. CSS depends on it below 1500px. Verified in output: +`_site/essays/scaling_outage.html` has 3 `class="sidenote"` and zero +`footnotes` occurrences. With JS disabled, footnote content is invisible on +narrow viewports. The comment, the CSS, and ozymandias.md's own prose all +contradict actual behavior. + +### 2.4 Sidenote bodies rendered without the KaTeX writer — **MED** + +`build/Filters/Sidenotes.hs:103-115`. `inlinesToHtml`/`blocksToHtml` use +`writeHtml5String (def :: WriterOptions)` (PlainMath), while the main +pipeline uses `KaTeX ""` (`build/Compilers.hs:47`). Math inside a footnote +never gets `\(...\)`, so KaTeX never +renders it — degrades to plain italics, silently inconsistent with body +math. + +### 2.5 SourceRefs whitelist vs `/source/` serving whitelist have drifted — **MED** + +`build/Filters/SourceRefs.hs:114-141` vs `build/Site.hs:217-240`. Site.hs:209 +says "must stay aligned with 'isSourcePath'". Mismatches: SourceRefs wraps +`content/` and `yaml-source/` (no Site counterpart); `static/` + any known +ext vs Site's `static/js/**`/`static/css/**` only; `tools/` + any ext vs +Site's `tools/**.sh`/`tools/**.py`; `data/` at any depth vs Site's +top-level `data/*.{json,yaml,md,bib}`. Each mismatch yields a wrapped +source-ref whose popup fetch 404s (Forgejo href fallback still works). +Inverse: Site serves `data/*.bib` but `.bib` is missing from +`hasKnownExt` — dead whitelist entry. + +### 2.6 `epistemicEntry` ignores `confidence: proved` — **MED** + +`build/Site.hs:1014-1024`. Comment: "Compute overall-score the same way +Contexts.overallScoreField does," but it uses +`readMaybe =<< lookupString "confidence" meta`, which is `Nothing` for +`"proved"`/`"proven"`, whereas `Contexts.overallScoreField` +(`build/Contexts.hs:574-576`) substitutes 100 via `isProvedConfidence`. +Proved pages get no `score` in `data/epistemic-meta.json` and export the +raw string under `confidence`, so client-side filtering silently misses +them. + +### 2.7 Empty affiliation `
` ships on every essay without `affiliation:` — **MED** + +`build/Contexts.hs:84-89` + `templates/partials/metadata-tail.html:12`. +`affiliationField` returns an empty list instead of `noResult`; Hakyll's +`$if$` is truthy for empty list fields (the codebase knows this — +`tagLinksFieldExcludingScope` uses `noResult` for exactly this reason). +Verified in output: `_site/essays/asymmetric-forgetting.html` contains +`
` with whitespace-only content. + +### 2.8 Library page hard-depends on `content/library.md` — **LOW** + +`build/Site.hs:675`. `_ <- loadSnapshot libraryIntroId "body"` is a +top-level compiler statement (not inside a `field`), so it's a hard +failure. The block is documented as "optional prose block"; deleting +`content/library.md` breaks the whole `library.html` compile. Contrast the +existence-guarded sidecars at `build/Tags.hs:277-283` and +`build/Site.hs:843-850`. + +### 2.9 Library `primaryPortalOf` reads only list-form `tags:` — **LOW** + +`build/Site.hs:632-638`. `lookupStringList "tags"` returns `Nothing` for +scalar comma form (`tags: research, ai`), which Hakyll's `getTags` +accepts. Such an item appears on tag pages but is silently dropped from +the library. All current content uses list form — latent. + +### 2.10 `allContent` omits me/, memento-mori/, photography from the link graph — **LOW** + +`build/Patterns.hs:124-133`, used by `build/Backlinks.hs:334,345`. Despite +"Every content file the backlinks pass should index," `content/me/index.md` +and `content/memento-mori/index.md` (full essays, rendered with +`backlinksField`) never have their outgoing links extracted; photography +likewise. Either deliberate-but-undocumented or the exact silent omission +the module header says it exists to prevent. + +### 2.11 Paginated tag pages: split by creation date, sorted by display date — **LOW** + +`build/Tags.hs:371-377`. `buildPaginateWith (sortAndGroupAt tagPageSize)` +partitions via `sortRecentFirst` (creation date), then each page re-sorts +with `recentFirstByDisplay` (revision-aware). A recently revised old item +stays on a late page but jumps to its top — cross-page ordering is not +monotone. Only fires above the 150-item threshold. + +### 2.12 `fill:#000` replacement corrupts longer hex colors — **LOW** + +`build/Filters/Score.hs:118-133` (and `Filters/Viz.hs` `processColors`). +The 6-digit pass protects only `#000000`; for `fill:#000080` the 3-digit +pass produces `fill:currentColor80` — invalid CSS, silently mangled SVG. +Quoted attribute forms are safe; only unquoted style-property forms are +exposed. + +### 2.13 Source-level preprocessors rewrite inside fenced code blocks — **LOW** + +`build/Filters/Wikilinks.hs:24-31`, `Filters/Transclusion.hs:18-20`, +`Filters/EmbedPdf.hs`. All run on the raw source before Pandoc parses +fences: `[[anything]]` in a code block becomes a link; a code-block line +that is exactly `{{slug}}` or `{{pdf:...}}` becomes raw HTML. +Transclusion's comment ("prevents accidental substitution inside prose or +code") is false for full-line directives in code blocks. A live foot-gun +for a site that documents its own syntax (ozymandias.md does exactly +this). + +### 2.14 `domainIcon` matches substrings of the whole URL, not the host — **LOW** + +`build/Filters/Links.hs:120-153`. `"x.com" `T.isInfixOf` url` etc. — +`https://example.org/why-x.com-failed` gets the Twitter icon. Contradicts +the strict-hostname discipline `isExternal` documents at lines 95-101 of +the same file. Cosmetic (icon only). + +### 2.15 `gsubRoute "content/"` strips every occurrence, not just the prefix — **LOW** + +`build/Site.hs:171,357,417` etc. Hakyll's `gsubRoute` is replace-all; a +co-located directory literally named `content` would be silently mangled +(`content/essays/slug/content/data.csv` → `essays/slug/data.csv`). Same +for `gsubRoute "static/"`. Improbable but silent. + +### 2.16 `existsCached` memoizes non-existence for the process lifetime — **LOW** + +`build/Filters/SourceRefs.hs:160-166`. Under `make watch`, a source file +created after first reference stays cached as absent until restart. + +### 2.17 Core NITs + +- `build/Site.hs:42-44`: comment says "eight portals"; the list has nine. + Echoed at Site.hs:606 ("the eight") vs line 657's "nine times". +- `build/Site.hs:866-877`: random-pages.json comment says "essays + blog + posts only" but the rule loads fiction and flat poetry too; uses + flat-only `content/poetry/*.md` while the epistemic rule uses + `allPoetry` — collection poems are epistemic-indexed but never + randomizable. +- `build/Utils.hs:64-73`: `authorSlugify` comment claims runs of spaces + collapse; code maps each space (`"A B"` → `"a--b"`). Consistent + everywhere, so links work; comment wrong. +- `build/Utils.hs:31-32`: `readingTime` truncates (`div 200`) — 399 words + reports "1 min"; comment implies ceiling semantics. +- `build/Pagination.hs:42` + `build/Site.hs:77-82`: hardcoded pattern + literals duplicate `Patterns.hs`, defeating that module's stated purpose + (Patterns.hs:6-10). +- `build/Contexts.hs:174-180`: plain `tagLinksField` returns an empty list + rather than `noResult` — `$if(item-tags)$` is true and templates emit + empty tag wrappers (author-index.html, item-card.html). +- `build/Tags.hs:296-304`: `tagItemCtx` composes `defaultContext`, not + `siteCtx`, so `$if(has-monogram)$` never fires on tag pages — monograms + render on new.html/library but silently never on tag indexes. +- `build/Contexts.hs:485-492`: `dotsField` comment says "1–5" but accepts + 0 (`max 0 (min 5 n)`) — `importance: 0` renders five empty circles. +- `build/Contexts.hs:375-381`: `descriptionField` doc says `noResult`; + code uses `fail` — behaviorally fine under Hakyll 4.16 `$if$` (verified + against Hakyll 4.16.7.1 source) but logs `[ERROR]` debug noise per + abstract-less page. Same in `abstractField`, `summaryField`, + `bibliographyField`. +- `build/Filters/Images.hs:233-234`: `webpSrc` interpolated into `srcset` + unescaped while sibling `src` goes through `esc`. +- `build/Filters/Links.hs:37-46,63-69`: internal PDF links double-classified + (`pdf-link` + `link-internal` chrome) despite the "no overlap" comment. +- `build/Filters/Smallcaps.hs:31-34` + `Filters/Archive.hs:42-44`: + "headers are skipped" only at top level; a Header nested in a + Div/BlockQuote is processed, contradicting the comments. + +Verified clean: no unguarded `head`/`fromJust`/`read`/`!!` hazards in the +core modules; filter composition order matches its documenting comments; +Hakyll 4.16.7.1 `$if$` treats both `fail` and `noResult` as false. + +--- + +## 3. Haskell build code — feature modules + +### 3.1 Stats heatmap day-of-week off-by-one: Sunday clipped out of the SVG — **MED** + +`build/Stats.hs:185,300,317`. `dowOf d = fromEnum (dayOfWeek d) -- Mon=0..Sun=6` +— but `time-1.12.2` is ISO-numbered (verified: +`map fromEnum [Monday..Sunday] == [1..7]`). So Sunday lands at y=106 while +`svgH` = 104 — every Sunday cell is clipped out of the viewBox and grid +row 0 is permanently blank. Relatedly, `weekStart` returns the previous +*Sunday* (and for a Sunday, 7 days back), not the "first Monday on or +before" its comment claims; builds run on a Sunday also clip the newest +column horizontally. + +### 3.2 `Commonplace.hs` uses `Char8.pack` — non-ASCII YAML corruption — **MED** + +`build/Commonplace.hs:143`. `Y.decodeEither' (BS.pack raw)` with +`Data.ByteString.Char8` truncates each `Char` to 8 bits — the exact hazard +`build/Now.hs:249-253` documents and fixes with `TE.encodeUtf8`. +`data/commonplace.yaml` is currently pure ASCII, so latent — but a +commonplace book of quotations is the likeliest file to acquire an em-dash +or curly quote, which will then either fail the YAML parse or publish +mojibake. + +### 3.3 Backlinks: links inside tight lists are invisible — **MED** + +`build/Backlinks.hs:220-226`. `extractLinksWithContext`'s `go` handles +`Para`, `BlockQuote`, `Div`, `BulletList`, `OrderedList`, then `go _ = []`. +Tight list items (the default `- item` form) are `Plain` blocks, not +`Para`, so recursion into list children yields nothing. Every internal +link written in a tight list never produces a backlink. `Header`, `Table`, +and `DefinitionList` blocks are likewise skipped. The doc comment implies +coverage it doesn't deliver. + +### 3.4 Stability "age" is the first→last commit span, not time since first commit — **MED** + +`build/Stability.hs:89-93,99-112`. Docs say "age in days since first +commit," but `classify (length dates) (daySpan (last dates) newest)` +computes the span between first and most recent *commit*, with no +reference to today. A piece written in a one-week burst years ago reports +"volatile" forever; time passing without commits can never increase +stability. Either the comment or the metric is wrong. + +### 3.5 Frontmatter `history:` assumed newest-first; WRITING.md documents oldest-first — **MED** + +`build/Stability.hs:204-217,299-336` vs `WRITING.md:105-109`. +`loadVersionHistory` keeps authored order and all range fields treat the +head as newest (`es@(newest:_) -> let oldest = last es`). Git history is +newest-first, but WRITING.md's `history:` example is oldest-first. With +the documented ordering, `version-history-range` renders reversed +("14 March 2026 – 1 March 2026"), `range-start` returns the newest date, +and `version-history-primary` shows the three *oldest* entries. + +### 3.6 Archive manifest→provenance join is exact-string, rest of system is normalized — **MED** + +`build/Archive.hs:269`. `Map.lookup (meUrl me) provByUrl` joins on the raw +URL; everywhere else equivalence is `normalizeUrl` (ArchiveIndex +filtering, dup detection, ARCHIVE.md:189-192). Editing a manifest URL to a +normalization-equivalent form (`http`→`https`, trailing slash, tracking +param) silently unpublishes `/archive//` while ArchiveIndex's +normalized filter keeps the slug active — links keep pointing at a 404. + +### 3.7 Photography `buildPin` computes wrong slug/thumb/title for flat entries — **MED** + +`build/Photography.hs:354,362`. `slug = takeFileName (takeDirectory fp)` — +for a flat `content/photography/foo.md` this yields `"photography"`, so +map.json gets `"slug": "photography"`, the title fallback is wrong, and +`thumb = "/photography/photography/

"` 404s (flat-single assets route to +`/photography/`). PHOTOGRAPHY.md:214 explicitly supports flat +singles. Latent — `content/photography/` currently has only `index.md` — +but breaks the first geo-tagged flat single. + +### 3.8 `geo-precision` fails open: a typo'd "hidden" publishes coordinates — **MED** + +`build/Photography.hs:347-349,312-320`. Only the exact string matches +(`(_, Just "hidden", _) -> return Nothing`); any other value (e.g. +`Hidden`, `hiddn`) falls into `roundCoord`, whose catch-all treats unknown +values as `city` (~10 km rounding) — publishing coordinates the author +meant to suppress. Contradicts the file's own privacy comment (lines +287-289) and the fail-closed precedent for `visibility:` in +`build/Archive.hs:77-83`. + +### 3.9 Archive state is process-lifetime cached — `watch` goes stale — **LOW** + +`build/ArchiveIndex.hs:123-146` + `build/Archive.hs:304`. +`activeUrls`/`rawIndex`/`rawState` are NOINLINE `unsafePerformIO` CAFs read +once per process, and `archiveRules` reads the manifest in `preprocess`. +Under `site watch`, edits to `manifest.yaml`, `removed.yaml`, or the +regenerated state JSONs are never re-read until restart. One-shot builds +unaffected. + +### 3.10 Pinned pages render raw ISO in `$last-reviewed$` — **LOW** + +`build/Stability.hs:166-170`. The git branch formats via `fmtIso` +("1 May 2026"); the IGNORE.txt-pinned branch returns the frontmatter value +verbatim ("2026-05-01") — inconsistent display formatting. + +### 3.11 Empty/all-comments `manifest.yaml` halts the build — **LOW** + +`build/Archive.hs:158-170`. An empty YAML stream decodes as `Null`, which +fails to parse as `[ManifestEntry]` and takes the `exitFailure` branch — +draining the manifest to zero entries is fatal rather than the empty +archive the absent-file branch supports. + +### 3.12 Backlinks `normaliseUrl` misses directory-form canonical URLs — **LOW** + +`build/Backlinks.hs:275-281`. Strips `.html` but not +`index.html`/trailing slash: a page routed `essays/foo/index.html` keys as +`/essays/foo/index`, but a body link authored `/essays/foo/` doesn't +match — backlink silently dropped. `build/SimilarLinks.hs:97-99` handles +exactly this case and its comment flags the divergence. + +### 3.13 SimilarLinks PDF viewer URL not percent-encoded — **LOW** + +`build/SimilarLinks.hs:155-164`. +`viewerUrl = "/pdfjs/web/viewer.html?file=" ++ escapeHtml raw` — +`escapeHtml` handles HTML metachars only; a path containing `&`, `?`, `#`, +or spaces breaks the `file=` query value. + +### 3.14 Photography feed thumbnails only for directory-form entries — **LOW** + +`build/Photography.hs:449-453`. `imgTag` requires `isDir`; flat singles +and series children (`/.md`) get text-only feed entries, +against PHOTOGRAPHY.md's "thumbnails embedded inline" (lines 36, 445) and +the feed's deliberate inclusion of series children. + +### 3.15 Marks: missing confidence/evidence renders a literal "0 TRUST" — **LOW** + +`build/Marks.hs:272-278,565`. `computeTrust _ _ = 0` with a comment +claiming the figure "collapses to the bare frame," but +`renderEpistemicFigure` unconditionally calls `renderTrustLabel`, so a +piece with `status:` but no `confidence`/`evidence` (a case MARKS.md:696 +says should render) displays a prominent center "0" — indistinguishable +from an authored zero-trust score. + +### 3.16 Feature-module NITs + +- `build/Catalog.hs:228-235`: two distinct unknown categories render as + adjacent duplicate "Other" sections (equal rank, `groupBy` on raw + string). +- `build/Stats.hs:754-777`: `pageTOC` comment says "nine h2 sections"; + lists eleven (matching the eleven rendered). +- `build/SimilarLinks.hs:51-54`: comment says "the template caps the + display"; the code caps it (`take maxSimilar` at line 80). +- `build/Stats.hs:169-171`, `build/Archive.hs:564-569`: "median" is the + upper-median for even-length lists. +- `build/Backlinks.hs:133-153`: protocol-relative `//host/path` URLs pass + `isPageLink` and pollute backlinks.json. +- `build/BibExtras.hs:75-98`: `@string`/`@comment`/`@preamble` blocks + parsed as citekey entries — only consequential on a citekey/macro-name + collision. + +Verified clean: Marks tick positions/axis order/radii match MARKS.md §3; +proved-confidence trust substitution matches §4.3; Archive's fail-closed +`visibility` validation, removed.yaml conflict rejection, and double-sided +SHA-256 verification all match ARCHIVE.md. + +--- + +## 4. Python & shell tooling + +### 4.1 `data/embed-cache-pages.npz.tmp.npz` orphan: explained; cleanup + ignore gaps — **MED** + +The orphan (mtime May 26) is the fossil of a fixed bug: an earlier +embed.py passed a bare path to `np.savez_compressed`, numpy appended +`.npz` (verified in numpy's `_savez` source), and the subsequent +`os.replace` raised FileNotFoundError, stranding the file. The current +file-handle code (`tools/embed.py:173-183`) is correct, but: (a) nothing +deletes the stale orphan — **delete it, don't commit it**; (b) the tmp +write has no try/finally, so any mid-write exception strands +`embed-cache-pages.npz.tmp`; (c) the new `.gitignore` entry is exact-path +(`data/embed-cache-pages.npz`) and covers neither `.tmp` nor `.tmp.npz` +variants — widen to `data/embed-cache-pages.npz*`; (d) the fixed tmp name +means two concurrent runs interleave writes. + +### 4.2 Corrupt embed cache crashes instead of being discarded — **MED** + +`tools/embed.py:154`. The discard path catches +`(OSError, KeyError, ValueError)`, but `np.load` on a truncated `.npz` +raises `zipfile.BadZipFile` (verified MRO: `BadZipFile → Exception`), and +`EOFError` is also uncaught. A half-written cache (exactly what §4.1(b) +can produce) makes every subsequent build print "Warning: embedding +failed" and leaves similar-links/semantic index stale until the file is +manually deleted — the opposite of the docstring's "unreadable → +discarding" contract. + +### 4.3 embed.py staleness check structurally defeated by stamp-build-time — **MED** + +`tools/embed.py:195-200` + `Makefile:68`. `needs_update()` compares +`_site/**/*.html` mtimes against embed's outputs — but the build order is +`embed.py` → `stamp-build-time.py _site`, and the stamper rewrites the +footer timestamp in essentially every HTML file each build. So every page +is always newer than embed's outputs and the "skip if fresh" fast path +never fires: the full paragraph-embedding pass (and model load) runs on +every build. The new page cache papers over half the cost; the paragraph +pass pays full price every time. Related (`tools/embed.py:297-299`): +model/config changes never invalidate outputs — currently masked by this +bug; fixing one exposes the other. + +### 4.4 archive.py writes provenance/index/state non-atomically — **MED** + +`tools/archive.py:718-721,734-737,953-957,1077-1080`. All plain +`write_text()`. An interrupt mid-write truncates `PROVENANCE.json`; the +next build's `json.loads` (line 642) raises an unhandled +`JSONDecodeError` — and a truncated provenance is indistinguishable from +corruption in a tool whose whole contract is integrity checking. embed.py +got atomic-write helpers; archive.py did not. + +### 4.5 download-leaflet.sh: checksum verification bypassable — **MED** + +`tools/download-leaflet.sh:43-47,90`. The early-exit skip checks file +existence only (download-model.sh re-verifies on its skip path), and +`curl -o "$target"` writes directly to the final path: a download that +*fails* `verify_or_warn` aborts via `set -e` *after* the bad file is in +place, and the next run's existence check accepts it permanently. A +MITM'd unpkg.com download survives one failed run and is silently +vendored on the next. + +### 4.6 Other download/convert scripts leave partial files in final paths — **LOW** + +`tools/download-model.sh:84`: interrupted curl leaves a partial +`model_quantized.onnx`; caught today only because model-checksums.sha256 +pins all five files — any unpinned file would persist forever. Use +`-o "$dst.part" && mv`. `tools/convert-images.sh:33`: interrupted cwebp +leaves a partial `.webp` that the `-nt` staleness gate then skips forever +— a truncated WebP ships until manually deleted. + +### 4.7 archive.py robustness gaps — **LOW** + +- `tools/archive.py:788,795-799`: provenance missing the `artifact` key + makes `prev_artifact == slug_dir`, then `sha256_of` raises an uncaught + `IsADirectoryError` instead of the structured "prior snapshot + incomplete" error. +- `tools/archive.py:614-617,938-940,1066-1068`: non-dict manifest entries + (`- https://example.com` instead of `- url: ...`) crash with + `AttributeError: 'str' object has no attribute 'get'`. +- `tools/archive.py:896`: `wayback_save` concatenates the raw URL + (contrast `wayback_lookup` at 909, which uses `quote(url, safe="")`). + +### 4.8 add-popup-source.sh: dead CSP reminder + unvalidated nginx interpolation — **LOW** + +`tools/add-popup-source.sh:214`: the connect-src reminder gates on +`[[ "$NEEDS_PROXY" -eq 0 && -n "$UPSTREAM_HOST" ]]`, but `UPSTREAM_HOST` +is only set in the `NEEDS_PROXY -eq 1` branch (lines 124-131) — the +reminder can never print, and the no-proxy case is exactly when it's +needed (the provider will be CSP-blocked with no hint). Line 71: `NAME` +from a free-text prompt is interpolated into +`location /proxy/$NAME/`/`set $upstream_$NAME` with no +`^[a-z0-9-]+$` validation (import-photo.sh validates; this doesn't). + +### 4.9 refreeze.sh deletes the freeze before the replacement succeeds — **LOW** + +`tools/refreeze.sh:13-16`. `rm -f "$FREEZE"` then `cabal freeze`; a failed +resolve leaves no freeze file (recoverable via git, but write-temp-then-move +is safer). + +### 4.10 embed.py / atomic-write NITs — **LOW/NIT** + +`tools/embed.py:109-115`: `atomic_write_bytes` uses a fixed `.tmp` name +(concurrent-run collision) and no `fsync` before `os.replace` (power loss +can leave an empty target). Same pattern in `_atomic_write_yaml` of +extract-exif.py:377, extract-palette.py:65, extract-dimensions.py:65. +`tools/embed.py:144`: NpzFile never closed — use +`with np.load(...) as npz:`. + +### 4.11 Tooling NITs + +- `tools/import-photo.sh:147-155`: on `mogrify -strip` failure the + EXIF-laden JPEG (GPS, serials) remains under `content/`, where + `make build`'s `git add content/` could auto-commit it. Delete `$TARGET` + on that failure path. +- `tools/hooks/pre-commit-marks.sh:28-31`: `awk '{ print $2 }'` truncates + paths with spaces; the `status:` probe reads the working tree, not the + staged blob. Advisory-only hook. +- `tools/preset-signing-passphrase.sh:30`: `echo -n "$PASSPHRASE"` eats a + passphrase starting with `-e`/`-n`/`-E`; use `printf '%s'`. +- `tools/stamp-build-time.py:52-54`: in-place non-atomic rewrite of + `_site/` HTML. +- `tools/archive.py:244`: `pdftotext` without `--`; a slug starting with + `-` parses as an option. Same in extract-exif.py:159. +- `tools/monolith-version.txt` records a sha256 (matches the binary + today, verified) but `find_monolith()` never checks it. + +Verified clean: sign-site.sh (atomic sig writes, post-pass manifest +verification); compress-assets.sh and download-pdfjs.sh (mktemp + EXIT +trap, hash verified before extraction); audit-marks.py, viz_theme.py, +extract-dimensions.py, extract-palette.py; embed.py's faiss `-1` padding +is safely filtered; `uv lock --check` passes; model-checksums.sha256 pins +all five model files. + +--- + +## 5. Frontend JavaScript + +### 5.1 Score-reader pages never restore theme/settings — **MED** + +`templates/score-reader-default.html:10` + `static/js/theme.js:12-13`. The +template loads `theme.js` without `utils.js` (unlike head.html:66-67), so +`window.lnUtils.safeStorage` is undefined and theme/text-size/focus-mode/ +reduce-motion all silently fail to restore — a dark-theme user gets a +light flash-and-stay on every score page. Compounding: settings.js (line +15; the template does render the settings toggle) falls back to its no-op +store, so theme picks made on score pages never persist either. + +### 5.2 search-filters.js: epistemic filters silently bypass clean-URL pages — **MED** + +`static/js/search-filters.js:117-125`. `normUrl()` returns `u.pathname` +verbatim and looks it up in `epistemicMeta[url]`. Verified: +`_site/data/epistemic-meta.json` keys include +`/essays/beyond-comorbidity-indices/index.html` while rendered result +links use `/essays/beyond-comorbidity-indices/`. The lookup misses, +`passes(null)` returns true ("no metadata = don't filter"), so every +directory-style page bypasses all active epistemic filters. Flat `.html` +pages match fine, which hides the bug. + +### 5.3 viz.js ignores the cappuccino theme — **MED** + +`static/js/viz.js:94-99`. `isDark()` knows only +`'dark'`/`'light'`/OS-preference, but theme.js/settings.js support +`'cappuccino'` — a dark-brown theme (`--bg: #553a28`, base.css:203). With +OS-light + cappuccino, charts render the LIGHT config (near-black marks +and axis labels) on a dark background. + +### 5.4 collapse.js localStorage keys collide across pages — **MED** + +`static/js/collapse.js:44,83`. Key is +`'section-collapsed:' + heading.id` with no pathname namespace (contrast +annotations.js). Pandoc auto-slugs (`#introduction`, `#background`) recur +across essays, so collapsing "Introduction" on one essay collapses it +everywhere. Also uses raw `localStorage` rather than +`lnUtils.safeStorage`. + +### 5.5 semantic-search.js: stale-response race + duplicate index fetch — **MED** + +`static/js/semantic-search.js:117-144`. `runSearch` has no generation +token; overlapping queries render in promise-resolution order, so an +older query's hits can replace a newer one's (with `setStatus('')` +masking it). `loadIndex()` (42-59) has no in-flight-promise dedup (unlike +`loadModel`'s `loadModelPromise`), so concurrent first searches fetch +`semantic-index.bin` + `semantic-meta.json` twice. + +### 5.6 lightbox.js: aria-modal with no focus trap, no keyboard activation — **MED** + +`static/js/lightbox.js`. Overlay sets `role="dialog"` + +`aria-modal="true"` but has no Tab handling (gallery.js's `trapTab` at +235-257 shows the in-repo pattern) — focus walks into the obscured page. +Trigger images get only a `click` listener and no `tabindex`/keydown, so +keyboard users can't open it; `close()` focuses a non-focusable ``, +which no-ops. + +### 5.7 Frontend LOWs + +- `static/js/gallery.js:122-125,270-275`: math/score overlay is + click-only (no role/tabindex/keydown); `closeOverlay()` focus-returns + to a non-focusable div — focus drops to ``. +- `static/js/popups.js:478,515`: the Wikipedia provider's + `decodeURIComponent` runs synchronously before the `.catch` attaches — + a malformed percent sequence in a link path throws an uncaught + `URIError` per hover. +- `static/js/popups.js:359,390`: fetched monogram SVG injected via + `innerHTML` unescaped — the single unsanitized path in an otherwise + fully escaped pipeline. Build-authored content, so not exploitable + today; the comment acknowledges the trust assumption. +- `static/js/citations.js`: dead file — no template loads it; popups.js + supersedes it. If ever re-added it would double-bind and inject + bibliography innerHTML without popups.js's cloned-node hardening. + Delete. +- `static/js/nav.js:26,30-31`: raw `localStorage` unguarded; if storage + access throws, the throw lands before `toggle.addEventListener`, + leaving the Portals toggle completely dead (utils.js exists precisely + for this). +- `static/js/annotations.js:209-215`: marks are mouse-only; the tooltip's + Delete button is unreachable by keyboard (only recourse is the + all-or-nothing "Clear Annotations"). +- `static/js/search.js:10`: unguarded `new PagefindUI(...)` — if the + pagefind bundle 404s, the ReferenceError aborts the whole handler + including the `?q=` pre-fill that the selection-popup "Here" flow + depends on. +- `static/js/semantic-search.js:55-56,96-107`: no + `vectors.length === meta.length * DIM` consistency check — a stale + CDN-cached mismatch yields NaN scores and silently garbage ranking. + (Current files verified consistent: 1,256,448 bytes = 818 × 384 × 4.) +- `static/js/transclude.js:149-151` + `collapse.js:111-114`: nested + transcludes render a bare placeholder (no rescan of injected content); + `reinitCollapse` is not idempotent (would stack toggle buttons if ever + called twice on the same container). +- `static/js/popups.js:985-988,1009-1014`: `daysBetween` uses `Math.abs`, + so future dates render "N days ago" (now.js:17 handles this correctly). + +### 5.8 Frontend NITs + +- `static/js/copy.js:20-22,39`: code-less `

` fallback copies the
+  "copy" button label along with content.
+- `static/js/score-reader.js:50`: URL rewritten to `?p=1` on every load
+  even without a `?p=` param.
+- `static/js/search-filters.js:271`: `parseInt(v,10) || 0` turns junk
+  threshold input into an active ≥0 filter that matches everything.
+- `static/js/selection-popup.js:90-95`: shift-keyup while typing capitals
+  in the annotation picker re-summons the selection toolbar over it.
+
+Verified clean: the semantic-search ↔ embed.py contract post-model-split
+(DIM 384, 818-entry meta, no prefix for MiniLM — the nomic
+`search_document:` prefix is confined to the build-only page path); XSS
+escaping across semantic-search, popups providers, map tooltips,
+annotations (sole exception §5.7 monogram); theme.js ↔ settings.js
+storage schema identical; all JS selector contracts against templates
+(including the uncommitted head/nav edits); popups/sidenotes
+double-init guards; settings.js and gallery.js focus traps.
+
+---
+
+## 6. Templates & content
+
+### 6.1 Draft in undocumented location is never built — **MED**
+
+`content/drafts/inclusionist-manifesto.md`. WRITING.md:34 says drafts go
+under `content/drafts/essays/`; `draftEssayPattern`
+(`build/Patterns.hs:46-49`) matches only that, so this file is invisible
+even to `make watch`/`make dev` — silently orphaned.
+
+### 6.2 SIMD/PQC essay `repository:` URL 404s — **MED**
+
+`content/essays/where-does-simd-help-post-quantum-cryptography/index.md:24`.
+`https://git.levineuwirth.org/where-simd-helps` is missing the owner
+segment — verified HTTP 404, while the sibling essay's
+`.../neuwirth/beyond_comorbidity_indices` returns 200.
+
+### 6.3 Tracked drafts contradict the gitignore policy — **MED**
+
+`.gitignore:88` ignores `content/drafts/` as local-only "working notes,"
+but `git ls-files -i -c` shows four tracked drafts
+(`digital_progeny.md`, `modern_idolatry.md`, `test-essay.md`,
+`university_care.md`) — ignore rules don't untrack, so edits are
+auto-staged by `make build` and pushed publicly by deploy. The over-broad
+`**/.env.*` pattern also matches the tracked `.env.example`.
+
+### 6.4 Template/content LOWs and NITs
+
+- `content/colophon.md:5`: `modified:` is dead frontmatter — nothing
+  reads it; `$date-modified$` (page-footer.html:108) is Hakyll's
+  `dateField` over the `date` key.
+- Seven files end frontmatter with a valueless `confidence-history:`
+  (YAML null; WRITING.md:97 documents a list of ints) — harmless, but
+  `content/essays/scaling_outage.md` also retains the full WRITING.md
+  scaffold comments in a published essay.
+- `static/images/canto31.jpg`: still 4.0 MB (prior-audit §6.1 unfixed).
+- `templates/blog-post.html:25,34`: `id="similar-links"` appears twice in
+  mutually exclusive `$if$` branches — safe, fragile under edit.
+- `content/drafts/essays/digital_progeny.md`: title duplicates the
+  published "The Specification Dilemma" — stale draft.
+- Frontmatter flags `home:`/`library:`/`links:`/`search:`/`portal:` are
+  consumed (head.html CSS gates, default.html:6 `data-portal`) but
+  undocumented in WRITING.md.
+
+Verified clean: all `$partial(...)$` includes resolve; all ~140 distinct
+template variables have context providers; no missing `alt` attributes,
+tag-balance failures, or within-page duplicate IDs in composed pages; all
+26 CSS files referenced by head.html exist; sampled enum values across
+all sections are legal per WRITING.md and Contexts.hs validation lists.
+
+---
+
+## 7. Documentation / spec drift (WRITING.md, README.md)
+
+### 7.1 `js:` page-script paths documented as content-relative; emitted root-relative — **MED**
+
+`WRITING.md:773-775` vs `templates/default.html:37`
+(`