levineuwirth.org/AUDIT-2026-06-09.md

932 lines
46 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: Repository audit
date: 2026-06-09
---
# Repository audit — levineuwirth.org (2026-06-09)
Comprehensive audit of the repo on `main` at commit `620b974` (working tree
modified: branding refresh across `static/` + `templates/partials/`, plus
`tools/embed.py` rework; untracked `static/og-image.png`,
`templates/partials/logo-mark.svg`, `data/embed-cache-pages.npz.tmp.npz`).
Severity legend: **HIGH** (likely to break a build, cause data loss, or
expose a security weakness) — **MED** (latent bug, brittleness, or
documentation drift) — **LOW** (minor robustness gap or fragile assumption) —
**NIT** (style, polish, or paranoia).
Numbers are file:line against the working tree at audit time. Findings
marked "verified" were reproduced empirically (solver runs, built `_site/`
output inspection, live HTTP checks, binary parsing); the rest were
confirmed by reading the code.
Prior audit: `AUDIT.md` (2026-05-07). Follow-up status in §10.
---
## 1. Build & dependency chain
### 1.1 `cabal.project.freeze` is unsolvable again — next clean build fails — **HIGH**
`cabal build --dry-run` fails today (verified): the freeze pins
`distributive ==0.6.2.1`, but the system (pacman) GHC package db has
`comonad-5.0.10` built against `distributive-0.6.3`:
```
rejecting: distributive-0.6.3/installed... (constraint from
cabal.project.freeze requires ==0.6.2.1)
After searching the rest of the dependency tree exhaustively...
```
The conflict set also names aeson, warp, hakyll, http2, semigroupoids. This
is the same failure mode as prior-audit §1.1 — that audit's specific aeson
pin was fixed (now 2.2.2.0/hashable 1.4.7.0), but a different package broke
the same way after a system update. Recent builds succeed only off the
cached `dist-newstyle/cache/plan.json`; the freeze file has since changed,
so the next cabal invocation re-solves and fails. Because `make deploy`
starts with `make clean`, the next deploy hits this. `levineuwirth.cabal`'s
own bounds are compatible with the freeze — the conflict is
freeze-vs-installed-db, not freeze-vs-cabal-file.
Fix: `tools/refreeze.sh` (written for exactly this post-`pacman -Syu`
situation). The underlying fragility — freezing against a mutable system
package db — remains; consider documenting the refreeze step as part of any
system-upgrade ritual. *(In progress at time of writing.)*
### 1.2 Missing `data/archive-index.json` / `archive-state.json` crashes the build — **HIGH**
`build/ArchiveIndex.hs:134-146`. The module doc (lines 18-22) promises "An
absent or malformed file degrades safely: an empty index makes the link
consumers no-op; an absent state file makes every entry @Live@." But
`rawIndex = unsafePerformIO $ do decoded <- A.eitherDecodeFileStrict' indexPath`
(and identically `rawState`) never checks `doesFileExist`, and aeson's
`eitherDecodeFileStrict'` throws an uncaught `IOException` on a missing
file (verified: `withBinaryFile: does not exist`). Both files are
gitignored (`.gitignore:84-85`), so a fresh clone or a no-`.venv` build —
the exact path `build/Archive.hs:20-24` promises to support — throws when
the CAF is first forced. Contrast `readUrlSet` (line 109) in the same file,
which guards correctly. Currently latent on this machine only because both
generated files happen to exist.
### 1.3 `embed.py` `trust_remote_code=True` executes unpinned third-party code — **HIGH**
`tools/embed.py:329` (line ~341 in the uncommitted version). The new
page-model load is
`SentenceTransformer(PAGE_MODEL_NAME, revision=PAGE_MODEL_REVISION, trust_remote_code=True)`.
The `revision` arg pins only the `nomic-ai/nomic-embed-text-v1.5` repo; the
actual modeling code is pulled via `auto_map` from a *different* repo —
verified in the local HF cache: the executed code lives under
`transformers_modules/nomic_hyphen_ai/nomic_hyphen_bert_hyphen_2048/...`,
i.e. `nomic-ai/nomic-bert-2048` at its current head, which nothing pins. A
compromise of that second repo runs arbitrary Python at build time, in a
repo whose every other download path (download-model.sh, pdfjs, leaflet) is
sha256-pinned. The comment "Both pins are deliberate" is therefore
misleading. Fix: pin via `code_revision`, or run with `HF_HUB_OFFLINE=1`
after first fetch, or document the accepted risk.
### 1.4 Working-tree commit hazard: tracked templates reference untracked files — **HIGH (process)**
`templates/partials/nav.html:5` (tracked, modified) adds
`$partial("templates/partials/logo-mark.svg")$` and
`templates/partials/head.html` references `/og-image.png` — both target
files are **untracked** (no git history). Committing the template diff
without `git add`-ing both breaks every page's Hakyll build on a fresh
clone (`$partial$` aborts compilation) and 404s the og:image. They must
land in the same commit. Conversely, `data/embed-cache-pages.npz.tmp.npz`
must **not** be committed (see §4.1). The partial itself is safe as a
Hakyll template (verified: zero `$` characters; `match "templates/**"`
compiles it).
### 1.5 `einops` dependency: undocumented, unbounded, imported nowhere — **LOW**
`pyproject.toml:27` adds `einops>=0.8.2`. No import anywhere in
`tools/`/`build/`/`static/js/`; its only consumer is nomic's
`trust_remote_code` module (§1.3). Every sibling dependency has an
explanatory comment and an upper bound per the file's own stated policy
("Upper bounds are intentionally generous (next major) but always
present"); einops has neither. `uv lock --check` passes (0.8.2 pinned).
---
## 2. Haskell build code — core
### 2.1 Nav, home grid, and library link `/fiction/` and `/poetry/` — confirmed 404s — **MED**
`build/Site.hs:50-60` (`homePortals` contains `("Fiction","fiction")`,
`("Poetry","poetry")`), `templates/partials/nav.html:56,61`,
`templates/library.html:44,58`. No rule generates either index: fiction and
poetry are not in `tagIndexable` (`build/Patterns.hs:148-151` = essays +
blog + photos) and Site.hs has no landing rule. Verified: `_site/fiction`
does not exist; `_site/poetry/` has no `index.html`. nginx has no
redirects. Both links 404 in production today.
### 2.2 Tag/route collisions guarded for `photography` only — **MED**
`build/Tags.hs:98-99`. `tagIdentifier` maps tag `t``t ++ "/index.html"`;
`sectionOwnedTopLevelTags = ["photography"]` is the only guard. A
tagIndexable item tagged `music` (or `music/x`, which expands to `music`)
emits `music/index.html`, already owned by the music index route
(`build/Site.hs:486-487`); similarly `essays`, `blog`, `cv`, `archive`,
`authors`, `bibliography`. Hakyll does not error on duplicate routes — one
silently overwrites the other.
### 2.3 Sidenotes filter destroys the documented no-JS fallback — **MED**
`build/Filters/Sidenotes.hs:30-36` vs `static/css/sidenotes.css:125-135`.
The module doc claims the Pandoc `<section class="footnotes">` "serves as
fallback," but `apply` replaces every `Note`, so the writer never emits the
section. CSS depends on it below 1500px. Verified in output:
`_site/essays/scaling_outage.html` has 3 `class="sidenote"` and zero
`footnotes` occurrences. With JS disabled, footnote content is invisible on
narrow viewports. The comment, the CSS, and ozymandias.md's own prose all
contradict actual behavior.
### 2.4 Sidenote bodies rendered without the KaTeX writer — **MED**
`build/Filters/Sidenotes.hs:103-115`. `inlinesToHtml`/`blocksToHtml` use
`writeHtml5String (def :: WriterOptions)` (PlainMath), while the main
pipeline uses `KaTeX ""` (`build/Compilers.hs:47`). Math inside a footnote
never gets `<span class="math inline">\(...\)</span>`, so KaTeX never
renders it — degrades to plain italics, silently inconsistent with body
math.
### 2.5 SourceRefs whitelist vs `/source/` serving whitelist have drifted — **MED**
`build/Filters/SourceRefs.hs:114-141` vs `build/Site.hs:217-240`. Site.hs:209
says "must stay aligned with 'isSourcePath'". Mismatches: SourceRefs wraps
`content/` and `yaml-source/` (no Site counterpart); `static/` + any known
ext vs Site's `static/js/**`/`static/css/**` only; `tools/` + any ext vs
Site's `tools/**.sh`/`tools/**.py`; `data/` at any depth vs Site's
top-level `data/*.{json,yaml,md,bib}`. Each mismatch yields a wrapped
source-ref whose popup fetch 404s (Forgejo href fallback still works).
Inverse: Site serves `data/*.bib` but `.bib` is missing from
`hasKnownExt` — dead whitelist entry.
### 2.6 `epistemicEntry` ignores `confidence: proved` — **MED**
`build/Site.hs:1014-1024`. Comment: "Compute overall-score the same way
Contexts.overallScoreField does," but it uses
`readMaybe =<< lookupString "confidence" meta`, which is `Nothing` for
`"proved"`/`"proven"`, whereas `Contexts.overallScoreField`
(`build/Contexts.hs:574-576`) substitutes 100 via `isProvedConfidence`.
Proved pages get no `score` in `data/epistemic-meta.json` and export the
raw string under `confidence`, so client-side filtering silently misses
them.
### 2.7 Empty affiliation `<div>` ships on every essay without `affiliation:` — **MED**
`build/Contexts.hs:84-89` + `templates/partials/metadata-tail.html:12`.
`affiliationField` returns an empty list instead of `noResult`; Hakyll's
`$if$` is truthy for empty list fields (the codebase knows this —
`tagLinksFieldExcludingScope` uses `noResult` for exactly this reason).
Verified in output: `_site/essays/asymmetric-forgetting.html` contains
`<div class="meta-row meta-affiliation">` with whitespace-only content.
### 2.8 Library page hard-depends on `content/library.md` — **LOW**
`build/Site.hs:675`. `_ <- loadSnapshot libraryIntroId "body"` is a
top-level compiler statement (not inside a `field`), so it's a hard
failure. The block is documented as "optional prose block"; deleting
`content/library.md` breaks the whole `library.html` compile. Contrast the
existence-guarded sidecars at `build/Tags.hs:277-283` and
`build/Site.hs:843-850`.
### 2.9 Library `primaryPortalOf` reads only list-form `tags:` — **LOW**
`build/Site.hs:632-638`. `lookupStringList "tags"` returns `Nothing` for
scalar comma form (`tags: research, ai`), which Hakyll's `getTags`
accepts. Such an item appears on tag pages but is silently dropped from
the library. All current content uses list form — latent.
### 2.10 `allContent` omits me/, memento-mori/, photography from the link graph — **LOW**
`build/Patterns.hs:124-133`, used by `build/Backlinks.hs:334,345`. Despite
"Every content file the backlinks pass should index," `content/me/index.md`
and `content/memento-mori/index.md` (full essays, rendered with
`backlinksField`) never have their outgoing links extracted; photography
likewise. Either deliberate-but-undocumented or the exact silent omission
the module header says it exists to prevent.
### 2.11 Paginated tag pages: split by creation date, sorted by display date — **LOW**
`build/Tags.hs:371-377`. `buildPaginateWith (sortAndGroupAt tagPageSize)`
partitions via `sortRecentFirst` (creation date), then each page re-sorts
with `recentFirstByDisplay` (revision-aware). A recently revised old item
stays on a late page but jumps to its top — cross-page ordering is not
monotone. Only fires above the 150-item threshold.
### 2.12 `fill:#000` replacement corrupts longer hex colors — **LOW**
`build/Filters/Score.hs:118-133` (and `Filters/Viz.hs` `processColors`).
The 6-digit pass protects only `#000000`; for `fill:#000080` the 3-digit
pass produces `fill:currentColor80` — invalid CSS, silently mangled SVG.
Quoted attribute forms are safe; only unquoted style-property forms are
exposed.
### 2.13 Source-level preprocessors rewrite inside fenced code blocks — **LOW**
`build/Filters/Wikilinks.hs:24-31`, `Filters/Transclusion.hs:18-20`,
`Filters/EmbedPdf.hs`. All run on the raw source before Pandoc parses
fences: `[[anything]]` in a code block becomes a link; a code-block line
that is exactly `{{slug}}` or `{{pdf:...}}` becomes raw HTML.
Transclusion's comment ("prevents accidental substitution inside prose or
code") is false for full-line directives in code blocks. A live foot-gun
for a site that documents its own syntax (ozymandias.md does exactly
this).
### 2.14 `domainIcon` matches substrings of the whole URL, not the host — **LOW**
`build/Filters/Links.hs:120-153`. `"x.com" `T.isInfixOf` url` etc. —
`https://example.org/why-x.com-failed` gets the Twitter icon. Contradicts
the strict-hostname discipline `isExternal` documents at lines 95-101 of
the same file. Cosmetic (icon only).
### 2.15 `gsubRoute "content/"` strips every occurrence, not just the prefix — **LOW**
`build/Site.hs:171,357,417` etc. Hakyll's `gsubRoute` is replace-all; a
co-located directory literally named `content` would be silently mangled
(`content/essays/slug/content/data.csv` → `essays/slug/data.csv`). Same
for `gsubRoute "static/"`. Improbable but silent.
### 2.16 `existsCached` memoizes non-existence for the process lifetime — **LOW**
`build/Filters/SourceRefs.hs:160-166`. Under `make watch`, a source file
created after first reference stays cached as absent until restart.
### 2.17 Core NITs
- `build/Site.hs:42-44`: comment says "eight portals"; the list has nine.
Echoed at Site.hs:606 ("the eight") vs line 657's "nine times".
- `build/Site.hs:866-877`: random-pages.json comment says "essays + blog
posts only" but the rule loads fiction and flat poetry too; uses
flat-only `content/poetry/*.md` while the epistemic rule uses
`allPoetry` — collection poems are epistemic-indexed but never
randomizable.
- `build/Utils.hs:64-73`: `authorSlugify` comment claims runs of spaces
collapse; code maps each space (`"A B"` → `"a--b"`). Consistent
everywhere, so links work; comment wrong.
- `build/Utils.hs:31-32`: `readingTime` truncates (`div 200`) — 399 words
reports "1 min"; comment implies ceiling semantics.
- `build/Pagination.hs:42` + `build/Site.hs:77-82`: hardcoded pattern
literals duplicate `Patterns.hs`, defeating that module's stated purpose
(Patterns.hs:6-10).
- `build/Contexts.hs:174-180`: plain `tagLinksField` returns an empty list
rather than `noResult``$if(item-tags)$` is true and templates emit
empty tag wrappers (author-index.html, item-card.html).
- `build/Tags.hs:296-304`: `tagItemCtx` composes `defaultContext`, not
`siteCtx`, so `$if(has-monogram)$` never fires on tag pages — monograms
render on new.html/library but silently never on tag indexes.
- `build/Contexts.hs:485-492`: `dotsField` comment says "15" but accepts
0 (`max 0 (min 5 n)`) — `importance: 0` renders five empty circles.
- `build/Contexts.hs:375-381`: `descriptionField` doc says `noResult`;
code uses `fail` — behaviorally fine under Hakyll 4.16 `$if$` (verified
against Hakyll 4.16.7.1 source) but logs `[ERROR]` debug noise per
abstract-less page. Same in `abstractField`, `summaryField`,
`bibliographyField`.
- `build/Filters/Images.hs:233-234`: `webpSrc` interpolated into `srcset`
unescaped while sibling `src` goes through `esc`.
- `build/Filters/Links.hs:37-46,63-69`: internal PDF links double-classified
(`pdf-link` + `link-internal` chrome) despite the "no overlap" comment.
- `build/Filters/Smallcaps.hs:31-34` + `Filters/Archive.hs:42-44`:
"headers are skipped" only at top level; a Header nested in a
Div/BlockQuote is processed, contradicting the comments.
Verified clean: no unguarded `head`/`fromJust`/`read`/`!!` hazards in the
core modules; filter composition order matches its documenting comments;
Hakyll 4.16.7.1 `$if$` treats both `fail` and `noResult` as false.
---
## 3. Haskell build code — feature modules
### 3.1 Stats heatmap day-of-week off-by-one: Sunday clipped out of the SVG — **MED**
`build/Stats.hs:185,300,317`. `dowOf d = fromEnum (dayOfWeek d) -- Mon=0..Sun=6`
— but `time-1.12.2` is ISO-numbered (verified:
`map fromEnum [Monday..Sunday] == [1..7]`). So Sunday lands at y=106 while
`svgH` = 104 — every Sunday cell is clipped out of the viewBox and grid
row 0 is permanently blank. Relatedly, `weekStart` returns the previous
*Sunday* (and for a Sunday, 7 days back), not the "first Monday on or
before" its comment claims; builds run on a Sunday also clip the newest
column horizontally.
### 3.2 `Commonplace.hs` uses `Char8.pack` — non-ASCII YAML corruption — **MED**
`build/Commonplace.hs:143`. `Y.decodeEither' (BS.pack raw)` with
`Data.ByteString.Char8` truncates each `Char` to 8 bits — the exact hazard
`build/Now.hs:249-253` documents and fixes with `TE.encodeUtf8`.
`data/commonplace.yaml` is currently pure ASCII, so latent — but a
commonplace book of quotations is the likeliest file to acquire an em-dash
or curly quote, which will then either fail the YAML parse or publish
mojibake.
### 3.3 Backlinks: links inside tight lists are invisible — **MED**
`build/Backlinks.hs:220-226`. `extractLinksWithContext`'s `go` handles
`Para`, `BlockQuote`, `Div`, `BulletList`, `OrderedList`, then `go _ = []`.
Tight list items (the default `- item` form) are `Plain` blocks, not
`Para`, so recursion into list children yields nothing. Every internal
link written in a tight list never produces a backlink. `Header`, `Table`,
and `DefinitionList` blocks are likewise skipped. The doc comment implies
coverage it doesn't deliver.
### 3.4 Stability "age" is the first→last commit span, not time since first commit — **MED**
`build/Stability.hs:89-93,99-112`. Docs say "age in days since first
commit," but `classify (length dates) (daySpan (last dates) newest)`
computes the span between first and most recent *commit*, with no
reference to today. A piece written in a one-week burst years ago reports
"volatile" forever; time passing without commits can never increase
stability. Either the comment or the metric is wrong.
### 3.5 Frontmatter `history:` assumed newest-first; WRITING.md documents oldest-first — **MED**
`build/Stability.hs:204-217,299-336` vs `WRITING.md:105-109`.
`loadVersionHistory` keeps authored order and all range fields treat the
head as newest (`es@(newest:_) -> let oldest = last es`). Git history is
newest-first, but WRITING.md's `history:` example is oldest-first. With
the documented ordering, `version-history-range` renders reversed
("14 March 2026 1 March 2026"), `range-start` returns the newest date,
and `version-history-primary` shows the three *oldest* entries.
### 3.6 Archive manifest→provenance join is exact-string, rest of system is normalized — **MED**
`build/Archive.hs:269`. `Map.lookup (meUrl me) provByUrl` joins on the raw
URL; everywhere else equivalence is `normalizeUrl` (ArchiveIndex
filtering, dup detection, ARCHIVE.md:189-192). Editing a manifest URL to a
normalization-equivalent form (`http`→`https`, trailing slash, tracking
param) silently unpublishes `/archive/<slug>/` while ArchiveIndex's
normalized filter keeps the slug active — links keep pointing at a 404.
### 3.7 Photography `buildPin` computes wrong slug/thumb/title for flat entries — **MED**
`build/Photography.hs:354,362`. `slug = takeFileName (takeDirectory fp)`
for a flat `content/photography/foo.md` this yields `"photography"`, so
map.json gets `"slug": "photography"`, the title fallback is wrong, and
`thumb = "/photography/photography/<p>"` 404s (flat-single assets route to
`/photography/<asset>`). PHOTOGRAPHY.md:214 explicitly supports flat
singles. Latent — `content/photography/` currently has only `index.md`
but breaks the first geo-tagged flat single.
### 3.8 `geo-precision` fails open: a typo'd "hidden" publishes coordinates — **MED**
`build/Photography.hs:347-349,312-320`. Only the exact string matches
(`(_, Just "hidden", _) -> return Nothing`); any other value (e.g.
`Hidden`, `hiddn`) falls into `roundCoord`, whose catch-all treats unknown
values as `city` (~10 km rounding) — publishing coordinates the author
meant to suppress. Contradicts the file's own privacy comment (lines
287-289) and the fail-closed precedent for `visibility:` in
`build/Archive.hs:77-83`.
### 3.9 Archive state is process-lifetime cached — `watch` goes stale — **LOW**
`build/ArchiveIndex.hs:123-146` + `build/Archive.hs:304`.
`activeUrls`/`rawIndex`/`rawState` are NOINLINE `unsafePerformIO` CAFs read
once per process, and `archiveRules` reads the manifest in `preprocess`.
Under `site watch`, edits to `manifest.yaml`, `removed.yaml`, or the
regenerated state JSONs are never re-read until restart. One-shot builds
unaffected.
### 3.10 Pinned pages render raw ISO in `$last-reviewed$` — **LOW**
`build/Stability.hs:166-170`. The git branch formats via `fmtIso`
("1 May 2026"); the IGNORE.txt-pinned branch returns the frontmatter value
verbatim ("2026-05-01") — inconsistent display formatting.
### 3.11 Empty/all-comments `manifest.yaml` halts the build — **LOW**
`build/Archive.hs:158-170`. An empty YAML stream decodes as `Null`, which
fails to parse as `[ManifestEntry]` and takes the `exitFailure` branch —
draining the manifest to zero entries is fatal rather than the empty
archive the absent-file branch supports.
### 3.12 Backlinks `normaliseUrl` misses directory-form canonical URLs — **LOW**
`build/Backlinks.hs:275-281`. Strips `.html` but not
`index.html`/trailing slash: a page routed `essays/foo/index.html` keys as
`/essays/foo/index`, but a body link authored `/essays/foo/` doesn't
match — backlink silently dropped. `build/SimilarLinks.hs:97-99` handles
exactly this case and its comment flags the divergence.
### 3.13 SimilarLinks PDF viewer URL not percent-encoded — **LOW**
`build/SimilarLinks.hs:155-164`.
`viewerUrl = "/pdfjs/web/viewer.html?file=" ++ escapeHtml raw`
`escapeHtml` handles HTML metachars only; a path containing `&`, `?`, `#`,
or spaces breaks the `file=` query value.
### 3.14 Photography feed thumbnails only for directory-form entries — **LOW**
`build/Photography.hs:449-453`. `imgTag` requires `isDir`; flat singles
and series children (`<series>/<photo>.md`) get text-only feed entries,
against PHOTOGRAPHY.md's "thumbnails embedded inline" (lines 36, 445) and
the feed's deliberate inclusion of series children.
### 3.15 Marks: missing confidence/evidence renders a literal "0 TRUST" — **LOW**
`build/Marks.hs:272-278,565`. `computeTrust _ _ = 0` with a comment
claiming the figure "collapses to the bare frame," but
`renderEpistemicFigure` unconditionally calls `renderTrustLabel`, so a
piece with `status:` but no `confidence`/`evidence` (a case MARKS.md:696
says should render) displays a prominent center "0" — indistinguishable
from an authored zero-trust score.
### 3.16 Feature-module NITs
- `build/Catalog.hs:228-235`: two distinct unknown categories render as
adjacent duplicate "Other" sections (equal rank, `groupBy` on raw
string).
- `build/Stats.hs:754-777`: `pageTOC` comment says "nine h2 sections";
lists eleven (matching the eleven rendered).
- `build/SimilarLinks.hs:51-54`: comment says "the template caps the
display"; the code caps it (`take maxSimilar` at line 80).
- `build/Stats.hs:169-171`, `build/Archive.hs:564-569`: "median" is the
upper-median for even-length lists.
- `build/Backlinks.hs:133-153`: protocol-relative `//host/path` URLs pass
`isPageLink` and pollute backlinks.json.
- `build/BibExtras.hs:75-98`: `@string`/`@comment`/`@preamble` blocks
parsed as citekey entries — only consequential on a citekey/macro-name
collision.
Verified clean: Marks tick positions/axis order/radii match MARKS.md §3;
proved-confidence trust substitution matches §4.3; Archive's fail-closed
`visibility` validation, removed.yaml conflict rejection, and double-sided
SHA-256 verification all match ARCHIVE.md.
---
## 4. Python & shell tooling
### 4.1 `data/embed-cache-pages.npz.tmp.npz` orphan: explained; cleanup + ignore gaps — **MED**
The orphan (mtime May 26) is the fossil of a fixed bug: an earlier
embed.py passed a bare path to `np.savez_compressed`, numpy appended
`.npz` (verified in numpy's `_savez` source), and the subsequent
`os.replace` raised FileNotFoundError, stranding the file. The current
file-handle code (`tools/embed.py:173-183`) is correct, but: (a) nothing
deletes the stale orphan — **delete it, don't commit it**; (b) the tmp
write has no try/finally, so any mid-write exception strands
`embed-cache-pages.npz.tmp`; (c) the new `.gitignore` entry is exact-path
(`data/embed-cache-pages.npz`) and covers neither `.tmp` nor `.tmp.npz`
variants — widen to `data/embed-cache-pages.npz*`; (d) the fixed tmp name
means two concurrent runs interleave writes.
### 4.2 Corrupt embed cache crashes instead of being discarded — **MED**
`tools/embed.py:154`. The discard path catches
`(OSError, KeyError, ValueError)`, but `np.load` on a truncated `.npz`
raises `zipfile.BadZipFile` (verified MRO: `BadZipFile → Exception`), and
`EOFError` is also uncaught. A half-written cache (exactly what §4.1(b)
can produce) makes every subsequent build print "Warning: embedding
failed" and leaves similar-links/semantic index stale until the file is
manually deleted — the opposite of the docstring's "unreadable →
discarding" contract.
### 4.3 embed.py staleness check structurally defeated by stamp-build-time — **MED**
`tools/embed.py:195-200` + `Makefile:68`. `needs_update()` compares
`_site/**/*.html` mtimes against embed's outputs — but the build order is
`embed.py``stamp-build-time.py _site`, and the stamper rewrites the
footer timestamp in essentially every HTML file each build. So every page
is always newer than embed's outputs and the "skip if fresh" fast path
never fires: the full paragraph-embedding pass (and model load) runs on
every build. The new page cache papers over half the cost; the paragraph
pass pays full price every time. Related (`tools/embed.py:297-299`):
model/config changes never invalidate outputs — currently masked by this
bug; fixing one exposes the other.
### 4.4 archive.py writes provenance/index/state non-atomically — **MED**
`tools/archive.py:718-721,734-737,953-957,1077-1080`. All plain
`write_text()`. An interrupt mid-write truncates `PROVENANCE.json`; the
next build's `json.loads` (line 642) raises an unhandled
`JSONDecodeError` — and a truncated provenance is indistinguishable from
corruption in a tool whose whole contract is integrity checking. embed.py
got atomic-write helpers; archive.py did not.
### 4.5 download-leaflet.sh: checksum verification bypassable — **MED**
`tools/download-leaflet.sh:43-47,90`. The early-exit skip checks file
existence only (download-model.sh re-verifies on its skip path), and
`curl -o "$target"` writes directly to the final path: a download that
*fails* `verify_or_warn` aborts via `set -e` *after* the bad file is in
place, and the next run's existence check accepts it permanently. A
MITM'd unpkg.com download survives one failed run and is silently
vendored on the next.
### 4.6 Other download/convert scripts leave partial files in final paths — **LOW**
`tools/download-model.sh:84`: interrupted curl leaves a partial
`model_quantized.onnx`; caught today only because model-checksums.sha256
pins all five files — any unpinned file would persist forever. Use
`-o "$dst.part" && mv`. `tools/convert-images.sh:33`: interrupted cwebp
leaves a partial `.webp` that the `-nt` staleness gate then skips forever
— a truncated WebP ships until manually deleted.
### 4.7 archive.py robustness gaps — **LOW**
- `tools/archive.py:788,795-799`: provenance missing the `artifact` key
makes `prev_artifact == slug_dir`, then `sha256_of` raises an uncaught
`IsADirectoryError` instead of the structured "prior snapshot
incomplete" error.
- `tools/archive.py:614-617,938-940,1066-1068`: non-dict manifest entries
(`- https://example.com` instead of `- url: ...`) crash with
`AttributeError: 'str' object has no attribute 'get'`.
- `tools/archive.py:896`: `wayback_save` concatenates the raw URL
(contrast `wayback_lookup` at 909, which uses `quote(url, safe="")`).
### 4.8 add-popup-source.sh: dead CSP reminder + unvalidated nginx interpolation — **LOW**
`tools/add-popup-source.sh:214`: the connect-src reminder gates on
`[[ "$NEEDS_PROXY" -eq 0 && -n "$UPSTREAM_HOST" ]]`, but `UPSTREAM_HOST`
is only set in the `NEEDS_PROXY -eq 1` branch (lines 124-131) — the
reminder can never print, and the no-proxy case is exactly when it's
needed (the provider will be CSP-blocked with no hint). Line 71: `NAME`
from a free-text prompt is interpolated into
`location /proxy/$NAME/`/`set $upstream_$NAME` with no
`^[a-z0-9-]+$` validation (import-photo.sh validates; this doesn't).
### 4.9 refreeze.sh deletes the freeze before the replacement succeeds — **LOW**
`tools/refreeze.sh:13-16`. `rm -f "$FREEZE"` then `cabal freeze`; a failed
resolve leaves no freeze file (recoverable via git, but write-temp-then-move
is safer).
### 4.10 embed.py / atomic-write NITs — **LOW/NIT**
`tools/embed.py:109-115`: `atomic_write_bytes` uses a fixed `.tmp` name
(concurrent-run collision) and no `fsync` before `os.replace` (power loss
can leave an empty target). Same pattern in `_atomic_write_yaml` of
extract-exif.py:377, extract-palette.py:65, extract-dimensions.py:65.
`tools/embed.py:144`: NpzFile never closed — use
`with np.load(...) as npz:`.
### 4.11 Tooling NITs
- `tools/import-photo.sh:147-155`: on `mogrify -strip` failure the
EXIF-laden JPEG (GPS, serials) remains under `content/`, where
`make build`'s `git add content/` could auto-commit it. Delete `$TARGET`
on that failure path.
- `tools/hooks/pre-commit-marks.sh:28-31`: `awk '{ print $2 }'` truncates
paths with spaces; the `status:` probe reads the working tree, not the
staged blob. Advisory-only hook.
- `tools/preset-signing-passphrase.sh:30`: `echo -n "$PASSPHRASE"` eats a
passphrase starting with `-e`/`-n`/`-E`; use `printf '%s'`.
- `tools/stamp-build-time.py:52-54`: in-place non-atomic rewrite of
`_site/` HTML.
- `tools/archive.py:244`: `pdftotext` without `--`; a slug starting with
`-` parses as an option. Same in extract-exif.py:159.
- `tools/monolith-version.txt` records a sha256 (matches the binary
today, verified) but `find_monolith()` never checks it.
Verified clean: sign-site.sh (atomic sig writes, post-pass manifest
verification); compress-assets.sh and download-pdfjs.sh (mktemp + EXIT
trap, hash verified before extraction); audit-marks.py, viz_theme.py,
extract-dimensions.py, extract-palette.py; embed.py's faiss `-1` padding
is safely filtered; `uv lock --check` passes; model-checksums.sha256 pins
all five model files.
---
## 5. Frontend JavaScript
### 5.1 Score-reader pages never restore theme/settings — **MED**
`templates/score-reader-default.html:10` + `static/js/theme.js:12-13`. The
template loads `theme.js` without `utils.js` (unlike head.html:66-67), so
`window.lnUtils.safeStorage` is undefined and theme/text-size/focus-mode/
reduce-motion all silently fail to restore — a dark-theme user gets a
light flash-and-stay on every score page. Compounding: settings.js (line
15; the template does render the settings toggle) falls back to its no-op
store, so theme picks made on score pages never persist either.
### 5.2 search-filters.js: epistemic filters silently bypass clean-URL pages — **MED**
`static/js/search-filters.js:117-125`. `normUrl()` returns `u.pathname`
verbatim and looks it up in `epistemicMeta[url]`. Verified:
`_site/data/epistemic-meta.json` keys include
`/essays/beyond-comorbidity-indices/index.html` while rendered result
links use `/essays/beyond-comorbidity-indices/`. The lookup misses,
`passes(null)` returns true ("no metadata = don't filter"), so every
directory-style page bypasses all active epistemic filters. Flat `.html`
pages match fine, which hides the bug.
### 5.3 viz.js ignores the cappuccino theme — **MED**
`static/js/viz.js:94-99`. `isDark()` knows only
`'dark'`/`'light'`/OS-preference, but theme.js/settings.js support
`'cappuccino'` — a dark-brown theme (`--bg: #553a28`, base.css:203). With
OS-light + cappuccino, charts render the LIGHT config (near-black marks
and axis labels) on a dark background.
### 5.4 collapse.js localStorage keys collide across pages — **MED**
`static/js/collapse.js:44,83`. Key is
`'section-collapsed:' + heading.id` with no pathname namespace (contrast
annotations.js). Pandoc auto-slugs (`#introduction`, `#background`) recur
across essays, so collapsing "Introduction" on one essay collapses it
everywhere. Also uses raw `localStorage` rather than
`lnUtils.safeStorage`.
### 5.5 semantic-search.js: stale-response race + duplicate index fetch — **MED**
`static/js/semantic-search.js:117-144`. `runSearch` has no generation
token; overlapping queries render in promise-resolution order, so an
older query's hits can replace a newer one's (with `setStatus('')`
masking it). `loadIndex()` (42-59) has no in-flight-promise dedup (unlike
`loadModel`'s `loadModelPromise`), so concurrent first searches fetch
`semantic-index.bin` + `semantic-meta.json` twice.
### 5.6 lightbox.js: aria-modal with no focus trap, no keyboard activation — **MED**
`static/js/lightbox.js`. Overlay sets `role="dialog"` +
`aria-modal="true"` but has no Tab handling (gallery.js's `trapTab` at
235-257 shows the in-repo pattern) — focus walks into the obscured page.
Trigger images get only a `click` listener and no `tabindex`/keydown, so
keyboard users can't open it; `close()` focuses a non-focusable `<img>`,
which no-ops.
### 5.7 Frontend LOWs
- `static/js/gallery.js:122-125,270-275`: math/score overlay is
click-only (no role/tabindex/keydown); `closeOverlay()` focus-returns
to a non-focusable div — focus drops to `<body>`.
- `static/js/popups.js:478,515`: the Wikipedia provider's
`decodeURIComponent` runs synchronously before the `.catch` attaches —
a malformed percent sequence in a link path throws an uncaught
`URIError` per hover.
- `static/js/popups.js:359,390`: fetched monogram SVG injected via
`innerHTML` unescaped — the single unsanitized path in an otherwise
fully escaped pipeline. Build-authored content, so not exploitable
today; the comment acknowledges the trust assumption.
- `static/js/citations.js`: dead file — no template loads it; popups.js
supersedes it. If ever re-added it would double-bind and inject
bibliography innerHTML without popups.js's cloned-node hardening.
Delete.
- `static/js/nav.js:26,30-31`: raw `localStorage` unguarded; if storage
access throws, the throw lands before `toggle.addEventListener`,
leaving the Portals toggle completely dead (utils.js exists precisely
for this).
- `static/js/annotations.js:209-215`: marks are mouse-only; the tooltip's
Delete button is unreachable by keyboard (only recourse is the
all-or-nothing "Clear Annotations").
- `static/js/search.js:10`: unguarded `new PagefindUI(...)` — if the
pagefind bundle 404s, the ReferenceError aborts the whole handler
including the `?q=` pre-fill that the selection-popup "Here" flow
depends on.
- `static/js/semantic-search.js:55-56,96-107`: no
`vectors.length === meta.length * DIM` consistency check — a stale
CDN-cached mismatch yields NaN scores and silently garbage ranking.
(Current files verified consistent: 1,256,448 bytes = 818 × 384 × 4.)
- `static/js/transclude.js:149-151` + `collapse.js:111-114`: nested
transcludes render a bare placeholder (no rescan of injected content);
`reinitCollapse` is not idempotent (would stack toggle buttons if ever
called twice on the same container).
- `static/js/popups.js:985-988,1009-1014`: `daysBetween` uses `Math.abs`,
so future dates render "N days ago" (now.js:17 handles this correctly).
### 5.8 Frontend NITs
- `static/js/copy.js:20-22,39`: code-less `<pre>` fallback copies the
"copy" button label along with content.
- `static/js/score-reader.js:50`: URL rewritten to `?p=1` on every load
even without a `?p=` param.
- `static/js/search-filters.js:271`: `parseInt(v,10) || 0` turns junk
threshold input into an active ≥0 filter that matches everything.
- `static/js/selection-popup.js:90-95`: shift-keyup while typing capitals
in the annotation picker re-summons the selection toolbar over it.
Verified clean: the semantic-search ↔ embed.py contract post-model-split
(DIM 384, 818-entry meta, no prefix for MiniLM — the nomic
`search_document:` prefix is confined to the build-only page path); XSS
escaping across semantic-search, popups providers, map tooltips,
annotations (sole exception §5.7 monogram); theme.js ↔ settings.js
storage schema identical; all JS selector contracts against templates
(including the uncommitted head/nav edits); popups/sidenotes
double-init guards; settings.js and gallery.js focus traps.
---
## 6. Templates & content
### 6.1 Draft in undocumented location is never built — **MED**
`content/drafts/inclusionist-manifesto.md`. WRITING.md:34 says drafts go
under `content/drafts/essays/`; `draftEssayPattern`
(`build/Patterns.hs:46-49`) matches only that, so this file is invisible
even to `make watch`/`make dev` — silently orphaned.
### 6.2 SIMD/PQC essay `repository:` URL 404s — **MED**
`content/essays/where-does-simd-help-post-quantum-cryptography/index.md:24`.
`https://git.levineuwirth.org/where-simd-helps` is missing the owner
segment — verified HTTP 404, while the sibling essay's
`.../neuwirth/beyond_comorbidity_indices` returns 200.
### 6.3 Tracked drafts contradict the gitignore policy — **MED**
`.gitignore:88` ignores `content/drafts/` as local-only "working notes,"
but `git ls-files -i -c` shows four tracked drafts
(`digital_progeny.md`, `modern_idolatry.md`, `test-essay.md`,
`university_care.md`) — ignore rules don't untrack, so edits are
auto-staged by `make build` and pushed publicly by deploy. The over-broad
`**/.env.*` pattern also matches the tracked `.env.example`.
### 6.4 Template/content LOWs and NITs
- `content/colophon.md:5`: `modified:` is dead frontmatter — nothing
reads it; `$date-modified$` (page-footer.html:108) is Hakyll's
`dateField` over the `date` key.
- Seven files end frontmatter with a valueless `confidence-history:`
(YAML null; WRITING.md:97 documents a list of ints) — harmless, but
`content/essays/scaling_outage.md` also retains the full WRITING.md
scaffold comments in a published essay.
- `static/images/canto31.jpg`: still 4.0 MB (prior-audit §6.1 unfixed).
- `templates/blog-post.html:25,34`: `id="similar-links"` appears twice in
mutually exclusive `$if$` branches — safe, fragile under edit.
- `content/drafts/essays/digital_progeny.md`: title duplicates the
published "The Specification Dilemma" — stale draft.
- Frontmatter flags `home:`/`library:`/`links:`/`search:`/`portal:` are
consumed (head.html CSS gates, default.html:6 `data-portal`) but
undocumented in WRITING.md.
Verified clean: all `$partial(...)$` includes resolve; all ~140 distinct
template variables have context providers; no missing `alt` attributes,
tag-balance failures, or within-page duplicate IDs in composed pages; all
26 CSS files referenced by head.html exist; sampled enum values across
all sections are legal per WRITING.md and Contexts.hs validation lists.
---
## 7. Documentation / spec drift (WRITING.md, README.md)
### 7.1 `js:` page-script paths documented as content-relative; emitted root-relative — **MED**
`WRITING.md:773-775` vs `templates/default.html:37`
(`<script src="/$script-src$" defer>`). The doc claims a composition's
`js: scripts/widget.js` serves at `/music/symphony/scripts/widget.js`; the
template emits raw root-relative frontmatter. The only current user
(memento-mori) works by coincidence of its root-level route. A
composition following the doc would 404.
### 7.2 "Standalone page `content/my-page/index.md`" has no generic rule — **MED**
`WRITING.md:20` presents directory-form standalone pages as a general
capability; `build/Site.hs` hardcodes only `content/me/index.md` (293) and
`content/memento-mori/index.md` (307); the generic rule (351) matches flat
`content/*.md` only. A new `content/my-page/index.md` silently doesn't
build.
### 7.3 Portal table lists 8 portals; the build has 9 — **MED**
`WRITING.md:221-231` omits Photography, which is in `homePortals`
(`build/Site.hs:50-60`), the nav, and `content/tag-meta/photography.md`.
### 7.4 Three implemented frontmatter fields undocumented — **MED**
WRITING.md:3 claims to cover "all frontmatter fields"; zero hits for:
`summary:` (`build/Contexts.hs:415-427`, rendered by essay.html:16 and
reading.html:12, in live use), `revised:` (`build/Contexts.hs:815`
`getRevisions` — drives `$date-display$`/`$date-original$`/
`$revision-note$` and list sort order), `keywords:`
(`build/Contexts.hs:283` → `/bibliography/<kw>/` links).
### 7.5 Documentation LOWs
- `WRITING.md:268-269,82`: default citation style called "Chicago
Author-Date"; the injected CSL (`build/Citations.hs:114,167-168`) is
`data/chicago-notes.csl`, titled "Chicago Notes Bibliography".
- `README.md:12,19`: `make watch` described as "rebuilds on save without
a server"; it runs Hakyll's preview server (WRITING.md:1139 has it
right).
- `WRITING.md:105-109`: `history:` example ordering contradicts the code
(see §3.5).
---
## 8. nginx, Makefile & deployment
### 8.1 Multi-line CSP value embeds literal `\` + LF bytes — **MED**
`nginx/security-headers.conf:60-71`. The
`Content-Security-Policy-Report-Only` value is a single quoted string
spanning 12 lines with trailing `\` characters — nginx has no
line-continuation inside quoted strings, so the emitted header contains
raw backslash, LF, and leading-space bytes between directives. Raw LF in
a header value is illegal in HTTP/2 (vhost example enables `http2 on`);
strict clients reject the whole response. Sent on every response even as
Report-Only. Must be collapsed to one line.
### 8.2 CSP gaps that will fire under enforcement — **MED**
`nginx/security-headers.conf:66-67`. (a) `font-src 'self' data:` blocks
KaTeX webfonts: head.html:61 loads `katex.min.css` from cdn.jsdelivr.net,
whose relative font URLs resolve to the CDN. (b) `connect-src 'self'`
blocks the onnxruntime `.wasm` that transformers.js v2 (dynamically
imported in `static/js/semantic-search.js:25`) fetches from jsdelivr —
the config comment covers the same-origin model files but not the
runtime. Both latent while Report-Only.
### 8.3 Makefile auto-commit sweeps any pre-staged changes — **MED**
`Makefile:28-29`. `git add content/` followed by
`git diff --cached --quiet || git commit -m "auto: ..."` commits the
*entire index* — anything previously staged gets folded into an
`auto: <timestamp> [skip ci]` commit and pushed publicly on deploy. Use
`git commit -- content/` or verify no foreign paths are staged.
### 8.4 Makefile LOWs
- pdf-thumbs: the `find | while read` pipeline swallows `pdftoppm`
failures (loop exit status is the last iteration's) — a corrupt PDF
silently ships without a thumbnail.
- deploy: prerequisite order `clean build sign` is guaranteed only under
serial make; no `.NOTPARALLEL:` guard for `-j` invocations. (Confirmed:
deploy does run `clean` first; `.PHONY` is complete; `.env` export
allowlist is sound.)
- `tools/hooks/pre-commit-marks.sh` is documented (Makefile:175 comment)
but not installed — `.git/hooks/` has only samples and `core.hooksPath`
is unset.
Verified clean: all seven `data/` JSON/YAML files parse;
`data/embed-cache-pages.npz` is untracked, so the new gitignore entry is
fully effective; nginx archive.conf's add_header-inheritance re-include is
correct; no redirect loops; popup-proxy rate-limit/cache zones correctly
documented for http{} scope.
---
## 9. Working-tree diff review (branding refresh + embed split)
The model contract is **intact** — the diff splits one MiniLM pipeline
into two: pages now use nomic-embed-text-v1.5 (768d, build-only, for
similar-links.json); paragraphs stay on all-MiniLM-L6-v2@c9745ed (384d,
the browser contract). download-model.sh, model-checksums.sha256,
semantic-search.js (`DIM = 384`), and both WRITING.md lines (1108 nomic
for Related-pages, 1128 MiniLM for client search) are all consistent.
Icon declarations all match real files (verified with `file`: apple-touch
180×180, favicon-96 96×96, manifest PNGs 192/512, og-image 1200×630
matching declared og:image dimensions; the webp sidecar was regenerated).
Open items beyond §1.3/§1.4/§4.1:
### 9.1 32.8 KB traced SVG inlined into every page — **MED**
`templates/partials/logo-mark.svg` (32,818 bytes, potrace-style single
giant `<path>`) is inlined via the nav partial into every HTML page —
a ~33 KB per-page weight regression (pre-compression). The two-tone
`--logo-ink`/`--logo-bg` cutout (components.css:72-98) genuinely needs
inline SVG or `<use>`; an external sprite + `<use href>` restores
cacheability. Better still: a hand-drawn or simplified path — a traced
bitmap at nav size carries detail that can never resolve.
### 9.2 Icon asset bloat — **LOW**
`static/favicon.ico` is now 71,766 bytes; parsed directory shows
16/32/48/64/128/256 px entries, the 128+256 pair alone 55.8 KB. The .ico
is only the legacy fallback (modern browsers take the SVG); 16+32+48
(~8 KB) is conventional. `static/favicon.svg` is a 32,844-byte traced
path. `static/images/link-icons/internal.svg` went ~2 KB → 32,818 bytes
yet renders at 0.71.6 rem via CSS mask in three stylesheets
(components.css:853, typography.css:833, popups.css:161).
### 9.3 Webmanifest regressions — **NIT**
`static/site.webmanifest`: `purpose` changed maskable→`any` for both
icons (Android adaptive launchers will letterbox; convention is separate
`any` + `maskable` entries); still no `start_url`/`scope`/`description`
(Lighthouse installability warnings). JSON valid; icons verified.
---
## 10. Prior audit (AUDIT.md 2026-05-07) follow-up
| Finding | Status |
|---|---|
| §1.1 freeze unsolvable | **Effectively still open** — aeson pin fixed, but the freeze broke again via `distributive` after a system update (§1.1 above); the underlying freeze-vs-system-db fragility is unaddressed |
| §1.3 Python version mismatch | Fixed (`requires-python = ">=3.14"` matches `.python-version`) |
| §1.4 model checksums | Fixed (`tools/model-checksums.sha256`, 5 entries) |
| §9.1 nginx headers | Fixed (`nginx/security-headers.conf` + vhost example, README'd) — but see §8.1/§8.2 for new issues in that file |
| §6.1 `canto31.jpg` 4 MB | **Unfixed** |
| robots.txt / sitemap | Fixed (Site.hs:941/963, present in `_site/`) |
| README `paper/`/`spec.md` ghosts | Fixed |
| rsync target quoting | Fixed |
| date-quoting doc | Fixed (WRITING.md:106) |
| tag-meta no-title exception | Fixed (WRITING.md:238-251) |
---
## Suggested triage order
1. ~~`tools/refreeze.sh`~~ (§1.1 — in progress)
2. Delete `data/embed-cache-pages.npz.tmp.npz`; widen the gitignore
pattern; `git add` `logo-mark.svg` + `og-image.png` before committing
the branding diff (§1.4, §4.1)
3. Guard `ArchiveIndex.hs` file reads with `doesFileExist` (§1.2)
4. Pin or sandbox the nomic remote code (§1.3)
5. Fix the `/fiction/``/poetry/` 404s (§2.1) and the production-visible
frontend MEDs (§5.1, §5.2)
6. Collapse the nginx CSP to one line before ever flipping it to
enforcing (§8.1, §8.2)
7. The rest by severity as time allows