46 KiB
| title | date |
|---|---|
| Repository audit | 2026-06-09 |
Repository audit — levineuwirth.org (2026-06-09)
Comprehensive audit of the repo on main at commit 620b974 (working tree
modified: branding refresh across static/ + templates/partials/, plus
tools/embed.py rework; untracked static/og-image.png,
templates/partials/logo-mark.svg, data/embed-cache-pages.npz.tmp.npz).
Severity legend: HIGH (likely to break a build, cause data loss, or expose a security weakness) — MED (latent bug, brittleness, or documentation drift) — LOW (minor robustness gap or fragile assumption) — NIT (style, polish, or paranoia).
Numbers are file:line against the working tree at audit time. Findings
marked "verified" were reproduced empirically (solver runs, built _site/
output inspection, live HTTP checks, binary parsing); the rest were
confirmed by reading the code.
Prior audit: AUDIT.md (2026-05-07). Follow-up status in §10.
1. Build & dependency chain
1.1 cabal.project.freeze is unsolvable again — next clean build fails — HIGH
cabal build --dry-run fails today (verified): the freeze pins
distributive ==0.6.2.1, but the system (pacman) GHC package db has
comonad-5.0.10 built against distributive-0.6.3:
rejecting: distributive-0.6.3/installed... (constraint from
cabal.project.freeze requires ==0.6.2.1)
After searching the rest of the dependency tree exhaustively...
The conflict set also names aeson, warp, hakyll, http2, semigroupoids. This
is the same failure mode as prior-audit §1.1 — that audit's specific aeson
pin was fixed (now 2.2.2.0/hashable 1.4.7.0), but a different package broke
the same way after a system update. Recent builds succeed only off the
cached dist-newstyle/cache/plan.json; the freeze file has since changed,
so the next cabal invocation re-solves and fails. Because make deploy
starts with make clean, the next deploy hits this. levineuwirth.cabal's
own bounds are compatible with the freeze — the conflict is
freeze-vs-installed-db, not freeze-vs-cabal-file.
Fix: tools/refreeze.sh (written for exactly this post-pacman -Syu
situation). The underlying fragility — freezing against a mutable system
package db — remains; consider documenting the refreeze step as part of any
system-upgrade ritual. (In progress at time of writing.)
1.2 Missing data/archive-index.json / archive-state.json crashes the build — HIGH
build/ArchiveIndex.hs:134-146. The module doc (lines 18-22) promises "An
absent or malformed file degrades safely: an empty index makes the link
consumers no-op; an absent state file makes every entry @Live@." But
rawIndex = unsafePerformIO $ do decoded <- A.eitherDecodeFileStrict' indexPath
(and identically rawState) never checks doesFileExist, and aeson's
eitherDecodeFileStrict' throws an uncaught IOException on a missing
file (verified: withBinaryFile: does not exist). Both files are
gitignored (.gitignore:84-85), so a fresh clone or a no-.venv build —
the exact path build/Archive.hs:20-24 promises to support — throws when
the CAF is first forced. Contrast readUrlSet (line 109) in the same file,
which guards correctly. Currently latent on this machine only because both
generated files happen to exist.
1.3 embed.py trust_remote_code=True executes unpinned third-party code — HIGH
tools/embed.py:329 (line ~341 in the uncommitted version). The new
page-model load is
SentenceTransformer(PAGE_MODEL_NAME, revision=PAGE_MODEL_REVISION, trust_remote_code=True).
The revision arg pins only the nomic-ai/nomic-embed-text-v1.5 repo; the
actual modeling code is pulled via auto_map from a different repo —
verified in the local HF cache: the executed code lives under
transformers_modules/nomic_hyphen_ai/nomic_hyphen_bert_hyphen_2048/...,
i.e. nomic-ai/nomic-bert-2048 at its current head, which nothing pins. A
compromise of that second repo runs arbitrary Python at build time, in a
repo whose every other download path (download-model.sh, pdfjs, leaflet) is
sha256-pinned. The comment "Both pins are deliberate" is therefore
misleading. Fix: pin via code_revision, or run with HF_HUB_OFFLINE=1
after first fetch, or document the accepted risk.
1.4 Working-tree commit hazard: tracked templates reference untracked files — HIGH (process)
templates/partials/nav.html:5 (tracked, modified) adds
$partial("templates/partials/logo-mark.svg")$ and
templates/partials/head.html references /og-image.png — both target
files are untracked (no git history). Committing the template diff
without git add-ing both breaks every page's Hakyll build on a fresh
clone ($partial$ aborts compilation) and 404s the og:image. They must
land in the same commit. Conversely, data/embed-cache-pages.npz.tmp.npz
must not be committed (see §4.1). The partial itself is safe as a
Hakyll template (verified: zero $ characters; match "templates/**"
compiles it).
1.5 einops dependency: undocumented, unbounded, imported nowhere — LOW
pyproject.toml:27 adds einops>=0.8.2. No import anywhere in
tools//build//static/js/; its only consumer is nomic's
trust_remote_code module (§1.3). Every sibling dependency has an
explanatory comment and an upper bound per the file's own stated policy
("Upper bounds are intentionally generous (next major) but always
present"); einops has neither. uv lock --check passes (0.8.2 pinned).
2. Haskell build code — core
2.1 Nav, home grid, and library link /fiction/ and /poetry/ — confirmed 404s — MED
build/Site.hs:50-60 (homePortals contains ("Fiction","fiction"),
("Poetry","poetry")), templates/partials/nav.html:56,61,
templates/library.html:44,58. No rule generates either index: fiction and
poetry are not in tagIndexable (build/Patterns.hs:148-151 = essays +
blog + photos) and Site.hs has no landing rule. Verified: _site/fiction
does not exist; _site/poetry/ has no index.html. nginx has no
redirects. Both links 404 in production today.
2.2 Tag/route collisions guarded for photography only — MED
build/Tags.hs:98-99. tagIdentifier maps tag t → t ++ "/index.html";
sectionOwnedTopLevelTags = ["photography"] is the only guard. A
tagIndexable item tagged music (or music/x, which expands to music)
emits music/index.html, already owned by the music index route
(build/Site.hs:486-487); similarly essays, blog, cv, archive,
authors, bibliography. Hakyll does not error on duplicate routes — one
silently overwrites the other.
2.3 Sidenotes filter destroys the documented no-JS fallback — MED
build/Filters/Sidenotes.hs:30-36 vs static/css/sidenotes.css:125-135.
The module doc claims the Pandoc <section class="footnotes"> "serves as
fallback," but apply replaces every Note, so the writer never emits the
section. CSS depends on it below 1500px. Verified in output:
_site/essays/scaling_outage.html has 3 class="sidenote" and zero
footnotes occurrences. With JS disabled, footnote content is invisible on
narrow viewports. The comment, the CSS, and ozymandias.md's own prose all
contradict actual behavior.
2.4 Sidenote bodies rendered without the KaTeX writer — MED
build/Filters/Sidenotes.hs:103-115. inlinesToHtml/blocksToHtml use
writeHtml5String (def :: WriterOptions) (PlainMath), while the main
pipeline uses KaTeX "" (build/Compilers.hs:47). Math inside a footnote
never gets <span class="math inline">\(...\)</span>, so KaTeX never
renders it — degrades to plain italics, silently inconsistent with body
math.
2.5 SourceRefs whitelist vs /source/ serving whitelist have drifted — MED
build/Filters/SourceRefs.hs:114-141 vs build/Site.hs:217-240. Site.hs:209
says "must stay aligned with 'isSourcePath'". Mismatches: SourceRefs wraps
content/ and yaml-source/ (no Site counterpart); static/ + any known
ext vs Site's static/js/**/static/css/** only; tools/ + any ext vs
Site's tools/**.sh/tools/**.py; data/ at any depth vs Site's
top-level data/*.{json,yaml,md,bib}. Each mismatch yields a wrapped
source-ref whose popup fetch 404s (Forgejo href fallback still works).
Inverse: Site serves data/*.bib but .bib is missing from
hasKnownExt — dead whitelist entry.
2.6 epistemicEntry ignores confidence: proved — MED
build/Site.hs:1014-1024. Comment: "Compute overall-score the same way
Contexts.overallScoreField does," but it uses
readMaybe =<< lookupString "confidence" meta, which is Nothing for
"proved"/"proven", whereas Contexts.overallScoreField
(build/Contexts.hs:574-576) substitutes 100 via isProvedConfidence.
Proved pages get no score in data/epistemic-meta.json and export the
raw string under confidence, so client-side filtering silently misses
them.
2.7 Empty affiliation <div> ships on every essay without affiliation: — MED
build/Contexts.hs:84-89 + templates/partials/metadata-tail.html:12.
affiliationField returns an empty list instead of noResult; Hakyll's
$if$ is truthy for empty list fields (the codebase knows this —
tagLinksFieldExcludingScope uses noResult for exactly this reason).
Verified in output: _site/essays/asymmetric-forgetting.html contains
<div class="meta-row meta-affiliation"> with whitespace-only content.
2.8 Library page hard-depends on content/library.md — LOW
build/Site.hs:675. _ <- loadSnapshot libraryIntroId "body" is a
top-level compiler statement (not inside a field), so it's a hard
failure. The block is documented as "optional prose block"; deleting
content/library.md breaks the whole library.html compile. Contrast the
existence-guarded sidecars at build/Tags.hs:277-283 and
build/Site.hs:843-850.
2.9 Library primaryPortalOf reads only list-form tags: — LOW
build/Site.hs:632-638. lookupStringList "tags" returns Nothing for
scalar comma form (tags: research, ai), which Hakyll's getTags
accepts. Such an item appears on tag pages but is silently dropped from
the library. All current content uses list form — latent.
2.10 allContent omits me/, memento-mori/, photography from the link graph — LOW
build/Patterns.hs:124-133, used by build/Backlinks.hs:334,345. Despite
"Every content file the backlinks pass should index," content/me/index.md
and content/memento-mori/index.md (full essays, rendered with
backlinksField) never have their outgoing links extracted; photography
likewise. Either deliberate-but-undocumented or the exact silent omission
the module header says it exists to prevent.
2.11 Paginated tag pages: split by creation date, sorted by display date — LOW
build/Tags.hs:371-377. buildPaginateWith (sortAndGroupAt tagPageSize)
partitions via sortRecentFirst (creation date), then each page re-sorts
with recentFirstByDisplay (revision-aware). A recently revised old item
stays on a late page but jumps to its top — cross-page ordering is not
monotone. Only fires above the 150-item threshold.
2.12 fill:#000 replacement corrupts longer hex colors — LOW
build/Filters/Score.hs:118-133 (and Filters/Viz.hs processColors).
The 6-digit pass protects only #000000; for fill:#000080 the 3-digit
pass produces fill:currentColor80 — invalid CSS, silently mangled SVG.
Quoted attribute forms are safe; only unquoted style-property forms are
exposed.
2.13 Source-level preprocessors rewrite inside fenced code blocks — LOW
build/Filters/Wikilinks.hs:24-31, Filters/Transclusion.hs:18-20,
Filters/EmbedPdf.hs. All run on the raw source before Pandoc parses
fences: [[anything]] in a code block becomes a link; a code-block line
that is exactly {{slug}} or {{pdf:...}} becomes raw HTML.
Transclusion's comment ("prevents accidental substitution inside prose or
code") is false for full-line directives in code blocks. A live foot-gun
for a site that documents its own syntax (ozymandias.md does exactly
this).
2.14 domainIcon matches substrings of the whole URL, not the host — LOW
build/Filters/Links.hs:120-153. "x.com" T.isInfixOf url etc. —
https://example.org/why-x.com-failed gets the Twitter icon. Contradicts
the strict-hostname discipline isExternal documents at lines 95-101 of
the same file. Cosmetic (icon only).
2.15 gsubRoute "content/" strips every occurrence, not just the prefix — LOW
build/Site.hs:171,357,417 etc. Hakyll's gsubRoute is replace-all; a
co-located directory literally named content would be silently mangled
(content/essays/slug/content/data.csv → essays/slug/data.csv). Same
for gsubRoute "static/". Improbable but silent.
2.16 existsCached memoizes non-existence for the process lifetime — LOW
build/Filters/SourceRefs.hs:160-166. Under make watch, a source file
created after first reference stays cached as absent until restart.
2.17 Core NITs
build/Site.hs:42-44: comment says "eight portals"; the list has nine. Echoed at Site.hs:606 ("the eight") vs line 657's "nine times".build/Site.hs:866-877: random-pages.json comment says "essays + blog posts only" but the rule loads fiction and flat poetry too; uses flat-onlycontent/poetry/*.mdwhile the epistemic rule usesallPoetry— collection poems are epistemic-indexed but never randomizable.build/Utils.hs:64-73:authorSlugifycomment claims runs of spaces collapse; code maps each space ("A B"→"a--b"). Consistent everywhere, so links work; comment wrong.build/Utils.hs:31-32:readingTimetruncates (div 200) — 399 words reports "1 min"; comment implies ceiling semantics.build/Pagination.hs:42+build/Site.hs:77-82: hardcoded pattern literals duplicatePatterns.hs, defeating that module's stated purpose (Patterns.hs:6-10).build/Contexts.hs:174-180: plaintagLinksFieldreturns an empty list rather thannoResult—$if(item-tags)$is true and templates emit empty tag wrappers (author-index.html, item-card.html).build/Tags.hs:296-304:tagItemCtxcomposesdefaultContext, notsiteCtx, so$if(has-monogram)$never fires on tag pages — monograms render on new.html/library but silently never on tag indexes.build/Contexts.hs:485-492:dotsFieldcomment says "1–5" but accepts 0 (max 0 (min 5 n)) —importance: 0renders five empty circles.build/Contexts.hs:375-381:descriptionFielddoc saysnoResult; code usesfail— behaviorally fine under Hakyll 4.16$if$(verified against Hakyll 4.16.7.1 source) but logs[ERROR]debug noise per abstract-less page. Same inabstractField,summaryField,bibliographyField.build/Filters/Images.hs:233-234:webpSrcinterpolated intosrcsetunescaped while siblingsrcgoes throughesc.build/Filters/Links.hs:37-46,63-69: internal PDF links double-classified (pdf-link+link-internalchrome) despite the "no overlap" comment.build/Filters/Smallcaps.hs:31-34+Filters/Archive.hs:42-44: "headers are skipped" only at top level; a Header nested in a Div/BlockQuote is processed, contradicting the comments.
Verified clean: no unguarded head/fromJust/read/!! hazards in the
core modules; filter composition order matches its documenting comments;
Hakyll 4.16.7.1 $if$ treats both fail and noResult as false.
3. Haskell build code — feature modules
3.1 Stats heatmap day-of-week off-by-one: Sunday clipped out of the SVG — MED
build/Stats.hs:185,300,317. dowOf d = fromEnum (dayOfWeek d) -- Mon=0..Sun=6
— but time-1.12.2 is ISO-numbered (verified:
map fromEnum [Monday..Sunday] == [1..7]). So Sunday lands at y=106 while
svgH = 104 — every Sunday cell is clipped out of the viewBox and grid
row 0 is permanently blank. Relatedly, weekStart returns the previous
Sunday (and for a Sunday, 7 days back), not the "first Monday on or
before" its comment claims; builds run on a Sunday also clip the newest
column horizontally.
3.2 Commonplace.hs uses Char8.pack — non-ASCII YAML corruption — MED
build/Commonplace.hs:143. Y.decodeEither' (BS.pack raw) with
Data.ByteString.Char8 truncates each Char to 8 bits — the exact hazard
build/Now.hs:249-253 documents and fixes with TE.encodeUtf8.
data/commonplace.yaml is currently pure ASCII, so latent — but a
commonplace book of quotations is the likeliest file to acquire an em-dash
or curly quote, which will then either fail the YAML parse or publish
mojibake.
3.3 Backlinks: links inside tight lists are invisible — MED
build/Backlinks.hs:220-226. extractLinksWithContext's go handles
Para, BlockQuote, Div, BulletList, OrderedList, then go _ = [].
Tight list items (the default - item form) are Plain blocks, not
Para, so recursion into list children yields nothing. Every internal
link written in a tight list never produces a backlink. Header, Table,
and DefinitionList blocks are likewise skipped. The doc comment implies
coverage it doesn't deliver.
3.4 Stability "age" is the first→last commit span, not time since first commit — MED
build/Stability.hs:89-93,99-112. Docs say "age in days since first
commit," but classify (length dates) (daySpan (last dates) newest)
computes the span between first and most recent commit, with no
reference to today. A piece written in a one-week burst years ago reports
"volatile" forever; time passing without commits can never increase
stability. Either the comment or the metric is wrong.
3.5 Frontmatter history: assumed newest-first; WRITING.md documents oldest-first — MED
build/Stability.hs:204-217,299-336 vs WRITING.md:105-109.
loadVersionHistory keeps authored order and all range fields treat the
head as newest (es@(newest:_) -> let oldest = last es). Git history is
newest-first, but WRITING.md's history: example is oldest-first. With
the documented ordering, version-history-range renders reversed
("14 March 2026 – 1 March 2026"), range-start returns the newest date,
and version-history-primary shows the three oldest entries.
3.6 Archive manifest→provenance join is exact-string, rest of system is normalized — MED
build/Archive.hs:269. Map.lookup (meUrl me) provByUrl joins on the raw
URL; everywhere else equivalence is normalizeUrl (ArchiveIndex
filtering, dup detection, ARCHIVE.md:189-192). Editing a manifest URL to a
normalization-equivalent form (http→https, trailing slash, tracking
param) silently unpublishes /archive/<slug>/ while ArchiveIndex's
normalized filter keeps the slug active — links keep pointing at a 404.
3.7 Photography buildPin computes wrong slug/thumb/title for flat entries — MED
build/Photography.hs:354,362. slug = takeFileName (takeDirectory fp) —
for a flat content/photography/foo.md this yields "photography", so
map.json gets "slug": "photography", the title fallback is wrong, and
thumb = "/photography/photography/<p>" 404s (flat-single assets route to
/photography/<asset>). PHOTOGRAPHY.md:214 explicitly supports flat
singles. Latent — content/photography/ currently has only index.md —
but breaks the first geo-tagged flat single.
3.8 geo-precision fails open: a typo'd "hidden" publishes coordinates — MED
build/Photography.hs:347-349,312-320. Only the exact string matches
((_, Just "hidden", _) -> return Nothing); any other value (e.g.
Hidden, hiddn) falls into roundCoord, whose catch-all treats unknown
values as city (~10 km rounding) — publishing coordinates the author
meant to suppress. Contradicts the file's own privacy comment (lines
287-289) and the fail-closed precedent for visibility: in
build/Archive.hs:77-83.
3.9 Archive state is process-lifetime cached — watch goes stale — LOW
build/ArchiveIndex.hs:123-146 + build/Archive.hs:304.
activeUrls/rawIndex/rawState are NOINLINE unsafePerformIO CAFs read
once per process, and archiveRules reads the manifest in preprocess.
Under site watch, edits to manifest.yaml, removed.yaml, or the
regenerated state JSONs are never re-read until restart. One-shot builds
unaffected.
3.10 Pinned pages render raw ISO in $last-reviewed$ — LOW
build/Stability.hs:166-170. The git branch formats via fmtIso
("1 May 2026"); the IGNORE.txt-pinned branch returns the frontmatter value
verbatim ("2026-05-01") — inconsistent display formatting.
3.11 Empty/all-comments manifest.yaml halts the build — LOW
build/Archive.hs:158-170. An empty YAML stream decodes as Null, which
fails to parse as [ManifestEntry] and takes the exitFailure branch —
draining the manifest to zero entries is fatal rather than the empty
archive the absent-file branch supports.
3.12 Backlinks normaliseUrl misses directory-form canonical URLs — LOW
build/Backlinks.hs:275-281. Strips .html but not
index.html/trailing slash: a page routed essays/foo/index.html keys as
/essays/foo/index, but a body link authored /essays/foo/ doesn't
match — backlink silently dropped. build/SimilarLinks.hs:97-99 handles
exactly this case and its comment flags the divergence.
3.13 SimilarLinks PDF viewer URL not percent-encoded — LOW
build/SimilarLinks.hs:155-164.
viewerUrl = "/pdfjs/web/viewer.html?file=" ++ escapeHtml raw —
escapeHtml handles HTML metachars only; a path containing &, ?, #,
or spaces breaks the file= query value.
3.14 Photography feed thumbnails only for directory-form entries — LOW
build/Photography.hs:449-453. imgTag requires isDir; flat singles
and series children (<series>/<photo>.md) get text-only feed entries,
against PHOTOGRAPHY.md's "thumbnails embedded inline" (lines 36, 445) and
the feed's deliberate inclusion of series children.
3.15 Marks: missing confidence/evidence renders a literal "0 TRUST" — LOW
build/Marks.hs:272-278,565. computeTrust _ _ = 0 with a comment
claiming the figure "collapses to the bare frame," but
renderEpistemicFigure unconditionally calls renderTrustLabel, so a
piece with status: but no confidence/evidence (a case MARKS.md:696
says should render) displays a prominent center "0" — indistinguishable
from an authored zero-trust score.
3.16 Feature-module NITs
build/Catalog.hs:228-235: two distinct unknown categories render as adjacent duplicate "Other" sections (equal rank,groupByon raw string).build/Stats.hs:754-777:pageTOCcomment says "nine h2 sections"; lists eleven (matching the eleven rendered).build/SimilarLinks.hs:51-54: comment says "the template caps the display"; the code caps it (take maxSimilarat line 80).build/Stats.hs:169-171,build/Archive.hs:564-569: "median" is the upper-median for even-length lists.build/Backlinks.hs:133-153: protocol-relative//host/pathURLs passisPageLinkand pollute backlinks.json.build/BibExtras.hs:75-98:@string/@comment/@preambleblocks parsed as citekey entries — only consequential on a citekey/macro-name collision.
Verified clean: Marks tick positions/axis order/radii match MARKS.md §3;
proved-confidence trust substitution matches §4.3; Archive's fail-closed
visibility validation, removed.yaml conflict rejection, and double-sided
SHA-256 verification all match ARCHIVE.md.
4. Python & shell tooling
4.1 data/embed-cache-pages.npz.tmp.npz orphan: explained; cleanup + ignore gaps — MED
The orphan (mtime May 26) is the fossil of a fixed bug: an earlier
embed.py passed a bare path to np.savez_compressed, numpy appended
.npz (verified in numpy's _savez source), and the subsequent
os.replace raised FileNotFoundError, stranding the file. The current
file-handle code (tools/embed.py:173-183) is correct, but: (a) nothing
deletes the stale orphan — delete it, don't commit it; (b) the tmp
write has no try/finally, so any mid-write exception strands
embed-cache-pages.npz.tmp; (c) the new .gitignore entry is exact-path
(data/embed-cache-pages.npz) and covers neither .tmp nor .tmp.npz
variants — widen to data/embed-cache-pages.npz*; (d) the fixed tmp name
means two concurrent runs interleave writes.
4.2 Corrupt embed cache crashes instead of being discarded — MED
tools/embed.py:154. The discard path catches
(OSError, KeyError, ValueError), but np.load on a truncated .npz
raises zipfile.BadZipFile (verified MRO: BadZipFile → Exception), and
EOFError is also uncaught. A half-written cache (exactly what §4.1(b)
can produce) makes every subsequent build print "Warning: embedding
failed" and leaves similar-links/semantic index stale until the file is
manually deleted — the opposite of the docstring's "unreadable →
discarding" contract.
4.3 embed.py staleness check structurally defeated by stamp-build-time — MED
tools/embed.py:195-200 + Makefile:68. needs_update() compares
_site/**/*.html mtimes against embed's outputs — but the build order is
embed.py → stamp-build-time.py _site, and the stamper rewrites the
footer timestamp in essentially every HTML file each build. So every page
is always newer than embed's outputs and the "skip if fresh" fast path
never fires: the full paragraph-embedding pass (and model load) runs on
every build. The new page cache papers over half the cost; the paragraph
pass pays full price every time. Related (tools/embed.py:297-299):
model/config changes never invalidate outputs — currently masked by this
bug; fixing one exposes the other.
4.4 archive.py writes provenance/index/state non-atomically — MED
tools/archive.py:718-721,734-737,953-957,1077-1080. All plain
write_text(). An interrupt mid-write truncates PROVENANCE.json; the
next build's json.loads (line 642) raises an unhandled
JSONDecodeError — and a truncated provenance is indistinguishable from
corruption in a tool whose whole contract is integrity checking. embed.py
got atomic-write helpers; archive.py did not.
4.5 download-leaflet.sh: checksum verification bypassable — MED
tools/download-leaflet.sh:43-47,90. The early-exit skip checks file
existence only (download-model.sh re-verifies on its skip path), and
curl -o "$target" writes directly to the final path: a download that
fails verify_or_warn aborts via set -e after the bad file is in
place, and the next run's existence check accepts it permanently. A
MITM'd unpkg.com download survives one failed run and is silently
vendored on the next.
4.6 Other download/convert scripts leave partial files in final paths — LOW
tools/download-model.sh:84: interrupted curl leaves a partial
model_quantized.onnx; caught today only because model-checksums.sha256
pins all five files — any unpinned file would persist forever. Use
-o "$dst.part" && mv. tools/convert-images.sh:33: interrupted cwebp
leaves a partial .webp that the -nt staleness gate then skips forever
— a truncated WebP ships until manually deleted.
4.7 archive.py robustness gaps — LOW
tools/archive.py:788,795-799: provenance missing theartifactkey makesprev_artifact == slug_dir, thensha256_ofraises an uncaughtIsADirectoryErrorinstead of the structured "prior snapshot incomplete" error.tools/archive.py:614-617,938-940,1066-1068: non-dict manifest entries (- https://example.cominstead of- url: ...) crash withAttributeError: 'str' object has no attribute 'get'.tools/archive.py:896:wayback_saveconcatenates the raw URL (contrastwayback_lookupat 909, which usesquote(url, safe="")).
4.8 add-popup-source.sh: dead CSP reminder + unvalidated nginx interpolation — LOW
tools/add-popup-source.sh:214: the connect-src reminder gates on
[[ "$NEEDS_PROXY" -eq 0 && -n "$UPSTREAM_HOST" ]], but UPSTREAM_HOST
is only set in the NEEDS_PROXY -eq 1 branch (lines 124-131) — the
reminder can never print, and the no-proxy case is exactly when it's
needed (the provider will be CSP-blocked with no hint). Line 71: NAME
from a free-text prompt is interpolated into
location /proxy/$NAME//set $upstream_$NAME with no
^[a-z0-9-]+$ validation (import-photo.sh validates; this doesn't).
4.9 refreeze.sh deletes the freeze before the replacement succeeds — LOW
tools/refreeze.sh:13-16. rm -f "$FREEZE" then cabal freeze; a failed
resolve leaves no freeze file (recoverable via git, but write-temp-then-move
is safer).
4.10 embed.py / atomic-write NITs — LOW/NIT
tools/embed.py:109-115: atomic_write_bytes uses a fixed .tmp name
(concurrent-run collision) and no fsync before os.replace (power loss
can leave an empty target). Same pattern in _atomic_write_yaml of
extract-exif.py:377, extract-palette.py:65, extract-dimensions.py:65.
tools/embed.py:144: NpzFile never closed — use
with np.load(...) as npz:.
4.11 Tooling NITs
tools/import-photo.sh:147-155: onmogrify -stripfailure the EXIF-laden JPEG (GPS, serials) remains undercontent/, wheremake build'sgit add content/could auto-commit it. Delete$TARGETon that failure path.tools/hooks/pre-commit-marks.sh:28-31:awk '{ print $2 }'truncates paths with spaces; thestatus:probe reads the working tree, not the staged blob. Advisory-only hook.tools/preset-signing-passphrase.sh:30:echo -n "$PASSPHRASE"eats a passphrase starting with-e/-n/-E; useprintf '%s'.tools/stamp-build-time.py:52-54: in-place non-atomic rewrite of_site/HTML.tools/archive.py:244:pdftotextwithout--; a slug starting with-parses as an option. Same in extract-exif.py:159.tools/monolith-version.txtrecords a sha256 (matches the binary today, verified) butfind_monolith()never checks it.
Verified clean: sign-site.sh (atomic sig writes, post-pass manifest
verification); compress-assets.sh and download-pdfjs.sh (mktemp + EXIT
trap, hash verified before extraction); audit-marks.py, viz_theme.py,
extract-dimensions.py, extract-palette.py; embed.py's faiss -1 padding
is safely filtered; uv lock --check passes; model-checksums.sha256 pins
all five model files.
5. Frontend JavaScript
5.1 Score-reader pages never restore theme/settings — MED
templates/score-reader-default.html:10 + static/js/theme.js:12-13. The
template loads theme.js without utils.js (unlike head.html:66-67), so
window.lnUtils.safeStorage is undefined and theme/text-size/focus-mode/
reduce-motion all silently fail to restore — a dark-theme user gets a
light flash-and-stay on every score page. Compounding: settings.js (line
15; the template does render the settings toggle) falls back to its no-op
store, so theme picks made on score pages never persist either.
5.2 search-filters.js: epistemic filters silently bypass clean-URL pages — MED
static/js/search-filters.js:117-125. normUrl() returns u.pathname
verbatim and looks it up in epistemicMeta[url]. Verified:
_site/data/epistemic-meta.json keys include
/essays/beyond-comorbidity-indices/index.html while rendered result
links use /essays/beyond-comorbidity-indices/. The lookup misses,
passes(null) returns true ("no metadata = don't filter"), so every
directory-style page bypasses all active epistemic filters. Flat .html
pages match fine, which hides the bug.
5.3 viz.js ignores the cappuccino theme — MED
static/js/viz.js:94-99. isDark() knows only
'dark'/'light'/OS-preference, but theme.js/settings.js support
'cappuccino' — a dark-brown theme (--bg: #553a28, base.css:203). With
OS-light + cappuccino, charts render the LIGHT config (near-black marks
and axis labels) on a dark background.
5.4 collapse.js localStorage keys collide across pages — MED
static/js/collapse.js:44,83. Key is
'section-collapsed:' + heading.id with no pathname namespace (contrast
annotations.js). Pandoc auto-slugs (#introduction, #background) recur
across essays, so collapsing "Introduction" on one essay collapses it
everywhere. Also uses raw localStorage rather than
lnUtils.safeStorage.
5.5 semantic-search.js: stale-response race + duplicate index fetch — MED
static/js/semantic-search.js:117-144. runSearch has no generation
token; overlapping queries render in promise-resolution order, so an
older query's hits can replace a newer one's (with setStatus('')
masking it). loadIndex() (42-59) has no in-flight-promise dedup (unlike
loadModel's loadModelPromise), so concurrent first searches fetch
semantic-index.bin + semantic-meta.json twice.
5.6 lightbox.js: aria-modal with no focus trap, no keyboard activation — MED
static/js/lightbox.js. Overlay sets role="dialog" +
aria-modal="true" but has no Tab handling (gallery.js's trapTab at
235-257 shows the in-repo pattern) — focus walks into the obscured page.
Trigger images get only a click listener and no tabindex/keydown, so
keyboard users can't open it; close() focuses a non-focusable <img>,
which no-ops.
5.7 Frontend LOWs
static/js/gallery.js:122-125,270-275: math/score overlay is click-only (no role/tabindex/keydown);closeOverlay()focus-returns to a non-focusable div — focus drops to<body>.static/js/popups.js:478,515: the Wikipedia provider'sdecodeURIComponentruns synchronously before the.catchattaches — a malformed percent sequence in a link path throws an uncaughtURIErrorper hover.static/js/popups.js:359,390: fetched monogram SVG injected viainnerHTMLunescaped — the single unsanitized path in an otherwise fully escaped pipeline. Build-authored content, so not exploitable today; the comment acknowledges the trust assumption.static/js/citations.js: dead file — no template loads it; popups.js supersedes it. If ever re-added it would double-bind and inject bibliography innerHTML without popups.js's cloned-node hardening. Delete.static/js/nav.js:26,30-31: rawlocalStorageunguarded; if storage access throws, the throw lands beforetoggle.addEventListener, leaving the Portals toggle completely dead (utils.js exists precisely for this).static/js/annotations.js:209-215: marks are mouse-only; the tooltip's Delete button is unreachable by keyboard (only recourse is the all-or-nothing "Clear Annotations").static/js/search.js:10: unguardednew PagefindUI(...)— if the pagefind bundle 404s, the ReferenceError aborts the whole handler including the?q=pre-fill that the selection-popup "Here" flow depends on.static/js/semantic-search.js:55-56,96-107: novectors.length === meta.length * DIMconsistency check — a stale CDN-cached mismatch yields NaN scores and silently garbage ranking. (Current files verified consistent: 1,256,448 bytes = 818 × 384 × 4.)static/js/transclude.js:149-151+collapse.js:111-114: nested transcludes render a bare placeholder (no rescan of injected content);reinitCollapseis not idempotent (would stack toggle buttons if ever called twice on the same container).static/js/popups.js:985-988,1009-1014:daysBetweenusesMath.abs, so future dates render "N days ago" (now.js:17 handles this correctly).
5.8 Frontend NITs
static/js/copy.js:20-22,39: code-less<pre>fallback copies the "copy" button label along with content.static/js/score-reader.js:50: URL rewritten to?p=1on every load even without a?p=param.static/js/search-filters.js:271:parseInt(v,10) || 0turns junk threshold input into an active ≥0 filter that matches everything.static/js/selection-popup.js:90-95: shift-keyup while typing capitals in the annotation picker re-summons the selection toolbar over it.
Verified clean: the semantic-search ↔ embed.py contract post-model-split
(DIM 384, 818-entry meta, no prefix for MiniLM — the nomic
search_document: prefix is confined to the build-only page path); XSS
escaping across semantic-search, popups providers, map tooltips,
annotations (sole exception §5.7 monogram); theme.js ↔ settings.js
storage schema identical; all JS selector contracts against templates
(including the uncommitted head/nav edits); popups/sidenotes
double-init guards; settings.js and gallery.js focus traps.
6. Templates & content
6.1 Draft in undocumented location is never built — MED
content/drafts/inclusionist-manifesto.md. WRITING.md:34 says drafts go
under content/drafts/essays/; draftEssayPattern
(build/Patterns.hs:46-49) matches only that, so this file is invisible
even to make watch/make dev — silently orphaned.
6.2 SIMD/PQC essay repository: URL 404s — MED
content/essays/where-does-simd-help-post-quantum-cryptography/index.md:24.
https://git.levineuwirth.org/where-simd-helps is missing the owner
segment — verified HTTP 404, while the sibling essay's
.../neuwirth/beyond_comorbidity_indices returns 200.
6.3 Tracked drafts contradict the gitignore policy — MED
.gitignore:88 ignores content/drafts/ as local-only "working notes,"
but git ls-files -i -c shows four tracked drafts
(digital_progeny.md, modern_idolatry.md, test-essay.md,
university_care.md) — ignore rules don't untrack, so edits are
auto-staged by make build and pushed publicly by deploy. The over-broad
**/.env.* pattern also matches the tracked .env.example.
6.4 Template/content LOWs and NITs
content/colophon.md:5:modified:is dead frontmatter — nothing reads it;$date-modified$(page-footer.html:108) is Hakyll'sdateFieldover thedatekey.- Seven files end frontmatter with a valueless
confidence-history:(YAML null; WRITING.md:97 documents a list of ints) — harmless, butcontent/essays/scaling_outage.mdalso retains the full WRITING.md scaffold comments in a published essay. static/images/canto31.jpg: still 4.0 MB (prior-audit §6.1 unfixed).templates/blog-post.html:25,34:id="similar-links"appears twice in mutually exclusive$if$branches — safe, fragile under edit.content/drafts/essays/digital_progeny.md: title duplicates the published "The Specification Dilemma" — stale draft.- Frontmatter flags
home:/library:/links:/search:/portal:are consumed (head.html CSS gates, default.html:6data-portal) but undocumented in WRITING.md.
Verified clean: all $partial(...)$ includes resolve; all ~140 distinct
template variables have context providers; no missing alt attributes,
tag-balance failures, or within-page duplicate IDs in composed pages; all
26 CSS files referenced by head.html exist; sampled enum values across
all sections are legal per WRITING.md and Contexts.hs validation lists.
7. Documentation / spec drift (WRITING.md, README.md)
7.1 js: page-script paths documented as content-relative; emitted root-relative — MED
WRITING.md:773-775 vs templates/default.html:37
(<script src="/$script-src$" defer>). The doc claims a composition's
js: scripts/widget.js serves at /music/symphony/scripts/widget.js; the
template emits raw root-relative frontmatter. The only current user
(memento-mori) works by coincidence of its root-level route. A
composition following the doc would 404.
7.2 "Standalone page content/my-page/index.md" has no generic rule — MED
WRITING.md:20 presents directory-form standalone pages as a general
capability; build/Site.hs hardcodes only content/me/index.md (293) and
content/memento-mori/index.md (307); the generic rule (351) matches flat
content/*.md only. A new content/my-page/index.md silently doesn't
build.
7.3 Portal table lists 8 portals; the build has 9 — MED
WRITING.md:221-231 omits Photography, which is in homePortals
(build/Site.hs:50-60), the nav, and content/tag-meta/photography.md.
7.4 Three implemented frontmatter fields undocumented — MED
WRITING.md:3 claims to cover "all frontmatter fields"; zero hits for:
summary: (build/Contexts.hs:415-427, rendered by essay.html:16 and
reading.html:12, in live use), revised: (build/Contexts.hs:815
getRevisions — drives $date-display$/$date-original$/
$revision-note$ and list sort order), keywords:
(build/Contexts.hs:283 → /bibliography/<kw>/ links).
7.5 Documentation LOWs
WRITING.md:268-269,82: default citation style called "Chicago Author-Date"; the injected CSL (build/Citations.hs:114,167-168) isdata/chicago-notes.csl, titled "Chicago Notes Bibliography".README.md:12,19:make watchdescribed as "rebuilds on save without a server"; it runs Hakyll's preview server (WRITING.md:1139 has it right).WRITING.md:105-109:history:example ordering contradicts the code (see §3.5).
8. nginx, Makefile & deployment
8.1 Multi-line CSP value embeds literal \ + LF bytes — MED
nginx/security-headers.conf:60-71. The
Content-Security-Policy-Report-Only value is a single quoted string
spanning 12 lines with trailing \ characters — nginx has no
line-continuation inside quoted strings, so the emitted header contains
raw backslash, LF, and leading-space bytes between directives. Raw LF in
a header value is illegal in HTTP/2 (vhost example enables http2 on);
strict clients reject the whole response. Sent on every response even as
Report-Only. Must be collapsed to one line.
8.2 CSP gaps that will fire under enforcement — MED
nginx/security-headers.conf:66-67. (a) font-src 'self' data: blocks
KaTeX webfonts: head.html:61 loads katex.min.css from cdn.jsdelivr.net,
whose relative font URLs resolve to the CDN. (b) connect-src 'self'
blocks the onnxruntime .wasm that transformers.js v2 (dynamically
imported in static/js/semantic-search.js:25) fetches from jsdelivr —
the config comment covers the same-origin model files but not the
runtime. Both latent while Report-Only.
8.3 Makefile auto-commit sweeps any pre-staged changes — MED
Makefile:28-29. git add content/ followed by
git diff --cached --quiet || git commit -m "auto: ..." commits the
entire index — anything previously staged gets folded into an
auto: <timestamp> [skip ci] commit and pushed publicly on deploy. Use
git commit -- content/ or verify no foreign paths are staged.
8.4 Makefile LOWs
- pdf-thumbs: the
find | while readpipeline swallowspdftoppmfailures (loop exit status is the last iteration's) — a corrupt PDF silently ships without a thumbnail. - deploy: prerequisite order
clean build signis guaranteed only under serial make; no.NOTPARALLEL:guard for-jinvocations. (Confirmed: deploy does runcleanfirst;.PHONYis complete;.envexport allowlist is sound.) tools/hooks/pre-commit-marks.shis documented (Makefile:175 comment) but not installed —.git/hooks/has only samples andcore.hooksPathis unset.
Verified clean: all seven data/ JSON/YAML files parse;
data/embed-cache-pages.npz is untracked, so the new gitignore entry is
fully effective; nginx archive.conf's add_header-inheritance re-include is
correct; no redirect loops; popup-proxy rate-limit/cache zones correctly
documented for http{} scope.
9. Working-tree diff review (branding refresh + embed split)
The model contract is intact — the diff splits one MiniLM pipeline
into two: pages now use nomic-embed-text-v1.5 (768d, build-only, for
similar-links.json); paragraphs stay on all-MiniLM-L6-v2@c9745ed (384d,
the browser contract). download-model.sh, model-checksums.sha256,
semantic-search.js (DIM = 384), and both WRITING.md lines (1108 nomic
for Related-pages, 1128 MiniLM for client search) are all consistent.
Icon declarations all match real files (verified with file: apple-touch
180×180, favicon-96 96×96, manifest PNGs 192/512, og-image 1200×630
matching declared og:image dimensions; the webp sidecar was regenerated).
Open items beyond §1.3/§1.4/§4.1:
9.1 32.8 KB traced SVG inlined into every page — MED
templates/partials/logo-mark.svg (32,818 bytes, potrace-style single
giant <path>) is inlined via the nav partial into every HTML page —
a ~33 KB per-page weight regression (pre-compression). The two-tone
--logo-ink/--logo-bg cutout (components.css:72-98) genuinely needs
inline SVG or <use>; an external sprite + <use href> restores
cacheability. Better still: a hand-drawn or simplified path — a traced
bitmap at nav size carries detail that can never resolve.
9.2 Icon asset bloat — LOW
static/favicon.ico is now 71,766 bytes; parsed directory shows
16/32/48/64/128/256 px entries, the 128+256 pair alone 55.8 KB. The .ico
is only the legacy fallback (modern browsers take the SVG); 16+32+48
(~8 KB) is conventional. static/favicon.svg is a 32,844-byte traced
path. static/images/link-icons/internal.svg went ~2 KB → 32,818 bytes
yet renders at 0.7–1.6 rem via CSS mask in three stylesheets
(components.css:853, typography.css:833, popups.css:161).
9.3 Webmanifest regressions — NIT
static/site.webmanifest: purpose changed maskable→any for both
icons (Android adaptive launchers will letterbox; convention is separate
any + maskable entries); still no start_url/scope/description
(Lighthouse installability warnings). JSON valid; icons verified.
10. Prior audit (AUDIT.md 2026-05-07) follow-up
| Finding | Status |
|---|---|
| §1.1 freeze unsolvable | Effectively still open — aeson pin fixed, but the freeze broke again via distributive after a system update (§1.1 above); the underlying freeze-vs-system-db fragility is unaddressed |
| §1.3 Python version mismatch | Fixed (requires-python = ">=3.14" matches .python-version) |
| §1.4 model checksums | Fixed (tools/model-checksums.sha256, 5 entries) |
| §9.1 nginx headers | Fixed (nginx/security-headers.conf + vhost example, README'd) — but see §8.1/§8.2 for new issues in that file |
§6.1 canto31.jpg 4 MB |
Unfixed |
| robots.txt / sitemap | Fixed (Site.hs:941/963, present in _site/) |
README paper//spec.md ghosts |
Fixed |
| rsync target quoting | Fixed |
| date-quoting doc | Fixed (WRITING.md:106) |
| tag-meta no-title exception | Fixed (WRITING.md:238-251) |
Suggested triage order
(§1.1 — in progress)tools/refreeze.sh- Delete
data/embed-cache-pages.npz.tmp.npz; widen the gitignore pattern;git addlogo-mark.svg+og-image.pngbefore committing the branding diff (§1.4, §4.1) - Guard
ArchiveIndex.hsfile reads withdoesFileExist(§1.2) - Pin or sandbox the nomic remote code (§1.3)
- Fix the
/fiction/–/poetry/404s (§2.1) and the production-visible frontend MEDs (§5.1, §5.2) - Collapse the nginx CSP to one line before ever flipping it to enforcing (§8.1, §8.2)
- The rest by severity as time allows