levineuwirth.org/audit.md

740 lines
51 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# levineuwirth.org — Comprehensive Audit
**Auditor:** Independent code review (read-only, no changes made)
**Date:** 2026-04-09
**Scope:** ~15,400 lines across Haskell build system (`build/**/*.hs`), Pandoc filters (`build/Filters/*.hs`), static JavaScript (`static/js/*.js`), CSS (`static/css/*.css`), templates (`templates/**`), Python tooling (`tools/*.py`), shell scripts (`tools/*.sh`), `Makefile`, cabal/pyproject configuration, and repository hygiene.
**Methodology:** Direct reading of critical modules (`Site.hs`, `Contexts.hs`, `Stats.hs`, `Backlinks.hs`, `Compilers.hs`, `Citations.hs`, `Stability.hs`, `Catalog.hs`, `Commonplace.hs`, `Filters/*.hs`, `Makefile`, shell scripts, `embed.py`); parallel exploration of JS, CSS, templates, and the larger Python tools.
Each finding is labeled by **severity** (`CRITICAL`, `HIGH`, `MEDIUM`, `LOW`, `NIT`) and cites file + line. The codebase is generally well-written — architecture is clean, modules are tightly scoped, YAML/frontmatter is parsed defensively, and escaping is applied in most HTML rendering sites. Most findings are local issues; the codebase does not exhibit systemic rot.
---
## Executive summary
**Confirmed correctness bugs (by impact):**
| # | File | Severity | Summary |
|---|------|----------|---------|
| 1 | `build/Filters/Images.hs:110` | **CRITICAL** | `lowerExt` is mathematically wrong — returns `"image."` for `"image.jpg"`. Every local raster fails `isLocalRaster`, so **no `<picture>` / WebP wrapping happens site-wide**. The entire WebP pipeline is dead code. |
| 2 | `build/Commonplace.hs:126-131` | **HIGH** | Operator-precedence bug in `renderChronoView`: `a ++ if c then x else y ++ z` parses as `a ++ (if c then x else (y ++ z))`, so `</div>` is never emitted when the commonplace book is empty → unclosed tag. |
| 3 | `tools/embed.py:68-73` | **HIGH** | Root `index.html` yields URL `"/./"` instead of `"/"`. Homepage is never matched by `SimilarLinks.hs`, so the "Related" block never renders on the home page. |
| 4 | `build/Authors.hs:50` | **HIGH** | `allContent` pattern does not include `content/essays/*/index.md` (directory-form essays). Author pages silently omit those essays. Compare against `Tags.hs:69`, which *does* include them. |
| 5 | `build/Filters/Score.hs:40` | **HIGH** | `TIO.readFile fullPath` is called with no existence check and no exception catch. A missing SVG aborts the entire build with a bare `openFile: does not exist` — no file name context, no graceful fallback. |
| 6 | `build/Filters/Viz.hs:96-99` | **HIGH** | Same pattern: `readProcessWithExitCode "python3" [fullPath]` runs even when `fullPath` doesn't exist; the only signal the author gets is a generic "non-zero exit". |
| 7 | `build/Filters/Sidenotes.hs:38` | **HIGH** | Sidenote labels wrap after the 26th note: `(n - 1) mod 26` turns note 27 into `a` again, creating duplicate `id="sn-a"` / `id="snref-a"` across the same document. Breaks in-page links and screen-readers. |
| 8 | `build/Filters/Images.hs:77` | **MEDIUM** | `passedKvs` filters only `loading` and `data-lightbox`, but not `id`, `class`, `alt`, or `title` — all of which are already emitted explicitly above. Any author-set `id=` or `class=` kv on an image is emitted **twice** in the `<img>`, producing invalid HTML (`<img id="x" id="x">`). |
| 9 | `build/Contexts.hs:263-264` | **MEDIUM** | `confidenceTrendField` uses `xs !! (length xs - 2)` (O(n) indexing) and `last xs`. They are guarded by a length check so they're safe, but this is a partial idiom in a module that otherwise uses total patterns. |
| 10 | `build/Filters/Links.hs:59` | **MEDIUM** | `not ("levineuwirth.org" 'T.isInfixOf' url)` — substring match. `https://evil-levineuwirth.org.attacker.com` is classified as *internal*, skipping `rel=noopener noreferrer target=_blank`. |
**Defense-in-depth findings:**
- `build/Filters/Transclusion.hs:41` interpolates the author-controlled `sec` section name into a `data-section="..."` attribute with no escaping. In a static site where all Markdown is author-authored this is not an exploitable XSS, but it is a raw-HTML injection primitive — a stray `"` in a section name will break markup, and any future lowering of the "author is trusted" assumption (PRs, multi-author site, user submissions) turns it into one.
- `build/Stats.hs:161-169` implements a correct URL allowlist (`isSafeUrl`) but accepts `"/"` as a prefix, which also matches `//evil.com` (protocol-relative URLs). Mostly cosmetic here since inputs come from Hakyll-computed routes, but the allowlist comment claims strict defense and this is a hole.
- Two different `authorSlugify` / `nameOf` implementations exist (`Authors.hs:30-39` and `Contexts.hs:147-154`). They'll drift the moment one is edited.
- Five copies of `escHtml``Utils.hs:18-26` (the "real" one), `Filters/Images.hs:135-142`, `Filters/Score.hs:88-92`, `Filters/Smallcaps.hs` (per the filter audit), `Filters/Viz.hs:178-182`, plus identical ones in JS (`annotations.js`, `popups.js`, `semantic-search.js`). Any fix must be made in 7+ places.
**Repository hygiene:**
- `.env` is gitignored and not tracked — good.
- `~5.4 MB` of `.docx` binaries (`BeyondComorbidityIndices*.docx`) sit in the repo root, untracked but present; they're build input for the new essay but should be moved under `paper/` or similar rather than the project root.
- `HOMEPAGE.md~` (zero-byte editor backup) is on disk; gitignore catches it, but it should be removed.
- `content/modern_idolatry.md` is untracked and not under `content/drafts/` — either it's a ready-to-publish draft that escaped the drafts workflow, or a forgotten scratch file.
- `build/Metadata.hs` contains only `module Metadata where` — a no-op placeholder dragged along since Phase 2. Delete or populate.
- `build/Filters/Math.hs` and `build/Filters/Dropcaps.hs` are `apply = id` placeholders; fine as TODO anchors, but `-Wno-unused-imports` in `levineuwirth.cabal` is masking warnings that would otherwise tell you so.
---
## 1. Haskell build system (`build/*.hs`)
### 1.1 `Site.hs`
**L-1.1.1 — LOW — Blog posts do not support directory-form pages.** `content/blog/*.md` (line 249) only matches flat posts; compare to essays and poetry, which accept both flat and `*/index.md`. If the author ever wants to co-locate blog assets, they'll have to edit both the rule and `Backlinks.hs:allContent`.
**L-1.1.2 — LOW — Backlinks pattern drift.** `allContent` in `Backlinks.hs:200-208`, `Authors.hs:50`, `Tags.hs:69`, and the implicit patterns in `Site.hs` all enumerate the same content types, slightly differently. Authors omits directory essays; Backlinks omits fiction/*/index.md; Tags includes both essay forms but not fiction. This divergence is the root of finding #4 (Authors missing directory essays) and will continue to produce silent bugs. Extract one canonical `Patterns.hs`.
**L-1.1.3 — LOW — `draftEssays` → `isDev` ties build correctness to an environment variable read at rule registration.** `isDev <- preprocess $ ... lookupEnv "SITE_ENV"` runs once at startup. Correct — but a developer toggling `SITE_ENV` mid-`cabal run site -- watch` will be confused. Worth a comment at the `preprocess` call, not just near `draftEssays`.
**L-1.1.4 — LOW — `library.html` loads all content four times.** `portalList` calls `loadAll essays`, `loadAll posts`, `loadAll fiction`, `loadAll poetry` **inside the inner list body**, which is re-evaluated for each of the eight `portalList` calls. That's 32 `loadAll` calls for eight portals. Hakyll caches identifiers so the impact is bounded, but it's still unnecessary work; hoist the loads into the outer `compile` block.
**NIT — `random-pages.json` (line 445).** The type annotation `:: Compiler [Item String]` on every binding is load-bearing because without it Hakyll can't infer the snapshot type. Fine, but a quick comment would save a future reader from thinking they're decorative.
### 1.2 `Contexts.hs`
**M-1.2.1 — MEDIUM — `authorLinksField` produces empty-slug URLs for empty author names.** `authorLinksField` (line 161) splits on `|`, trims, and calls `authorSlugify`. An entry like `"| https://url"` or `" "` produces name `""` → slug `""` → URL `/authors//`. Guard against empty names (fall back to `defaultAuthor` or skip the entry).
**M-1.2.2 — MEDIUM — `parseMovements` silently drops malformed entries.** `parseMovements` (line 380-397) uses `catMaybes $ map parseOne` — an entry missing `name` or `page` is dropped with zero diagnostic. Compositions with a typo in one movement silently lose it. Add at least a `putStrLn` warning via `unsafeCompiler` or fail loudly.
**L-1.2.3 — LOW — `abstractField` only strips single-`Para` abstracts.** Line 184-186: `Pandoc m [Para ils] -> Pandoc m [Plain ils]`. An abstract with inline `<br>` or line breaks becomes multiple `Para` blocks and the outer `<p>` is not stripped. Harmless but inconsistent.
**L-1.2.4 — LOW — `confidenceTrendField` threshold of ±5 is undocumented.** Line 267-269: `c - p > 5` → up, `p - c > 5` → down. The comment in the header describes behavior but not the threshold. Magic number.
**L-1.2.5 — LOW — `pageScriptsField` uses the script path as the item identifier.** Line 123: `Item (fromFilePath s) s`. If two separate frontmatter entries both load `shared.js`, they collide in Hakyll's item-store the first time `listField` evaluates them. Probably works by accident because the inner `script-src` field just returns `itemBody`; note the risk.
**NIT — `getInt` via `Rational → Double → floor`** (line 396). If a page number is `1000000000000000000` (unlikely), Double precision loss. Use `Scientific.floatingOrInteger` from `scientific` (already transitively available via Aeson).
### 1.3 `Stats.hs`
**M-1.3.1 — MEDIUM — `stripHtmlTags` is naive.** Line 108-111 strips `<...>` greedily, ignoring `>` inside attribute values, `<!-- ... -->` comments, and `<![CDATA[...]]>`. Used to compute word count and reading time for the `/build/` page so the impact is limited, but if a future author writes `alt="a > b"` (rare but legal) it'll slice the content.
**M-1.3.2 — MEDIUM — `walkDir` has no symlink-loop protection.** Line 406-416 recurses through `_site` via `doesDirectoryExist`, which follows symlinks. A developer who accidentally symlinks `_site/a → _site` will infinite-loop the build. Use `doesDirectoryExist` + `pathIsSymbolicLink` (in `directory >= 1.3.6`).
**L-1.3.3 — LOW — `isSafeUrl` allows protocol-relative URLs.** Line 161-164 accepts `"/"`-prefixed values. `"//evil.com"` matches this prefix. All current inputs are Hakyll-derived routes so the exposure is nil, but the comment ("Defense-in-depth URL allowlist") claims more rigor than the implementation provides. Fix: reject `u` that begins with `//`.
**L-1.3.4 — LOW — `readFile`/`Aeson.decodeStrict` round-trip.** Line 741 decodes backlinks via `TE.encodeUtf8 (T.pack rawBL)` where `rawBL :: String`. That is `String → Text → ByteString` — three copies. Read the item as `Item ByteString` via `getResourceLBS` (or keep backlinks.json as bytes throughout) to avoid two conversions.
**L-1.3.5 — LOW — Two separate tag sections.** `renderStatsTags` (line 380) and `renderTagsSection` (line 568) are the same function with different names. Consolidate.
**L-1.3.6 — LOW — Lazy `readFile` in `countLinesDir`.** Line 455: `readFile (dir </> e)` holds the handle open until `length (lines content)` is fully forced. Under `forM`, multiple handles may be concurrently open. For a 30-file build directory it's fine; use `Data.Text.IO.readFile` for explicit strictness.
**NIT — `lookupString "title" meta` fallback `"(untitled)"`** (line 71 and many siblings). Fine, but consider extracting a `titleOr` helper since it appears ~6 times.
### 1.4 `Backlinks.hs`
**L-1.4.1 — LOW — `normaliseUrl` does not URL-decode.** Line 188-194: stripping `?` and `#` is done on the raw URL without percent-decoding. A path like `/essays/caf%C3%A9` won't normalize to `/essays/café`. Current build likely does not emit percent-encoded routes, so this is latent.
**L-1.4.2 — LOW — `backlinksField` does not handle the "item with noResult route" case explicitly.** When `getRoute item` is `Nothing`, it fails with `"backlinks: item has no route"`. Fine, but that path is unreachable for items that have an associated rule. Note it, remove if always reachable.
**NIT — `renderBacklinks` concatenates strings; use blaze-html** to match `Stats.hs`. Not urgent; the output is static per build.
### 1.5 `Citations.hs`
**L-1.5.1 — LOW — Partial functions in `transformInline`.** Line 142: `head keys` / `head nums`. Guarded by `null nums` check above and by the structure of Pandoc `Cite` (never empty from the parser), so this is safe in practice. Swap to `case nums of (n:_) -> ...`.
**L-1.5.2 — LOW — `markerHtml` concatenates `T.unpack . show` via `tshow`** but also builds `data-cite-keys` as a space-separated list of HTML IDs with no escaping. If a citation key contains a quote character (unusual but legal), the attribute breaks.
**NIT — `stripRefPrefix` (line 209)** is `"ref-"`-specific; should be renamed `stripPandocRefPrefix` or documented with a pointer to the Pandoc source that emits it.
### 1.6 `Compilers.hs`
**L-1.6.1 — LOW — `pageCompiler` does not save a `toc` snapshot.** OK for pages that use `pageCtx`, but the commonplace, landing, and standalone pages that would benefit from a TOC get no opportunity. Not a bug — an architectural choice worth documenting.
**NIT — `stringify`** is redefined here (line 56-77) in addition to `Filters/Images.hs:119-132` and the one `Text.Pandoc.Shared` exports. Three implementations. Pick one.
### 1.7 `Stability.hs`
**M-1.7.1 — MEDIUM — `readIgnore` uses lazy `readFile`.** Line 44: handle stays open until the whole list is forced. Fine for a single-shot read but the pattern is fragile; `Data.Text.IO.readFile` is strict.
**L-1.7.2 — LOW — `unsafeCompiler` for git subprocess breaks Hakyll's dep tracking.** `stabilityField` calls `git log` via `unsafeCompiler`. Hakyll will not re-run the compiler when HEAD moves. Expected — `make build` always runs `git add content/` + commit first, which updates mtimes — but it's fragile to reason about. Worth a note at the `unsafeCompiler` call site rather than the header docs.
**L-1.7.3 — LOW — `gitDates` ignores `stderr`.** Line 54: `(ec, out, _) <- readProcessWithExitCode ...``_` drops the error. If the file isn't tracked yet, git prints a warning to stderr; user sees nothing. Log it.
**NIT — `stabilityFromDates` classification is undocumented magic.** `n <= 5 && age < 90` → "revising". These thresholds should be constants with intent comments.
### 1.8 `Catalog.hs`
**M-1.8.1 — MEDIUM — `renderEntry` does not escape frontmatter.** `ceTitle`, `ceYear`, `ceDuration`, `ceInstrumentation`, and `ceUrl` are pasted directly into HTML via `concat`. This is consistent with the site's "author-controlled trusted HTML in titles" convention (`Stats.hs:180-186` calls this out explicitly), but `Catalog.hs` has *no such comment*. If a collaborator's frontmatter contains a stray `<` or a malformed entry, the HTML breaks silently.
Suggest: adopt the `pageLink` convention from `Stats.hs` — escape `href` via `safeHref`, pass title through `preEscapedToHtml` with a documented comment.
**L-1.8.2 — LOW — `renderCategorySection` assumes non-empty group.** Line 194: `categoryLabel (ceCategory (head g))`. `groupBy` on a non-empty list produces non-empty sublists, so this is safe, but partial.
**NIT — `categoryRank` uses `lookup` instead of `elemIndex`.** Shorter:
```haskell
categoryRank c = fromMaybe (length categoryOrder) (elemIndex c categoryOrder)
```
### 1.9 `Commonplace.hs`
**H-1.9.1 — HIGH — Operator-precedence bug in `renderChronoView` (line 126-131).**
```haskell
renderChronoView entries =
"<div class=\"cp-chrono\" id=\"cp-chrono\" hidden>"
++ if null sorted
then "<p class=\"cp-empty\">No entries yet.</p>"
else concatMap renderEntry sorted
++ "</div>"
```
Parses as `"..." ++ (if null sorted then "..." else (concatMap renderEntry sorted ++ "</div>"))`. When `sorted` is empty, the closing `</div>` is silently dropped. Fix: parenthesize the `if`, or split into two lines with explicit binding.
**L-1.9.2 — LOW — `renderText` replaces `\n` with `<br>\n`** after escaping, which is correct, but does not escape `\r`. Windows-style line endings would produce `\r<br>`, leaving stray `\r` in HTML. Normalize line endings in `stripTrailingNL`.
### 1.10 `Authors.hs`
**H-1.10.1 — HIGH — `allContent` omits directory-form essays.** Line 50:
```haskell
allContent = ("content/essays/*.md" .||. "content/blog/*.md") .&&. hasNoVersion
```
Compare to `Tags.hs:69`, which adds `"content/essays/*/index.md"`. Any essay stored as `content/essays/foo/index.md` will NOT appear on its author's index page. This is the most likely source of silent "why isn't this essay on my author page" bugs.
**L-1.10.2 — LOW — Duplicate of `Contexts.authorSlugify`.** `Authors.slugify` and `Contexts.authorSlugify` do the same thing with different definitions (the Contexts version normalizes before filtering, Authors version filters after lowercasing). The two will diverge on Unicode edge cases. Consolidate.
### 1.11 `Utils.hs`
**L-1.11.1 — LOW — `wordCount` counts HTML tokens as words.** Called from `Compilers.hs:172` on raw source `src` (Markdown, including any raw HTML) and from `Stats.hs:809` on tag-stripped HTML. On raw Markdown this miscounts `[display](url)` as three "words". Low-severity because the stat is approximate anyway, but worth noting when comparing `/stats/` numbers to `wc`.
### 1.12 `Pagination.hs`, `Tags.hs`, `SimilarLinks.hs`, `Metadata.hs`, `Main.hs`
No material issues. `Metadata.hs` is a two-line empty-module placeholder — delete or populate.
---
## 2. Pandoc filters (`build/Filters/*.hs`)
### 2.1 `Filters/Images.hs` — the big one
**C-2.1.1 — CRITICAL — `lowerExt` returns the basename, not the extension.** Line 110:
```haskell
lowerExt = map toLower . reverse . ('.' :) . takeWhile (/= '.') . tail . dropWhile (/= '.') . reverse
```
Trace for `"image.jpg"`:
1. `reverse``"gpj.egami"`
2. `dropWhile (/= '.')``".egami"`
3. `tail``"egami"`
4. `takeWhile (/= '.')``"egami"`
5. `('.' :)``".egami"`
6. `reverse``"image."`
7. `toLower``"image."`
So `lowerExt "image.jpg" == "image."` — which does not equal `.jpg`, `.jpeg`, `.png`, or `.gif`. **`isLocalRaster` is therefore `False` for every file**, the entire `<picture>`/WebP dispatch is dead code, and `tools/convert-images.sh` produces `.webp` companions that are never referenced.
Fix: `System.FilePath.takeExtension` is already imported elsewhere and already pulled in transitively; replace with
```haskell
lowerExt = map toLower . takeExtension
```
**M-2.1.2 — MEDIUM — `passedKvs` duplicate-emits `id`, `class`, `alt`, `title`.** Line 77:
```haskell
passedKvs = filter (\(k, _) -> k `notElem` ["loading", "data-lightbox"]) kvs
```
But above, `attrId`, `attrClasses`, `attrAlt`, and `attrTitle` already emit those attributes from `(ident, classes, kvs)`. If an author writes `![alt](img.jpg){.foo title="bar"}`, Pandoc places `title` into `kvs`, so the output becomes `<img ... class="foo" title="bar" title="bar">`. Expand the blacklist:
```haskell
passedKvs = filter (\(k, _) -> k `notElem` ["loading", "data-lightbox", "id", "class", "alt", "title"]) kvs
```
Side-note: the same issue affects the non-picture branch at line 47 indirectly (via the `Image` constructor Pandoc emits), but Pandoc's HTML writer handles dedup there.
**M-2.1.3 — MEDIUM — `stringify` catches most but not all inline variants.** Line 119-132: handles `Str`, `Space`, `SoftBreak`, `LineBreak`, `Emph`, `Strong`, `Code`, `Link`, `Image`, `Span`. Misses `Strikeout`, `Superscript`, `Subscript`, `SmallCaps`, `Quoted`, `Cite`, `Math`, `RawInline`. Alt text for an image captioned `~subscript~` will be empty.
**L-2.1.4 — LOW — `renderKvs` does not escape the key.** Line 94: `" " <> k <> "=\"" <> esc v <> "\""`. Keys in Pandoc come from Markdown attribute syntax and can only be identifiers, so this is safe in practice; but it's asymmetric with `v` and deserves either `esc k` or an assertion comment.
**L-2.1.5 — LOW — `isUrl` misses `data:`, fine; misses `file://`, OK; misses `mailto:` not relevant here.** Accurate for the intended domain.
### 2.2 `Filters/Transclusion.hs`
**M-2.2.1 — MEDIUM — `sec` attribute not HTML-escaped.** Line 41:
```haskell
Just (slugToUrl slug, " data-section=\"" ++ sec ++ "\"")
```
`sec` is everything after `#` up to `}}` in the Markdown source. If an author writes `{{essay#a"b}}`, the emitted HTML is `<div … data-section="a"b">` — invalid markup. Not a realistic XSS vector on a single-author static site (would be a self-attack), but:
- It's an injection primitive. The moment content ever comes from a PR, a collaborator, or an imported source, it becomes one.
- The fix is one line: escape `"`, `<`, `>`, `&` before interpolation.
**L-2.2.2 — LOW — `slugToUrl` appends `.html` unconditionally.** Line 46-49: `slug ++ ".html"`. If the slug is already `page.html`, you get `page.html.html`. Unlikely in practice (source convention is `{{essay-slug}}` with no extension), but guard against it.
**NIT — `trim` re-implemented yet again.** Same function appears at least four times (`Transclusion.hs:59`, `EmbedPdf.hs:80`, `Wikilinks.hs:59`, plus `Contexts.hs`'s `strip`). Factor.
### 2.3 `Filters/Score.hs`
**H-2.3.1 — HIGH — `TIO.readFile fullPath` with no existence check and no exception handling.** Line 40. A Markdown file that references a missing SVG aborts the entire Hakyll build with nothing more than:
```
openFile: does not exist (No such file or directory)
```
No filename, no page context, no recovery. Fix:
```haskell
existed <- doesFileExist fullPath
if not existed
then do putStrLn $ "[Score] missing: " ++ fullPath
return (Div ("", cls, attrs) blocks)
else do svgRaw <- TIO.readFile fullPath
...
```
Or wrap in `try` and fall back to an `errorBlock` mirroring `Filters.Viz.errorBlock`.
**M-2.3.2 — MEDIUM — Lazy-I/O `readFile` under `walkM`.** Using `Data.Text.IO.readFile` forces immediately, so this is actually OK — I retract the generic concern. The real issue is #H-2.3.1 above.
**L-2.3.3 — LOW — `processColors` is order-sensitive.** The comment on line 56-58 acknowledges it: the 6-digit hex replacements come *last* in the function composition chain, which means they're applied *first*. That's correct and the comment is helpful. Keep the comment.
**L-2.3.4 — LOW — `escHtml` reorder bug.** Line 88-92:
```haskell
escHtml = T.replace "\"" "&quot;"
. T.replace ">" "&gt;"
. T.replace "<" "&lt;"
. T.replace "&" "&amp;"
```
`&` must be replaced *first*, else the `&amp;` injected by other replacements gets its `&` replaced by `&amp;` to become `&amp;amp;`. Read bottom-up because of function composition: `&``<``>``"`. Wait — function composition: `f . g . h` applied to `x` is `f (g (h x))`. So the order executed is `&`, then `<`, then `>`, then `"`. **This is correct** (`&` first). Retracted — the `Viz.escHtml` at `Viz.hs:178-182` has the same composition order and is also correct. Nit only: write the function as a single chain with a comment stating the invariant.
### 2.4 `Filters/Viz.hs`
**H-2.4.1 — HIGH — No file-existence check before `readProcessWithExitCode`.** Line 96-99. Same class of bug as Score; the user sees `"non-zero exit"` with no path. Add `doesFileExist fullPath` before spawning.
**M-2.4.2 — MEDIUM — Exception handler drops the exception detail.** Line 99:
```haskell
`catch` (\e -> return (ExitFailure 1, "", show (e :: IOException)))
```
The third tuple element is set to `show e`, but then on line 102 the caller reads it as `err` and displays it. That's actually correct — retracted. BUT the error bubbles up to `errorBlock` which renders `<div class="viz-error">...</div>` inline in the page. That's actually graceful. Good.
**L-2.4.3 — LOW — `escScriptTag` only replaces `</`.** Line 133: correct for JSON embedding but not for content that contains `<!--` or `]]>` inside strings. Vega-Lite specs won't contain those, so fine.
**L-2.4.4 — LOW — `warn` uses `putStrLn` to stdout, not stderr.** Line 176. Mixes with Hakyll's build progress output. Use `hPutStrLn stderr`.
### 2.5 `Filters/Sidenotes.hs`
**H-2.5.1 — HIGH — Label wrap at 26 produces duplicate IDs.** Line 38:
```haskell
toLabel n = T.singleton (toEnum (fromEnum 'a' + (n - 1) `mod` 26))
```
Note 27 → `a` again. Two `<sup id="snref-a">` and two `<sup id="sn-a">` in the same document. Duplicate IDs are invalid HTML, break `href="#sn-a"` fragment navigation, and confuse ATs.
Fix options:
1. Use numeric labels: `"sn" ++ show n`.
2. Use two-letter labels for n > 26: `aa`, `ab`, …, `zz`.
3. Fail loudly with `error`: essays with >26 footnotes are rare and the user should know.
**M-2.5.2 — MEDIUM — `replacePTags` is a string-level hack.** Line 57-60:
```haskell
replacePTags =
T.replace "<p>" "<span class=\"sidenote-para\">"
. T.replace "</p>" "</span>"
```
A footnote whose content contains the literal text `<p>` (e.g., a code sample discussing HTML) will be mangled. Rare but possible. The correct fix is to transform the AST before writing, not the post-rendered HTML.
### 2.6 `Filters/Links.hs`
**M-2.6.1 — MEDIUM — `isExternal` uses substring match for the site domain.** Line 59:
```haskell
isExternal url =
("http://" `T.isPrefixOf` url || "https://" `T.isPrefixOf` url)
&& not ("levineuwirth.org" `T.isInfixOf` url)
```
`https://evil-levineuwirth.org.attacker.com/phish` contains `levineuwirth.org` as a substring → classified as *internal* → no `rel=noopener noreferrer target=_blank`. In 2026 with partitioned cookies this is mostly a cosmetic concern, but fix is trivial:
```haskell
isSameHost url =
case T.stripPrefix "https://" url <|> T.stripPrefix "http://" url of
Nothing -> False
Just rest ->
let host = T.takeWhile (\c -> c /= '/' && c /= ':') rest
in host == "levineuwirth.org" || "." `T.isSuffixOf` ("." <> host) -- etc.
```
or simpler: `host == "levineuwirth.org" || T.isSuffixOf ".levineuwirth.org" host`.
**M-2.6.2 — MEDIUM — PDF links with fragment are not rewritten.** Line 30-36 requires `.pdf" T.isSuffixOf` url` — a URL like `/papers/foo.pdf#page=5` has suffix `5`, not `.pdf`, so it doesn't route through the PDF.js viewer. Compare to `EmbedPdf.hs` which does handle fragments in the source preprocessor path. Inconsistent.
**L-2.6.3 — LOW — `domainIcon` duplicates twitter/x and youtube/youtu.be mappings.** Fine. Nit: table-driven via `lookup` would be cleaner than the chain of guards.
### 2.7 `Filters/Wikilinks.hs`
**M-2.7.1 — MEDIUM — `toMarkdownLink` does not escape `]` or `)`.** Line 33-36:
```haskell
toMarkdownLink inner =
let (title, display) = splitOnPipe inner
url = "/" ++ slugify title
in "[" ++ display ++ "](" ++ url ++ ")"
```
If the display text contains `]` or `)`, the generated Markdown is broken and Pandoc will parse it as raw text or as a weird link. Rare in practice (wikilink display is usually a plain name), but worth escaping.
**L-2.7.2 — LOW — `slugify` uses `intercalate "-" . words . ...` — "a.b" → "a b" → "a-b".** That's by design (punctuation becomes space becomes hyphen). Note the trailing hyphen for inputs like "end.": space after "end" → `["end"]` → "end". OK.
**NIT — Inefficient `trim` — `reverse . dropWhile ' ' . reverse . dropWhile ' '`.** Use `T.strip` if inputs were Text. `String`-based pipeline makes this unavoidable.
### 2.8 `Filters/EmbedPdf.hs`
**M-2.8.1 — MEDIUM — `encodeQueryValue` does not encode `#`.** Line 68-76: the encoder is called on `filePath`, which is already split on `#` by `parseDirective` (line 38). So the unencoded `#` issue doesn't bite here. However, the docstring at line 65 says "percent-encode characters that would break a query-string value" — `#` is such a character. Add it for defense in depth, even if the current call site doesn't benefit.
**L-2.8.2 — LOW — `parsePageHash` silently produces `""` for invalid fragments.** Line 45-51. An author writing `{{pdf:/foo.pdf#garbage}}` silently drops the fragment. No warning.
### 2.9 `Filters/Typography.hs`, `Filters/Code.hs`, `Filters/Smallcaps.hs`, `Filters/Dropcaps.hs`, `Filters/Math.hs`
Scanned via the parallel sub-audit; only nit-level findings apply (duplicate `escHtml`, smart-quote edge case in abbreviation matching, `apply = id` placeholders).
---
## 3. Static JavaScript (`static/js/*.js`)
Audited by parallel exploration. The full per-file list is long; the aggregate pattern is: **no user-authored content is ever injected**, so `innerHTML` usage across `popups.js`, `annotations.js`, `citations.js`, and `selection-popup.js` is **not an XSS vector under the current authoring model**. The risk profile changes the moment the site accepts PRs, gains an annotations-backend, or proxies third-party content (none of which are planned per `spec.md`).
### 3.1 XSS surface (all author-trust scoped)
**M-3.1.1 — MEDIUM — `popups.js:608-614` copies `innerHTML` from the page into the popup.** The `epistemicContent` provider does `html += '<div class="ep-compact">' + compact.innerHTML + '</div>'`. Because the source (`.ep-compact`) is emitted by our own Haskell code (`Contexts.hs` + templates), this is safe under the trust model. Switch to `compact.cloneNode(true)` + `popup.appendChild()` for a defense-in-depth fix that costs nothing.
**M-3.1.2 — MEDIUM — `popups.js` cross-origin fetches (Wikipedia, arXiv, CrossRef, GitHub, etc.) don't validate `Content-Type`.** A malicious CORS-enabled endpoint could return HTML that the popup would render. Every fetch already pipes through an `esc()` call (line 655-661), so the risk is bounded to text that escapes in some corner.
**L-3.1.3 — LOW — `citations.js:15, 56` and `annotations.js:167-172` use `innerHTML` with escaped data.** The escaping is correct; the fragility is that the escape-before-concat pattern is easy to get wrong in the future.
### 3.2 Event handling / lifecycles
**M-3.2.1 — MEDIUM — `sidenotes.js:73-94` attaches listeners per-sidenote with no cleanup path.** When `transclude.js` re-renders a fragment on resize, sidenotes accumulate duplicate handlers. Net effect: `update()` gets called 2×, 3×, … on hover over the same sidenote. Not a bug in the output, but a measurable leak over a long session.
**M-3.2.2 — MEDIUM — `popups.js` attaches listeners at load time and never re-binds for transcluded content.** A transcluded essay's internal links have no popup previews. If transclusion is meant to feel "live", this is a user-visible gap.
**M-3.2.3 — MEDIUM — `semantic-search.js:66-74` race in `loadModel`.** If two searches fire before the first model-load resolves, both call `import()` and `pipeline()`. Second call wastes CPU + memory. Track in-flight Promise:
```js
if (loadPromise) return loadPromise;
loadPromise = import(CDN).then(...);
```
### 3.3 Accessibility
**H-3.3.1 — HIGH — `gallery.js` overlay has no focus trap.** `openOverlay()` focuses the close button, but Tab escapes into the backdrop. Pattern to copy: `settings.js:35-49`.
**M-3.3.2 — MEDIUM — `selection-popup.js` annotation picker color swatches are mouse-only.** Arrow-key navigation + Enter to select would make it keyboard-accessible.
**M-3.3.3 — MEDIUM — `sidenotes.js` sidenote focus toggle is click-only.** No keyboard equivalent.
**L-3.3.4 — LOW — `lightbox.js:18,42` defaults `img.alt` to `""` and only later populates from source.** If source alt is missing, the lightbox image has no accessible name. Use `img.alt = srcAlt || 'Lightbox image'`.
**L-3.3.5 — LOW — `theme.js:9-28` does not `try/catch` around `localStorage.getItem`.** Private-browsing Safari throws. The code happens to work because `getItem` returns `null` on failure *in most browsers*, but not all.
### 3.4 Duplication and style
**L-3.4.1 — LOW — HTML escaping reimplemented 3× across `annotations.js`, `popups.js`, `semantic-search.js`.** Add a shared `utils.js` (one function).
**L-3.4.2 — LOW — Mixed `var` vs `const`/`let`.** `citations.js`, `nav.js`, `sidenotes.js`, `toc.js` use modern ES6+; `popups.js`, `annotations.js`, `gallery.js` use `var`. Pick one.
**NIT — Magic-number sprinkles** for delays (`SHOW_DELAY=250`, `HIDE_DELAY=150`, `SHOW_DELAY=450`, swipe threshold `30`, etc.). Not worth a refactor.
---
## 4. CSS and HTML templates
Audited by parallel exploration. Highlights:
### 4.1 CSS
**H-4.1.1 — HIGH — Undefined CSS custom properties.** `build.css` uses `--rule` (lines 21, 30, 39, 69) and `--bg-subtle` (components.css:1448) and `--font-ui` (many places) that have no definition in `base.css`. Browsers treat `var(--undefined)` as the initial value → silent visual degradation on the `/build/` and annotation-related pages.
Fix:
```css
:root {
--rule: var(--border-muted);
--font-ui: var(--font-sans);
--bg-subtle: #f5f5f5;
}
[data-theme="dark"] { --bg-subtle: #1f1f1f; }
```
**H-4.1.2 — HIGH — Dark-mode `--text-faint` contrast fails WCAG AA.** `#6a6660` on `#121212` ≈ 2.8:1. Used for sidenote numbers (0.65em!) and disabled-state icons. Bump to ~`#8b8680` (≈3.5:1) at minimum.
**H-4.1.3 — HIGH — TOC collapse hides content from keyboard + AT.** `components.css:433-436` uses `visibility: hidden` on collapsed TOC, which removes it from the accessibility tree. Use `aria-expanded` + height transition, or `aria-hidden="true"` explicitly, or `display: none` (losing the smooth collapse).
**H-4.1.4 — HIGH — No consistent `:focus-visible` ring across interactive elements.** `.nav-portal-toggle`, `.settings-toggle`, `.toc-toggle`, `.annotation-toggle` lack focus styles. Add a global:
```css
button:focus-visible, a:focus-visible {
outline: 2px solid var(--text);
outline-offset: 2px;
}
```
**M-4.1.5 — MEDIUM — Hardcoded hex in `print.css`.** `#fff`, `#000`, `#f9f9f9`, `#ddd` bypass variables. Move into a `@media print` `:root` overrides block.
**M-4.1.6 — MEDIUM — Breakpoints are scattered.** `540px`, `680px`, `900px`, `1100px`, `1500px` appear across files with no central definition. Define once in `base.css`:
```css
:root {
--bp-phone: 540px;
--bp-tablet: 680px;
--bp-desktop: 900px;
--bp-wide: 1500px;
}
```
(Note: CSS variables cannot be used inside `@media` queries; use Sass or a preprocessor, or settle for a comment + grep discipline.)
**L-4.1.7 — LOW — Inconsistent transition timings.** `0.15s`, `0.28s`, `0.3s`, `0.35s`, `0.5s` scattered. Three tokens would cover all cases.
**L-4.1.8 — LOW — Deprecated `font-variant` shorthand.** `reading.css:95` and `library.css:22` use `font-variant: small-caps`, which resets other OpenType features (like kerning). Use `font-variant-caps: small-caps`.
### 4.2 HTML templates
**M-4.2.1 — MEDIUM — `templates/default.html:30-33` inline onload script.** The KaTeX bootstrap is an inline onload attribute containing a multi-line JS expression. Works, but blocks any future strict CSP (`unsafe-inline`). Move to an external `katex-bootstrap.js` served from `/js/`.
**L-4.2.2 — LOW — `templates/partials/nav.html` buttons lack `type="button"`.** If any nav is ever placed inside a `<form>`, Enter will submit. Belt-and-suspenders fix: add `type="button"` to every `<button>` that isn't a submit.
**L-4.2.3 — LOW — `templates/partials/head.html` loads all component CSS unconditionally plus three conditional files.** Not a perf bug on HTTP/2, but `components.css` (1464 lines) is loaded even on the homepage. Split.
---
## 5. Python tooling (`tools/*.py`)
### 5.1 `tools/embed.py`
**H-5.1.1 — HIGH — Root URL becomes `"/./"`.** Line 68-73:
```python
def _url_from_path(html_path: Path) -> str:
rel = html_path.relative_to(SITE_DIR)
if rel.name == "index.html":
url = "/" + str(rel.parent) + "/"
return url.replace("//", "/")
return "/" + str(rel)
```
For `_site/index.html`, `rel.parent` is `Path(".")`. `str(Path("."))` is `"."` on Linux. Result: `"/./"`. Haskell's `SimilarLinks.normaliseUrl` produces `"/"` for the same route, so lookup fails and the homepage never gets similar-links suggestions.
Fix:
```python
if rel.name == "index.html":
parent = str(rel.parent)
if parent in (".", ""):
return "/"
return "/" + parent + "/"
```
**L-5.1.2 — LOW — No `--quiet` mode.** `embed.py` prints progress unconditionally; CI builds get noise.
**L-5.1.3 — LOW — `needs_update()` uses `rglob("*.html")` over `_site`.** Fine, but for a large `_site/` this re-stat's every HTML on every build. Could be cached via a single inode-level watermark file.
**NIT — `EXCLUDE_URLS` comparison against `/search/`, `/build/`, etc. works only because `_url_from_path` matches those exact forms.** A refactor could break the set. Document.
### 5.2 `tools/import-poetry.py`
**H-5.2.1 — HIGH — `yaml_str()` does not escape newlines.** Lines 193-203. An `abstract`, `attribution`, or first-line containing `\n` yields invalid YAML. Add `\n`, `\r` to the `needs_quote` character set.
**H-5.2.2 — HIGH — Empty `title_prefix` / `collection_slug` silently collide.** Line 328. If `--collection` is all punctuation, the slug becomes empty and every poem writes to the same path. Add an up-front assertion:
```python
if not collection_slug or collection_slug == "-":
sys.exit(f"error: collection slug is empty (check --collection={args.collection!r})")
```
**M-5.2.3 — MEDIUM — `--date` is unvalidated.** Line 313. User can pass `--date "last tuesday"` and it flows into YAML unchanged. Parse as `int` in `[1, 2100]`.
**M-5.2.4 — MEDIUM — `roman_to_int` has no explicit bounds check.** Line 45-52. Guarded by regex at the call site; fine today, but make the function defensive for its own protection.
**L-5.2.5 — LOW — `write_text(content, encoding="utf-8")` with no `errors=` argument.** Will raise on unmappable codepoints. Pick `errors="strict"` or `errors="replace"` intentionally.
### 5.3 `tools/viz_theme.py`
**L-5.3.1 — LOW — `save_svg` has no `try/finally` around `plt.close(fig)`.** If `savefig` raises, matplotlib state leaks. Standalone CLI-tool-y, so the impact is one figure.
### 5.4 `pyproject.toml` / `uv.lock`
**M-5.4.1 — MEDIUM — Upper-bound-free pins.** `torch>=2.5`, `sentence-transformers>=3.4`, `faiss-cpu>=1.9`, `numpy>=2.0`. A future major release can break the build silently. Pin with `<4` upper bounds.
---
## 6. Shell scripts and `Makefile`
### 6.1 `Makefile`
**M-6.1.1 — MEDIUM — `make deploy` pushes to GitHub *before* rsync.** Line 57-58:
```makefile
deploy: clean build sign
git push -u origin main
rsync -avz --delete _site/ $(VPS_USER)@$(VPS_HOST):$(VPS_PATH)/
```
If `rsync` fails, the GitHub push has already succeeded — the remote is ahead of the deployed site. Inverse order (rsync first, then push on success) would be safer, though `make` won't auto-rollback either way.
**M-6.1.2 — MEDIUM — `deploy` uses `$(VPS_USER)`, `$(VPS_HOST)`, `$(VPS_PATH)` with no definition in the Makefile.** They must come from `.env`. If any is unset, rsync runs as `@:/_site/` → silently opens an SSH connection to the wrong place or errors out obliquely. Add a guard:
```makefile
deploy: clean build sign
@test -n "$(VPS_USER)" || (echo "VPS_USER not set" >&2; exit 1)
@test -n "$(VPS_HOST)" || (echo "VPS_HOST not set" >&2; exit 1)
@test -n "$(VPS_PATH)" || (echo "VPS_PATH not set" >&2; exit 1)
...
```
**M-6.1.3 — MEDIUM — `build` commits content before building but never cleans up on failure.** Line 9-10:
```makefile
@git add content/
@git diff --cached --quiet || git commit -m "auto: $$(date -u +%Y-%m-%dT%H:%M:%SZ)"
```
If the subsequent `cabal run site -- build` fails, the commit is already in place. Subsequent `make build` retries will see a clean diff and succeed, masking the original failure in history as a "trailing" auto-commit. Low severity — the memory note about always `make clean && make build` + the `deploy: clean build` target make this infrequent.
**L-6.1.4 — LOW — `clean` only runs `cabal run site -- clean`.** That cleans `_site` and `_cache` (Hakyll's store), but not `dist-newstyle/` (cabal build output) or the embeddings under `data/` (gitignored but stale). Arguably correct-as-designed: deep clean is `git clean -fdX`. Document.
**L-6.1.5 — LOW — `pdf-thumbs` recipe uses unquoted `$$pdf` in `-nt` test.** If a PDF filename contains a space (rare), the test misparses. Quote:
```makefile
if [ ! -f "$${thumb}.png" ] || [ "$$pdf" -nt "$${thumb}.png" ]; then
```
(The file path IS quoted — false alarm. Retracted.)
**L-6.1.6 — LOW — `export` on line 6 is blunt.** Every Make variable and every variable inherited from the shell becomes available to every recipe. For a solo build this is fine; be aware of scope creep.
**NIT — `build-start.txt` via file I/O instead of make variable.** Line 11, 23. A single recipe could use shell arithmetic, avoiding the scratch file (and the need to gitignore it).
### 6.2 `tools/sign-site.sh`
**L-6.2.1 — LOW — `find | xargs -I{}` with `-P $(nproc)` is vulnerable to pathological filenames.** The `-I{}` substitution plus `-0` is safe for spaces, but if `$(nproc)` returns `0` (cgroup edge cases), `-P 0` means "as many as possible", which is arguably fine but non-obvious. Explicit: `-P "${JOBS:-$(nproc)}"`.
**NIT — Hardcoded key fingerprint `C9A42A6FAD444FBE566FD738531BDC1CC2707066`.** Expected — but document how a key-rotation requires editing both this script and `preset-signing-passphrase.sh`.
### 6.3 `tools/convert-images.sh`
No issues of note. `set -euo pipefail`, `find -print0 | read -d ''`, quoting are all correct.
### 6.4 `tools/download-model.sh`
**L-6.4.1 — LOW — No checksum verification on downloaded ONNX.** Line 26 `curl -fsSL $BASE_URL/$src -o $dst`. If HuggingFace is compromised or returns a different build of the model, the site ships a trojan without warning. Pin expected SHA-256 and verify.
**NIT — Hardcoded HuggingFace URL.** Document that an official mirror is unavailable; that's why you pull from `resolve/main` rather than a pinned revision.
### 6.5 `tools/subset-fonts.sh`
**L-6.5.1 — LOW — Paths are Arch-specific.** `/usr/share/fonts/ttf-spectral`, `/usr/share/fonts/TTF`. On Debian/Ubuntu, JetBrains Mono lives at `/usr/share/fonts/truetype/jetbrains-mono/`. Detect via `fc-match` or document as Arch-only.
### 6.6 `tools/preset-signing-passphrase.sh`, `tools/refreeze.sh`
Clean. No material issues.
---
## 7. Repository hygiene and configuration
### 7.1 `.gitignore`
**L-7.1.1 — LOW — `.gitignore` lacks `*.swp`/`*.swo` for vim users, but has `*.swp`/`*.swo`.** OK. ✓
**L-7.1.2 — LOW — `dist-newstyle/`, `_site/`, `_cache/`, `.env`, `IGNORE.txt` all correctly ignored.**
**NIT — `paper/` is tracked but its purpose is unclear.** Not in this audit's scope but worth a README.
### 7.2 Files in repo root that shouldn't be
- `BeyondComorbidityIndices.docx` (3.8 MB) + `BeyondComorbidityIndicesSupplement.docx` (1.6 MB) — untracked, but 5.4 MB of binary clutter at the project root. Move into `paper/` or `drafts/`.
- `HOMEPAGE.md~` — empty editor backup, gitignored but on disk.
- `HOMEPAGE.md`, `WRITING.md`, `migrate_html.md` — workspace notes without a home. Consider `docs/` or `notes/`.
- `content/modern_idolatry.md` — untracked Markdown file in `content/` that isn't `content/drafts/`. Either move under drafts or commit.
- `IGNORE.txt` — exists (empty), gitignored, used by the stability pin mechanism. Clean.
### 7.3 `levineuwirth.cabal`
**L-7.3.1 — LOW — `-Wno-unused-imports` masks real unused imports.** Set at the executable level. This hid the `Metadata.hs` no-op and the `Data.List.intercalate` in several modules. Delete the flag and fix the warnings.
**L-7.3.2 — LOW — Version bounds are present but `< 4.17` on Hakyll and `< 3.7` on Pandoc pin the project to a specific minor release window.** Good discipline, but document the refreeze cadence (there's `tools/refreeze.sh` — reference it in README).
**NIT — `bytestring < 0.13`** is ambitiously loose; the Pandoc ecosystem tends to follow `bytestring < 0.12`. Verify by running `cabal outdated --v2-freeze-file`.
### 7.4 `cabal.project`
`-O1` for the build program is the right call — Hakyll build time is dominated by Pandoc, not the wrapper. ✓
### 7.5 `pyproject.toml`
See finding M-5.4.1. Otherwise clean.
### 7.6 README
**M-7.6.1 — MEDIUM — `README.md` is a single line: `# levineuwirth.org`.** The project has a 63 KB `spec.md`, multiple build flows, optional features (.venv for embeddings, download-model for semantic search), a signing setup, and an rsync deployment target. None of this is documented. A new contributor (or future-you after a two-year hiatus) cannot get started from what's here.
Minimum viable README:
1. One-sentence description.
2. `make build`, `make dev`, `make deploy` entrypoints.
3. Optional: `.venv` setup via `uv sync` for embeddings; `make download-model` for client-side semantic search.
4. `.env` format (link to `.env.example`).
5. Pointer to `spec.md` for architecture.
---
## 8. Cross-cutting observations
### 8.1 Duplicate code
At least five independent implementations of HTML escaping (`Utils.hs`, `Images.hs`, `Score.hs`, `Smallcaps.hs`, `Viz.hs`, plus 3× in JS). At least four implementations of `trim` (`Transclusion`, `EmbedPdf`, `Wikilinks`, plus `Contexts.strip`). Two of `slugify`/`authorSlugify`. Two of `stringify` (`Compilers.hs`, `Images.hs`, plus `Text.Pandoc.Shared.stringify` in the library). Two `normaliseUrl` (`Backlinks.hs`, `SimilarLinks.hs`) — **almost** identical but with different `index.html` handling, so they cannot be naively merged.
Recommendation: create a `build/Common.hs` (or separate `build/Text.hs`) for `escapeHtml`, `trim`, `stringify`, and consolidate where possible.
### 8.2 Partial functions
Partial functions are used in several places with explicit guards (`last`, `head`, `!!`, `fromJust`). All are safe in their current guards, but the pattern is riskier than case-analysis. Audit: `Contexts.hs:263`, `Citations.hs:142`, `Stats.hs:125 (median)`, `Stability.hs:75`, `Catalog.hs:194`.
### 8.3 Error handling consistency
Two different patterns:
- `Score.hs` and older filters: `readFile` blows up, no diagnostic.
- `Viz.hs` and `Stability.hs`: `catch` + `errorBlock` / fallback.
Standardize on the second pattern across all IO-performing filters.
### 8.4 Trust boundary is unstated
The codebase leans on a "the author writes and reviews everything" assumption for:
- Frontmatter metadata (used raw in HTML by `Catalog.hs`, `Stats.hs`, `Contexts.hs`).
- Wikilink / transclusion slugs (used raw in HTML by `Transclusion.hs`).
- Bibliography entries (used raw in HTML by `Citations.hs`, `Backlinks.hs`).
This is a defensible design for a single-author site. Document it in `spec.md`. If the day ever comes that a PR from a collaborator is accepted, or that a user-provided input feeds any of these fields, the trust boundary needs to be revisited across all of these call sites simultaneously.
### 8.5 Build reproducibility
- Python dependencies are upper-bound-free (M-5.4.1).
- Model download (`tools/download-model.sh`) is unpinned by SHA (L-6.4.1).
- KaTeX and Vega are loaded from CDN (`templates/partials/head.html:26, 34-36`) without SRI hashes.
- Pandoc version is bounded (`>= 3.1 && < 3.7`), good but citeproc behavior varies subtly across these.
Add SRI to CDN assets; pin the ONNX model to a specific revision + SHA; tighten Python pins.
### 8.6 Accessibility posture
Strong foundation (skip link, ARIA on nav, semantic elements, reduce-motion support), with localized gaps:
- Gallery overlay focus trap (H-3.3.1)
- Collapsed TOC (H-4.1.3)
- Dark-mode text-faint contrast (H-4.1.2)
- Keyboard-only equivalents for sidenote + annotation picker interactions
Addressing H-3.3.1, H-4.1.3, and H-4.1.2 alone would raise the overall a11y grade meaningfully.
---
## 9. Fix priority (recommended order)
### P0 — correctness blockers
1. `Filters/Images.hs:110` `lowerExt` bug. One-line fix (`takeExtension`). Restores the entire WebP pipeline.
2. `Commonplace.hs:126-131` parenthesize the `if`. One line.
3. `tools/embed.py:68-73` fix root URL. Three lines.
4. `Authors.hs:50` add `content/essays/*/index.md` to `allContent`.
5. `Filters/Sidenotes.hs:38` numeric labels (or error on >26).
### P1 — silent-failure hardening
6. `Filters/Score.hs:40` — missing file handling.
7. `Filters/Viz.hs:96` — missing file handling.
8. `Filters/Images.hs:77` — dedup `passedKvs` blacklist.
9. `Filters/Links.hs:59` — proper hostname match.
10. `tools/import-poetry.py:193` — escape newlines in YAML strings.
### P2 — accessibility
11. Dark-mode `--text-faint` contrast.
12. Gallery focus trap.
13. TOC collapsed-state keyboard access.
14. Global `:focus-visible` styles.
### P3 — hygiene and refactor
15. Missing CSS variables (`--rule`, `--font-ui`, `--bg-subtle`).
16. Consolidate duplicate `escapeHtml`/`trim`/`stringify`.
17. `README.md` with actual contents.
18. Delete `build/Metadata.hs` or populate.
19. Remove `-Wno-unused-imports` from `levineuwirth.cabal` and fix what surfaces.
20. Relocate `.docx` binaries out of repo root.
### P4 — nice to have
21. Reproducibility: SRI on CDN, pinned ONNX, tightened Python bounds.
22. Consolidate `Backlinks`/`Authors`/`Tags`/`Site` content patterns into a single `Patterns.hs`.
23. Defense-in-depth escaping in `Transclusion.hs`, `Catalog.hs`.
24. `make deploy` guard for `VPS_*` variables.
---
## Appendix A — files scanned in full
- **Haskell (build system):** `Main.hs`, `Site.hs`, `Contexts.hs`, `Stats.hs`, `Backlinks.hs`, `Compilers.hs`, `Citations.hs`, `Stability.hs`, `Catalog.hs`, `Commonplace.hs`, `Authors.hs`, `Tags.hs`, `Pagination.hs`, `SimilarLinks.hs`, `Utils.hs`, `Metadata.hs`, `Filters.hs`.
- **Haskell (filters):** `Filters/Images.hs`, `Filters/Transclusion.hs`, `Filters/Score.hs`, `Filters/Viz.hs`, `Filters/Sidenotes.hs`, `Filters/Links.hs`, `Filters/Wikilinks.hs`, `Filters/EmbedPdf.hs`. Others via parallel audit.
- **JavaScript:** 20 files under `static/js/` via parallel audit (prism.min.js excluded as vendor).
- **CSS:** 22 files under `static/css/` via parallel audit.
- **Templates:** `default.html`, `partials/head.html`, `partials/nav.html`, plus the full template tree via parallel audit.
- **Python:** `tools/embed.py`, plus `tools/import-poetry.py`, `tools/viz_theme.py` via parallel audit.
- **Shell:** `tools/convert-images.sh`, `tools/sign-site.sh`, `tools/download-model.sh`, `tools/subset-fonts.sh`, `tools/preset-signing-passphrase.sh`, `tools/refreeze.sh`, `Makefile`.
- **Config:** `levineuwirth.cabal`, `cabal.project`, `pyproject.toml`, `.gitignore`, `.env.example`.
## Appendix B — what was not audited
- `templates/partials/metadata.html`, `footer.html`, `page-footer.html`, `paginate-nav.html` — inspected briefly via the CSS/template sub-audit only.
- `static/css/build.css` — cited by the CSS audit for undefined variable usage; rules not fully traced.
- `data/*.bib`, `data/*.csl` — treated as data, not audited for CSL correctness.
- `content/**/*.md` — authored content, out of scope.
- `_site/`, `_cache/`, `dist-newstyle/`, `.venv/` — build outputs.
- `spec.md` — design document, referenced but not audited line-by-line.
- `prism.min.js`, `pagefind` output, KaTeX, Vega — vendor / third-party.
— End of audit —